DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 2, 4 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zafeiriou et al1 (“Zafeiriou”) in view of Gupta et al2 (“Gupta”).
Regarding claim 1, Zafeiriou teaches a method, comprising:
receiving, by a processing device, an input image depicting a human subject that includes lighting effects (see Zafeiriou, paragraphs 0024-0026 and figure 1 teaching “2D image 102 comprising a face is input” where lighting effects are included given that the images are “unconstrained” and the technique addresses “2D images captured in arbitrary recording conditions (also referred to as “in-the-wild”)” as in paragraph 0003 and for example as the 2D input is put through a de-lighting process and a re-lighting process then it is clear that the input image includes lighting effects which are then processed and analyzed);
generating, by the processing device, a segmentation mask that divides the input image into multiple semantic segments each representing a different portion of the human subject depicted in the input image (note that a segmentation mask is any mask or collection of data values representing some segment or smaller part of the whole of the input image and for example each segment is not required to be semantically linked to a specific body part nor must each segment only represent a specific body part and for example if input image data is divided in any manner to assign some classification or semantic category of any kind that separates it from another classification or semantic category then this would be a segmentation mask that divides the input image into multiple semantic segments and importantly note that the segments must each represent a different portion of the human subject in the input image but the semantic feature assigned to each portion does not need to be different than the semantic feature of any other portion; see Zafeiriou, paragraphs 0027-0033 teaching the method “one or more fitting neural networks 104 generates the… the low resolution 2D texture map 106” where “low resolution 2d texture map 106 may be any 2D map that can represent 3D textures. An example of such a map is a UV map. A UV map is a 2D representation of a 3D surface or mesh. Points in 3D space (for example described by (x, y, z) co-ordinates) are mapped onto a 2D space (described by (u, v) co-ordinates). A UV map may be formed by unwrapping a 3D mesh in a 3D space onto the u-v plane in the 2D UV space, and storing parameters associated with the 3D surface at each point in UV space. A texture UV map 110 may be formed by storing colour values of the vertices of a 3D surface/mesh in the 3D space at corresponding points in the UV space” such that the UV map functions as the segmentation mask by partitioning the continuous 3D surface of the human subject into different 2D segments on the map, where one segment may represent a cheek portion, and another representing a forehead and the like such that these are multiple segments each representing a different portion of the human subject and these segments representing each pixel and portion of the face are from dividing of the input image into multiple segments such as in paragraphs 0025-0026 teaching “input image may be cropped from a larger image based on detection of a face in the larger image” such that this provides a segmentation mask whose values can then be used to generate a skin tone mask and other information);
generating, by the processing device, a skin tone mask identifying one or more color values for a skin region of the human subject depicted in the input image (note that a skin tone mask as defined is any mask or identified collection of values that identifies color values for a skin region of the human subject; see Zafeiriou, paragraphs 0025-0033 teaching “texture UV map 110 may be formed by storing colour values of the vertices of a 3D surface/mesh in the 3D space at corresponding points in the UV space” such that the “2D texture map” functions as a skin tone mask identifying the color of the skin of the subject and note further that the skin tone mask identifying the skin region colors also may be considered to be the high resolution texture map 312 derived from the low resolution texture map segments as this gives the skin tone colors as a mask of UV mapping values that identify the colors to use to generate the diffuse albedo image); and
generating, by the processing device and using a machine learning lighting removal network, an unlit image by removing the lighting effects from the input image based on the segmentation mask and the skin tone mask (note that an “unlit image” as defined as generated is unlit if there is any removing of the lighting effects of the initial subject from the input image and furthermore note that the manner in which a “machine learning lighting removal network” is utilized to generate the unlit image is not specific and may include machine learning at any point in the pipeline which would make such generating done by a network of module that rely on machine learning in some way and for example the claims do not require a specific input/output relationship between the segmentation mask and skin tone mask to generate the unlit image but rather “based on” such information an unlit image must be generated; see Zafeiriou, paragraphs 0025-0038 and figure 1 teaching a “2D diffuse albedo map 116 is generated from the high resolution 2D texture map 112 using an image-to-image translation neural network 114 (also referred to herein as a “de-lighting image-to-image translation network”)” and “delighting neural network may be pre-trained to generate un-lit diffuse albedos” in order to remove “baked illumination (e.g. reflection, shadows)” and where of course a “neural network” is a machine learning lighting removal network as it removes such lighting to create unlit images).
Zafeiriou teaches all of the above, but fails to specifically teaching that the segmentation mask that divides the input image divides the input image into multiple semantic segments, as Zafeiriou is silent as to whether the cropping that results in the divided image including the segments is based on a semantic segmentation in which pixels are assigned some classification or semantic category, such that it is possible that such determination of the mask relied on a cropping technique that did not comprise such a classification or semantic assignment of a category or class. Thus Zafeiriou stands as a base device upon which the claimed invention can be seen as an improvement through a segmentation technique that provides a segmentation mask that divides the input image into multiple semantic segments and provides such semantic segments for further processing which would improve upon the more basic masking and/or cropping that may include extraneous pixels and data.
In the same field of endeavor relating to object and skin detection for image editing, Gupta teaches that it is known in image editing that objects which appear in images with humans may have skin tones and to improve the ability to enhance or edit images by selecting areas that only belong to actual skin such that these areas are edited/enhanced. Gupta teaches to generate a segmentation mask that divides the input image into multiple semantic segments each representing a different portion of the human subject depicted in the input image and then generating further image data based on the segmentation mask (see Gupta, paragraph 0015-0018 teaching and “system initially determines a bounding box for each person depicted in a digital image” and “generates object masks for each person depicted in the digital image” where “an object mask indicates pixels of a digital image that correspond to a person surrounded by the respective bounding box” and then this information may be further processed by additional stages and this object mask which “indicates pixels of a digital image that correspond to a person surrounded by the respective bounding box” is a segmentation mask that divides the input image into multiple semantic segments each representing a different portion of the human subject as they have been assigned a semantic category such as person as opposed to other pixels which do not make up the mask which have been assigned a category other than person and may be excluded). Thus Gupta provides a known technique applicable to the base system of Zafeiriou.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Zafeiriou to apply the known techniques of Gupta above as doing so would be no more than the application of a known technique to a base system ready for improvement. Zafeiriou already teaches to divide an input image into multiple segments with an input image cropped to bound a face and using the segmented regions for further processing, and Gupta’s technique for isolating the human subjects and portions of the human could be adapted such that the cropping in Zafeiriou is done according to Gupta’s technique which provides multiple semantic segments that correspond to different portions of the human subject as it segments the pixels that correspond to a person and not the other pixels. The application of Gupta to Zafeiriou would yield predictable results that the regions which belong to humans would be the ones considered in the further processing stages of Zafeiriou such that the further stages are based on only relevant pixels without including extraneous background pixels for example. This would result in an improved system as well as it would ensure that only relevant information is used to generate the image information for the next stages and would ensure that the information being mapped corresponds to the 3D model in Zafeiriou without any outliers for example.
Regarding claim 10, Zafeiriou as modified teaches all that is required as applied to claim 1 above and further teaches generating, by the processing device, a lighting representation that represents the lighting effects removed from the input image (see Zafeiriou, paragraphs 0024-0036 teaching generating such a lighting representation through generation of the albedo map which represents the lighting effects removed from the input image through showing the subject without such lighting effects).
Regarding claim 2, Zafeiriou as modified teaches all that is required as applied to claim 1 above but fails to teach wherein the generating the skin tone mask includes:
receiving user input specifying the one or more color values (see Zafeiriou, paragraphs 0025-0029 teaching “low resolution 2d texture map 106 may be any 2D map that can represent 3D textures. An example of such a map is a UV map. A UV map is a 2D representation of a 3D surface or mesh. Points in 3D space (for example described by (x, y, z) co-ordinates) are mapped onto a 2D space (described by (u, v) co-ordinates). A UV map may be formed by unwrapping a 3D mesh in a 3D space onto the u-v plane in the 2D UV space, and storing parameters associated with the 3D surface at each point in UV space. A texture UV map 110 may be formed by storing colour values of the vertices of a 3D surface/mesh in the 3D space at corresponding points in the UV space” such that here the program provides input specifying the one or more color values are those corresponding to the 2D image and 3D space created such that it is specified that one or more color values are to be filled corresponding to the UV patches and 2D image);
selecting one or more of the multiple semantic segments as the skin region (see Zafeiriou, paragraphs 0025-0029 teaching “low resolution 2d texture map 106 may be any 2D map that can represent 3D textures. An example of such a map is a UV map. A UV map is a 2D representation of a 3D surface or mesh. Points in 3D space (for example described by (x, y, z) co-ordinates) are mapped onto a 2D space (described by (u, v) co-ordinates). A UV map may be formed by unwrapping a 3D mesh in a 3D space onto the u-v plane in the 2D UV space, and storing parameters associated with the 3D surface at each point in UV space. A texture UV map 110 may be formed by storing colour values of the vertices of a 3D surface/mesh in the 3D space at corresponding points in the UV space” such that here the patches corresponding to the skin are selected as the UV patches in the UV space); and
filling the skin region with the one or more color values (paragraphs 0025-0029 teaching “low resolution 2d texture map 106 may be any 2D map that can represent 3D textures. An example of such a map is a UV map. A UV map is a 2D representation of a 3D surface or mesh. Points in 3D space (for example described by (x, y, z) co-ordinates) are mapped onto a 2D space (described by (u, v) co-ordinates). A UV map may be formed by unwrapping a 3D mesh in a 3D space onto the u-v plane in the 2D UV space, and storing parameters associated with the 3D surface at each point in UV space. A texture UV map 110 may be formed by storing colour values of the vertices of a 3D surface/mesh in the 3D space at corresponding points in the UV space” where here the skin region of the patches is filled with the one or more color values specified by the input 2D image). However, Zafeiriou does not teach that the input received is necessarily “user input” in some way that begins or causes or is involved with such a process that is to happen. Thus Zafeiriou stands as a base system upon which the claimed invention can be seen as an improvement through allowing a user control over specifying the one or more color values in some way which would at least result in an improved ability of the user to guide the process and results or to allow an opportunity to initiate the process in some way.
In the same field of endeavor relating to object skin detection for image editing, Gupta teaches that it is known in image editing that objects which appear in images with humans may have skin tones and to improve the ability to enhance or edit images by selecting areas that only belong to actual skin such that these areas are edited/enhanced. Gupta teaches to receive user input specifying one or more color values in order to select one or more of multiple semantic regions as skin regions and then filling the skin regions based on the selection to specify actual skin and its color in the image as opposed to anything with a color close to skin (see Gupta, as explained above regarding the semantic segments, and further paragraphs 0038-0056 teaching “determining portions of a photorealistic digital image that depict a person's skin and as generating data to select the skin without selecting other portions of the image” and “object recognition module 116 and the skin selection module 118 in this example are incorporated as part of a system to generate data effective to select a portion of a digital image 302 that corresponds to exposed skin of a person depicted in the image. The digital image 302 may correspond to the photographic image 112 in accordance with the described techniques. To begin, the object recognition module 116 obtains the digital image 302 for skin selection. The object recognition module 116 may obtain the digital image 302, for instance, in response to a user selection of a skin selection option presented via a user interface, e.g., of the content editing application 104” such that here “in response to a user selection of a skin selection option” a specifying of color values occurs as the system is now meant to specifically look for skin color values that belong to actual skin portions). Thus Gupta provides known techniques applicable to the base system of Zafeiriou above.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Zafeiriou to apply the known techniques of Gupta above as doing so would be no more than the application of a known technique to a base system ready for improvement. Here the Gupta technique could be used to initiate the de-lighting process in Zafeiriou with the user first specifying that the image is to go through color selection of only identifying skin regions which are then used as the basis for the 2D image and UV maps which help to generate the unlit image. The results of such an application would yield predictable results that only the specified skin color of human subjects are operated on as in Zafeiriou such that any extraneous objects and their colors are not utilized to fill the segments corresponding to the input image. This would result in an improved system as well as it would ensure that only relevant information is used to generate the skin tone color information and would ensure that the information being mapped corresponds to the 3D model in Zafeiriou without any outliers for example.
Regarding claim 4, Zafeiriou as modified teaches all that is required as applied to claim 1 above and further teaches generating, by the processing device, a separation mask that separates the human subject from other depicted objects of the input image, and wherein the generating the unlit image includes conditioning the machine learning lighting removal network on the input image, the separation mask, the segmentation mask, and the skin tone mask (see Zafeiriou, paragraphs 0025-0032 teaching as explained above the input image, segmentation mask and skin tone mask condition the machine learning lighting removal network to provide the outputs which are conditioned on this information being created beforehand). However, Zafeiriou fails to teach generating a separation mask that separates the human subject from other depicted objects of the input image and using this information as well to condition the network. Thus Zafeiriou stands as a base device upon which the claimed invention can be seen as an improvement.
In the same field of endeavor relating to object detection in image editing, Gupta teaches know techniques including to generate a separation mask that separates a human subject from other depicted objects of the input image and then using such a separation mask as a condition to image processing downstream (see Gupta, paragraphs 0028-0056 teaching “determining portions of a photorealistic digital image that depict a person's skin and as generating data to select the skin without selecting other portions of the image” and “object recognition module 116 and the skin selection module 118 in this example are incorporated as part of a system to generate data effective to select a portion of a digital image 302 that corresponds to exposed skin of a person depicted in the image. The digital image 302 may correspond to the photographic image 112 in accordance with the described techniques. To begin, the object recognition module 116 obtains the digital image 302 for skin selection. The object recognition module 116 may obtain the digital image 302, for instance, in response to a user selection of a skin selection option presented via a user interface, e.g., of the content editing application 104” and “persons are identified in an image, the object recognition module 116 is further configured to determine a boundary of each person detected in the image. Then the skin selection module 118 determines which portions of the photographic image 112 correspond to skin based on the persons recognized by the object recognition module 116. In accordance with the techniques described herein, the skin selection module 118 limits a search for identifying skin-colored pixels to solely the image pixels that correspond to recognized persons—as indicated by data (e.g., person masks) provided by the object recognition module 116” and “By providing selections solely of skin of persons depicted in digital images, and not of other portions of digital images that are simply skin-colored (but are not actually skin), the image processing system 114 aids client device users in performing skin-based editing operations with content editing applications” and “Based in part on pixels within the object masks 310 identified as having skin colors, the skin selection module 118 generates the selected skin data 318. As noted above, this selected skin data 318 indicates the pixels of the digital image 302 that correspond to exposed skin without indicating other portions of the digital image 302, such as portions of the digital image 302 having a same or similar color as skin but that do not actually correspond to depicted skin. The object recognition module 116 and skin selection module 118 can generate selected skin data 318 for one or more persons depicted in the image, and do so without selecting other portions of the image” where for example figures 2 and 4 show a separation mask of the human and skin map). Thus Gupta provides a known technique applicable to the base system of Zafeiriou.
Therefore it would have been obvious for one of ordinary skill in the art to modify Zafeiriou with the known techniques of Gupta to arrive at the claimed invention as doing so would be no more than the application of a known technique to a base system ready for improvement. Here the Gupta technique could be used as a step before cropping to the 2D input image in Zafeiriou with the system first ensuring other objects are removed from the image before supplying the 2D input image of the subject. The results of such an application would yield predictable results that only relevant information from the photo would be used as input to the de-lighting network that corresponds to actual skin color of a segment that is actually human. This would result in an improved system as such extraneous objects would not be erroneously mapped to the 2D image texture and would ensure that the texture is more accurate and as suggested by Gupta, “providing selections solely of skin of persons depicted in digital images, and not of other portions of digital images that are simply skin-colored (but are not actually skin), the image processing system 114 aids client device users in performing skin-based editing operations with content editing applications” (see Gupta, paragraph 0037).
Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zafeiriou as modified as applied to claim 1 above and further in view of Fu et al3 (“Fu”).
Regarding claim 3, Zafeiriou as modified teaches all that is required as applied to claim 1 but fails to teach wherein the one or more color values correspond to a single color, and wherein the generating the unlit image includes shifting pixel color values in the skin region of the unlit image to be closer to the single color. Thus Zafeiriou stands as a base device upon which the claimed invention can be considered an improvement.
In the same field of endeavor relating to editing of facial images, Fu teaches to determine when one or more color values correspond to a single color in a segment of an input image and shifting the pixel color values in the skin region of the unlit image to be closer to the single color (see Fu, paragraphs 0120-0121 teaching in a system which recognizes skin portions of a human face, “the lip region image is converted into HSV color space. From experimentation, the inventors herein observed that: the “hue” channel usually changes when the lighting condition has changed or light lipsticks are applied; the “saturation” channel changes when red lipsticks are applied; and the “value” or “brightness” changes when a purple or darker color is applied. Based on these observations, one can edit the corresponding channels with different colors of lipsticks when detected. For lip color removal, the specific reference histogram of “saturation” and “value” can be learned from a collected non-makeup lip dataset. With those predefined lip histograms, an input lip makeup can be removed by matching the detected lip histogram to a corresponding non-makeup histogram. Note that for the “hue” channel, the lip region usually only has one value and does not need to be represented by a histogram” and “a lip appears differently under different lighting conditions. Therefore, the system of the present disclosure takes the lighting condition into consideration to provide a more realistic removal color prediction. For use in lip removal, a skin color dataset is collected under different lighting conditions with corresponding lip color shifting compared with a standard lip color. With this dataset, the system of the present disclosure first extracts the input skin color and finds the corresponding lip color shifting under this specific lighting condition. Then, the final revised removal lip color is provided with the detected color shifting. A skin color dataset is not needed for other removal areas, but is collected for guidance in lip makeup removal. Absent the dataset, a predefined color may also be used to detect the lip” such that here colors may correspond to a single color of an input image and a “shifting” of color is determined so that the processed image is closer to a single color). Thus Fu provides a known technique applicable to the base system of Zafeiriou.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Zafeiriou with the teachings of Fu to arrive at the claimed invention as doing so would be no more than application of a known technique to a base device ready for improvement. Here the technique of Fu could be applied to Zafeirious so that predictably an input image prior to de-lighting would have input skin color that appears closer to a target skin color such that the generated unlit image would also then appear closer to the selected color as it would be based on this shifted color. This would result in an improved system as it would allow similar areas to be processed similarly and provide a type of normalization to such uniform regions or regions which should otherwise be uniform as is done in Fu.
Allowable Subject Matter
Claims 12-20 are allowed.
Regarding claim 12, it is most instructive to refer to the limitations of the rejected claims above and the prior art as applied to establish the novelty of claim 12. Claim 12, like claim 1 does require similar operations be performed such as use of a skin tone color value for a human depicted in an image with lighting effects and using machine learning to generate an unlit image that removes lighting information. Zafeiriou as explained above, also teaches removing lighting from an image and using a skin tone mask. However, claim 12 requires “generating a second unlit image by shifting color values in the skin region of the first unlit image to be closer to the skin tone value” which is not required by claim 1.
Furthermore, Zafeiriou does not teach any generation of a second unlit image by shifting color values, rather after the unlit albedo image is generated, new lighting may be applied by another network that applies the lighting but does not necessarily shift color values of the first unlit image to be closer to the skin tone value, especially where such skin tone value is specified as claimed. Thus claim 12 distinguishes over Zafeiriou. Note that the other cited prior art also fails to teach such generation of a second unlit image as recited. The Examiner is unable to find any other teaching or suggestion of such limitations and thus the claim and its progeny are allowed.
Regarding claim 16, claim 16 differs from the prior art in that while it is known to generate an unlit image using a machine learning network as explained above as in Zafeiriou for example, it is not known or suggested to generate a lighting representation that represents light effects removed from the input image where there is then receiving of user input editing this lighting representation, whereby then there is an updating of the unlit image based on the edited lighting representation, the updated unlit image having the lighting effects represented by the edited lighting representation removed from the input image. Rather while user input at various stages of image editing is known, including to update lighting, such techniques cannot be considered directly applicable or combinable with techniques such as claimed and in Zafeiriou as the effect of user input at various stages in such processing pipelines would not be expected to necessarily yield predictable results. Furthermore such techniques would not be editing lighting representations that represent lighting removed from an input image by a machine learning lighting removal network. Thus claim 16 is allowable. Claims 17-20 are thus allowable based at least upon being dependent upon an allowable claim.
Claims 5-9 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claim 5, the instant claims specify “generating, by the processing device, a combined input feature by concatenating the input image, the skin tone mask, the segmentation mask, and the separation mask; and subdividing, by a patch generation block, the combined input feature into a plurality of patches”. While the cited prior art teaches an input image, skin tone mask, segmentation mask, and separation mask, there is no teaching or suggestion to combine these into an input feature through concatenation and the subdividing as claimed. Thus claim 5 and its dependent claims 6-9 contain allowable subject matter.
Regarding claim 11, the closest prior art as in Zafeiriou does not teach any ability for user editing of the lighting representation where there is an updating of “the unlit image based on the edited lighting representation, the updated unlit image having the lighting effects represented by the edited lighting representation removed from the input image”. Furthermore the Examiner is unable to find any teaching or suggestion of such user editing of a lighting represented which is used to generate an updated unlit image. Thus the claim contains allowable subject matter.
Response to Arguments
Applicant’s arguments, see “REMARKS”, filed 12/11/2025, with respect to the rejection(s) of claim(s) 1 under 35 U.S.C. 102 have been fully considered and are persuasive insofar as the amendments to the claims do require teachings beyond Zafeiriou, although Applicant’s arguments as to why the claims define over Zafeiriou are not necessarily persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Zafeiriou and Gupta.
Applicant has amended the claim to recite that the segmentation mask is one that “divides the input image into multiple semantic segments” and thus requires that in some way a semantic meaning or category is assigned to pixels or larger segments. However, the claims do not require that the multiple semantic segments have different semantic values assigned to different portions of the human subject, rather even if a same semantic category is assigned to some pixel or larger segment of pixels that depicts a different portion of the human subject then this would still constitute multiple semantic segments each representing a different portion of the human subject. Applicant’s proposed amendment from the interview on 10/15/2025 proposed that each segment would be representing “different semantic features”, which as discussed at the time was not enough to distinguish over Zafeiriou, as there was not requirement that the multiple segments were necessarily any result of a semantic segmentation. However, as explained in the obviousness rejections above, Zafeiriou does teach segmenting an input image to derive the relevant information for processing, but does not teach that this process necessarily utilizes any assignment of a semantic category or value to provide this segmented data. Thus Gupta is provided as explained above which teaches to semantically segment an input image to isolate the human subject from a background category and the category of another human in some examples.
Applicant then argues further on pages 10-14 that Zafeiriou’s segmented subject and image data that informs the UV map for example cannot be the multiple semantic segments as recited in claim 1 for various reasons. However, in each argument Applicant is arguing limitations which are not present in the claims. In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., “no operation in Zafeirou to divide the different portions into "multiple semantic segments," or delineate them” and “separating different portions from one another” and “not segmented into distinct semantic regions, such as "a first segment 212 that represents hair of the human subject, a second segment 214 that represents eyebrows of the human subject, a third segment 216 that represents eyes of the human subject, a fourth segment 218 that represents a neck of the human subject, a fifth segment 220 that represents clothing of the human subject, a sixth segment 222 that represents lips of the human subject, and a seventh segment 224 that represents a face of the human subject” and “different semantic features of a human depicted” and “divided into semantic segments corresponding to different semantic features of a human subject”) are not recited in the rejected claim(s). Applicant also provides an explanation of “advantages” of the segmentation mask on pages 13-14 and uses an example of a “hair region” and “region-specific color inconsistency difficulties”, but such advantages are also not necessarily conferred by the invention defined in the disputed claims as they do not contain these concepts or utilize region-specificity. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). It can be seen that Applicant’s claim language and proposed amendments are attempting to capture concepts from the Specification in which the concept of not only semantic segmentation is important but also that the semantic segmentation is of a kind that can assign different semantic categories to different features of the human subject, however as explained above they do not sufficiently recite that the segmentation mask must have different semantic categories or the like assigned to different features. Thus Applicant’s arguments are not persuasive as the proper interpretation of the claims as explained above corresponds to the rejections of the claims as fully explained above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT E SONNERS whose telephone number is (571)270-7504. The examiner can normally be reached Mon-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SCOTT E SONNERS/Examiner, Art Unit 2613
/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613
1 US PGPUB No. 20230077187
2 US PGPUB No. 20190180083
3 US PGPUB No. 20190014884