DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim Objections
Claims 7 and 15 are objected to because of the following informalities: Claims 7 and 15 recite “the 2D mask” instead of “the additional 2D mask.” Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 4-7, 9-11, 14-15, and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Simons et al. (US 2018/0350030) in view of Wang et al. (US 2024/0135483) and Kamath et al. (US 2019/0213705).
Regarding claim 1, Simons teaches/suggests: A computer-implemented method for performing style transfer, the method comprising:
converting a style sample into a first set of semantic features and a first set of visual features (Simons [0053] “a segmentation guide is generated by dividing the target image and the style exemplar image into various regions or features … An example of a semantic feature can include, but is not limited to, the hair, eyebrow, nose, lip, oral cavity, eye, skin, chin, etc. of the first or second person's head in the target image and style exemplar image” [0095] “the computer graphics system 102 can generate a skin region trimap for the first and second character by converting the target image and the style exemplar image to YC.sub.BC.sub.R color space and determining a likelihood of each pixel in the target image and each pixel in the style exemplar image being a skin pixel” [The style exemplar image meets the style sample.]);
determining a set of content samples corresponding to a three-dimensional (3D) scene (Simons [0050] “data indicating a target image and a style exemplar image is obtained or received” [0023] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos)” [The video frames meet the content samples.]);
for each content sample included in the set of content samples (Simons [0023] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos)”):
converting the content sample into an additional set of semantic features and an additional set of visual features (Simons [0053] “a segmentation guide is generated by dividing the target image and the style exemplar image into various regions or features … An example of a semantic feature can include, but is not limited to, the hair, eyebrow, nose, lip, oral cavity, eye, skin, chin, etc. of the first or second person's head in the target image and style exemplar image” [0095] “the computer graphics system 102 can generate a skin region trimap for the first and second character by converting the target image and the style exemplar image to YC.sub.BC.sub.R color space and determining a likelihood of each pixel in the target image and each pixel in the style exemplar image being a skin pixel” [The target image meets the content sample.]); and
determining a set of matches between (i) the additional set of semantic features and the additional set of visual features and (ii) the first set of semantic features and the first set of visual features (Simons [0060] “the computer graphics system 102 identifies one or more of the high-priority regions or features of the target image and one or more corresponding high-priority regions or features of the style exemplar image”); and
generating a style transfer result (Simons [0023] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos)” [0069] “a stylized image is generated or created by applying a style or texture of the particular style feature to the particular target feature”),
wherein the style transfer result comprises one or more structural elements of the 3D scene and one or more stylistic elements of the style sample (Simons [0046] “the computer graphics system 102 uses various algorithms to generate stylized animations that preserve the identity of an object or character in the target image (e.g., preserves the identity of the first person in the target image) and the visual richness of the style exemplar image (e.g., by retaining the local textural details of the style exemplar image)”).
Simons does not teach/suggest a set of content samples corresponding to a plurality of views of a three-dimensional (3D) scene. Wang, however, teaches/suggests a plurality of views of a three-dimensional (3D) scene (Wang [0244] “generating dynamic free-viewpoint videos from multiple synchronized video streams … multi-viewpoint videos”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the stylizing of Simons to be applied to the multiple viewpoints of the video as taught/suggested by Wang to stylize a free-viewpoint video.
Nor does Simons teach/suggest a style transfer result that includes a representation of the 3D scene. Wang further teaches/suggests a representation of the 3D scene (Wang [0246] “Neural Radiance Fields (NeRFs) are a specific neural representation that trains a neural network (e.g., a multilayer perceptron (MLP)) to represent the appearance of a 3D scene”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the stylized 3D scene of Simons to include the NeRF of Wang to represent its appearance.
Simons as modified by Wang does not teach/suggest a style transfer result based on one or more losses associated with the sets of matches determined for the set of content samples. Kamath, however, teaches/suggests one or more losses (Kamath [0135] “This pattern is then stylized, using a neural network (typically previously-trained), to produce a stylized output image” [0153][-[0154] “Two perceptual scalar loss functions are used in this process, to measure high-level semantic and perceptual differences between images … The first loss function indicates similarity to the originally-input image”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the computer graphics system of Simons as modified by Wang to include the neural network of Kamath for machine learning.
As such, Simons as modified by Wang and Kamath teaches/suggests a style transfer result based on one or more losses associated with the sets of matches determined for the set of content samples (Simons [0023] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos)” [0060] “the computer graphics system 102 identifies one or more of the high-priority regions or features of the target image and one or more corresponding high-priority regions or features of the style exemplar image” Kamath [0153]-[0154] “Two perceptual scalar loss functions are used in this process, to measure high-level semantic and perceptual differences between images … The first loss function indicates similarity to the originally-input image”).
Regarding claim 4, Simons as modified by Wang and Kamath teaches/suggests: The computer-implemented method of claim 1, wherein the style sample includes a 2D depiction of one or more of a painting, a sketch, a drawing, or a photograph (Simons [0003] “the computer graphics system creates a stylized image that mimics an artistic style or texture and looks like the target image … the computer graphics system may receive, as a target image, a photograph of a person”).
Regarding claim 5, Simons as modified by Wang and Kamath teaches/suggests: The computer-implemented method of claim 1, wherein the one or more structural elements include one or more of objects, lines, surfaces, or backgrounds, and the one or more stylistic elements include one or more of patterns, colors, textures, or lighting characteristics (Simons [0046] “the computer graphics system 102 uses various algorithms to generate stylized animations that preserve the identity of an object or character in the target image (e.g., preserves the identity of the first person in the target image) and the visual richness of the style exemplar image (e.g., by retaining the local textural details of the style exemplar image)”).
Regarding claim 6, Simons as modified by Wang and Kamath teaches/suggests: The computer-implemented method of claim 1, wherein generating the style transfer result comprises:
computing the one or more losses based on a set of distances associated with visual features included in the sets of matches determined for the set of content samples (Simons [0023] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos)” [0060] “the computer graphics system 102 identifies one or more of the high-priority regions or features of the target image and one or more corresponding high-priority regions or features of the style exemplar image” Kamath [0154] “The feature reconstruction loss is the (squared, normalized) Euclidean distance between feature representations”); and
iteratively modifying the representation of the 3D scene based on the one or more losses (Wang [0246] “Neural Radiance Fields (NeRFs) are a specific neural representation that trains a neural network (e.g., a multilayer perceptron (MLP)) to represent the appearance of a 3D scene” [0211] “The output of the network is compared to the desired output using a loss function and an error value is calculated for each of the neurons in the output layer … The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network” Kamath [0153]-[0154] “Two perceptual scalar loss functions are used in this process, to measure high-level semantic and perceptual differences between images … The feature reconstruction loss is the (squared, normalized) Euclidean distance between feature representations”).
The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 7, Simons as modified by Wang and Kamath teaches/suggests: The computer-implemented method of claim 1, wherein determining the set of matches comprises:
determining a set of two-dimensional (2D) masks associated with the set of content samples and an additional 2D mask associated with the style sample (Simons [0023]-[0024] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos) … the computer graphics device generates the segmentation guide for both the style exemplar image and the target image by creating a mask or soft mask of one or more features of the target image and the style exemplar image”); and
determining the sets of matches between (i) a subset of the additional set of semantic features and the additional set of visual features associated with the set of 2D masks and (ii) a subset of the first set of semantic features and the first set of visual features associated with the [additional] 2D mask (Simons [0041] “the computer graphics application 140 identifies or detects one or more of the regions or features of the style exemplar image and target image using the segmentation guide” [0024] “the computer graphics device generates the segmentation guide for both the style exemplar image and the target image by creating a mask or soft mask of one or more features of the target image and the style exemplar image”).
The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 9, Simons as modified by Wang and Kamath teaches/suggests: The computer-implemented method of claim 1, wherein each content sample included in the set of content samples includes a two-dimensional (2D) rendering of the 3D scene (Simons [0023] “generating or synthesizing stylized images (e.g., frames) or animations (e.g., videos)” Wang [0002] “This neural representation is trained to encode structural and color information that can be used to render a 2D image of the scene from novel viewpoints”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 10, Simons as modified by Wang and Kamath teaches/suggests: The computer-implemented method of claim 1, wherein the representation of the 3D scene comprises a neural radiance field (NeRF) (Wang [0246] “Neural Radiance Fields (NeRFs) are a specific neural representation that trains a neural network (e.g., a multilayer perceptron (MLP)) to represent the appearance of a 3D scene”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Claims 11, 14-15, and 17 recite limitation(s) similar in scope to those of claims 1, 6-7, and 19, respectively, and are rejected for the same reason(s). Simons as modified by Wang and Kamath further teaches/suggests one or more non-transitory computer-readable media storing instructions (Simons [0039] “the computer graphics system 102 includes the computer graphics application 140, which can include one or more instructions stored on a computer-readable storage medium and executable by processors of the computing device 104”).
Claim 18 recites limitation(s) similar in scope to those of claim 1, and is rejected for the same reason(s). Simons as modified by Wang and Kamath further teaches/suggests one or more memories storing instructions; and one or more processors for executing the instructions (Simons [0039] “the computer graphics system 102 includes the computer graphics application 140, which can include one or more instructions stored on a computer-readable storage medium and executable by processors of the computing device 104”).
Claim(s) 2-3, 12-13, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Simons et al. (US 2018/0350030) in view of Wang et al. (US 2024/0135483) and Kamath et al. (US 2019/0213705) as applied to claims 1, 11, and 18 above, and further in view of Ma et al. (US 2018/0060684).
Regarding claim 2, Simons as modified by Wang and Kamath does not teach/suggest: The computer-implemented method of claim 1, wherein determining the set of matches comprises computing a distance based on (i) a subset of the additional set of semantic features associated with a portion of the content sample, (ii) a subset of the additional set of visual features associated with the portion of the content sample, (iii) a subset of the first set of semantic features associated with a portion of the style sample, and (iv) a subset of the first set of visual features associated with the portion of the style sample. Ma, however, teaches/suggests computing a distance (Ma [0123] “whether two features are similar can be reflected by the size of an Euclidean distance”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the computer graphics system of Simons as modified by Wang and Kamath to include the Euclidean distance as taught/suggested by Ma to identify the corresponding regions/features.
As such, Simons as modified by Wang, Kamath, and Ma teaches/suggests computing a distance based on (i) a subset of the additional set of semantic features associated with a portion of the content sample, (ii) a subset of the additional set of visual features associated with the portion of the content sample, (iii) a subset of the first set of semantic features associated with a portion of the style sample, and (iv) a subset of the first set of visual features associated with the portion of the style sample (Simons [0060] “the computer graphics system 102 identifies one or more of the high-priority regions or features of the target image and one or more corresponding high-priority regions or features of the style exemplar image” Ma [0123] “the Euclidean distance is used to represent the texture similarity distance, color similarity distance and semantic attribute similarity distance” [0129] “according to different weights for three similarity distances calculated in Steps S202-S204, that is to fuse the similarity distances so as to comprehensively reflect the appearance similarity between the first image and the second image by three appearance features, i.e., texture, color and semantic attribute features”).
Regarding claim 3, Simons as modified by Wang, Kamath, and Ma teaches/suggests: The computer-implemented method of claim 2, wherein the distance comprises a weighted combination of (i) a first distance between the subset of the additional set of semantic features and the subset of the first set of semantic features and (ii) a second distance between the subset of the additional set of visual features and the subset of the first set of visual features (Ma [0123] “the Euclidean distance is used to represent the texture similarity distance, color similarity distance and semantic attribute similarity distance” [0129] “according to different weights for three similarity distances calculated in Steps S202-S204, that is to fuse the similarity distances so as to comprehensively reflect the appearance similarity between the first image and the second image by three appearance features, i.e., texture, color and semantic attribute features”). The same rationale to combine as set forth in the rejection of claim 2 above is incorporated herein.
Claims 12 and 13 recite limitation(s) similar in scope to those of claims 2 and 3, respectively, and are rejected for the same reason(s)
Claims 19 and 20 recite limitation(s) similar in scope to those of claims 2 and 3, respectively, and are rejected for the same reason(s)
Claim(s) 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Simons et al. (US 2018/0350030) in view of Wang et al. (US 2024/0135483) and Kamath et al. (US 2019/0213705) as applied to claims 7 and 15 above, and further in view of Sommerlade et al. (US 2022/0383034).
Regarding claim 8, Simons as modified by Wang and Kamath does not teach/suggest: The computer-implemented method of claim 7, wherein determining the set of 2D masks and the additional 2D mask comprises matching the set of 2D masks to the additional 2D mask based on a label associated with the set of 2D masks and the additional 2D mask. Sommerlade, however, teaches/suggests a label (Sommerlade [0051] “the semantic labeler 320 may classify every pixel in the input image according to a given class ... provides a pixel mask that labels hair adjacent to a detected face”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the computer graphics system of Simons as modified by Wang and Kamath to include the semantic labeler of Sommerlade for classification.
As such, Simons as modified by Wang, Kamath, and Sommerlade teaches/suggests matching the set of 2D masks to the additional 2D mask based on a label associated with the set of 2D masks and the additional 2D mask (Simons [0024] “the computer graphics device generates the segmentation guide for both the style exemplar image and the target image by creating a mask or soft mask of one or more features of the target image and the style exemplar image” Sommerlade [0051] “the semantic labeler 320 may classify every pixel in the input image according to a given class ... provides a pixel mask that labels hair adjacent to a detected face”).
Claim 16 recites limitation(s) similar in scope to those of claim 8, and is rejected for the same reason(s).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 2019/0244060 – domain stylization
US 2022/0383465 – multiview style transfer
US 2023/0351566 – appearance transfer by correspondence
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANH-TUAN V NGUYEN whose telephone number is 571-270-7513. The examiner can normally be reached on M-F 9AM-5PM ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JASON CHAN can be reached on 571-272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANH-TUAN V NGUYEN/
Primary Examiner, Art Unit 2619