DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 15 is rejected under 35 U.S.C. 101 because the claim 15 recites: “A computer program product comprising one or more computer-executable instructions …… ”, the body of the claim recites computer program steps, such as, “the method defined in claim 1”, without encoded in a non-transitory computer readable medium, which are nothing more than just programmed instructions to be performed by the system. Therefore, claim 15 is non- statutory. Similarly, computer programs claimed as computer listings per se, ie., the descriptions or expressions of the programs, are not physical “things.” They are neither computer components nor statutory processes, as they are not “acts” being performed. Such claimed computer programs do not define any structural and functional interrelationships between the computer program and other claimed elements of a computer which permit the computer program’s functionality to be realized. In contrast, a claimed non-transitory computer-readable medium encoded with a computer program is a computer element which defines structural and functional interrelationships between the computer program and the rest of the computer which permit the computer program’s functionality to be realized, and is thus statutory. Accordingly, it is important to distinguish claims that define descriptive material per se from claims that define statutory inventions.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 8-10 and 13-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over NPL Viazovetskyi et al. (“StyleGAN2 Distillation for Feed-forward Image Manipulation”), hereinafter as Viazovetskyi, in view of NPL Pinkney et al. (“Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains”), hereinafter as Pinkney, further in view of NPL Richardson et al. (“Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation”), hereinafter as Richardson.
Regarding claim 1, Viazovetskyi teaches A method for constructing cartoonization models (Viazovetskyi Page 3, First paragraph, “We propose a way to generate a paired dataset and then train a “student” network on the gathered data. This method is very flexible and is not limited to the particular image-to-image model.”), comprising: generating a predetermined number of sample authentic images using a pre-trained first generative model (Viazovetskyi, Page 6, First paragraph, “We generate 50 000 samples for each task. Each sample consists of two source images and a target image. Each source image is obtained by randomly sampling z from normal distribution, mapping it to intermediate latent code w, and generating image g(w) with StyleGAN2.”, Pinkney teaches the pretrained StyleGan shown below); ……
acquiring a sample image pair by combining the each of the sample authentic images with the corresponding sample cartoon image (Viazovetskyi Page 7, Figure 3, “Fig. 3: Dataset generation. We first sample random vectors z from normal distribution. Then for each z we generate a set of images along the vector ∆ corresponding to a facial attribute. Then for each set of images we select the best pair based on classification results”, Pinkney teaches the cartoon image generated by second GAN shown below); and generating a cartoonization model for converting a target image into a fully cartoonized image by fitting, based on a sample set consisting of multiple sample image pairs, a predetermined initial cartoonization model ……(Viazovetskyi teaches training the pix2pixHD model based on the generated sample image pairs as the cartoonization model, Page 7, 3.2 Training process, “As a result, we choose to train pix2pixHD6 [55] as a unified framework for image-to-image translation instead of selecting a custom model for every type of task …… Light blobs is a problem which is solved in StyleGAN2. We suppose that similar treatment also in use for pix2pixHD. “, Page 3, Second paragraph, “We propose a way to generate a paired dataset and then train a “student” network on the gathered data.”, Pinkney teaches the cartoonization model as shown below).
Viazovetskyi fails to teach constructing a second generative model based on the first generative model, and generating a sample cartoon image corresponding to each of the sample authentic images using the second generative model;…… with a weight corresponding to the second generative model as an initial weight.
Pinkney teaches constructing a second generative model based on the first generative model (Pinkney teaches pbase as the weights of the first generative model, and pinterp as the weights for the second generative mode, Page 2, Second paragraph, “1. Start with a pre-trained model with weights pbase, the Base model, 2. Train the model on a new dataset to create a model via transfer learning with weights ptransfer, the Transferred model. 3. Combine the weights from both original and new Generators into a new set of weights pinterp. 4. The new weights pinterp are then used to create the Interpolated model.”), and generating a sample cartoon image corresponding to each of the sample authentic images using the second generative model (Pinkney teaches using the interpolated model to generate cartoon style images. Page 2, Figure 3, “Figure 3: The interpolated model (a) produces images with the structural characteristics of a cartoon, but with photo-realistic rendering. When comparing the same latent vector input to the original FFHQ model (b) the identity appears largely preserved thus the interpolated model gives a "cartoonification" effect. The right most pair shows a result after encoding an image of Shinzo Abe”).
Viazovetskyi and Pinkney are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Pinkney teaches a layer swapping interpolation scheme in StyleGAN to have a better control of the output image (Pinkney Page 1, Abstract, “It is also desirable to have a level of control so that there is a degree of artistic direction rather than purely curation of random results. Here we present a method for interpolating between generative models of the StyleGAN architecture in a resolution dependent manner. This allows us to generate images from an entirely novel domain and do this with a degree of control over the nature of the output.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Pinkney with the method of Viazovetskyi to have a better control of output image.
Viazovetskyi in view of Pinkney fails to teach …… with a weight corresponding to the second generative model as an initial weight. Richardson teaches …… with a weight corresponding to the second generative model as an initial weight (Richardson teaches pixel2style2pixel as the cartoonization model, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute the pix2pixeHD model of Viazovetskyi with the pixel2style2pixel model of Richardson, Richardson further teaches using a pretrained StyleGAN as the generator in the pSp model, implicitly teaching using the initial weight of the pretrained StyleGAN, Pinkney teaches using StyleGAN as the second generative model, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Richardson with Pinkney, Richardson Page 2, Left Column, Third Paragraph, “we propose using our encoder together with the pretrained StyleGAN generator as a complete image-to-image translation framework; Page 7, Right Column, Last paragraph, “Besides the applications presented above, we have found pSp to be applicable to a wide variety of additional tasks with minimal changes to the training process. Specifically, we present samples of super-resolution and inpainting results using pSp in Figure 1 with further details and results presented in Appendix C. For both tasks, paired data is generated and training is performed in a supervised fashion.”).
Viazovetskyi , Pinkney and Richardson are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Richardson teaches combining pSp with pretrained StyleGAN to improve training process (Richardson Page 1, Left Column, Abstract, “We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Richardson with the method of Viazovetskyi in view of Pinkney to improve training speed.
Regarding claim 2, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 1, wherein constructing the second generative model based on the first generative model comprises: and further teach generating an intermediate cartoon model by adjusting a weight of the first generative model (Pinkney teaches a transferred model as the intermediate cartoon model, Page 2, Second paragraph, “1. Start with a pre-trained model with weights pbase, the Base model. 2. Train the model on a new dataset to create a model via transfer learning with weights ptransfer, the Transferred model.”); generating the second generative model by replacing a weight corresponding to a partially specified layer in the intermediate cartoon model with a weight corresponding to the partially specified layer in the first generative model (Pinkney Page 2, Figure 2, “Figure 2: Schematic of the "layer swapping" interpolation scheme described in Section 2. Each block represents a resolution level in StyleGAN, the final interpolated model is composed of blocks taken from each of the input models depending on the resolution “) and performing weight interpolation (Pinkney Page 2, Second paragraph, “The new weights pinterp are then used to create the Interpolated model. pinterp = (1 − α)pbase + αptransfer”).
Viazovetskyi, Pinkney and Richardson are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Pinkney teaches a layer swapping interpolation scheme in StyleGAN to have a better control of the output image (Pinkney Page 1, Abstract, “It is also desirable to have a level of control so that there is a degree of artistic direction rather than purely curation of random results. Here we present a method for interpolating between generative models of the StyleGAN architecture in a resolution dependent manner. This allows us to generate images from an entirely novel domain and do this with a degree of control over the nature of the output.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Pinkney with the method of Viazovetskyi and Richardson to have a better control of output image.
Regarding claim 3, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 2, and further teach wherein the partially specified layer comprises at least one of: a layer controlling a pose of a character or a layer controlling skin color of a character (Pinkney, Page 1, Last Paragraph, “different resolution layers in the model are responsible for different features in the generated image [6] (e.g. low resolutions control head pose, high resolutions control lighting)”).
Viazovetskyi, Pinkney and Richardson are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Pinkney teaches a layer swapping interpolation scheme in StyleGAN to have a better control of the output image (Pinkney Page 1, Abstract, “It is also desirable to have a level of control so that there is a degree of artistic direction rather than purely curation of random results. Here we present a method for interpolating between generative models of the StyleGAN architecture in a resolution dependent manner. This allows us to generate images from an entirely novel domain and do this with a degree of control over the nature of the output.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Pinkney with the method of Viazovetskyi and Richardson to have a better control of output image.
Regarding claim 4, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 1, and further teach wherein the initial cartoonization model comprises an encoder and a decoder (Richardson Page 8, Right Column, Last paragraph, “Combining our encoder with a StyleGAN decoder, we present a generic framework for solving various image-to-image translation tasks, all using the same architecture.”); and generating the cartoonization model for converting the target image into the fully cartoonized image by fitting, based on the sample set consisting of the multiple sample image pairs, the predetermined initial cartoonization model with the weight corresponding to the second generative model as the initial weight (Please refer to claim 1 for detailed rejection rationale) comprises:
acquiring a corresponding feature map and style attribute information by performing, by the encoder, feature extraction on the sample authentic image in the sample set, and outputting the feature map and the style attribute information to the decoder (Richardson Page 3, Figure 2, “Our pSp architecture. Feature maps are first extracted using a standard feature pyramid over a ResNet backbone. For each of the 18 target styles, a small mapping network is trained to extract the learned styles from the corresponding feature map, where styles (0-2) are generated from the small feature map, (3-6) from the medium feature map, and (7-18) from the largest feature map. The mapping network, map2style, is a small fully convolutional network, which gradually reduces spatial size using a set of 2-strided convolutions followed by LeakyReLU activations. Each generated 512 vector, is fed into StyleGAN”);
and acquiring the cartoonization model by training, by the decoder with the sample cartoon image in the sample set as a training target and the weight of the second generative model as the initial weight, the feature map and the style attribute information using a predetermined loss function (Richardson teaches an input and target pair for training the pSp model, Pinkney teaches an image and cartoon image pair, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine to combine the teaching of Richardson with Pinkney, Richardson Page 11, Right Column, Second paragraph, “We train both our model and pix2pixHD [43] in a supervised fashion, where for each input we perform random bi-cubic down-sampling of ×1 (i.e. no down-sampling), ×2, ×4, ×8, ×16, or ×32 and set the original, full resolution image as the target.”, Page 4, Left Column, First paragraph, “While the style-based translation is the core part of our framework, the choice of losses is also crucial. Our encoder is trained using a weighted combination of several objectives. First, we utilize the pixel-wise L2 loss…… In addition, to learn perceptual similarities, we utilize the LPIPS [46] loss…… we additionally define the following regularization loss”).
Viazovetskyi , Pinkney and Richardson are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Richardson teaches combining pSp with pretrained StyleGAN to improve training process (Richardson Page 1, Left Column, Abstract, “We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Richardson with the method of Viazovetskyi in view of Pinkney to improve training speed.
Regarding claim 8, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 4, and further teach wherein the second generative model is a StyleGAN2 model (Pinkney, Page 4, B.1.2 Toonification, “For transfer learning the model was initialised with weights from the config-f 1024x1024 FFHQ model. This was trained on the new dataset at a learn rate of 0.002 with mirror augmentation. Training was conducted for 32 thousand images with the default settings for StyleGAN2 FFHQ, after which the exponentially weighted average model was used for interpolation.”), and the decoder has a same structure as a synthesis network of the StyleGAN2 model (Richardson Page 11, Left Column, First paragraph, “We use a fixed StyleGAN2 generator trained on the FFHQ [21] dataset.”).
Viazovetskyi , Pinkney and Richardson are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Richardson teaches combining pSp with pretrained StyleGAN to improve training process (Richardson Page 1, Left Column, Abstract, “We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Richardson with the method of Viazovetskyi in view of Pinkney to improve training speed.
Regarding claim 9, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 4, and further teach further comprising: acquiring the target image, and inputting the target image into the cartoonization model (Richardson Page 3, Figure 2 shows an input image as the target image into the pSp Encoder); and extracting, in the cartoonization model, a target feature map and target style attribute information of the target image by performing, by the encoder, feature extraction on the target image, and inputting the target feature map and the target style attribute information into the decoder (Richardson Page 3, Right Column, Third Paragraph, “in pSp we extend an encoder backbone with a feature pyramid [26], generating three levels of feature maps from which styles are extracted using a simple intermediate network — map2style — shown in Figure 2. The styles, aligned with the hierarchical representation, are then fed into the generator in correspondence to their scale to generate the output image”), and generating, by the decoder, a corresponding fully cartoonized image based on the target feature map and the target style attribute information, and outputting the fully cartoonized image (Richardson teaches a StyleGAN generator as the decoder, styles based on the hierarchical feature map are fed into the generator, and a 1024 x 1024 output image are generated. Page 3, Figure 2).
Viazovetskyi , Pinkney and Richardson are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Richardson teaches combining pSp with pretrained StyleGAN to improve training process (Richardson Page 1, Left Column, Abstract, “We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Richardson with the method of Viazovetskyi in view of Pinkney to improve training speed.
Regarding claim 10, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 9, and further teach wherein the target image comprises at least an image input via an image editing page; or a plurality of image frames from a target video (Viazovetskyi Page 3, Fourth paragraph, “We show that it is possible to train image-to-image network on synthetic data and then apply it to real world images” and Page 4, Third Paragraph, “There are numerous follow-up works based on pix2pixHD architecture, including those working with video”).
Regarding claim 13, it recites similar limitations of claim 1 but in an electronic device form. The rationale of claim 1 rejection based on Viazovetskyi in view of Pinkney and Richardson is applied to reject claim 13. In addition, Viazovetskyi in view of Pinkney and Richardson teach An electronic device for constructing cartoonization models, comprising: one or more processors; and a memory, configured to store one or more programs; wherein the one or more processors, when loading and running the one or more programs, are caused to perform (the electronic device with memory and processors are inherent features used by the system and method taught in Viazovetskyi, Pinkney and Richardson).
Regarding claim 14, it recites similar limitations of claim 1 but in a non-transitory computer-readable storage medium form. The rationale of claim 1 rejection based on Viazovetskyi in view of Pinkney and Richardson is applied to reject claim 14. In addition, Viazovetskyi in view of Pinkney and Richardson teach A non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when loaded and run by a processor, causes the processor to perform: (the electronic device with memory and processors are inherent features used by the systems and methods taught in Viazovetskyi, Pinkney and Richardson).
Regarding claim 15, Viazovetskyi in view of Pinkney and Richardson teaches the method as defined in claim 1, and further teach A computer program product comprising one or more computer-executable instructions, wherein the one or more computer-executable instructions, when loaded and executed by a processor, cause the processor to perform …… (a computer program with instructions that can be run by processors are inherent features used by the system and methods taught in Viazovetskyi, Pinkney and Richardson).
Regarding claim 16, claim 16 has similar limitations as claim 2, therefore it is rejected under the same rationale as claim 2.
Regarding claim 17, claim 17 has similar limitations as claim 3, therefore it is rejected under the same rationale as claim 3.
Regarding claim 18, claim 18 has similar limitations as claim 4, therefore it is rejected under the same rationale as claim 4.
Claim(s) 6 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over NPL Viazovetskyi et al. (“StyleGAN2 Distillation for Feed-forward Image Manipulation”), hereinafter as Viazovetskyi, in view of NPL Pinkney et al. (“Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains”), hereinafter as Pinkney, further in view of NPL Richardson et al. (“Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation”), hereinafter as Richardson, and Park et al. (US20210358177 A1), hereinafter as Park.
Regarding claim 6, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 4, but fail to teach wherein the encoder comprises: an input layer, multiple residual layers, and a fully connected layer, wherein the multiple residual layers are configured to extract the feature map from the sample authentic image and output the feature map to a corresponding layer of the decoder and the fully connected layer is configured to extract the style attribute information from the sample authentic image and output the style attribute information to multiple layers of the decoder. Park teaches wherein the encoder comprises: an input layer, multiple residual layers, and a fully connected layer (Park paragraph [0081] “As illustrated in FIG. 6, the encoder neural network 206 includes the convolutional layers (“cony”), the residual blocks (“ResBlock”), the layout blocks (“LayoutBlock”), and a fully connected layer (“fc”), as outlined in relation to FIG. 5.”), wherein the multiple residual layers are configured to extract the feature map from the sample authentic image and output the feature map to a corresponding layer of the decoder (Park teaches generating the layout block as the feature map and output the spatial code to the corresponding layer in the generator, paragraph [0078-0080] “As illustrated in FIG. 5, the encoder neural network 206 includes convolutional layers, residual blocks, and layout blocks. In particular, the key in FIG. 5 indicates that the white layers of the encoder neural network 206 are convolutional layers, the diagonally patterned blocks are residual blocks, and the crosshatch patterned blocks are layout blocks……the encoder neural network 206 generates the spatial code by passing intermediate (e.g., non-output) activations or latent features into layout blocks. Each layout block upsamples the latent feature vector to a fixed size (e.g., a spatial resolution of 32 or 64, depending on the dataset) and reduces the channel dimension (e.g., to 1 or 2 channels). The encoder neural network 206 further aggregates (e.g., averages) the intermediate features to generate the spatial code.” And paragraph [0087] “the generator neural network 216 predicts spatially varying shifts and biases and injects the shifts and biases into a corresponding layer which has the same resolution as the spatial code”), and the fully connected layer is configured to extract the style attribute information from the sample authentic image and output the style attribute information to multiple layers of the decoder (Park teaches the global code as the style attribute information and further teaches feeding the global code to multiple layers of decoder, Figure 6, global code is generated after the final fully connected layer, paragraph [0082] “The encoder neural network 206 further includes a fully connected layer with 8192 input channels and 1024 output channels to generate the global code.” And paragraph [0087-0088] “The generator neural network 216 determines the scale and shift parameters from the mapping block based on the global code and the spatial code …..the generator neural network 216 thus flattens the spatial code (“flatten”) to put it through a fully connected layer to become a length 1024 feature vector to merge with the global code vector of length 1024 through concatenation (“concat”).”).
Viazovetskyi , Pinkney, Richardson and Park are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Park teaches using a global and spatial autoencoder to achieve image to image translation in order to improve accuracy and efficiency (Park paragraph [0035] “In addition to improved accuracy, the deep image manipulation system further provides improved efficiency over many conventional digital image editing systems.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Park with the method of Viazovetskyi in view of Pinkney and Richardson to improve efficiency and accuracy.
Regarding claim 20, claim 20 has similar limitations as claim 6, therefore it is rejected under the same rationale as claim 6.
Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over NPL Viazovetskyi et al. (“StyleGAN2 Distillation for Feed-forward Image Manipulation”), hereinafter as Viazovetskyi, in view of NPL Pinkney et al. (“Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains”), hereinafter as Pinkney, further in view of NPL Richardson et al. (“Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation”), hereinafter as Richardson, and NPL Karras et al. (“Training Generative Adversarial Networks with Limited Data”), hereinafter as Karras.
Regarding claim 11, Viazovetskyi in view of Pinkney and Richardson teach The method according to claim 1, but fail to teach further comprising: performing data augmentation on the sample set prior to model fitting using the sample set, wherein the data augmentation comprises randomly performing at least one of random rotation, random cropping, random zooming in, or random zooming out on the sample authentic image and the sample cartoon image. Karras teaches further comprising: performing data augmentation on the sample set prior to model fitting using the sample set (Karras Page 1, Third paragraph, “In this paper, we demonstrate how to use a wide range of augmentations to prevent the discriminator from overfitting,” and Page 3, Second Paragraph, “Our solution is similar to bCR in that we also apply a set of augmentations to all images shown to the discriminator.”), wherein the data augmentation comprises randomly performing at least one of random rotation, random cropping, random zooming in, or random zooming out on the sample authentic image and the sample cartoon image (Karras Page 4, Figure 3, (a) Isotropic image scaling (b) Random 90 degree rotations “Figure 3: Leaking behavior of three example augmentations”, Page 4, Third paragraph, “The strength of augmentations is controlled by the scalar p ∈ [0, 1], so that each transformation is applied with probability p or skipped with probability 1 − p. We always use the same value of p for all transformations. The randomization is done separately for each augmentation and for each image in a minibatch.” And Page 3, Fifth Paragraph, “Appendix C shows that this can be made to hold for a large class of widely used augmentations, including deterministic mappings (e.g., basis transformations), additive noise, transformation groups (e.g, image or color space rotations, flips and scaling), and projections (e.g., cutout [11]).”).
Viazovetskyi , Pinkney, Richardson and Karras are in the same field of endeavor, namely computer graphics, especially in the field of GAN based image to image translation. Karras teaches data augmentation approach to improve training process (Karras Page 1, Abstract, “We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.”). Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Karras with the method of Viazovetskyi in view of Pinkney and Richardson to improve training.
Allowable Subject Matter
Claims 5, 7, 19 and 21 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding claims 5 and 19, the closest prior art of Richardson teaches using multiple loss functions in training, pixel-wise L2 loss, perceptual LPIPS loss, regularization loss and identity loss. However, Richardson fails to teach the combined limitation as the whole “wherein the loss function comprises a combination of: an adversarial network loss function, a perceptual loss function, and a regression loss function L1 loss; wherein the adversarial network loss function is configured to determine authenticity of the fully cartoonized image generated by the cartoonization model, and calculate a loss based on a result of the determination; the perceptual loss function is configured to acquire a corresponding first feature map and second feature map output by a predetermined neural network model by separately inputting the fully cartoonized image output by the cartoonization model and a corresponding sample cartoon image in the sample set into the predetermined neural network model, and calculate an L2 loss between the first feature map and the second feature map; and the regression loss function L1_loss is configured to calculate an L1 loss between the fully cartoonized image output by the cartoonization model and the corresponding sample cartoon image in the sample set.”. Furthermore, no prior art of record either alone or in combination teaches the above limitation as a whole. Therefore, claims 5 and 19 are considered to allowable.
Regarding claims 7 and 21, the closest prior art of Park teaches an encoder neural network with global feature and spatial feature, However, Park fails to teach the combined limitation as the whole “wherein an initial weight of the encoder is a weight of an encoder in which various real person images are edited previously.” Furthermore, no prior art of record either alone or in combination teaches the above limitation as a whole. Therefore, claims 7 and 21 are considered to allowable.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Milman et al. (US20190340419 A1) teaches a trained machine-learning model to output avatar in cartoony style based on an input digital photo.
Luo et al. (US20220375024 A1) teaches a stylized training and image generation system that is capable of generating a cartoonized image.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOMING WEI whose telephone number is (571)272-3831. The examiner can normally be reached M-F 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XIAOMING WEI/Examiner, Art Unit 2611
/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611