DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see Remarks pages 6-9, filed 10/23/2025, with respect to the rejection of claims 1-6 under 35 U.S.C. 101 have been fully considered and are persuasive. The rejections of claims 1-6 have been withdrawn.
Applicant's arguments, see Remarks pages 9-10, filed 10/23/2025, with respect to the rejection of amended claim 1 under 35 U.S.C. 103 have been fully considered but they are not persuasive.
On page 9-10 of Remarks, Applicant argues:
PNG
media_image1.png
501
778
media_image1.png
Greyscale
Applicant’s arguments in regards to prior art Ramesh failing to disclose the amended claim 1 limitations “generate synthetic images based on the extracted latent features by perturbing the extracted latent features with a random noise and a random bias using a residual multiplicative perturbation, the synthetic images having a same classification as the input image,” have been fully considered and are moot in view of the new grounds of rejections of Nikolenko in view of Ramesh and Li, and Borovik in view of Ramesh and Li (detailed in the arguments and 35 U.S.C. 103 rejections below) necessitated by Applicant’s amendment to the claim(s).
Section 3 of Ramesh discloses: “Our approach allows us to encode any given image x into a bipartite latent representation (zi; xT ) that is sufficient for the decoder to produce an accurate reconstruction. The latent zi describes the aspects of the image that are recognized by CLIP, while the latent xT encodes all of the residual information necessary for the decoder to reconstruct x. The former is obtained by simply encoding the image with the CLIP image encoder. The latter is obtained by applying DDIM inversion (Appendix F in [11]) to x using the decoder, while conditioning on zi,” wherein the image’s latent representations, including residuals, are processed by a DDIM inversion for the generation of a variation of the input image, wherein the variations preserve the semantic information of the input image as is disclosed in Figure 3. Thus, Ramesh discloses “generate synthetic images based on the extracted latent features by perturbing the extracted latent features…using a residual…perturbation, the synthetic images having a same classification as the input image,” but fails to disclose wherein the extracted latent features are perturbed with a random noise and bias, and wherein the disclosed residual perturbation, is multiplicative.
Section 3.3. Stochastic Feature Augmentation (SFA) of Li discloses: “In this paper, we propose to augment feature representation using random noise. More specifically, the latent feature
embedding is augmented by simply multiplying and adding random variables sampled from certain distributions. The feature augmentation function is formulated as
PNG
media_image2.png
47
343
media_image2.png
Greyscale
where α ∈ RN×K×H×W and β ∈ RN×K×H×W are the noise samples, and ⊙ indicates element-wise multiplication. Each element α is sampled from some distributions,” wherein latent feature embeddings are augmented through a multiplicative perturbation with random variables α and β, which constitute noise and bias terms, respectively.
Therefore, as is further disclosed below in the claim 1 rejection under 35 U.S.C. 103, Nikolenko in view of Ramesh and Li, and Borovik in view of Ramesh and Li each disclose the limitation: “generate synthetic images based on the extracted latent features by perturbing the extracted latent features with a random noise and a random bias using a residual multiplicative perturbation, the synthetic images having a same classification as the input image.”
Claim Objections
Claim 2 is objected to because of the following informality: the limitation “wherein the random noise and bias are optimized with one or objectives of…” should be corrected to read “wherein the random noise and bias are optimized with one or more objectives of…” Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2 and 5-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nikolenko et al. (US 20200320345 A1) hereinafter referenced as Nikolenko, in view of Ramesh et al. (Hierarchical Text-Conditional Image Generation with CLIP Latents) hereinafter referenced as Ramesh, and Li et al. (A Simple Feature Augmentation for Domain Generalization).
Regarding claim 1, Nikolenko discloses: A computer-implemented media platform (Nikolenko: 0019: “the training datasets can be used in any number of applications, including visual or face recognition, autonomous vehicles, satellite imagery, drone imagery, gesture recognition, navigation, interior mapping, medical application and imagery, retail spaces, gaze estimation, robotics and control systems, animal agriculture, aquaculture, security systems, mobile or personal devices that perform recognition tasks (such as recognizing objects in images or landmarks on objects), personal shopping, etc.”), comprising:
one or more computers; one or more computer memory devices interoperably coupled with the one or more computer and having computer-readable storage media storing one or more instructions (Nikolenko: 0053) that when executed by the one or more computers, cause the one or more computers to instantiate: a deep neural network (DNN) image generator to generate images utilizing a DNN image generating model, wherein the deep neural network (DNN) image generating model is trained using generated synthetic images (Nikolenko: 0004: “The invention discloses a system and method for generating training data for training a recognition system to recognize a subject in a short period of time using training data… As the synthetic data is generated, labels are used to provide accurate information as part of the dataset that is used for training the system.”; 0031: “For example and in accordance with some aspects of the invention, Generative Adversarial Networks (GANs) are used to generate or enhance synthetic data.”; 0048: “In accordance with one aspect of the invention, the synthetic data is used to train a generative machine learning model, which in turn is used to make synthetic data (or images) more realistic.”; Wherein the synthetic images generated by the GAN model are used to refine the synthetic images).
Nikolenko does not disclose expressly: an image extractor configured to extract vectors representing latent features and at least one classification from an input image;
a generative model to: generate synthetic images based on the extracted latent features by perturbing the extracted latent features with a random noise and a random bias using a residual multiplicative perturbation, the synthetic images having a same classification as the input image;
and train a deep neural network (DNN) image generating model using the generated synthetic images.
Ramesh discloses: an image extractor configured to extract vectors representing latent features and at least one classification from an input image (Ramesh: Figure 2; 2. Method: “Our training dataset consists of pairs (x; y) of images x and their corresponding captions y. Given an image x, let zi and zt be its CLIP image and text embeddings, respectively…A prior P(zi |y) that produces CLIP image embeddings zi conditioned on captions y.”; Wherein an image’s embeddings are extracted and then conditioned on its caption, which constitutes its class); a generative model to: generate synthetic images based on the extracted latent features by perturbing the extracted latent features using a residual perturbation, the synthetic images having a same classification as the input image (Ramesh: Figure 3; 3. Image Manipulation: “Our approach allows us to encode any given image x into a bipartite latent representation (zi; xT) that is sufficient for the decoder to produce an accurate reconstruction. The latent zi describes the aspects of the image that are recognized by CLIP, while the latent xT encodes all of the residual information necessary for the decoder to reconstruct x. The former is obtained by simply encoding the image with the CLIP image encoder. The latter is obtained by applying DDIM inversion (Appendix F in [11]) to x using the decoder, while conditioning on zi…
Given an image x, we can produce related images that share the same essential content but vary in other aspects, such as shape and orientation (Figure 3). To do this, we apply the decoder to the bipartite representation (zi; xT) using DDIM with n > 0 for sampling…Larger values of n introduce stochasticity into successive sampling steps, resulting in variations that are perceptually “centered” around the original image x. As n increases, these variations tell us what information was captured in the CLIP image embedding (and thus is preserved across samples), and what was lost (and thus changes across the samples).”; Wherein, in order to generate synthetic images of similar content, which constitutes class as shown in Figure 3, the input image’s latent representations, including residuals, are perturbed).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the DALL-E 2 model disclosed by Ramesh in order to generate additional training data for the training of the GAN model disclosed by Nikolenko. The suggestion/motivation for doing so would have been “We compare our text-to-image system with other systems such as DALL-E [40] and GLIDE [35], finding that our samples are comparable in quality to GLIDE, but with greater diversity in our generations.” (Ramesh: 1. Introduction; Wherein the model is able to produce diverse results, overall assisting the training of the models). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Nikolenko in view of Ramesh does not disclose expressly: generate synthetic images based on the extracted latent features by perturbing the extracted latent features with a random noise and a random bias using a residual multiplicative perturbation.
Li discloses: the augmentation of features representations by perturbing the extracted latent features with a random noise and a random bias using a multiplicative perturbation (Li: 3.3. Stochastic Feature Augmentation (SFA): “In this paper, we propose to augment feature representation using random noise. More specifically, the latent feature embedding is augmented by simply multiplying and adding random variables sampled from certain distributions. The feature augmentation function is formulated as
PNG
media_image3.png
29
161
media_image3.png
Greyscale
(4) where α ∈ R N×K×H×W and β ∈ R N×K×H×W are the noise samples, and ⊙ indicates element-wise multiplication. Each element α is sampled from some distributions”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the algorithms for data augmentation disclosed in Li on the latent representations disclosed in Nikolenko in view of Ramesh for the generation of synthetic images. The suggestion/motivation for doing so would have been “In this paper we proposed a simple stochastic feature augmentation approach to improving model performance under domain shift. Our SFA provides a simple plug-in module, that provides state of the art performance when used to augment a Vanilla ERM baseline. Unlike alternative data-augmentation based approaches which are extremely complex, our approach can be added to any existing model in a few lines of code, and induces almost no training overhead.). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Nikolenko in view of Ramesh with Li to obtain the invention as specified in claim 1.
Regarding claim 2, Nikolenko in view of Ramesh and Li discloses: The media platform of Claim 1, wherein the random noise and bias are optimized with one or objectives of maintaining classification consistency, increasing prediction entropy (Li: Figure 1: “the feature spare must adapt to separate the classes with a more robust decision boundary. This new boundary is in turn more robust to domain-shift.”; Abstract: “Subsequent joint stochastic feature augmentation provides an effective domain randomization method, perturbing features in the directions of intra-class/cross-domain variability”; Wherein the feature space is adapted for class consistency and wherein the perturbing for intra-class/cross-domain variability constitutes increasing prediction entropy), or promoting diversity for subsequent images (Li: Abstract: “existing approaches primarily rely on image-space data augmentation, which requires careful augmentation design, and provides limited diversity of augmented data. We argue that feature augmentation is a more promising direction for DG; Wherein the proposed method provides improved diversity compared to existing approaches.).
Regarding claim 5, Nikolenko in view of Ramesh and Li discloses: The media platform of Claim 1, wherein the media platform is a video security platform (Nikolenko: 0017: “The real data may come from any source, including video, real dynamic images and real static images.”; 0019: “the training datasets can be used in any number of applications, including visual or face recognition, autonomous vehicles, satellite imagery, drone imagery, gesture recognition, navigation, interior mapping, medical application and imagery, retail spaces, gaze estimation, robotics and control systems, animal agriculture, aquaculture, security systems, mobile or personal devices that perform recognition tasks (such as recognizing objects in images or landmarks on objects), personal shopping, etc.”; Wherein the usage of video for security systems constitutes a video security platform).
Regarding claim 6, Nikolenko in view of Ramesh and Li discloses: The media platform of Claim 5, wherein the DNN image generator is to generate images for an object detection model (Nikolenko: 0020: “This improves training of machine learning models for computer vision tasks, including, but not limited to, image classification, object detection, image segmentation, and scene understanding.”; Wherein the GAN model produces training data for an object detection model.).
Claim(s) 1 and 3-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Borovik et al. (DeepHumor: Image-Based Meme Generation Using Deep Learning) hereinafter referenced as Borovik, in view of Ramesh and Li.
Regarding claim 1, Borovik discloses: A computer-implemented media platform (Borovik: Abstract), comprising: one or more computers; one or more computer memory devices interoperably coupled with the one or more computer and having computer-readable storage media storing one or more instructions (Borovik: 5. Experiments and Results: “All the experiments were conducted in Google Colab on Tesla P100 PCIe 16 GB GPUs for faster training. The source codes with the experiments and demonstrations are given in the GitHub repository.”) that when executed by the one or more computers, cause the one or more computers to instantiate: a deep neural network (DNN) image generator to generate images utilizing a DNN image generating model, wherein the deep neural network (DNN) image generating model is trained using generated synthetic images (Borovik: Abstract: “We investigate the application of Transformer based models for meme generation and compare them with the LSTM-based models…We train the models on two levels of text tokenization, words and characters, and compare performance by estimating per-token perplexity for predictions on the test set.”).
Borovik does not disclose expressly: an image extractor configured to extract vectors representing latent features and at least one classification from an input image;
a generative model to: generate synthetic images based on the extracted latent features by perturbing the extracted latent features with a random noise and a random bias using a residual multiplicative perturbation, the synthetic images having a same classification as the input image;
and train a deep neural network (DNN) image generating model using the generated synthetic images.
Ramesh discloses: an image extractor configured to extract vectors representing latent features and at least one classification from an input image (Ramesh: Figure 2; 2. Method: “Our training dataset consists of pairs (x; y) of images x and their corresponding captions y. Given an image x, let zi and zt be its CLIP image and text embeddings, respectively…A prior P(zi |y) that produces CLIP image embeddings zi conditioned on captions y.”; Wherein an image’s embeddings are extracted and then conditioned on its caption, which constitutes its class); a generative model to: generate synthetic images based on the extracted latent features by perturbing the extracted latent features using a residual perturbation, the synthetic images having a same classification as the input image (Ramesh: Figure 3; 3. Image Manipulation: “Our approach allows us to encode any given image x into a bipartite latent representation (zi; xT ) that is sufficient for the decoder to produce an accurate reconstruction. The latent zi describes the aspects of the image that are recognized by CLIP, while the latent xT encodes all of the residual information necessary for the decoder to reconstruct x. The former is obtained by simply encoding the image with the CLIP image encoder. The latter is obtained by applying DDIM inversion (Appendix F in [11]) to x using the decoder, while conditioning on zi…
Given an image x, we can produce related images that share the same essential content but vary in other aspects, such as shape and orientation (Figure 3). To do this, we apply the decoder to the bipartite representation (zi; xT) using DDIM with n > 0 for sampling…Larger values of n introduce stochasticity into successive sampling steps, resulting in variations that are perceptually “centered” around the original image x. As n increases, these variations tell us what information was captured in the CLIP image embedding (and thus is preserved across samples), and what was lost (and thus changes across the samples).”; Wherein, in order to generate synthetic images of similar content, which constitutes class as shown in Figure 3, the input image’s latent representations, including residuals, are perturbed ).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the DALL-E 2 model disclosed by Ramesh in order to generate additional training data for the training of the meme generation model disclosed by Borovik. The suggestion/motivation for doing so would have been “We compare our text-to-image system with other systems such as DALL-E [40] and GLIDE [35], finding that our samples are comparable in quality to GLIDE, but with greater diversity in our generations.” (Ramesh: 1. Introduction; Wherein the model is able to produce diverse results, overall assisting the training of the models). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Borovik in view of Ramesh does not disclose expressly: generate synthetic images based on the extracted latent features by perturbing the extracted latent features with a random noise and a random bias using a residual multiplicative perturbation.
Li discloses: the augmentation of features representations by perturbing the extracted latent features with a random noise and a random bias using a multiplicative perturbation (Li: 3.3. Stochastic Feature Augmentation (SFA): “In this paper, we propose to augment feature representation using random noise. More specifically, the latent feature embedding is augmented by simply multiplying and adding random variables sampled from certain distributions. The feature augmentation function is formulated as
PNG
media_image3.png
29
161
media_image3.png
Greyscale
(4) where α ∈ R N×K×H×W and β ∈ R N×K×H×W are the noise samples, and ⊙ indicates element-wise multiplication. Each element α is sampled from some distributions”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the algorithms for data augmentation disclosed in Li on the latent representations disclosed in Borovik in view of Ramesh for the generation of synthetic images. The suggestion/motivation for doing so would have been “In this paper we proposed a simple stochastic feature augmentation approach to improving model performance under domain shift. Our SFA provides a simple plug-in module, that provides state of the art performance when used to augment a Vanilla ERM baseline. Unlike alternative data-augmentation based approaches which are extremely complex, our approach can be added to any existing model in a few lines of code, and induces almost no training overhead.). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Borovik in view of Ramesh with Li to obtain the invention as specified in claim 1.
Regarding claim 3, Borovik in view of Ramesh and Li discloses: The media platform of Claim 1, wherein the media platform is a social media platform (Borovik: Abstract: “Memes are an integral part of modern culture. In this project, we present several encoder-decoder architectures for image-based meme generation.”; Wherein the usage of models trained on memes for the purpose of meme generation constitutes a social media platform.).
Regarding claim 4, Borovik in view of Ramesh and Li discloses: The media platform of Claim 3, wherein the DNN image generator is to generate memes for the social media platform (Borovik: Abstract: “In this project, we present several encoder-decoder architectures for image-based meme generation.”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY J RODRIGUEZ whose telephone number is (703)756-5821. The examiner can normally be reached Monday-Friday 10am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANTHONY J RODRIGUEZ/Examiner, Art Unit 2672
/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672