DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 02/02/2026 has been entered. Claims 1-20 remain pending in the application.
Response to Arguments
Pg. 7 of the Remarks, filed 02/02/2026, indicates that a new Figure 1 has been submitted; however, amended Figure 1 has not been received.
Applicant’s arguments with respect to claim(s) 1, 8, and 15 have been considered but are moot because the new ground of rejection does not rely on any combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In pg. 7, ¶4 of the Remarks, Applicant argues: “The ‘937 Publication relies on largely unsupervised disentanglement of theme and content.” Though ‘937 describes embodiments of unsupervised neural network training, ‘937 also describes methods for supervised training. Regarding style/content decoupling, ‘937 describes training a latent space in ¶74: “pre-training a latent space to disentangle theme information from content information of an image”. In ¶71, ‘937 describes supervised training of neural networks. Supervised training of the latent space in ¶74 involves using labels to identify theme versus content information of the training images.
Drawings
The drawings are objected to because of the following informalities: Figure 1 includes a diagram with no text labels. The unlabeled rectangular boxes shown in the drawing should be provided with descriptive text labels (See MPEP 608.02(b)). See Response to Arguments above.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the Examiner, the Applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Interpretation
Regarding claims 7, 14, and 20, a “relative prevalence” is interpreted to be a comparative frequency/sample size of an image class relative to other image classes in a group of images.
Claims 1, 8, and 15 recite selecting or retrieving “from a database, a prototype image.” The specification indicates that selecting or retrieving a prototype image, corresponding to a real-world image, can also be performed by converting the real-world image using the VPE model (see paragraph 45: “In doing so, the VPE model 302 performs image-to-image translation using a variational auto encoder (VAE) structure, converting noisy real-world traffic sign images into clean, canonical images. In other words, the VPE model 302 selects a prototype image that corresponds to the real-world image based on the classification” and paragraph 87: “the one or more processors retrieves a prototype image from a database that most closely resembles the latent space representation of the input image data. This can be a comparison of latent space of the input image data to the latent space of the prototype images stored in the database. This can be performed using the VPE model 302, for example, which performs image- to-image translation using a VAE structure, converting noisy real-world traffic sign images into clean, canonical images.”, emphasis added). For examination purposes, the limitation of selecting or retrieving “from a database, a prototype image” can be met by a prototype image being selected/retrieved from a database or by the VPE model performing image-to-image translation.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claim 6 is rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends: Claim 6 fails to further limit the subject matter of claim 1. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (Kim et al., "Variational prototyping-encoder: One-shot learning with prototypical images" (2019) in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9462-9470), hereinafter Kim, in view of Kim et al. (U.S. Patent No. 2022/0269937 A1), hereinafter ‘937, in further view of Rosebrock (Rosebrock, Adrian. Label smoothing with Keras, TensorFlow, and Deep Learning. PyImageSearch, 30 Dec 2019 [online], [retrieved on 2026-03-17]. Retrieved from the Internet <URL: https://pyimagesearch.com/2019/12/30/label-smoothing-with-keras-tensorflow-and-deep-learning/#:~:text=Notice%20how%20we%20have%20a%20hard%20label,1%20while%20all%20others%20are%200%20.>), Xiong et al. (Xiong, F., Wang, Q., & Gao, Q. (2019). Consistent embedded GAN for image-to-image translation. IEEe Access, 7, 126651-126661.), hereinafter Xiong, and Zheng et al. (Zheng et al., "SAPNet: Segmentation-aware progressive network for perceptual contrastive deraining" (2022) in Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 52-62), hereinafter Zheng.
Regarding claim 1, Kim teaches a method of generating images for training a machine-learning model (Kim, abstract on pg. 9462: “one-shot classification with prototypical images as a single training example for each novel class”; see also combination below), the method comprising:
receiving image data corresponding to an image (Kim, first paragraph on pg. 9465: “real image inputs”);
altering the image data corresponding to a style of the image to create altered image data (Kim, section “3.3. Data augmentation”, bottom of pg. 9465 to pg. 9466: “We apply random rotation and horizontal flipping to both the real images and prototypes”; the “style” altered is the orientation of the image);
utilizing a machine-learning model (Kim, Variational Prototyping-Encoder, section “3.1. Variational Prototyping-Encoder”, pg. 9464) to encode the altered image data into a first latent space (Kim, test phase, Figure 2 caption on pg. 9463: “the encoder encodes real domain input images to latent distribution q(z|x)”);
decoding the first latent space (Kim, see Figure 2 caption citation below) and retrieving, from a database of previously-generated stored images, a prototype image (Kim, Figure 2 on pg. 9463, prototype database contains stored images that were previously generated) that is an unaltered, pre-existing image stored in the database and represents the altered image data (Kim, Figure 2 caption on pg. 9463: “The decoder then reconstructs the encoded distribution back to a prototype that corresponds to the input image”; see how decoded prototypes are stored in the prototype database and retrieved in the Test Phase of Figure 2; these images are pre-existing prior to retrieval and unaltered after generation).
Kim fails to teach wherein the image is captured by an image sensor;
extracting style encodings from the first latent space to classify a style of the altered image data in a second latent space, wherein the extracting includes using a content and style decoupling model trained using soft labels that capture style attributes including at least one of lighting conditions, blurriness, illumination, deformation, or another transformation; and
generating a new image by executing a pre-trained reconstructor model based on the first latent space, the second latent space, and the prototype image, wherein the pre-trained reconstructor model utilizes a contrastive loss and a perceptual loss when generating the new image.
However, ‘937 teaches an image sensor (‘937, part of an autonomous vehicle, para 173: “image sensors”);
extracting style encodings (‘937, para 74: spatially-independent, or “theme information”) from the first latent space (‘937, para 74: spatially-dependent, or “content information”) to classify a style of the altered image data in a second latent space (‘937, para 78: “an image encoder component of driving neural simulator 310 may separate a latent space of a given image into a spatially-independent latent space and a spatially-dependent latent space. A spatially-independent latent space may include theme information of given image, which includes information that does not depend on pixel locations such as a background color or weather of a scene”), wherein the extracting includes using a content and style decoupling model (‘937, image encoder that separates latent spaces, described in last citation; para 74: “pre-training a latent space to disentangle theme information from content information of an image”) trained using labels that capture style attributes including at least one of lighting conditions, blurriness, illumination, deformation, or another transformation (‘937, supervised training, para 71: “untrained neural network 206 is trained using supervised learning, wherein training dataset 202 includes an input paired with a desired output for an input”; lighting conditions – day vs. night, para 74: “identifying spatially-independent information referred to as a “theme” (e.g., such as weather, day vs. night, etc.)”; supervised training of the decoupling model involves content/theme labels to learn to disentangle the information in the images); and
generating a new image (‘937, para 82: “may be used for generating multiple variations of a real-world driving video sequence”) by executing a pre-trained reconstructor model (‘937, para 80: “a Generative Adversarial Networks (GAN) is used together with the encoder portion of the VAE”; para 113: “a neural network trained to…image reconstruction”) based on the first latent space and the second latent space (‘937, para 82: “variation of an original video may include a different weather in a scene of a simulation than a weather in an equivalent scene in original video”; see Figure 4A).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image sensor, decoupled style encodings, and reconstructor model of ‘937 with the method of Kim in order to capture real-world driving images used to generate training images for a machine learning model in an autonomous vehicle (‘937, para 149: “one or more of camera(s) may be used to perform advanced driver assistance systems (“ADAS”) functions”; para 236: “server(s) 1078 may receive, over network(s) 1090 and from vehicles, image data representative of images showing unexpected or changed road conditions”; para 237: “server(s) 1078 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation”; regarding style encodings and reconstructor model- para 82: “driving neural simulator 310 may be used for generating rich and diverse training datasets that can be used for training neural networks in autonomous driving vehicles”). Kim’s publication utilizes the latent space relationships between images and prototypes to classify real data (Kim, last paragraph of pg. 1 to first paragraph of pg. 2) and references the importance of diversifying training and prototype datasets (Kim, section 3.3 on pg. 4-5 and second paragraph in section 1 on pg. 1: “Thereby, the absence of a large number of training examples for a class often raises an issue when training a large capacity learner”). The combination of Kim and the reconstructor model, taught by ‘937, allows for the generation of additional road sign images, with further diversification by altering the style of images.
Additionally, Rosebrock teaches a machine learned model trained using soft labels (Rosebrock, pg. 2, regarding Label smoothing: “Turns “hard” class label assignments to “soft” label assignments”; see examples of soft labels for an image at the top of pg. 5). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the use of soft labels for training, taught by Rosebrock, with the method of Kim in view of ‘937 in order to reduce overfitting and improve the performance of the trained model (Rosebrock, pg. 17: “The ultimate goal of applying regularization when training our deep neural networks is to reduce overfitting and increase the ability of our model to generalize. Typically we achieve this goal by sacrificing training loss/accuracy during training time in hopes of a better generalizable model — that’s the exact behavior we’re seeing here”).
While ‘937 teaches the combining of content and style latent spaces to generate a style-altered image, Xiong teaches execution of a pre-trained reconstructor model (Xiong, 1st bullet point on pg. 126653: “We propose a novel image-to-image translation model by combining GAN and latent space learning, our model can generate both realistic and diverse images”) based on a latent space and a prototype image (Xiong, See Figure 4 from pg. 126655 below; section C on pg. 126655: “we encode the possible multiple outputs in the latent space and combine the latent code with the given image as the input of the generator”; see further Figure 5 on pg. 126657 where images are input to the model CEGAN and output images are generated). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image reconstructor model based on latent space and a prototype image, taught by Xiong, with the method of Kim in view of ‘937 and Rosebrock in order to generate images that are influenced by both the prototype image and latent space, resulting in realistic and diverse generated images (Xiong, Conclusion on pg. 126660: “generate both realistic and diversity images. This method captures the full distribution of potential multiple modes of results by enforcing tight connections between the latent space and the real image space”). In the combination of Kim in view of ‘937, Rosebrock, and Xiong, a variety of realistic images with diverse styles, that also closely resemble a selected prototype image, can be generated. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Xiong, in the same way to the method of Kim in view of ‘937 and Rosebrock and achieved predictable results of further diversifying a machine learning training dataset.
PNG
media_image1.png
148
349
media_image1.png
Greyscale
Lastly, Zheng teaches a deep learning method for generating augmented images (Zheng, removing rain from images, see abstract). Zheng teaches utilizing both a contrastive loss (Zheng, “Perceptual Contrastive Loss” section on pg. 56) and perceptual loss (Zheng, “Learned Perceptual Image Similarity Loss” section on pg. 56). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the contrastive and perceptual loss values of Zheng with the method of Kim in view of ‘937, Rosebrock, and Xiong in order to generate an image that still closely resembles the road sign prototype when changing the style (Zheng, third bullet point on the right column of pg. 53: “With the advantage of contrastive learning and perceptual similarity, the derained image is close to the groundtruth in terms of pixel-wise difference and fine details”).
Regarding claim 2 (dependent on claim 1), Kim in view of ‘937, Rosebrock, Xiong, and Zheng teaches further comprising:
training an image-recognition machine-learning model using the new image generated from the pre-trained reconstructor model to produce a trained image-recognition machine-learning model (‘937, para 82: “driving neural simulator 310 may be used for generating rich and diverse training datasets that can be used for training neural networks in autonomous driving vehicles”; para 151: “front-facing cameras may also be used for ADAS functions and systems including, without limitation, Lane Departure Warnings (“LDW”), Autonomous Cruise Control (“ACC”), and/or other functions such as traffic sign recognition”; para 237: “server(s) 1078 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation”).
Regarding claim 3 (dependent on claim 1), Kim in view of ‘937, Rosebrock, Xiong, and Zheng teaches further wherein the first latent space includes the style encodings and categorical content encodings (Kim, encodings are not separated when the real image input is encoded in both the training and test phase, see q(z|x) in Figure 2), and wherein the second latent space does not include the categorical content encodings (‘937, latent spaces are separated, so the spatially-independent latent space does not include the content spatially-dependent latent space, see para 78: “may separate a latent space of a given image into a spatially-independent latent space and a spatially-dependent latent space. A spatially-independent latent space may include theme information of the given image”).
Regarding claim 4 (dependent on claim 1), Kim in view of ‘937, Rosebrock, Xiong, and Zheng teaches further wherein the style encodings include data representing blurriness of the image, orientation of the image, brightness of the image, or deformation of the image (‘937, brightness of the image; para 74: “as a ‘theme’ (e.g., such as weather, day vs. night, etc.)”; para 78: “A spatially-independent latent space may include theme information of the given image, which includes information that does not depend on pixel locations such as a background color or weather of a scene”).
Regarding claim 5 (dependent on claim 1), Kim in view of ‘937, Rosebrock, Xiong, and Zheng teaches wherein: the image sensor is mounted on a vehicle (‘937, para 150: “one or more camera may be mounted in a mounting assembly,”; para 173: “image sensors” of vehicle 1000),
the image captured is of a road sign (Kim, Figure 2; third paragraph on pg. 9463: “traffic sign datasets”), and
the generated new image differs in style from the captured image (‘937, para 82: “variation of an original video may include a different weather in a scene of a simulation than a weather in an equivalent scene in original video”).
Regarding claim 6 (dependent on claim 1), Kim in view of ‘937, Rosebrock, Xiong, and Zheng teaches further pre-trained reconstructor model utilizes a contrastive loss and a perceptual loss when generating the new image (Taught in combination with Zheng, see claim 1).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of ‘937, Rosebrock, Xiong, Zheng, and Guo (CN Patent No. 112215181 A).
Regarding claim 7 (dependent on claim 1), Kim in view of ‘937, Rosebrock, Xiong, and Zheng fails to teach further comprising: selecting the image data for utilization with the machine-learning model based upon a relative prevalence of a corresponding image class in a machine-learning database.
However, Guo teaches selecting the image data for utilization with the machine-learning model based upon a relative prevalence of a corresponding image class in a machine-learning database (Guo, para 16: “the rare category is a category divided according to social attributes as follows: the number of data samples under the category is no more than 15%, preferably 13%, and more preferably 10% of the number of original data samples”; para 64: “data samples of rare categories may be input into a generative model for training to form additional data samples that only include the rare categories”).
Guo teaches a method and system for data augmentation intended for intelligent driving systems. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image selection method of Guo with the method of Kim in view of ‘937, Rosebrock, Xiong, and Zheng in order to improve the machine learning model reliability by increasing the number of training images in rare classes (Guo, para 15: “by targeted generation of rare categories among vulnerable traffic participants, the disparity in the number of rare categories and abundant categories in image classification can be reduced”; para 4: “The accuracy and richness of its information are closely related to the safety and reliability of the car”).
Claims 8-12 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of ‘937, in further view of Rosebrock and Xiong.
Regarding claim 8, Kim teaches a system of generating images for training a machine-learning model (Kim, abstract on pg. 9462: “one-shot classification with prototypical images as a single training example for each novel class”; see also combination below), the system comprising:
image data corresponding to the captured image (Kim, first paragraph on pg. 9465: “real image inputs”); and programmed to (processor taught by ‘937 below):
alter a portion of the image data corresponding to a style of the image to create altered image data (Kim, section “3.3. Data augmentation”, bottom of pg. 9465 to pg. 9466: “We apply random rotation and horizontal flipping to both the real images and prototypes”; the “style” altered is the orientation of the image), encode, via a machine-learning model (Kim, Variational Prototyping-Encoder, section “3.1. Variational Prototyping-Encoder”, pg. 9464), the altered image data into a first latent space (Kim, Figure 2 caption on pg. 9463: “the encoder encodes real domain input images to latent distribution q(z|x)”), decode the first latent space (Kim, see Figure 2 caption citation below) and retrieve, from a database, an associated prototype image that is an unaltered, pre-existing image stored in the database (Kim, Figure 2 on pg. 9463, prototype database contains stored images; these images are pre-existing prior to retrieval and unaltered after generation) and represents the altered image based on the decoded first latent space (Kim, Figure 2 caption on pg. 9463: “The decoder then reconstructs the encoded distribution back to a prototype that corresponds to the input image”; see how decoded prototypes are stored in the prototype database and retrieved in the Test Phase of Figure 2).
Kim fails to explicitly teach an image sensor and processor, and therefore fails to teach: an image sensor configured to capture an image and generate image data corresponding to the captured image; and a processor in communication with the image sensor and programmed to (programmed actions taught above):
extract style encodings from the first latent space to classify a style of the altered image data in a second latent space, wherein the extraction is performed via using a content and style decoupling model trained using soft labels that capture style attributes including at least one of lighting conditions, blurriness, illumination, deformation, or another transformation, and
generate a new image by executing a pre-trained reconstructor model based on the first latent space, the second latent space, and the prototype image.
However, ‘937 teaches an image sensor configured to capture an image and generate image data corresponding to the captured image (‘937, part of an autonomous vehicle, para 173: “image sensors”; para 149: “one or more camera(s) (e.g., all cameras) may record and provide image data (e.g., video) simultaneously”) and wherein a processor is in communication with the image sensor (‘937, para 173: “images sensors”; para 149: “one or more of camera(s) may be used to perform advanced driver assistance systems (“ADAS”) functions; see Fig. 10C, processor is in communication with cameras);
extract style encodings (‘937, para 74: spatially-independent, or “theme information”) from the first latent space (‘937, para 74: spatially-dependent, or “content information”) to classify a style of the altered image data in a second latent space (‘937, para 78: “an image encoder component of driving neural simulator 310 may separate a latent space of a given image into a spatially-independent latent space and a spatially-dependent latent space. A spatially-independent latent space may include theme information of given image, which includes information that does not depend on pixel locations such as a background color or weather of a scene”), wherein the extraction is performed via using a content and style decoupling model (‘937, image encoder that separates latent spaces, described in last citation; para 74: “pre-training a latent space to disentangle theme information from content information of an image”) trained using labels that capture style attributes including at least one of lighting conditions, blurriness, illumination, deformation, or another transformation (‘937, supervised training, para 71: “untrained neural network 206 is trained using supervised learning, wherein training dataset 202 includes an input paired with a desired output for an input”; lighting conditions – day vs. night, para 74: “identifying spatially-independent information referred to as a “theme” (e.g., such as weather, day vs. night, etc.)”; supervised training of the decoupling model involves content/theme labels to learn to disentangle the information in the images), and
generate a new image (‘937, para 82: “may be used for generating multiple variations of a real-world driving video sequence”) by executing a pre-trained reconstructor model (‘937, para 80: “a Generative Adversarial Networks (GAN) is used together with the encoder portion of the VAE”; para 113: “a neural network trained to…image reconstruction”) based on the first latent space and the second latent space (‘937, para 82: “variation of an original video may include a different weather in a scene of a simulation than a weather in an equivalent scene in original video”; see Figure 4A).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image sensor, processor, decoupled style encodings, and reconstructor model of ‘937 with the system of Kim in order to capture real-world driving images used to generate training images for a machine learning model in an autonomous vehicle (‘937, para 149: “one or more of camera(s) may be used to perform advanced driver assistance systems (“ADAS”) functions”; para 236: “server(s) 1078 may receive, over network(s) 1090 and from vehicles, image data representative of images showing unexpected or changed road conditions”; para 237: “server(s) 1078 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation”; regarding style encodings and reconstructor model- para 82: “driving neural simulator 310 may be used for generating rich and diverse training datasets that can be used for training neural networks in autonomous driving vehicles”) and execute the programmed instructions of an autonomous vehicle machine learning model (‘937, servers, described above, include CPUs, see Fig. 10D). Kim’s publication utilizes the latent space relationships between images and prototypes to classify real data (Kim, last paragraph of pg. 1 to first paragraph of pg. 2) and references the importance of diversifying training and prototype datasets (Kim, section 3.3 on pg. 4-5 and second paragraph in section 1 on pg. 1: “Thereby, the absence of a large number of training examples for a class often raises an issue when training a large capacity learner”). The combination of Kim and the reconstructor model, as taught by ‘937, allows for the generation of additional road sign images, with further diversification by altering the style of images.
Additionally, Rosebrock teaches a machine learned model trained using soft labels (Rosebrock, pg. 2, regarding Label smoothing: “Turns “hard” class label assignments to “soft” label assignments”; see examples of soft labels for an image at the top of pg. 5). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the use of soft labels for training, taught by Rosebrock, with the system of Kim in view of ‘937 in order to reduce overfitting and improve the performance of the trained model (Rosebrock, pg. 17: “The ultimate goal of applying regularization when training our deep neural networks is to reduce overfitting and increase the ability of our model to generalize. Typically we achieve this goal by sacrificing training loss/accuracy during training time in hopes of a better generalizable model — that’s the exact behavior we’re seeing here”).
While ‘937 teaches the combining of content and style latent spaces to generate a style-altered image, Xiong teaches execution of a pre-trained reconstructor model (Xiong, 1st bullet point on pg. 126653: “We propose a novel image-to-image translation model by combining GAN and latent space learning, our model can generate both realistic and diverse images”) based on a latent space and a prototype image (Xiong, See Figure 4 from pg. 126655 in claim 1; section C on pg. 126655: “we encode the possible multiple outputs in the latent space and combine the latent code with the given image as the input of the generator”; see further Figure 5 on pg. 126657 where images are input to the model CEGAN and output images are generated). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image reconstructor model based on latent space and a prototype image, taught by Xiong, with the system of Kim in view of ‘937 and Rosebrock in order to generate images that are influenced by both the prototype image and latent space, resulting in realistic and diverse generated images (Xiong, Conclusion on pg. 126660: “generate both realistic and diversity images. This method captures the full distribution of potential multiple modes of results by enforcing tight connections between the latent space and the real image space”). In the combination of Kim in view of ‘937, Rosebrock, and Xiong, a variety of realistic images with diverse styles, that closely resemble the selected prototype image, can be generated. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Xiong, in the same way to the system of Kim in view of ‘937 and Rosebrock and achieved predictable results of further diversifying a machine learning training dataset.
Regarding claim 9 (dependent on claim 8), Kim in view of ‘937, Rosebrock, and Xiong teaches wherein the processor is further programmed to:
train an image-recognition machine-learning model using the new image generated from the pre-trained reconstructor model to produce a trained image-recognition machine-learning model (‘937, para 82: “driving neural simulator 310 may be used for generating rich and diverse training datasets that can be used for training neural networks in autonomous driving vehicles”; para 151: “front-facing cameras may also be used for ADAS functions and systems including, without limitation, Lane Departure Warnings (“LDW”), Autonomous Cruise Control (“ACC”), and/or other functions such as traffic sign recognition”; para 237: “server(s) 1078 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation”).
Regarding claim 10 (dependent on claim 8), Kim in view of ‘937, Rosebrock, and Xiong teaches further wherein the first latent space includes the style encodings and categorical content encodings (Kim, encodings are not separated when the real image input is encoded in both the training and test phase, see q(z|x) in Figure 2), and wherein the second latent space does not include the categorical content encodings (‘937, latent spaces are separated, so the spatially-independent latent space does not include the content spatially-dependent latent space, see para 78: “may separate a latent space of a given image into a spatially-independent latent space and a spatially-dependent latent space. A spatially-independent latent space may include theme information of the given image”).
Regarding claim 11 (dependent on claim 8), Kim in view of ‘937, Rosebrock, and Xiong teaches further wherein the style encodings include data representing blurriness of the image, orientation of the image, brightness of the image, or deformation of the image (‘937, brightness of the image; para 74: “as a ‘theme’ (e.g., such as weather, day vs. night, etc.)”; para 78: “information that does not depend on pixel locations such as a background color or weather of a scene”).
Regarding claim 12 (dependent on claim 8), Kim in view of ‘937, Rosebrock, and Xiong teaches wherein:
the image sensor is mounted on a vehicle (‘937, para 150: “one or more camera may be mounted in a mounting assembly,”; para 173: “image sensors” of vehicle 1000),
the image captured is of a road sign (Kim, Figure 2; third paragraph on pg. 9463: “traffic sign datasets”), and
the generated new image differs in style from the captured image (‘937, para 82: “variation of an original video may include a different weather in a scene of a simulation than a weather in an equivalent scene in original video”).
Regarding claim 15, Kim teaches a method of training a machine-learning model with newly generated images to yield a trained machine-learning model (Kim, abstract on pg. 9462: “one-shot classification with prototypical images as a single training example for each novel class”; see also combination below), the method comprising:
receiving image data corresponding to an image (Kim, first paragraph on pg. 9465: “real image inputs”);
altering a portion of the image data corresponding to a style of the image, wherein the altering produces altered image data (Kim, section “3.3. Data augmentation”, bottom of pg. 9465 to pg. 9466: “We apply random rotation and horizontal flipping to both the real images and prototypes”; the “style” altered is the orientation of the image);
encoding the altered image data into a first latent space (Kim, Figure 2 caption on pg. 9463: “the encoder encodes real domain input images to latent distribution q(z|x)”);
selecting, from a database, a prototype image that is an unaltered, pre-existing image stored in the database (Kim, Figure 2 on pg. 9463, prototype database contains stored images; these images are pre-existing prior to retrieval and unaltered after generation) and corresponds to the altered image based on a decoding of the first latent space (Kim, Figure 2 caption on pg. 9463: “The decoder then reconstructs the encoded distribution back to a prototype that corresponds to the input image”; see how decoded prototypes are stored in the prototype database and retrieved in the Test Phase of Figure 2);
Kim fails to teach wherein the image is captured by an image sensor;
extracting style encodings from the first latent space to classify a style of the altered image data in a second latent space, wherein the extracting uses a content and style decoupling model trained using soft labels that capture style attributes including at least one of lighting conditions, blurriness, illumination, deformation or another transformation;
generating a new image by executing a pre-trained reconstructor model based on the first latent space, the second latent space, and the prototype image; and
training an image-recognition machine-learning model using the new image generated from the pre-trained reconstructor model to yield a trained image-recognition machine-learning model.
However, ‘937 teaches an image sensor (‘937, part of an autonomous vehicle, para 173: “image sensors”);
extracting style encodings (‘937, para 74: spatially-independent, or “theme information”) from the first latent space (‘937, para 74: spatially-dependent, or “content information”) to classify a style of the altered image data in a second latent space (‘937, para 78: “an image encoder component of driving neural simulator 310 may separate a latent space of a given image into a spatially-independent latent space and a spatially-dependent latent space. A spatially-independent latent space may include theme information of given image, which includes information that does not depend on pixel locations such as a background color or weather of a scene”), wherein the extracting uses a content and style decoupling model (‘937, image encoder that separates latent spaces, described in last citation; para 74: “pre-training a latent space to disentangle theme information from content information of an image”) trained using labels that capture style attributes including at least one of lighting conditions, blurriness, illumination, deformation, or another transformation (‘937, supervised training, para 71: “untrained neural network 206 is trained using supervised learning, wherein training dataset 202 includes an input paired with a desired output for an input”; lighting conditions – day vs. night, para 74: “identifying spatially-independent information referred to as a “theme” (e.g., such as weather, day vs. night, etc.)”; supervised training of the decoupling model involves content/theme labels to learn to disentangle the information in the images);
generating a new image (‘937, para 82: “may be used for generating multiple variations of a real-world driving video sequence”) by executing a pre-trained reconstructor model (‘937, para 80: “a Generative Adversarial Networks (GAN) is used together with the encoder portion of the VAE”; para 113: “a neural network trained to…image reconstruction”) based on the first latent space and the second latent space (‘937, para 82: “variation of an original video may include a different weather in a scene of a simulation than a weather in an equivalent scene in original video”; See also Figure 4A); and
training an image-recognition machine-learning model using the new image generated from the pre-trained reconstructor model to yield a trained image-recognition machine-learning model (‘937, para 82: “driving neural simulator 310 may be used for generating rich and diverse training datasets that can be used for training neural networks in autonomous driving vehicles”; para 151: “front-facing cameras may also be used for ADAS functions and systems including, without limitation, Lane Departure Warnings (“LDW”), Autonomous Cruise Control (“ACC”), and/or other functions such as traffic sign recognition”; para 237: “server(s) 1078 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation”).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image sensor, style encodings, and reconstructor model/training method of ‘937 with the method of Kim in order to capture real-world driving images used to generate training images for a machine learning model in an autonomous vehicle (‘937, para 149: “one or more of camera(s) may be used to perform advanced driver assistance systems (“ADAS”) functions”; para 236: “server(s) 1078 may receive, over network(s) 1090 and from vehicles, image data representative of images showing unexpected or changed road conditions”; para 237: “server(s) 1078 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation”; regarding style encodings and reconstructor model- para 82: “driving neural simulator 310 may be used for generating rich and diverse training datasets that can be used for training neural networks in autonomous driving vehicles”). Kim’s publication utilizes the latent space relationships between images and prototypes to classify real data (Kim, last paragraph of pg. 1 to first paragraph of pg. 2) and references the importance of diversifying training and prototype datasets (Kim, section 3.3 on pg. 4-5 and second paragraph in section 1 on pg. 1: “Thereby, the absence of a large number of training examples for a class often raises an issue when training a large capacity learner”). The combination of Kim and the reconstructor model, as taught by ‘937, allows for the generation of additional road sign images, with further diversification by altering the style of images.
Additionally, Rosebrock teaches a machine learned model trained using soft labels (Rosebrock, pg. 2, regarding Label smoothing: “Turns “hard” class label assignments to “soft” label assignments”; see examples of soft labels for an image at the top of pg. 5). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the use of soft labels for training, taught by Rosebrock, with the method of Kim in view of ‘937 in order to reduce overfitting and improve the performance of the trained model (Rosebrock, pg. 17: “The ultimate goal of applying regularization when training our deep neural networks is to reduce overfitting and increase the ability of our model to generalize. Typically we achieve this goal by sacrificing training loss/accuracy during training time in hopes of a better generalizable model — that’s the exact behavior we’re seeing here”).
While ‘937 teaches the combining of content and style latent spaces to generate a style-altered image, Xiong teaches execution of a pre-trained reconstructor model (Xiong, 1st bullet point on pg. 126653: “We propose a novel image-to-image translation model by combining GAN and latent space learning, our model can generate both realistic and diverse images”) based on a latent space and a prototype image (Xiong, See Figure 4 from pg. 126655 below; section C on pg. 126655: “we encode the possible multiple outputs in the latent space and combine the latent code with the given image as the input of the generator”; see further Figure 5 on pg. 126657 where images are input to the model CEGAN and output images are generated). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image reconstructor model based on latent space and a prototype image, taught by Xiong, with the method of Kim in view of ‘937 and Rosebrock in order to generate images that are influenced by both the prototype image and latent space, resulting in realistic and diverse generated images (Xiong, Conclusion on pg. 126660: “generate both realistic and diversity images. This method captures the full distribution of potential multiple modes of results by enforcing tight connections between the latent space and the real image space”). In the combination of Kim in view of ‘937, Rosebrock, and Xiong, a variety of realistic images with diverse styles, that further closely resemble the selected prototype image, can be generated. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Xiong, in the same way to the method of Kim in view of ‘937 and Rosebrock and achieved predictable results of further diversifying a machine learning training dataset.
Regarding claim 16 (dependent on claim 15), Kim in view of ‘937, Rosebrock, and Xiong teaches further wherein the first latent space includes the style encodings and categorical content encodings (Kim, encodings are not separated when the real image input is encoded in both the training and test phase, see q(z|x) in Figure 2), and wherein the second latent space does not include the categorical content encodings (‘937, latent spaces are separated, so the spatially-independent latent space does not include the content spatially-dependent latent space, see para 78: “may separate a latent space of a given image into a spatially-independent latent space and a spatially-dependent latent space. A spatially-independent latent space may include theme information of the given image”).
Regarding claim 17 (dependent on claim 15), Kim in view of ‘937, Rosebrock, and Xiong teaches further wherein the style encodings include data representing blurriness of the image, orientation of the image, brightness of the image, or deformation of the image (‘937, brightness of the image; para 74: “as a ‘theme’ (e.g., such as weather, day vs. night, etc.)”; para 78: “information that does not depend on pixel locations such as a background color or weather of a scene”).
Regarding claim 18 (dependent on claim 15), Kim in view of ‘937, Rosebrock, and Xiong teaches wherein:
the image sensor is mounted on a vehicle (‘937, para 150: “one or more camera may be mounted in a mounting assembly,”; para 173: “image sensors” of vehicle 1000),
the image captured is of a road sign (Kim, Figure 2; third paragraph on pg. 9463: “traffic sign datasets”), and
the generated new image differs in style from the captured image of the road sign (‘937, para 82: “variation of an original video may include a different weather in a scene of a simulation than a weather in an equivalent scene in original video”).
Claims 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of ‘937, in further view of Rosebrock, Xiong, and Zheng.
Regarding claim 13 (dependent on claim 8), Kim in view of ‘937, Rosebrock, and Xiong fails to explicitly teach wherein the pre-trained reconstructor model utilizes a contrastive loss and a perceptual loss when generating the new image (‘937 teaches a perceptural loss, para 83: “for reconstruction term, a perceptual distance between input image 410 and output image 445 is reduced”).
However, Zheng teaches a deep learning method for generating augmented images (Zheng, removing rain from images, see abstract). Zheng teaches utilizing both a contrastive loss (Zheng, “Perceptual Contrastive Loss” section on pg. 56) and perceptual loss (Zheng, “Learned Perceptual Image Similarity Loss” section on pg. 56). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the contrastive and perceptual loss values of Zheng with the system of Kim in view of ‘937, Rosebrock, and Xiong in order to generate an image that still closely resembles the road sign prototype when changing the style (Zheng, third bullet point on the right column of pg. 53: “With the advantage of contrastive learning and perceptual similarity, the derained image is close to the groundtruth in terms of pixel-wise difference and fine details”).
Regarding claim 19 (dependent on claim 15), Kim in view of ‘937, Rosebrock, and Xiong fails to explicitly teach wherein the pre-trained reconstructor model utilizes a contrastive loss and a perceptual loss when generating the new image (‘937 teaches a perceptural loss, para 83: “for reconstruction term, a perceptual distance between input image 410 and output image 445 is reduced”).
However, Zheng teaches a deep learning method for generating augmented images (Zheng, removing rain from images, see abstract). Zheng teaches utilizing both a contrastive loss (Zheng, “Perceptual Contrastive Loss” section on pg. 56) and perceptual loss (Zheng, “Learned Perceptual Image Similarity Loss” section on pg. 56). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the contrastive and perceptual loss values of Zheng with the method of Kim in view of ‘937, Rosebrock, and Xiong in order to generate an image that still closely resembles the road sign prototype when changing the style (Zheng, third bullet point on the right column of pg. 53: “With the advantage of contrastive learning and perceptual similarity, the derained image is close to the groundtruth in terms of pixel-wise difference and fine details”).
Claims 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of ‘937, Rosebrock, Xiong, and Guo.
Regarding claim 14 (dependent on claim 8), Kim in view of ‘937, Rosebrock, and Xiong fails to teach wherein the processor is further programmed to: select the image data for utilization with the machine-learning model based upon a relative prevalence of a corresponding image class in a machine-learning database.
However, Guo teaches select the image data for utilization with the machine-learning model based upon a relative prevalence of a corresponding image class in a machine-learning database (Guo, para 16: “the rare category is a category divided according to social attributes as follows: the number of data samples under the category is no more than 15%, preferably 13%, and more preferably 10% of the number of original data samples”; para 64: “data samples of rare categories may be input into a generative model for training to form additional data samples that only include the rare categories”).
Guo teaches a method and system for data augmentation intended for intelligent driving systems. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image selection method of Guo with the system of Kim in view of ‘937, Rosebrock, and Xiong in order to improve the machine learning model reliability by increasing the number of training images in rare classes (Guo, para 15: “by targeted generation of rare categories among vulnerable traffic participants, the disparity in the number of rare categories and abundant categories in image classification can be reduced”; para 4: “The accuracy and richness of its information are closely related to the safety and reliability of the car”).
Regarding claim 20 (dependent on claim 15), Kim in view of ‘937, Rosebrock, and Xiong fails to teach further comprising: selecting the image data for utilization with the machine-learning model based upon a relative prevalence of a corresponding image class in a machine-learning database.
However, Guo teaches selecting the image data for utilization with the machine-learning model based upon a relative prevalence of a corresponding image class in a machine-learning database (Guo, para 16: “the rare category is a category divided according to social attributes as follows: the number of data samples under the category is no more than 15%, preferably 13%, and more preferably 10% of the number of original data samples”; para 64: “data samples of rare categories may be input into a generative model for training to form additional data samples that only include the rare categories”).
Guo teaches a method and system for data augmentation intended for intelligent driving systems. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the image selection method of Guo with the method of Kim in view of ‘937, Rosebrock, and Xiong in order to improve the machine learning model reliability by increasing the number of training images in rare classes (Guo, para 15: “by targeted generation of rare categories among vulnerable traffic participants, the disparity in the number of rare categories and abundant categories in image classification can be reduced”; para 4: “The accuracy and richness of its information are closely related to the safety and reliability of the car”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Oliveira Pinheiro (U.S. Patent No. 2019/0325299 A1) teaches retrieving a similar prototype image from a database (para 18: “receiving said input image and extracting features of said input image; a database of pre-extracted features of prototype representations, each prototype representation being representative of a category of images; feature comparison block for receiving said features of said input image from said first convolutional neural network and for receiving said features of said prototype representations from said database, said feature comparison block also being for comparing features of said input image and features of said prototype representations to determine which of said prototype representations is most similar to said input image”).
Tan et al. (U.S. Patent No. 2022/0156910 A1) teaches a similar method:
PNG
media_image2.png
449
402
media_image2.png
Greyscale
Park et al. (U.S. Patent No. 2021/0358177 A1) teaches a style transfer method (abstract) using a contrastive loss (para 70) and a perceptual loss (para 63).
Luan et al. (Luan, F., Paris, S., Shechtman, E., & Bala, K. (2017). Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4990-4998).) teaches a style transfer method.
Kafri et al. (Kafri, O., Patashnik, O., Alaluf, Y., & Cohen-Or, D. (2021). Stylefusion: A generative model for disentangling spatial segments. arXiv preprint arXiv:2107.07437.) teaches generating augmented images using style latent spaces (Hair, eyes, etc. input style codes in FIG 2) and a prototype image (Global input image whose latent code controls the spatial layout of the generated image, see section 4.5 on pg. 8).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMMA E DRYDEN whose telephone number is (571)272-1179. The examiner can normally be reached M-F 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW BEE can be reached at (571) 270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EMMA E DRYDEN/ /ANDREW W BEE/Examiner, Art Unit 2677 Supervisory Patent Examiner, Art Unit 2677