Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10/21/2025 has been entered.
Response to Remarks
Claim Rejections – 35 U.S.C. 103
Applicant’s prior art arguments have been fully considered but they are not persuasive.
Applicant argues (pgs. 15-18) that Fu does not teach differential data nor correlation and that the cited references do not teach the newly amended limitations that further clarify "operation as a difference calculation unit configured to calculate, with respect to each of the learning data sets, a difference between the acquired background data and the training data, by calculating pixel-wise correlations between an object region of the training data and a corresponding region of the acquired background data, to generate differential data that indicates the difference between the background data and the training data;" and "operation as a first training unit configured to execute machine learning of an estimator, the execution of the machine learning of the estimator comprising training the estimator using the differential data and the correct answer data so that, with respect to each of the learning data sets, a result of estimating the feature by the estimator based on the generated differential data conforms to the correct answer data, wherein the estimator is trained using the differential data as input features, different from the training data, in combination with the correct answer data"
Examiner agrees. Accordingly, a new reference, Arnab et al. (“Pixelwise Instance Segmentation with a Dynamically Instantiated Network”) has been added to the rejection, as further detailed below.
As the rejection added another reference, the motivation to combine has also changed (see rejection below), which addresses Applicant’s argument of “No Motivation or Rationale to Combine” (pgs. 13-14)
The foregoing applies to all independent claims and their dependent claims.
Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (US 20190295302 A1) hereinafter known as Fu in view of Gupta et al. (“Automatic Trimap Generation for Image Matting”) hereinafter known as Gupta in view of Arnab et al. (“Pixelwise Instance Segmentation with a Dynamically Instantiated Network”) hereinafter known as Arnab.
Regarding independent claim 1, Fu teaches:
…
operation as a background acquisition unit configured to acquire, with respect to each of the learning data sets, background data that indicates a background of the training data; (Fu [¶ 0092]: “Based on extracted 68-point landmarks, semantic facial segmentations consisting of eyes, nose, mouth, skin, and background regions were generated.” Fu teaches that the segmentation that is acquired consists of background regions. This background region is the background of the training data images.)
…
…
Fu does not explicitly teach:
A model generation apparatus comprising a processor configured with a program to perform operations comprising: operation as a first data acquisition unit configured to acquire a plurality of learning data sets each constituted by a combination of training data that comprises image data, and correct answer data that indicates a feature comprised in the training data;
However, Gupta teaches:
A model generation apparatus comprising a processor configured with a program to perform operations comprising: operation as a first data acquisition unit configured to acquire a plurality of learning data sets each constituted by a combination of training data that comprises image data, and correct answer data that indicates a feature comprised in the training data; (Gupta [Page 9, Paragraph 5]: “We implemented this framework in MATLAB on a PC with Intel i5-4460s 2.9 GHz processor and 12 GB RAM.” Gupta teaches a processor that can run the program for the acquisition of image data. Gupta [Page 6, Paragraph 4]: “Consider an input image I” Gupta [Page 8, Paragraph 1]: “This difference map is then added to the eroded saliency map SMe, which results into a trimap (TM)” Gupta teaches that following the input image, the computations result in a trimap, which is a map that designates that foreground from the background. Gupta [Page 9, Paragraph 1]: “We compare the trimaps generated by the proposed framework with the manually created trimaps.” Gupta teaches that the framework created trimaps are compared with the manually created trimaps, showing that the manually created trimaps are the correct answer data.)
Fu and Gupta are in the same field of endeavor as the present invention, as the
references are directed to the analysis of features in images that are separated by foreground and background. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine gathering background data of images as training data as taught in Fu with using training data that specifically comprises images with the correct answer as taught in Gupta. Gupta provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Fu to include teachings of Gupta because the combination would allow for the accuracy of the estimation of the features to be evaluated as it can be compared with the correct answer image data. This has the potential benefit of producing a precise estimation because the estimator can be trained to tend toward the correct answer since it is available in the training data.
Fu and Gupta do not explicitly teach:
operation as a difference calculation unit configured to calculate, with respect to each of the learning data sets, a difference between the acquired background data and the training data by calculating pixel-wise correlations between an object region of the training data and a corresponding region of the acquired background data, to generate differential data that indicates the difference between the background data and the training data;
and operation as a first training unit configured to execute machine learning of an estimator, the execution of the machine learning of the estimator comprising training the estimator using the differential data and the correct answer data so that, with respect to each of the learning data sets, a result of estimating the feature by the estimator based on the generated differential data conforms to the correct answer data, wherein the estimator is trained using the differential data as input features, different from the training data, in combination with the correct answer data.
However, Arnab teaches:
operation as a difference calculation unit configured to calculate, with respect to each of the learning data sets, a difference between the acquired background data and the training data by calculating pixel-wise correlations between an object region of the training data and a corresponding region of the acquired background data, to generate differential data that indicates the difference between the background data and the training data; (Arnab [Page 444, Column 2, Paragraph 4]: “a prior on the expected shape of an object category can help us to identify the foreground instance within a bounding box” Arnab teaches that separate from the training data, an idea of the shape of the object (and thus in the foreground) can help identify the object from the background. Arnab [Page 445, Column 1, Paragraph 1]: “select the shape prior which matches the segmentation prediction for the detected class within the bounding box, Q_B_k (l_k), the best according to the normalized cross correlation” Arnab teaches that the cross correlation between the object and the surrounding area is used, which is a pixel-wise correlation as it is a pixel-by-pixel operation.)
and operation as a first training unit configured to execute machine learning of an estimator, the execution of the machine learning of the estimator comprising training the estimator using the differential data and the correct answer data so that, with respect to each of the learning data sets, a result of estimating the feature by the estimator based on the generated differential data conforms to the correct answer data, wherein the estimator is trained using the differential data as input features, different from the training data, in combination with the correct answer data. (Arnab [Page 445, Column 1, Paragraph 4]: “The pairwise term consists of densely-connected Gaussian potentials and encourages appearance and spatial consistency. The weights governing the importance of these terms are also learnt via backpropagation. We find that these priors are useful in the case of instance segmentation as well, since nearby pixels that have similar appearance often belong to the same object instance. They are often able to resolve occlusions based on appearance differences between objects of the same class” Arnab teaches that in addition to the training data, the priors as described above is used to train the model to estimate where the segmentation is. Arnab [Page 445, Figure 6]: Arnab depicts the ground truth, prediction, and matched truth in this figure.)
Arnab is in the same field as the present invention, since it is directed to estimating where the objects of interest are in in an image, using machine learning models. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine precise estimation of the features to be evaluated as it can be compared with the correct answer image data as taught in Fu as modified by Gupta with using pixel-wise correlations to improve the model as taught in Arnab. Arnab provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Fu as modified by Gupta to include teachings of Arnab because the combination would allow for the difference in the object and background of the image to be quantified in pixel-wise correlation. This has the potential benefit of improving the accuracy of the model in training and testing the prediction vs the ground truth.
Regarding dependent claim 2, Fu and Gupta teach:
The model generation apparatus according to claim 1,
Fu teaches:
wherein the processor configured with the program to perform operations such that operation as the background acquisition unit comprises generating the background data for the training data with respect to each of the learning data sets, using a machine learned generator. (Fu [¶ 0122]: “segmentor network, according to an embodiment, attempts to determine semantic segmentations on both real images and fake images to deliver estimated segmentations to guide the generator in synthesizing spatially constrained images” Fu teaches a segmentor network, which generates a segmentation of image inputs, which includes data that separates the foreground from the background. This network is learned using unsupervised learning and can provide a segmentation for the training data set of images.)
The reasons to combine are substantially similar to those of claim 1.
Regarding dependent claim 3, Fu and Gupta teach:
The model generation apparatus according to claim 2,
Fu teaches:
wherein the processor configured with the program to perform operations further comprising: operation as a second data acquisition unit configured to acquire learning background data; (Fu [¶ 0123]: “to avoid foreground-background mismatch, the generator network is configured to first, extract spatial information from an input segmentation” Fu teaches a generator network that receives and extracts information from the foreground-background segmentation.)
and operation as a second training unit configured to execute machine learning using the acquired learning background data, and construct the machine learned generator trained to generate the background data for the training data. (Fu [¶ 0123]: “In another embodiment, to avoid foreground-background mismatch, the generator network is configured to first, extract spatial information from an input segmentation, second, concatenate that latent vector to provide variations, and third, use attribute labels to synthesize attribute-specific contents in the generated image.” Fu teaches that the generator network uses the segmentation, which is the background data, to assign attributes to the contents of the image. This is a machine learning task that generates the background data for the training data.)
The reasons to combine are substantially similar to those of claim 1.
Regarding dependent claim 4, Fu and Gupta teach:
The model generation apparatus according to claim 1,
Arnab teaches:
wherein the processor configured with the program to perform operations such that operation as the difference calculation unit comprises generating the differential data by obtaining, based on the pixel-wise correlations between the object region comprising pixels of the training data and pixels surrounding these pixels, and the corresponding region comprising corresponding pixels of the background data and pixels surrounding these pixels, a difference between each of the pixels of the training data and a corresponding pixel of the background data. (Arnab [Page 444, Column 2, Paragraph 4]: “a prior on the expected shape of an object category can help us to identify the foreground instance within a bounding box” Arnab teaches that separate from the training data, an idea of the shape of the object (and thus in the foreground) can help identify the object from the background. Arnab [Page 445, Column 1, Paragraph 1]: “select the shape prior which matches the segmentation prediction for the detected class within the bounding box, Q_B_k (l_k), the best according to the normalized cross correlation” Arnab teaches that the cross correlation between the object and the surrounding area is used, which is a pixel-wise correlation as it is a pixel-by-pixel operation.)
The reasons to combine are substantially similar to those of claim 1.
Regarding dependent claim 5, Fu and Gupta teach:
The model generation apparatus according to claim1,
Fu teaches:
wherein the feature relates to a foreground of the training data. (Fu [¶ 0092]: “Based on extracted 68-point landmarks, semantic facial segmentations consisting of eyes, nose, mouth, skin, and background regions were generated.” Fu teaches features relating to the foreground of the training data, such as the eyes, nose, mouth of the image.)
The reasons to combine are substantially similar to those of claim 1.
Regarding dependent claim 6, Fu and Gupta teach:
The model generation apparatus according to claim1,
Fu teaches:
wherein the training data comprises image data comprising an image of an object, and the feature comprises an attribute of the object. (Fu [¶ 0040]: “given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown in the columns 103 (black hair and young), 104 (brown hair and old), and 105 (blonde hair and female)” Fu teaches that the input image data has features such as the color of the hair and the perceived age of the person in the image.)
The reasons to combine are substantially similar to those of claim 1.
Regarding independent claim 8, Fu teaches:
…
operation as a background acquisition unit configured to acquire object background data that corresponds to the object image data; (Fu [¶ 0092]: “Based on extracted 68-point landmarks, semantic facial segmentations consisting of eyes, nose, mouth, skin, and background regions were generated.” Fu teaches that the segmentation that is acquired consists of background regions. This background region is the background of the training data images.)
operation as a difference calculation unit configured to calculate a difference between the object image data and the object background data to generate object differential data; (Fu [¶ 0045]: “Then, based upon a segmentation loss, i.e., the difference between a segmentation determined from the generated image 274 and the target segmentation 271, the segmentor 260 is adjusted” Fu teaches that the difference between the target segmentation, which is the acquired background data, and the generated image, as training data, is calculated. This difference is denoted as the segmentation loss.)
operation as an estimation unit configured to estimate a feature comprised in the generated object differential data, using a machine learned estimator generated by the model generation apparatus according to claim1; (Fu [¶ 0045]: “the segmentor 260 is adjusted, e.g., weights in a neural network implementing the segmentor 260 are modified so the segmentor 260 produces segmentations that are closer to the target segmentation 271. The generator 240 is likewise adjusted based upon the segmentation loss to generate images that are closer to the target segmentation 271. In this way, the segmentor 260 and generator 220 are trained collaboratively.” Fu teaches that the difference between the segmentor and the target segmentation is what drives the training. This shows that the segmentor tends toward estimating the segmentation that conforms to the target, or correct answer.)
and operation as an output unit configured to output information relating to a result of estimating the feature. (Fu [¶ 0131]: “the segmentor 1380 takes either a fake or real image as input and outputs a segmentation result which is compared to the ground-truth segmentation to calculate segmentation loss” Fu teaches that the segmentor outputs the segmentation result, which is the segmentation of the features in the image. This result is used to calculate the segmentation loss.)
Gupta teaches:
An estimation apparatus comprising a processor configured with a program to perform operations comprising: operation as a data acquisition unit configured to acquire object image data; (Gupta [Page 9, Paragraph 5]: “We implemented this framework in MATLAB on a PC with Intel i5-4460s 2.9 GHz processor and 12 GB RAM.” Gupta teaches a processor that can run the program for the acquisition of image data. Gupta [Page 6, Paragraph 4]: “Consider an input image I” Gupta [Page 8, Paragraph 1]: “This difference map is then added to the eroded saliency map SMe, which results into a trimap (TM)” Gupta teaches that following the input image, the computations result in a trimap, which is a map that designates that foreground from the background. Gupta [Page 9, Paragraph 1]: “We compare the trimaps generated by the proposed framework with the manually created trimaps.” Gupta teaches that the framework created trimaps are compared with the manually created trimaps, showing that the manually created trimaps are the correct answer data.)
The reasons to combine are substantially similar to those of claim 1.
Claim 9 is rejected on the same grounds under 35 U.S.C. 101 as claim 1, as they are
substantially similar. Mutatis mutandis.
Claim 10 is substantially similar to claim 1, but has the following additional elements:
Regarding independent claim 10, Fu and Gupta teach:
A computer-readable medium, storing model generation program, which when read and executed, for causing a computer to perform operations comprising: (Fu [¶ 0191]: “If implemented in software, the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof” Fu teaches a computer readable medium that stores the instructions for the model generation.)
The reasons to combine are substantially similar to those of claim 1.
Claim 11 is rejected on the same grounds under 35 U.S.C. 103 as claim 4, as they are
substantially similar. Mutatis mutandis.
Claim 12 is rejected on the same grounds under 35 U.S.C. 103 as claim 4, as they are
substantially similar. Mutatis mutandis.
Claim 13 is rejected on the same grounds under 35 U.S.C. 103 as claim 5, as they are
substantially similar. Mutatis mutandis.
Claim 14 is rejected on the same grounds under 35 U.S.C. 103 as claim 5, as they are
substantially similar. Mutatis mutandis.
Claim 15 is rejected on the same grounds under 35 U.S.C. 103 as claim 5, as they are
substantially similar. Mutatis mutandis.
Claim 16 is rejected on the same grounds under 35 U.S.C. 103 as claim 6, as they are
substantially similar. Mutatis mutandis.
Claim 17 is rejected on the same grounds under 35 U.S.C. 103 as claim 6, as they are
substantially similar. Mutatis mutandis.
Claim 18 is rejected on the same grounds under 35 U.S.C. 103 as claim 6, as they are
substantially similar. Mutatis mutandis.
Claim 19 is rejected on the same grounds under 35 U.S.C. 103 as claim 8, as they are
substantially similar. Mutatis mutandis.
Claim 20 is rejected on the same grounds under 35 U.S.C. 103 as claim 8, as they are
substantially similar. Mutatis mutandis.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Fu in view of Gupta in view of Arnab in view of Roberts et al (“Deep Learning for Semantic Segmentation of Defects in Advanced STEM Images of Steels”) hereinafter known as Roberts.
Regarding dependent claim 7, Fu and Gupta teach:
The model generation apparatus according to claim 6,
Fu and Gupta do not explicitly teach:
wherein the object comprises a product, and the attribute of the object relates to a defect of the product.
However, Roberts teaches:
wherein the object comprises a product, and the attribute of the object relates to a defect of the product. (Roberts [Page 4, Paragraph 3]: “we explored several hybrid deep networks for pixel-wise semantic segmentation of the three defect features” Roberts [Page 6, Table 1]: Roberts teaches that the pixel-wise image segmentation of features are based on defects in a product, specifically in steels.)
Roberts is in the same field as the present invention, since it is directed to analysis of features in an image that is a representation of a product and a defect in the product. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine estimating features using the difference between background image data with training data containing correct answer data as taught in Fu as modified by Gupta as modified by Arnab with specifying that the feature is particularly a defect of a product as taught in Roberts. Roberts provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Fu as modified by Gupta as modified by Arnab to include teachings of Roberts because the combination would allow for estimations to be made on possible defects in products using images of the product as input. This has the potential benefit of improving the quality control of the production of the product, as defects can be more easily identified and accessed. This may allow for defective products to be either discarded or improved to no longer have any defects.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYU HYUNG HAN whose telephone number is (703) 756-5529. The examiner can normally be reached on MF 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Kyu Hyung Han/
Examiner
Art Unit 2123
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123