Prosecution Insights
Last updated: April 19, 2026
Application No. 18/300,202

SYSTEMS AND METHODS FOR IMPROVED TRAINING OF MACHINE LEARNING MODELS

Non-Final OA §103
Filed
Apr 13, 2023
Examiner
ANSARI, TAHMINA N
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Aligned AI Limited
OA Round
1 (Non-Final)
86%
Grant Probability
Favorable
1-2
OA Rounds
2y 8m
To Grant
99%
With Interview

Examiner Intelligence

Grants 86% — above average
86%
Career Allow Rate
743 granted / 868 resolved
+23.6% vs TC avg
Strong +18% interview lift
Without
With
+17.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
33 currently pending
Career history
901
Total Applications
across all art units

Statute-Specific Performance

§101
12.2%
-27.8% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
22.6%
-17.4% vs TC avg
§112
10.5%
-29.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 868 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are pending in this application. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. Specification The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made. Claims 1-7, 10-14, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Sugasawa et al. (US PGPub US20240185576A1, filed March 14, 2022), hereby referred to as “Sugasawa”, in view of Chen et al. (US PGPub US20240185576A1), hereby referred to as “Chen”. Consider Claims 1. Sugasawa teaches: 1. A computer-implemented method, comprising: / 11. A system, comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform: / 16. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing system, cause the computing system to perform a method comprising (Sugasawa: abstract, An image determination device according to the present disclosure includes: a trainer that obtains one or more first models by training machine learning models of one or more types with use of a first training data set including first images and first labels, and obtains one or more second models by training machine learning models of one or more types with use of one or more second training data sets each including second images different from the first images, second labels, and at least part of the first training data set; an image obtainer that obtains a target image; and a determiner that outputs a determination result of a label of the target image obtained by the image obtainer, which is obtained by using, for the target image, at least two models including one of the one or more first models and one of the one or more second models. [0029]-[0044], Figures 1-2) 1. providing, by a computing system, to a machine learning model, one or more labeled training data instances;/ 11. providing, to a machine learning model, one or more labeled training data instances; / 16. providing, to a machine learning model, one or more labeled training data instances; (Sugasawa: Machine Learning Models, [0071]-[0074], [0072] Model 1-1, model 1-2, and so on are machine learning models of one or more types trained using same data set 1 as a training data set. Model 2-1, model 2-2, and so on are machine learning models of one or more types trained using the same training data set that includes at least part of data set 1 and updated data set 2. [0089] FIG. 7 illustrates a concept of a method for selecting, by machine learning, a combination of old and new machine learning models according to the embodiment. [0090] Part (a) of FIG. 7 illustrates an example in which trained models 1-1, 1-2, and 1-3 can be used as old machine learning models, and trained models 2-1, 2-2, and 2-3 can be used as new machine learning models. Part (b) of FIG. 7 illustrates that a combination obtained by selecting an optimal combination from among all the combinations of the models is a combination of models 1-1, 2-1, and 2-3.) 1. receiving, by the computing system, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the labeled training data instances; / 11. receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the labeled training data instances; / 16. receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the labeled training data instances; (Sugasawa: [0093]-[0095] FIG. 8A illustrates examples of machine learning models that are combination targets and a testing data set according to the embodiment. FIG. 8B illustrates examples of outputs from machine learning models that are combination targets, for images included in the testing data set. The testing data set includes, for example, part of training data that includes data set 1 and updated data set 1, for instance, described above. [0096]-[0097] Next, n outputs are selected from among the outputs from machine learning models as illustrated in FIG. 8B, for example, and used as explanatory variables, and a machine learning model for selecting a combination that predicts labels such as 0 (good) and 1 (poor) using explanatory variables is created. If a description is given using the example in FIG. 8A, combinations when n=2 are (model 1-1, model 2-1), (model 1-1, model 2-2), (model 1-2, model 2-1), and (model 1-2, model 2-2).) 1. determining, by the computing system, a first loss function term, wherein the first loss function term rewards each of multiple elements of the machine learning model for the extent to which it properly predicts labels of the labeled training data instances; / 11.determining a first loss function term, wherein the first loss function term rewards each of multiple elements of the machine learning model for the extent to which it properly predicts labels of the labeled training data instances; / 16. determining a first loss function term, wherein the first loss function term rewards each of multiple elements of the machine learning model for the extent to which it properly predicts labels of the labeled training data instances; (Sugasawa: [0108] FIG. 10 is a flowchart briefly showing an operation of image determination device 10 according to the present embodiment. [0109] First, image determination device 10 obtains one or more models 1 by training in which data set 1 is used as training data, and obtains one or more models 2 by training in which a data set that includes updated data set 2 is used as training data (S1). More specifically, trainer 101 of image determination device 10 obtains one or more first models by training one or more machine learning models of one or more types using a first training data set that includes first images and first labels associated with the first images. Trainer 101 obtains one or more second models by training one or more machine learning models of one or more types, using one or more second training data sets. Here, each of the one or more second training data sets includes second images different from first images, second labels associated with the second images, and at least part of the first training data set.) 1. providing, by the computing system, to the machine learning model, one or more labeled training data instances; / 11.providing, to the machine learning model, one or more labeled training data instances; / 16. providing, to the machine learning model, one or more labeled training data instances; (Sugasawa: [0063] Data set 1 includes a great number of inspection images collected evenly from the plurality of production lines, and labels associated with the great number of inspection images. In the example of data set 1 illustrated in FIG. 5 , N1 inspection images are each associated with a label indicating dent failure, N2 inspection images are each associated with a label indicating scratch failure, N3 inspection images are each associated with a label indicating a non-defective product, and N4 inspection images are each associated with a label indicating crack failure. These images include N5 images collected from production line A, N6 images collected from production line B, and N7 images collected from production line C, while N5, N6, and N7 are all a predetermined number or more. [0064] Updated data set 2A includes, as a new data set, inspection images collected from production line A and labels associated with the inspection images, for example. In the example of updated data set 2A illustrated in FIG. 5 , M1a inspection images are each associated with a label indicating a non-defective product, whereas M2a inspection images are each associated with a label indicating a defective product.) 1. receiving, by the computing system, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the unlabeled training data instances; / 11.receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the unlabeled training data instances; / 16. receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the unlabeled training data instances; (Sugasawa: [0065] Similarly, updated data set 2B includes, as a new data set, inspection images collected from production line B and labels associated with the inspection images, for example. In the example of updated data set 2B illustrated in FIG. 5 , M1b inspection images are each associated with a label indicating a non-defective product, whereas M2b inspection images are each associated with a label indicating a non-defective product. [0068] ) 1. and determining, by the computing system, a second loss function term, wherein the second loss function term rewards each of the multiple elements of the machine learning model for the extent to which it disagrees with each of other elements of the machine learning model in predicting labels for a subset of the unlabeled training data instances. / 11.and determining a second loss function term, wherein the second loss function term rewards each of the multiple elements of the machine learning model for the extent to which it disagrees with each of other elements of the machine learning model in predicting labels for a subset of the unlabeled training data instances. / 16. and determining a second loss function term, wherein the second loss function term rewards each of the multiple elements of the machine learning model for the extent to which it disagrees with each of other elements of the machine learning model in predicting labels for a subset of the unlabeled training data instances. (Sugasawa: [0102] More specifically, image determination device 10 may include a display that displays a degree of precision of a determination result of a label of a target image obtained by using, for a testing data set, a combination of at least one of one or more first models and at least one of one or more second models. The testing data set includes part of a first training data set and part of each of one or more second training data sets, for example. [0103] Thus, image determination device 10 may further include a display or a display device. Of course, image determination device 10 may be connected to an external display or an external display device. Image determination device 10 may cause the display or the display device to display a list of degrees of precision of the old machine learning models, degrees of precision of the new machine learning models, and degrees of precision of combinations of the old machine learning models and the new machine learning models, and prompt a user to select an optimal combination. [0106] Note that easy operations such as searching, narrowing down, and sorting can be made on the list as illustrated in FIG. 9 . In this case, not only the user, but also determiner 104 may sort the list based on the degree of precision for data set 1, and select a combination that achieves at least a predetermined degree of precision such as at least 90% for data set 1 and the highest degree of precision for updated data set 2. Determiner 104 may sort the list based on the determination speed (takt time) and select a combination with the highest degree of precision, which makes determination at a predetermined determination speed or less.) Sugasawa does not explicitly teach: unlabeled data Chen teaches: 1. A computer-implemented method, comprising: / 11. A system, comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform: / 16. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing system, cause the computing system to perform a method comprising: (Chen: abstract, Systems, methods, and computer program products for performing semi-supervised contrastive learning of visual representations are provided. For example, the present disclosure provides systems and methods that leverage particular data augmentation schemes and a learnable nonlinear transformation between the representation and the contrastive loss to provide improved visual representations. Further, the present disclosure also provides improvements for semi-supervised contrastive learning. For example, computer-implemented method may include performing semi-supervised contrastive learning based on a set of one or more unlabeled training data, generating an image classification model based on a portion of a plurality of layers in a projection head neural network used in performing the contrastive learning, performing fine-tuning of the image classification model based on a set of one or more labeled training data, and after performing the fine-tuning, distilling the image classification model to a student model comprising a relatively smaller number of parameters than the image classification model.) 1. providing, by a computing system, to a machine learning model, one or more labeled training data instances;/ 11. providing, to a machine learning model, one or more labeled training data instances; / 16. providing, to a machine learning model, one or more labeled training data instances; (Chen: Example Contrastive Learning Techniques [0042] Example Contrastive Learning Framework [0043] Example implementations of the present disclosure learn representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space. As illustrated in FIG. 2A, an example framework 200 can include the following four major components: [0044] A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image x shown at 202) randomly resulting in two correlated views of the same example, denoted {tilde over (x)}i {tilde over (x)}j, and which are shown at 212 and 222, respectively. These augmented images 212 a[0nd 222 can be considered as a positive pair. Although the present disclosure focuses on data examples from the image domain for ease of explanation, the framework is extensible to data examples of different domains as well which are susceptible to augmentation of some kind, including text and/or audio domains. Example types of images that can be used include video frames, LiDAR point clouds, computed tomography scans, X-ray images, hyper-spectral images, and/or various other forms of imagery.) 1. receiving, by the computing system, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the labeled training data instances; / 11. receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the labeled training data instances; / 16. receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the labeled training data instances; (Chen: [0032] One example aspect of the present disclosure is directed to particular compositions of data augmentations which enable the system to define effective predictive tasks. Composition of multiple data augmentation operations is crucial in defining the contrastive prediction tasks that yield effective representations. As one example, a combination of random crop and color distortions provides particular benefit. In addition, unsupervised contrastive learning benefits from stronger data augmentation than supervised learning.[0043] As illustrated in FIG. 2A, an example framework 200 can include the following four major components: [0044] A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image x shown at 202) randomly resulting in two correlated views of the same example, denoted {tilde over (x)}i {tilde over (x)}j, and which are shown at 212 and 222, respectively. These augmented images 212 and 222 can be considered as a positive pair. Although the present disclosure focuses on data examples from the image domain for ease of explanation, the framework is extensible to data examples of different domains as well which are susceptible to augmentation of some kind, including text and/or audio domains. Example types of images that can be used include video frames, LiDAR point clouds, computed tomography scans, X-ray images, hyper-spectral images, and/or various other forms of imagery.) 1. determining, by the computing system, a first loss function term, wherein the first loss function term rewards each of multiple elements of the machine learning model for the extent to which it properly predicts labels of the labeled training data instances; / 11.determining a first loss function term, wherein the first loss function term rewards each of multiple elements of the machine learning model for the extent to which it properly predicts labels of the labeled training data instances; / 16. determining a first loss function term, wherein the first loss function term rewards each of multiple elements of the machine learning model for the extent to which it properly predicts labels of the labeled training data instances; (Chen: [0046], [0047] A projection head neural network 206 (represented in the notation herein as g(⋅)) that maps the intermediate representations to final representations within the space where contrastive loss is applied. For example, the projection head neural network 206 has generated final representations 216 and 226 from the intermediate representations 214 and 224, respectively. In some example implementations of the present disclosure, the projection head neural network 206 can be a multi-layer perceptron with one hidden layer to obtai zi=g(hi)=W(2)σ(W(i)hi) where σ is a ReLU non-linearity. As shown in the following sections, it is beneficial to define the contrastive loss on final representations zi's rather than intermediate representations hi's. [0048] A contrastive loss function can be defined for a contrastive prediction task. As one example, given a set {{tilde over (x)}k} including a positive pair of examples {tilde over (x)}i 212 and {tilde over (x)}j 222, the contrastive prediction task aims to identify {tilde over (x)}j in {{tilde over (x)}k}k≠i for a given {tilde over (x)}i, e.g., based on similarly between their respective final representations 216 and 226.[0049] In some implementations, to perform training within the illustrated framework, a minibatch of N examples can be randomly sampled and the contrastive prediction task can be defined on pairs of augmented examples derived from the minibatch, resulting in 2N data points.) 1. providing, by the computing system, to the machine learning model, one or more unlabeled training data instances; / 11.providing, to the machine learning model, one or more unlabeled training data instances; / 16. providing, to the machine learning model, one or more unlabeled training data instances; (Chen: [0118] Method 1100 begins at block 1102 when, for example, a computer system performs contrastive learning based on a set of training data. In an example, the computer system performs contrastive learning based on one or more of the various examples provided in the present disclosure. For example, the computer system may perform contrastive learning based on example framework 200 and other examples provided throughout the present disclosure. [0119] In an example, the computer system performs unsupervised pretraining of a model using contrastive learning based on a set of unlabeled training data. For example, the computer system may pretrain a large, task-agnostic general convolutional network using a large number of unlabeled training data. In various examples, training data generally may include any type of visual and non-visual data including, but not limited to, images, video content, image frames of video content, audio data, textual data, geospatial data, sensor data, etc. Unlabeled training data generally refers to any data where labels, descriptions, features, and/or properties are not provided or otherwise have been deleted, discarded or fully ignored. In an example, pretraining of a model may be performed using unsupervised or self-supervised contrastive learning based on unlabeled, task agnostic training data without class labels and without being directed or tailored to a specific classification task.) 1. receiving, by the computing system, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the unlabeled training data instances; / 11.receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the unlabeled training data instances; / 16. receiving, from the machine learning model, generated output, wherein the generated output comprises predicted labels for the unlabeled training data instances; (Chen: [0052] The task specific model 250 and/or the base encoder neural network 204 can be additionally trained (e.g., “fine-tuned”) on additional training data (e.g., which may be task specific data). The additional training can be, for example, supervised learning training. [0053] After fine-tuning, an additional input 252 can be provided to the base encoder neural network 204 which can produce an intermediate representation 254. The task-specific model 250 can receive and process the intermediate representation 254 to generate a task-specific prediction 256. As examples, the task-specific prediction 256 can be a classification prediction; a detection prediction; a recognition prediction; a segmentation prediction; and/or other prediction tasks.) 1. and determining, by the computing system, a second loss function term, wherein the second loss function term rewards each of the multiple elements of the machine learning model for the extent to which it disagrees with each of other elements of the machine learning model in predicting labels for a subset of the unlabeled training data instances. / 11.and determining a second loss function term, wherein the second loss function term rewards each of the multiple elements of the machine learning model for the extent to which it disagrees with each of other elements of the machine learning model in predicting labels for a subset of the unlabeled training data instances. / 16. and determining a second loss function term, wherein the second loss function term rewards each of the multiple elements of the machine learning model for the extent to which it disagrees with each of other elements of the machine learning model in predicting labels for a subset of the unlabeled training data instances. (Chen: [0049] In some implementations, to perform training within the illustrated framework, a minibatch of N examples can be randomly sampled and the contrastive prediction task can be defined on pairs of augmented examples derived from the minibatch, resulting in 2N data points. In some implementations, negative examples are not explicitly sampled. Instead, given a positive pair, the other 2(N−1) augmented examples within a minibatch can be treated as negative examples. Let sim(u,v)=uTv/∥u∥∥v∥ denote the cosine similarity between two vectors u and v. Then one example loss function for a positive pair of examples (i,f) can be defined as PNG media_image1.png 52 562 media_image1.png Greyscale where k≠i∈{0,1} is an indicator function evaluating to 1 iff k≠i and τ denotes a temperature parameter. The final loss can be computed across all positive pairs, both (i,j) and (j,i), in a mini-batch. For convenience, this loss is referred to further herein as NT-Xent (the normalized temperature-scaled cross entropy loss). [0050] The below example Algorithm 1 summarizes one example implementation of the proposed method: [0051] Algorithm 1 – Example Learning Algorithm [0118]-[0119] In various examples, training data generally may include any type of visual and non-visual data including, but not limited to, images, video content, image frames of video content, audio data, textual data, geospatial data, sensor data, etc. Unlabeled training data generally refers to any data where labels, descriptions, features, and/or properties are not provided or otherwise have been deleted, discarded or fully ignored. In an example, pretraining of a model may be performed using unsupervised or self-supervised contrastive learning based on unlabeled, task agnostic training data without class labels and without being directed or tailored to a specific classification task.) It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify Sugasawa’s method and system for machine learning for object inspection with Chen’s semi-supervised contrastive learning of visual representations, as they are both directed towards the same field of overall endeavor. The determination of obviousness is predicated upon the following findings: One skilled in the art would have been motivated to modify Sugasawa in order to improve the overall robustness of the learning model to account for differences with the contrastive-based architecture. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Sugasawa, while the teaching of Chen continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of improving the overall accuracy and precision in object detection and inspection by incorporation of contrastive or difference learning. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question. Consider Claims 2, 12 and 17. The combination of Sugasawa and Chen teaches: 2. The computer-implemented method of claim 1, further comprising: training, by the computing system, the machine learning model using the first loss function term and the second loss function term. / 12. The system of claim 11, wherein the instructions, when executed by the at least one processor, further cause the system to perform: training the machine learning model using the first loss function term and the second loss function term./ 17. The non-transitory computer-readable storage medium of claim 16, wherein the instructions, when executed by the at least one processor of the computing system, further cause the computing system to perform: training the machine learning model using the first loss function term and the second loss function term. (Chen: [0049] In some implementations, to perform training within the illustrated framework, a minibatch of N examples can be randomly sampled and the contrastive prediction task can be defined on pairs of augmented examples derived from the minibatch, resulting in 2N data points. In some implementations, negative examples are not explicitly sampled. Instead, given a positive pair, the other 2(N−1) augmented examples within a minibatch can be treated as negative examples. Let sim(u,v)=uTv/∥u∥∥v∥ denote the cosine similarity between two vectors u and v. Then one example loss function for a positive pair of examples (i,f) can be defined as PNG media_image1.png 52 562 media_image1.png Greyscale where k≠i∈{0,1} is an indicator function evaluating to 1 iff k≠i and τ denotes a temperature parameter. The final loss can be computed across all positive pairs, both (i,j) and (j,i), in a mini-batch. For convenience, this loss is referred to further herein as NT-Xent (the normalized temperature-scaled cross entropy loss). [0050] The below example Algorithm 1 summarizes one example implementation of the proposed method: [0051] Algorithm 1 – Example Learning Algorithm [0118]-[0119] In various examples, training data generally may include any type of visual and non-visual data including, but not limited to, images, video content, image frames of video content, audio data, textual data, geospatial data, sensor data, etc. Unlabeled training data generally refers to any data where labels, descriptions, features, and/or properties are not provided or otherwise have been deleted, discarded or fully ignored. In an example, pretraining of a model may be performed using unsupervised or self-supervised contrastive learning based on unlabeled, task agnostic training data without class labels and without being directed or tailored to a specific classification task.) Consider Claim 3. The combination of Sugasawa and Chen teaches: 3. The computer-implemented method of claim 1, wherein the multiple elements of the machine learning model are one or more of heads of the machine learning model, or ensemble elements of the machine learning model. (Chen: [0046], [0047] A projection head neural network 206 (represented in the notation herein as g(⋅)) that maps the intermediate representations to final representations within the space where contrastive loss is applied. For example, the projection head neural network 206 has generated final representations 216 and 226 from the intermediate representations 214 and 224, respectively. In some example implementations of the present disclosure, the projection head neural network 206 can be a multi-layer perceptron with one hidden layer to obtai zi=g(hi)=W(2)σ(W(i)hi) where σ is a ReLU non-linearity. As shown in the following sections, it is beneficial to define the contrastive loss on final representations zi's rather than intermediate representations hi's. [0048] A contrastive loss function can be defined for a contrastive prediction task. As one example, given a set {{tilde over (x)}k} including a positive pair of examples {tilde over (x)}i 212 and {tilde over (x)}j 222, the contrastive prediction task aims to identify {tilde over (x)}j in {{tilde over (x)}k}k≠i for a given {tilde over (x)}i, e.g., based on similarly between their respective final representations 216 and 226.[0049] In some implementations, to perform training within the illustrated framework, a minibatch of N examples can be randomly sampled and the contrastive prediction task can be defined on pairs of augmented examples derived from the minibatch, resulting in 2N data points.) Consider Claims 4, 13 and 18. The combination of Sugasawa and Chen teaches: 4. The computer-implemented method of claim 1, further comprising: selecting, by the computing system, the subset as a quantity of the unlabeled training data instances for which there is maximal disagreement among elements of the machine learning model./ 13. The system of claim 11, wherein the instructions, when executed by the at least one processor, further cause the system to perform: selecting the subset as a quantity of the unlabeled training data instances for which there is maximal disagreement among elements of the machine learning model./ 18. The non-transitory computer-readable storage medium of claim 16, wherein the instructions, when executed by the at least one processor of the computing system, further cause the computing system to perform: selecting the subset as a quantity of the unlabeled training data instances for which there is maximal disagreement among elements of the machine learning model. (Examiner Note: the use of a contrastive loss metric in the maximizing agreement between differently augmented views algorithm is analogous in scope to maximal disagreement Chen: [0042]-[0043] Example implementations of the present disclosure learn representations by maximizing agreement between differently augmented views of the same data example via a contrastive loss in the latent space. As illustrated in FIG. 2A, an example framework 200 can include the following four major components: [0044] A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image x shown at 202) randomly resulting in two correlated views of the same example, denoted {tilde over (x)}i {tilde over (x)}j, and which are shown at 212 and 222, respectively. These augmented images 212 and 222 can be considered as a positive pair. Although the present disclosure focuses on data examples from the image domain for ease of explanation, the framework is extensible to data examples of different domains as well which are susceptible to augmentation of some kind, including text and/or audio domains. Example types of images that can be used include video frames, LiDAR point clouds, computed tomography scans, X-ray images, hyper-spectral images, and/or various other forms of imagery.) Consider Claim 5. The combination of Sugasawa and Chen teaches: 5. The computer-implemented method of claim 1, wherein said predicted labels comprise class labels, reward labels, or recommendation labels. (Sugasawa: [0096]-[0097] Next, n outputs are selected from among the outputs from machine learning models as illustrated in FIG. 8B, for example, and used as explanatory variables, and a machine learning model for selecting a combination that predicts labels such as 0 (good) and 1 (poor) using explanatory variables is created. If a description is given using the example in FIG. 8A, combinations when n=2 are (model 1-1, model 2-1), (model 1-1, model 2-2), (model 1-2, model 2-1), and (model 1-2, model 2-2). Chen: [0053] After fine-tuning, an additional input 252 can be provided to the base encoder neural network 204 which can produce an intermediate representation 254. The task-specific model 250 can receive and process the intermediate representation 254 to generate a task-specific prediction 256. As examples, the task-specific prediction 256 can be a classification prediction; a detection prediction; a recognition prediction; a segmentation prediction; and/or other prediction tasks.) Consider Claim 6. The combination of Sugasawa and Chen teaches: 6. The computer-implemented method of claim 1, wherein the multiple elements of the machine learning model are trained to predict proper labels for the labeled training data instances and to be distinct on the unlabeled training data instances. (Sugasawa: [0096]-[0097] Next, n outputs are selected from among the outputs from machine learning models as illustrated in FIG. 8B, for example, and used as explanatory variables, and a machine learning model for selecting a combination that predicts labels such as 0 (good) and 1 (poor) using explanatory variables is created. If a description is given using the example in FIG. 8A, combinations when n=2 are (model 1-1, model 2-1), (model 1-1, model 2-2), (model 1-2, model 2-1), and (model 1-2, model 2-2). Chen: [0053] After fine-tuning, an additional input 252 can be provided to the base encoder neural network 204 which can produce an intermediate representation 254. The task-specific model 250 can receive and process the intermediate representation 254 to generate a task-specific prediction 256. As examples, the task-specific prediction 256 can be a classification prediction; a detection prediction; a recognition prediction; a segmentation prediction; and/or other prediction tasks.) Consider Claims 7, 14 and 19. The combination of Sugasawa and Chen teaches: 7. The computer-implemented method of claim 1, wherein the machine learning model is one of a transformer encoder-based classifier, convolutional neural network-based classifier, or an autoencoder-based machine learning model./ 14. The system of claim 11, wherein the machine learning model is one of a transformer encoder-based classifier, convolutional neural network-based classifier, or an autoencoder-based machine learning model. / 19. The non-transitory computer-readable storage medium of claim 16, wherein the machine learning model is one of a transformer encoder-based classifier, convolutional neural network- based classifier, or an autoencoder-based machine learning model . (Chen: [0047] A projection head neural network 206 (represented in the notation herein as g(⋅)) that maps the intermediate representations to final representations within the space where contrastive loss is applied. For example, the projection head neural network 206 has generated final representations 216 and 226 from the intermediate representations 214 and 224, respectively. In some example implementations of the present disclosure, the projection head neural network 206 can be a multi-layer perceptron with one hidden layer to obtai zi=g(hi)=W(2)σ(W(i)hi) where σ is a ReLU non-linearity.: [0052] The task specific model 250 and/or the base encoder neural network 204 can be additionally trained (e.g., “fine-tuned”) on additional training data (e.g., which may be task specific data). The additional training can be, for example, supervised learning training. Sugasawa: [0073] Here, types of machine learning models may be models trained by supervised learning, such as a logistic regression model, a support-vector machine, and a deep neural network (DNN). In addition, the types of machine learning models may also include an autoencoder and may also include a non-defective product model that is generated using data indicating non-defective products and outputs an outlier as an abnormal level. Thus, the machine learning models according to the present embodiment may be of one or more of the types stated as examples.) Consider Claim 10. The combination of Sugasawa and Chen teaches: 10. The computer-implemented method of claim 1, wherein said training data instances comprise one or more of sentences, images, or images superimposed with text.(Chen: [0043] As illustrated in FIG. 2A, an example framework 200 can include the following four major components: [0044] A stochastic data augmentation module (shown generally at 203) that transforms any given data example (e.g., an input image x shown at 202) randomly resulting in two correlated views of the same example, denoted {tilde over (x)}i {tilde over (x)}j, and which are shown at 212 and 222, respectively. These augmented images 212 and 222 can be considered as a positive pair. Although the present disclosure focuses on data examples from the image domain for ease of explanation, the framework is extensible to data examples of different domains as well which are susceptible to augmentation of some kind, including text and/or audio domains. Example types of images that can be used include video frames, LiDAR point clouds, computed tomography scans, X-ray images, hyper-spectral images, and/or various other forms of imagery. [0118]-[0119] In various examples, training data generally may include any type of visual and non-visual data including, but not limited to, images, video content, image frames of video content, audio data, textual data, geospatial data, sensor data, etc. Unlabeled training data generally refers to any data where labels, descriptions, features, and/or properties are not provided or otherwise have been deleted, discarded or fully ignored. In an example, pretraining of a model may be performed using unsupervised or self-supervised contrastive learning based on unlabeled, task agnostic training data without class labels and without being directed or tailored to a specific classification task.) Claims 8-9, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sugasawa et al. (US PGPub US20240185576A1, filed March 14, 2022), hereby referred to as “Sugasawa”, in view of Chen et al. (US PGPub US20240185576A1), hereby referred to as “Chen” further in view of Sriram et al. (US PGPub ). Consider Claims 8, 15 and 20. The combination of Sugasawa and Chen teaches the method of Claim 1, the system of Claim 11 and the non-transtiory computer-readable storage medium of claim 16. The combination of Sugasawa and Chen does not teach : wherein the machine learning model is a critic machine learning model, and wherein the critic machine learning model generates reward output for a second machine learning model Sriram teaches: 8. The computer-implemented method of claim 1, wherein the machine learning model is a critic machine learning model, / 15. The system of claim 11, wherein the machine learning model is a critic machine learning model, / 20. The non-transitory computer-readable storage medium of claim 16, wherein the machine learning model is a critic machine learning model, (Sriram: C. ROBUST ASR EMBODIMENTS [0028] 1. Encoder Distance Enhancer Embodiments [0029]-[0044], [0030] FIG. 1 depicts architecture of a sequence-to-sequence ASR model with encoder distance enhancer introduced herein, according to embodiments of the present disclosure. [0035] 2. GAN Enhancer Embodiments [0037] In one or more embodiments, Wasserstein GAN (WGAN) is used. FIG. 2 depicts architecture of a sequence-to-sequence model with WGAN enhancer, in accordance with embodiments of the present disclosure. The overall architecture in FIG. 2 is similar to the architecture depicted in FIG. 1, except that an Earth-Mover (EM) distance is used to replace the L1-distance shown in equation (1). As shown in FIG. 2, a critic f (210), is employed to output a first scalar score s (220) of the first hidden state z (125) and a second scalar score {tilde over (s)} (230) of the second hidden state {tilde over (z)} (130) respectively. The first scalar score s (220) and the second scalar score {tilde over (s)} (230) are then used to determine the EM distance 240.) 8. and wherein the critic machine learning model generates reward output for a second machine learning model. / 15. and wherein the critic machine learning model generates reward output for a second machine learning model. / 20. and wherein the critic machine learning model generates reward output for a second machine learning model. (Sriram: [0030] FIG. 1 depicts architecture of a sequence-to-sequence ASR model with encoder distance enhancer introduced herein, according to embodiments of the present disclosure. A encoder g (115), is applied to an audio input x (105) labeled with ground-truth label or transcription y (165) to produce a first hidden state (125) z=g(x). The same encoder 115 is applied to an unlabeled audio {tilde over (x)} (110) to produce a second hidden state (130) 2=g({tilde over (x)}). In one or more embodiments, the unlabeled audio 110 is corresponding to the labeled audio input 105. A decoder h (150) models the conditional probability p(y|x)=p(y|z)=h(z) using the first hidden state z (125) and outputs a predicted text sequence 160 one character at a time. The predicted text sequence 160 and the ground-truth label or transcription 165 are used to generate a cross-entropy (CE) loss 170, which is used for training the ASR model. A discriminator 140 receives the first hidden state 125 and the second hidden state 130 to generate a discriminator loss 145 based on the first hidden state 125 and the second hidden state 130. [0038] Following the notations of WGAN, the seq-to-seq model and the critic shown in FIG. 2 are parameterized with θ (for the encoder and the decoder) and w (for the critic), respectively. The encoder distance in Equation (1) is replaced with a dual of Earth-Mover (EM) distance, a distance between probability measures: PNG media_image2.png 42 538 media_image2.png Greyscale ) It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the combination of Sugasawa and Chen for a semi-supervised contrastive machine learning algorithm for object inspection with Sriram for a general scalable GAN-based architecture, as they are all directed towards the same field of overall endeavor of machine learning. The determination of obviousness is predicated upon the following findings: One skilled in the art would have been motivated to modify the combination of Sugasawa and Chen in order to leverage Sriram’s scalable GAN-based machine learning model that can easily be modified to be applicable to image data and can improve the overall accuracy of the object detection and inspection data. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of the combination of Sugasawa and Chen, while the teaching of Sriram continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of improving the overall accuracy and precision in object detection and inspection by leveraging the GAN-based architecture to model the contrastive or difference learning. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question. Consider Claim 9. The combination of Sugasawa, Chen and Sriram teaches: 9. The computer-implemented method of claim 8, wherein the second machine learning model is one of a transformer decoder-based generative machine learning model, long term short memory-based generative machine learning model, or a convolutional neural network-based generative machine learning model. (Chen: [0119] In an example, the computer system performs unsupervised pretraining of a model using contrastive learning based on a set of unlabeled training data. For example, the computer system may pretrain a large, task-agnostic general convolutional network using a large number of unlabeled training data. In various examples, training data generally may include any type of visual and non-visual data including, but not limited to, images, video content, image frames of video content, audio data, textual data, geospatial data, sensor data, etc. Unlabeled training data generally refers to any data where labels, descriptions, features, and/or properties are not provided or otherwise have been deleted, discarded or fully ignored. In an example, pretraining of a model may be performed using unsupervised or self-supervised contrastive learning based on unlabeled, task agnostic training data without class labels and without being directed or tailored to a specific classification task. Sriram: [0030] FIG. 1 depicts architecture of a sequence-to-sequence ASR model with encoder distance enhancer introduced herein, according to embodiments of the present disclosure. A encoder g (115), is applied to an audio input x (105) labeled with ground-truth label or transcription y (165) to produce a first hidden state (125) z=g(x). The same encoder 115 is applied to an unlabeled audio {tilde over (x)} (110) to produce a second hidden state (130) 2=g({tilde over (x)}). In one or more embodiments, the unlabeled audio 110 is corresponding to the labeled audio input 105. A decoder h (150) models the conditional probability p(y|x)=p(y|z)=h(z) using the first hidden state z (125) and outputs a predicted text sequence 160 one character at a time. The predicted text sequence 160 and the ground-truth label or transcription 165 are used to generate a cross-entropy (CE) loss 170, which is used for training the ASR model. A discriminator 140 receives the first hidden state 125 and the second hidden state 130 to generate a discriminator loss 145 based on the first hidden state 125 and the second hidden state 130. [0038] Following the notations of WGAN, the seq-to-seq model and the critic shown in FIG. 2 are parameterized with θ (for the encoder and the decoder) and w (for the critic), respectively. The encoder distance in Equation (1) is replaced with a dual of Earth-Mover (EM) distance, a distance between probability measures: PNG media_image2.png 42 538 media_image2.png Greyscale ) Conclusion The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. PNG media_image3.png 282 908 media_image3.png Greyscale Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379. The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’NEAL MISTRY can be reached on 313-446-4912. The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600. Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600. 2674 /Tahmina Ansari/ January 4, 2026 /TAHMINA N ANSARI/Primary Examiner, Art Unit 2674
Read full office action

Prosecution Timeline

Apr 13, 2023
Application Filed
Jan 05, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12586249
PROCESSING APPARATUS, PROCESSING METHOD, AND STORAGE MEDIUM FOR CALIBRATING AN IMAGE CAPTURE APPARATUS
2y 5m to grant Granted Mar 24, 2026
Patent 12586354
TRAINING METHOD, APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM FOR A MACHINE LEARNING MODEL
2y 5m to grant Granted Mar 24, 2026
Patent 12573083
COMPUTER-READABLE RECORDING MEDIUM STORING OBJECT DETECTION PROGRAM, DEVICE, AND MACHINE LEARNING MODEL GENERATION METHOD OF TRAINING OBJECT DETECTION MODEL TO DETECT CATEGORY AND POSITION OF OBJECT
2y 5m to grant Granted Mar 10, 2026
Patent 12548297
IMAGE PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT BASED ON FEATURE AND DISTRIBUTION CORRELATION
2y 5m to grant Granted Feb 10, 2026
Patent 12524504
METHOD AND DATA PROCESSING SYSTEM FOR PROVIDING EXPLANATORY RADIOMICS-RELATED INFORMATION
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+17.9%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 868 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month