Last updated: April 19, 2026
Application No. 17/503,140
AUGMENTATION OF TESTING OR TRAINING SETS FOR MACHINE LEARNING MODELS

Final Rejection §112
Filed
Oct 15, 2021
Examiner
GIROUX, GEORGE
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +27.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 612 resolved cases, 2023–2026
Examiner Intelligence

GIROUX, GEORGE View full profile →
Grants 66% — above average
Career Allow Rate
401 granted / 612 resolved
+10.5% vs TC avg
Strong +27% interview lift
Without
With
+27.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
28 currently pending
Career history
640
Total Applications
across all art units
Statute-Specific Performance

§101
11.0%
-29.0% vs TC avg
§103
45.5%
+5.5% vs TC avg
§102
16.0%
-24.0% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 612 resolved cases
Office Action

§112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 9 July 2025, in response to the Office Action mailed 13 March 2025.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-19 and 21-23 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

Claim 1 recites “performing gap analysis on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples.”  The specification describes that the “files can be sampled from all the clusters of the embedding space to capture the diversity represented in the target data” (para. [0075] of the specification) and “The diversity of the resulting sample was measured as the x2 distance between the distribution of audio segments across the embedding clusters and a uniform distribution between clusters. The value was normalized by calculating x2 of the contingency table over the percentage of data points in each cluster rather than raw frequency. The lower the x2 distance, the more audio properties encoded in the embedding space the resulting test set covers and thus, the more diverse the conditions captured by the test set are.” (para. [0083] of the specification).  However, the specification, as filed, does not appear to describe any “gap analysis” performed on the private data items using the clustering algorithm and diversity measure or identifying a classification with fewer than a threshold number of examples.  Claim 1 also recites “wherein the threshold number of examples is sufficient to accurately test or train the model with respect to data items having that classification.”  The specification describes that “A given classification of data is "under-represented" in a testing or training set when the testing or training set lacks sufficient examples of that classification to accurately test or train the model (e.g., fewer than 10 examples) with respect to data items having that classification. When a testing or training set is augmented with examples from an unrepresented or under-represented classification, the testing or training of the model often becomes more accurate with respect to other data items of the same classification. A given classification of data is "over-represented" when the number of examples of that classification is sufficiently large that the number can be reduced without significantly degrading the testing and/or training value of that dataset. For instance, in some cases, an over-represented classification might have 100 examples in an initial testing or training set, and this number could be reduced to 10 examples in an augmented testing or training set.” (para [0026] of the specification) and that “additional testing or training examples can be added to the testing or training set so that the testing or training set more accurately reflects real-world conditions” (para. [0090] of the specification).  However, while providing certain examples for numbers of samples that may be considered under- or over-represented, the specification as filed does not appear to describe determining a threshold number of examples that is sufficient to accurately test or train a model.
Claims 2-15 depend upon claim 1, and thus include the aforementioned limitation(s).

Claim 22 recites “perform gap analysis on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples.”  The specification describes that the “files can be sampled from all the clusters of the embedding space to capture the diversity represented in the target data” (para. [0075] of the specification) and “The diversity of the resulting sample was measured as the x2 distance between the distribution of audio segments across the embedding clusters and a uniform distribution between clusters. The value was normalized by calculating x2 of the contingency table over the percentage of data points in each cluster rather than raw frequency. The lower the x2 distance, the more audio properties encoded in the embedding space the resulting test set covers and thus, the more diverse the conditions captured by the test set are.” (para. [0083] of the specification).  However, the specification, as filed, does not appear to describe any “gap analysis” performed on the private data items using the clustering algorithm and diversity measure or identifying a classification with fewer than a threshold number of examples.  Claim 22 also recites “wherein the threshold number of examples is sufficient to accurately test or train the model with respect to data items having that classification.”  The specification describes that “A given classification of data is "under-represented" in a testing or training set when the testing or training set lacks sufficient examples of that classification to accurately test or train the model (e.g., fewer than 10 examples) with respect to data items having that classification. When a testing or training set is augmented with examples from an unrepresented or under-represented classification, the testing or training of the model often becomes more accurate with respect to other data items of the same classification. A given classification of data is "over-represented" when the number of examples of that classification is sufficiently large that the number can be reduced without significantly degrading the testing and/or training value of that dataset. For instance, in some cases, an over-represented classification might have 100 examples in an initial testing or training set, and this number could be reduced to 10 examples in an augmented testing or training set.” (para [0026] of the specification) and that “additional testing or training examples can be added to the testing or training set so that the testing or training set more accurately reflects real-world conditions” (para. [0090] of the specification).  However, while providing certain examples for numbers of samples that may be considered under- or over-represented, the specification as filed does not appear to describe determining a threshold number of examples that is sufficient to accurately test or train a model.
Claims 17-19 depend upon claim 22, and thus include the aforementioned limitation(s).

Claim 23 recites “performing gap analysis on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples.”  The specification describes that the “files can be sampled from all the clusters of the embedding space to capture the diversity represented in the target data” (para. [0075] of the specification) and “The diversity of the resulting sample was measured as the x2 distance between the distribution of audio segments across the embedding clusters and a uniform distribution between clusters. The value was normalized by calculating x2 of the contingency table over the percentage of data points in each cluster rather than raw frequency. The lower the x2 distance, the more audio properties encoded in the embedding space the resulting test set covers and thus, the more diverse the conditions captured by the test set are.” (para. [0083] of the specification).  However, the specification, as filed, does not appear to describe any “gap analysis” performed on the private data items using the clustering algorithm and diversity measure or identifying a classification with fewer than a threshold number of examples.  Claim 23 also recites “wherein the threshold number of examples is sufficient to accurately train the model with respect to data items having that classification.”  The specification describes that “A given classification of data is "under-represented" in a testing or training set when the testing or training set lacks sufficient examples of that classification to accurately test or train the model (e.g., fewer than 10 examples) with respect to data items having that classification. When a testing or training set is augmented with examples from an unrepresented or under-represented classification, the testing or training of the model often becomes more accurate with respect to other data items of the same classification. A given classification of data is "over-represented" when the number of examples of that classification is sufficiently large that the number can be reduced without significantly degrading the testing and/or training value of that dataset. For instance, in some cases, an over-represented classification might have 100 examples in an initial testing or training set, and this number could be reduced to 10 examples in an augmented testing or training set.” (para [0026] of the specification) and that “additional testing or training examples can be added to the testing or training set so that the testing or training set more accurately reflects real-world conditions” (para. [0090] of the specification).  However, while providing certain examples for numbers of samples that may be considered under- or over-represented, the specification as filed does not appear to describe determining a threshold number of examples that is sufficient to accurately test or train a model.
Claim 21 depends upon claim 23, and thus includes the aforementioned limitation(s).


Claims 1-19 and 21-23 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention.
The factors to be considered are as follows:
(A) the breadth of the claims;
(B) the nature of the invention;
(C) the state of the prior art;
(D) the level of one of ordinary skill;
(E) the level of predictability in the art;
(F) the amount of direction provided by the inventor;
(G) the existence of working examples;
(H) the quantity of experimentation needed to make or use the invention based on the content of the disclosure.

Claim 1 fail(s) to comply with the enablement requirement, as the specification does not describe how the claimed subject matter is achieved.  Namely, the specification does not describe how “gap analysis [is performed] on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples.” 
Regarding the above-identified factors to be considered, this limitation is addressed as follows:
Regarding factor (A), the claims are relatively narrow, in describing a series of steps being performed on private (and non-private) data items to produce an augmented training/testing set.  This does not weigh toward or against enablement.
Regarding factor (B), the nature of the invention is drawn to augmenting testing/training data for a machine learning model.  This weighs toward enablement.
Regarding factor (C), the prior art does not appear to describe all of the claimed elements/steps in the manner claimed (see below).  This weighs against enablement.
Regarding factor (D), one of ordinary skill in the art would be familiar with augmenting training/testing data using various methods/models.  This does not weigh toward or against enablement.
Regarding factor (E), the level of predictability in the art is generally low, as this is the reason that machine learning/predictive models are being developed and used.  This weighs against enablement.
Regarding factor (F), the inventor appears to discuss clustering and a diversity measure (see above), but does not appear to provide any direction regarding a “gap analysis.”  This weighs against enablement.
Regarding factor (G), examples are provided regarding clustering and diversity measures (see above), but no working example appears to be provided of a “gap analysis.”  This weighs against enablement.
Regarding factor (H), the quantity of experimentation needed to make or use the invention based on the content of the disclosure appears to be high, due to a lack of working examples and low level of predictability.  This weighs against enablement.
Additionally, the specification does not describe how “the threshold number of examples is sufficient to accurately test or train the model with respect to data items having the classification.” 
Regarding the above-identified factors to be considered, this limitation is addressed as follows:
Regarding factor (A), the claims are relatively narrow, in describing a series of steps being performed on private (and non-private) data items to produce an augmented training/testing set.  This does not weigh toward or against enablement.
Regarding factor (B), the nature of the invention is drawn to augmenting testing/training data for a machine learning model.  This weighs toward enablement.
Regarding factor (C), the prior art does not appear to describe all of the claimed elements/steps in the manner claimed (see below).  This weighs against enablement.
Regarding factor (D), one of ordinary skill in the art would be familiar with augmenting training/testing data using various methods/models.  This does not weigh toward or against enablement.
Regarding factor (E), the level of predictability in the art is generally low, as this is the reason that machine learning/predictive models are being developed and used.  This weighs against enablement.
Regarding factor (F), the inventor appears to provide direction toward there being certain thresholds for under- and over-representation, but not how these are identified/determined (see above).  This weighs against enablement.
Regarding factor (G), examples of specific under- and over-representations thresholds are given (see above), but not how these are identified/determined.  This weighs against enablement.
Regarding factor (H), the quantity of experimentation needed to make or use the invention based on the content of the disclosure appears to be high, as no specific means/method of making the determination appears to be provided.  This weighs against enablement.
Therefore, upon consideration of all of the evidence related to each of these factors, and based on the evidence as a whole, claim 1 is not enabled.
Claims 2-15 depend upon claim 1, and thus include the aforementioned limitation(s).

Claim 22 fail(s) to comply with the enablement requirement, as the specification does not describe how the claimed subject matter is achieved.  Namely, the specification does not describe how “gap analysis [is performed] on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples.” 
Regarding the above-identified factors to be considered, this limitation is addressed as follows:
Regarding factor (A), the claims are relatively narrow, in describing a series of steps being performed on private (and non-private) data items to produce an augmented training/testing set.  This does not weigh toward or against enablement.
Regarding factor (B), the nature of the invention is drawn to augmenting testing/training data for a machine learning model.  This weighs toward enablement.
Regarding factor (C), the prior art does not appear to describe all of the claimed elements/steps in the manner claimed (see below).  This weighs against enablement.
Regarding factor (D), one of ordinary skill in the art would be familiar with augmenting training/testing data using various methods/models.  This does not weigh toward or against enablement.
Regarding factor (E), the level of predictability in the art is generally low, as this is the reason that machine learning/predictive models are being developed and used.  This weighs against enablement.
Regarding factor (F), the inventor appears to discuss clustering and a diversity measure (see above), but does not appear to provide any direction regarding a “gap analysis.”  This weighs against enablement.
Regarding factor (G), examples are provided regarding clustering and diversity measures (see above), but no working example appears to be provided of a “gap analysis.”  This weighs against enablement.
Regarding factor (H), the quantity of experimentation needed to make or use the invention based on the content of the disclosure appears to be high, due to a lack of working examples and low level of predictability.  This weighs against enablement.
Additionally, the specification does not describe how “the threshold number of examples is sufficient to accurately test or train the model with respect to data items having the classification.” 
Regarding the above-identified factors to be considered, this limitation is addressed as follows:
Regarding factor (A), the claims are relatively narrow, in describing a series of steps being performed on private (and non-private) data items to produce an augmented training/testing set.  This does not weigh toward or against enablement.
Regarding factor (B), the nature of the invention is drawn to augmenting testing/training data for a machine learning model.  This weighs toward enablement.
Regarding factor (C), the prior art does not appear to describe all of the claimed elements/steps in the manner claimed (see below).  This weighs against enablement.
Regarding factor (D), one of ordinary skill in the art would be familiar with augmenting training/testing data using various methods/models.  This does not weigh toward or against enablement.
Regarding factor (E), the level of predictability in the art is generally low, as this is the reason that machine learning/predictive models are being developed and used.  This weighs against enablement.
Regarding factor (F), the inventor appears to provide direction toward there being certain thresholds for under- and over-representation, but not how these are identified/determined (see above).  This weighs against enablement.
Regarding factor (G), examples of specific under- and over-representations thresholds are given (see above), but not how these are identified/determined.  This weighs against enablement.
Regarding factor (H), the quantity of experimentation needed to make or use the invention based on the content of the disclosure appears to be high, as no specific means/method of making the determination appears to be provided.  This weighs against enablement.
Therefore, upon consideration of all of the evidence related to each of these factors, and based on the evidence as a whole, claim 1 is not enabled.
Claims 17-19 depend upon claim 22, and thus include the aforementioned limitation(s).

Claim 23 fail(s) to comply with the enablement requirement, as the specification does not describe how the claimed subject matter is achieved.  Namely, the specification does not describe how “gap analysis [is performed] on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples.” 
Regarding the above-identified factors to be considered, this limitation is addressed as follows:
Regarding factor (A), the claims are relatively narrow, in describing a series of steps being performed on private (and non-private) data items to produce an augmented training/testing set.  This does not weigh toward or against enablement.
Regarding factor (B), the nature of the invention is drawn to augmenting testing/training data for a machine learning model.  This weighs toward enablement.
Regarding factor (C), the prior art does not appear to describe all of the claimed elements/steps in the manner claimed (see below).  This weighs against enablement.
Regarding factor (D), one of ordinary skill in the art would be familiar with augmenting training/testing data using various methods/models.  This does not weigh toward or against enablement.
Regarding factor (E), the level of predictability in the art is generally low, as this is the reason that machine learning/predictive models are being developed and used.  This weighs against enablement.
Regarding factor (F), the inventor appears to discuss clustering and a diversity measure (see above), but does not appear to provide any direction regarding a “gap analysis.”  This weighs against enablement.
Regarding factor (G), examples are provided regarding clustering and diversity measures (see above), but no working example appears to be provided of a “gap analysis.”  This weighs against enablement.
Regarding factor (H), the quantity of experimentation needed to make or use the invention based on the content of the disclosure appears to be high, due to a lack of working examples and low level of predictability.  This weighs against enablement.
Additionally, the specification does not describe how “the threshold number of examples is sufficient to accurately train the model with respect to data items having the classification.” 
Regarding the above-identified factors to be considered, this limitation is addressed as follows:
Regarding factor (A), the claims are relatively narrow, in describing a series of steps being performed on private (and non-private) data items to produce an augmented training/testing set.  This does not weigh toward or against enablement.
Regarding factor (B), the nature of the invention is drawn to augmenting testing/training data for a machine learning model.  This weighs toward enablement.
Regarding factor (C), the prior art does not appear to describe all of the claimed elements/steps in the manner claimed (see below).  This weighs against enablement.
Regarding factor (D), one of ordinary skill in the art would be familiar with augmenting training/testing data using various methods/models.  This does not weigh toward or against enablement.
Regarding factor (E), the level of predictability in the art is generally low, as this is the reason that machine learning/predictive models are being developed and used.  This weighs against enablement.
Regarding factor (F), the inventor appears to provide direction toward there being certain thresholds for under- and over-representation, but not how these are identified/determined (see above).  This weighs against enablement.
Regarding factor (G), examples of specific under- and over-representations thresholds are given (see above), but not how these are identified/determined.  This weighs against enablement.
Regarding factor (H), the quantity of experimentation needed to make or use the invention based on the content of the disclosure appears to be high, as no specific means/method of making the determination appears to be provided.  This weighs against enablement.
Therefore, upon consideration of all of the evidence related to each of these factors, and based on the evidence as a whole, claim 1 is not enabled.
Claim 21 depends upon claim 23, and thus includes the aforementioned limitation(s).


The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-19 and 21-23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites the limitation “that is unrepresented or under-represented has fewer than a threshold number of examples.”  The intended scope of the claim is not clear because it is not clear what is meant by “or under-represented has fewer than a threshold number of examples.”  For the purposes of examination, the examiner assumes that it is intended to be “that is unrepresented or has fewer than a threshold number of examples” (i.e. that having fewer than the threshold number is being “under-represented”). The term “semantically meaningful” in claim 1 is also a relative term which renders the claim indefinite. The term “semantically meaningful” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The examiner has assumed, for the purposes of examination, that the categories have a semantic label. The term “sufficient to accurately test or train the model” in claim 1 is also a relative term which renders the claim indefinite. The term “sufficient to accurately test or train the model” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The examiner has assumed, for the purposes of examination, that having the threshold number of examples (of the classification) means that the data is sufficient to accurately test or train the model (also see above). 
Claims 2-15 depend upon claim 1, and thus include the aforementioned limitation(s).
Claim 8 also recites the limitation "each respective cluster” in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 15 also recites the term “over-represented” which is a relative term which renders the claim indefinite. The term “over-represented” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.

Claim 22 recites the limitation "the testing or training set” in line 22.  There is insufficient antecedent basis for this limitation in the claim. The term “semantically meaningful” in claim 22 is also a relative term which renders the claim indefinite. The term “semantically meaningful” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The examiner has assumed, for the purposes of examination, that the categories have a semantic label. The term “sufficient to accurately test or train the model” in claim 22 is also a relative term which renders the claim indefinite. The term “sufficient to accurately test or train the model” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The examiner has assumed, for the purposes of examination, that having the threshold number of examples (of the classification) means that the data is sufficient to accurately test or train the model (also see above).
Claims 17-19 depend upon claim 22, and thus include the aforementioned limitation(s).

Claim 23 recites the limitation "the testing or training set” in line 19.  There is insufficient antecedent basis for this limitation in the claim. The term “semantically meaningful” in claim 23 is also a relative term which renders the claim indefinite. The term “semantically meaningful” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The examiner has assumed, for the purposes of examination, that the categories have a semantic label. The term “sufficient to accurately train the model” in claim 23 is also a relative term which renders the claim indefinite. The term “sufficient to accurately train the model” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The examiner has assumed, for the purposes of examination, that having the threshold number of examples (of the classification) means that the data is sufficient to accurately test or train the model (also see above).
Claim 21 depends upon claim 23, and thus includes the aforementioned limitation(s).


Allowable Subject Matter
Examiner’s Note: the cited art teaches various systems for augmenting training/testing data sets, including combinations of private and public data, labeled and unlabeled data, clustering and classifying data, and improving imbalanced data sets.  However, none of the cited art appears to provide motivation for combining all of the claimed elements in the manner claimed, including classifying private data items in a repository by mapping the private data items into a feature space using one or more machine learning models trained on non-private data to build feature maps, wherein the one or more machine learning models are trained on auxiliary tasks to identify features in the data without extracting private information; clustering the private data items in the feature space using a clustering algorithm to partition the feature space into semantically meaningful categories; performing gap analysis on the private data items using the clustering algorithm and a diversity measure to identify classifications that are unrepresented or having fewer than a threshold number of examples in the testing or training set; determining quality labels for the private data items using a quality estimation model trained on non-private data, wherein the quality labels are determined without manual inspection of the private data items; and augmenting the testing or training set based on the classifications and gap analysis, where the augmented set includes additional examples from a particular classification that is unrepresented or has fewer than a threshold number of examples prior to augmenting, and wherein the threshold number of examples is sufficient to accurately train or test the model with respect to data items with that classification.


Response to Arguments
The prior rejections under 35 U.S.C. 112 have been withdrawn/updated based upon the amendments filed.  Examiner’s Note: the remarks, filed 28 May 2025, indicate that the term “over-represented” has been removed from the claims, but it still appears in claim 15.

The rejection of claims 20-21 under 35 U.S.C. 101 have been withdrawn due to the amendments filed.

Applicant’s arguments, see the remarks, filed 28 May 2025, with respect to the rejections under 35 U.S.C. 102 and 103 have been fully considered and are persuasive, in view of the amendments made to the independent claims (see above).  The rejections of claims 1-19 and 21 have been withdrawn. 


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claim 20 is cancelled; claims 1-19 and 21-23 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Müller (US 2021/0294945) – discloses training a neural network to approximate variance-optimal selection probabilities for training.
Gong et al. (PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation, May 2021, pgs. 1-15) – discloses classifying audio samples, including augmenting a training/testing dataset including balanced sampling for underrepresented classes in the dataset.
Hu (US 2020/0104643) – discloses unsupervised learning using cluster centers for each of the classes in a dataset, which are learned using a self-organizing feature map.
Zhang (US 2021/0158799) – discloses noise reduction for audio optimization for a machine learning model.
Gultekin (US 2021/0326646) – discloses automated generation of training samples, including class-specific thresholds to address sampling bias from a varying number and quality of training samples.
Xiao et al. (The Art of Labeling: Task Augmentation for Private (Collaborative) Learning on Transformed Data, May 2021, pgs. 1-17) – discloses using private transformations of private data/labels for shared data in a distributed learning system, and using generated noise as an auxiliary learning task.
Sattler et al. (FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning, Feb 2021, pgs. 1-23) – discloses federated distillation using unlabeled auxiliary data to train a client/private model.
McMahan et al. (Learning Differentially Private Recurrent Language Models, Feb 2018, pgs. 1-14) – discloses training large recurrent language models with user-level differential privacy guarantees by adding user-level privacy protection to the federated averaging algorithm.
Alwassel et al. (Self-Supervised Learning by Cross-Modal Audio-Video Clustering, 2020, pgs. 1-13) – discloses Cross-Modal Deep Clustering (XDC) as a self-supervised method that applies unsupervised clustering in one modality (e.g., audio) to another modality (e.g., video), which includes assigning samples to clusters based on sample features, and which cluster assignments are then used as labels for the cross-modality.
Zhang et al. (Learning from crowdsourced labeled data: a survey, July 2016, pgs. 1-34) – discloses various systems/methods for combining different data sources including determining label and model quality.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE GIROUX/Primary Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Oct 15, 2021
Application Filed
Mar 08, 2025
Non-Final Rejection — §112
Apr 08, 2025
Interview Requested
Apr 16, 2025
Interview Requested
May 28, 2025
Examiner Interview Summary
May 28, 2025
Applicant Interview (Telephonic)
May 28, 2025
Response Filed
May 28, 2025
Response after Non-Final Action
Jul 09, 2025
Response Filed
Oct 18, 2025
Final Rejection — §112
Dec 04, 2025
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/009,713
Patent 12572807
Neural Network Methods for Defining System Topology
2y 5m to grant Granted Mar 10, 2026
17/200,003
Patent 12572818
DEVICE AND METHOD FOR RANDOM WALK SIMULATION
2y 5m to grant Granted Mar 10, 2026
16/699,051
Patent 12554986
WEIGHT QUANTIZATION IN NEURAL NETWORKS
2y 5m to grant Granted Feb 17, 2026
17/990,242
Patent 12554983
MACHINE LEARNING-BASED SYSTEMS AND METHODS FOR IDENTIFYING AND RESOLVING CONTENT ANOMALIES IN A TARGET DIGITAL ARTIFACT
2y 5m to grant Granted Feb 17, 2026
18/642,614
Patent 12541696
ENHANCED VALIDITY MODELING USING MACHINE-LEARNING TECHNIQUES
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
66%
Grant Probability
93%
With Interview (+27.1%)
4y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 612 resolved cases by this examiner. Grant probability derived from career allow rate.
AUGMENTATION OF TESTING OR TRAINING SETS FOR MACHINE LEARNING MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email