Office Action Analysis: 18336511 — Improved Training Set Selection for Semi-Supervised Learning

Office Action

§101 §102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/16/2023 and 08/01/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112b
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4, 9, 14, 18, and 19 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 4, 9, 14, and 19 recites the limitation “wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data.” The limitation is unclear. Examiner interprets the limitation as the second set is generated by adding the subset of examples selected from the first set of training data.

Claim 18 recites the limitation "The system of claim 12" in line 1.  There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the claims as “The system of claim 17”.

In reference to dependent claims of claims 4, 9, 13, 18, and 19, dependent claims do not cure the deficiencies noted in the rejection of dependent claims 4, 9, 13, 18, and 19. Therefore, these claims are rejected under the same rationale as claims 4, 9, 13, 18, and 19.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

In reference to claim 1:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“selecting, from the first set of training data, a subset of training examples by selecting at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could select at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes.
“generating a second set of training data that includes the subset of training examples selected from the first set of training data, wherein each training example of the subset is labelled in the second set of training data as belonging to a respective class as predicted by the machine learning model;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate a second set of training data that includes the subset of training examples selected from the first set of training data.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“A method comprising: obtaining a first set of training data that includes a first plurality of training examples;” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input;” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (ii) determine a respective score that is representative of a degree of confidence of the respective class;” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“and using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model.” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“A method comprising: obtaining a first set of training data that includes a first plurality of training examples;” (well-understood, routine, conventional MPEP 2106.05(d))
“obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input;” (well-understood, routine, conventional MPEP 2106.05(d))
“applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (ii) determine a respective score that is representative of a degree of confidence of the respective class;” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“and using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model.” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 2:
Claim 2 is directed to a judicial exception from claim(s) depended on and does not recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the judicial exception.

In reference to claim 3:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method of claim 1, wherein determining the respective score comprises generating, as an output of the machine learning model when applying the given training example thereto, at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate at least one of a likelihood or a probability that the given training examples is a member of a predicted class of the given training example.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

In reference to claim 4:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate the second set of training data by adding the subset of training examples selected from the first set of training data.
“wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the first set of training data.
“wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the third set of training data.
“and determining, for the given training example, at least one distance between an embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).
“and wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“The method of claim 3, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“The method of claim 3, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (well-understood, routine, conventional MPEP 2106.05(d))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 5:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method of claim 4, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(b)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

In reference to claim 6:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method of claim 4, wherein determining the at least one distance comprises determining a distance between the embedding vector output by the machine learning model when applying the given training example thereto and an embedding vector generated for a training example of the third set of training data that is, of the training examples of the third set of training data that are the common class as the predicted class of the given training example, closest in the embedding space.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could determine that at least one distance between the embedding vectors.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

In reference to claim 7:
Claim 7 is directed to a judicial exception from claim(s) depended on and does not recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the judicial exception.

In reference to claim 8:
Claim 8 is directed to a judicial exception from claim(s) depended on and does not recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the judicial exception.
	
In reference to claim 9:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate the second set of training data by adding the subset of training examples selected from the first set of training data.
“wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the first set of training data.
“wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the third set of training data.
“and determining, for the given training example, at least one distance between an embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).
“and wherein determining a score that is representative of the degree of confidence of a class of the given training example comprises determining the score based on the at least one distance.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(a)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“The method of claim 1, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“The method of claim 3, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (well-understood, routine, conventional MPEP 2106.05(d))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 10:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method of claim 9, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(b)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

In reference to claim 11:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a process

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method of claim 9, wherein determining the at least one distance comprises determining a distance between the embedding vector output by the machine learning model when applying the given training example thereto and an embedding vector generated for a training example of the third set of training data that is, of the training examples of the third set of training data that are the common class as the predicted class of the given training example, closest in the embedding space.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could determine that at least one distance between the embedding vectors.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

In reference to claim 12:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a manufacture

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“selecting, from the first set of training data, a subset of training examples by selecting at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could select at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes.
“generating a second set of training data that includes the subset of training examples selected from the first set of training data, wherein each training example of the subset is labelled in the second set of training data as belonging to a respective class as predicted by the machine learning model;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate a second set of training data that includes the subset of training examples selected from the first set of training data.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“A non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing system, cause the computing system to perform operations comprising:” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“obtaining a first set of training data that includes a first plurality of training examples;” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input;” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (ii) determine a respective score that is representative of a degree of confidence of the respective class;” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“and using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model.” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“A non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing system, cause the computing system to perform operations comprising:” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“obtaining a first set of training data that includes a first plurality of training examples;” (well-understood, routine, conventional MPEP 2106.05(d))
“obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input;” (well-understood, routine, conventional MPEP 2106.05(d))
“applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (ii) determine a respective score that is representative of a degree of confidence of the respective class;” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“and using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model.” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 13:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a manufacture

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The non-transitory computer-readable medium of claim 12, wherein determining the respective score comprises generating, as an output of the machine learning model when applying the given training example thereto, at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate at least one of a likelihood or a probability that the given training examples is a member of a predicted class of the given training example.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No
	

In reference to claim 14:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a manufacture

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate the second set of training data by adding the subset of training examples selected from the first set of training data.
“wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the first set of training data.
“wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the third set of training data.
“and determining, for the given training example, at least one distance between an embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).
“and wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“The non-transitory computer-readable medium of claim 13, wherein the operations further comprise, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“The method of claim 3, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (well-understood, routine, conventional MPEP 2106.05(d))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 15:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a manufacture

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The non-transitory computer-readable medium of claim 14, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(b)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

In reference to claim 16:
Claim 16 is directed to a judicial exception from claim(s) depended on and does not recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the judicial exception.

In reference to claim 17:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a machine

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“selecting, from the first set of training data, a subset of training examples by selecting at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could select at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes.
“generating a second set of training data that includes the subset of training examples selected from the first set of training data, wherein each training example of the subset is labelled in the second set of training data as belonging to a respective class as predicted by the machine learning model;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate a second set of training data that includes the subset of training examples selected from the first set of training data.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“A system comprising: one or more processors; and memory, containing program instructions that, upon execution by the one or more processors, cause the system to perform operations comprising:” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“obtaining a first set of training data that includes a first plurality of training examples;” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input;” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (ii) determine a respective score that is representative of a degree of confidence of the respective class;” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“and using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model.” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“A system comprising: one or more processors; and memory, containing program instructions that, upon execution by the one or more processors, cause the system to perform operations comprising:” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“obtaining a first set of training data that includes a first plurality of training examples;” (well-understood, routine, conventional MPEP 2106.05(d))
“obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input;” (well-understood, routine, conventional MPEP 2106.05(d))
“applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (ii) determine a respective score that is representative of a degree of confidence of the respective class;” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
“and using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model.” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 18:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a machine

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The system of claim 12, wherein determining the respective score comprises generating, as an output of the machine learning model when applying the given training example thereto, at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate at least one of a likelihood or a probability that the given training examples is a member of a predicted class of the given training example.

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No

In reference to claim 19:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a manufacture

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)(c)). For example, a person could generate the second set of training data by adding the subset of training examples selected from the first set of training data.
“wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the first set of training data.
“wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space;” which is an abstract idea because it is directed to a mental process, an observation, evaluation, judgement, or opinion. The limitation as drafted, and under a broadest reasonable interpretation, can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(Ill)). For example, a person could generate a respective embedding vector for the third set of training data.
“and determining, for the given training example, at least one distance between an embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).
“and wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(c)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
“The system of claim 18, wherein the operations further comprise, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (insignificant extra-solution activity mere data gathering MPEP 2106.05(g))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are integrated into a practical application.

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
“The method of claim 3, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes,” (well-understood, routine, conventional MPEP 2106.05(d))
“and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input,” is merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.

In reference to claim 20:
Step 1 - Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is directed to a machine

Step 2A Prong 1 - Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The system of claim 19, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.” which is an abstract idea because it is directed to a mathematical relationships, mathematical formulas or equations, and mathematical calculations. (MPEP 2106.04(a)(2)(I)(b)).

Step 2A Prong 2 - Does the claim recite additional elements that integrate the judicial exception into a practical application?
No 

Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
No 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 3, 12, 13, 17, and 18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lan-Zhe Guo et al; “Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding” published 2022 (hereinafter “Guo”).

Regarding claim 1, Guo anticipates A method comprising: obtaining a first set of training data that includes a first plurality of training examples; (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a first set of training data that includes a first plurality of training examples (unlabeled data) is obtained)
obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input; (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp(−sk) be equal to the predicted probability ranked at ρ×length (Ck),where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels. Such a strategy takes the predicted probability ranked at ρ×100% separately from each class as a reference for thresholding.” Guo Page 7 Paragraph 3; “We train the model with batch size 64 for 218 iterations.” Examiner notes that a machine learning model (model) that has been trained to select, from a set of classes (pseudo-labels; each class) a predicted class for an input (number of unlabeled examples predicted as class K) is obtained through training)
applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (Examiners reference previous mapping to show that each training example of the first set of training data (unlabeled examples) is applied to the machine learning model to predict a respective class (class k) from the set of classes)
(ii) determine a respective score that is representative of a degree of confidence of the respective class; (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Examiner notes that a respective score that is representative of a degree of confidence (confidence level) of the respective class (class k) is determined as shown in algorithm 2)
selecting, from the first set of training data, a subset of training examples by selecting at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes; (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp (−sk) be equal to the predicted probability ranked at ρ×length(Ck), where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 12 Paragraph 2; “This ensures pseudo-labels with the same confidence level within class can be selected for every class… Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence” Examiner notes that selecting, from the first set of training data (unlabeled examples), a subset of training examples by selecting at least a top N training examples (selected examples with high within-class confidence), with respect to score (confidence level), from each of the predicted classes of the set of classes (selected for every class))
generating a second set of training data that includes the subset of training examples selected from the first set of training data, wherein each training example of the subset is labelled in the second set of training data as belonging to a respective class as predicted by the machine learning model; and (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) that includes the subset of training examples selected from the first set of training data (unlabeled data) is generated, wherein each training example of the subset is labeled (pseudo label) in the second set of training data as belonging to a respective class as predicted by the machine learning model (shown in algorithm 2))
using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model. (Guo Page 9 Paragraph 3; “We propose a novel Adsh approach that adaptively selects pseudo-labels to train models based on a class-dependent threshold.” Examiner notes that the second set of training data (pseudo labels) is used to further train the machine learning model, thereby generating an updated model)

Regarding claim 3, Guao anticipates The method of claim 1, wherein determining the respective score comprises generating, as an output of the machine learning model when applying the given training example thereto, at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp(−sk) be equal to the predicted probability ranked at ρ×length (Ck)” Examiner notes that respective score comprises generating, as an output of the machine learning model (model used as shown in algorithm 2) when applying the given training example (unlabeled data), a probability that the given training example is a member of a predicted class of the given training example (probabilities predicted as class k))

Regarding claim 12, Guo anticipates A non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing system, cause the computing system to perform operations comprising: (Guo Page 6 Paragraph 2; “The proportion ρ is computed from the majority class” Examiner notes that the word “computed” suggested a computing system having stored instructions to perform)
obtaining a first set of training data that includes a first plurality of training examples; (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a first set of training data that includes a first plurality of training examples (unlabeled data) is obtained)
obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input; (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp(−sk) be equal to the predicted probability ranked at ρ×length (Ck),where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels. Such a strategy takes the predicted probability ranked at ρ×100% separately from each class as a reference for thresholding.” Guo Page 7 Paragraph 3; “We train the model with batch size 64 for 218 iterations.” Examiner notes that a machine learning model (model) that has been trained to select, from a set of classes (pseudo-labels; each class) a predicted class for an input (number of unlabeled examples predicted as class K) is obtained through training)
applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (Examiners reference previous mapping to show that each training example of the first set of training data (unlabeled examples) is applied to the machine learning model to predict a respective class (class k) from the set of classes)
(ii) determine a respective score that is representative of a degree of confidence of the respective class; (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Examiner notes that a respective score that is representative of a degree of confidence (confidence level) of the respective class (class k) is determined as shown in algorithm 2)
selecting, from the first set of training data, a subset of training examples by selecting at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes; (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp (−sk) be equal to the predicted probability ranked at ρ×length(Ck), where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 12 Paragraph 2; “This ensures pseudo-labels with the same confidence level within class can be selected for every class… Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence” Examiner notes that selecting, from the first set of training data (unlabeled examples), a subset of training examples by selecting at least a top N training examples (selected examples with high within-class confidence), with respect to score (confidence level), from each of the predicted classes of the set of classes (selected for every class))
generating a second set of training data that includes the subset of training examples selected from the first set of training data, wherein each training example of the subset is labelled in the second set of training data as belonging to a respective class as predicted by the machine learning model; and (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) that includes the subset of training examples selected from the first set of training data (unlabeled data) is generated, wherein each training example of the subset is labeled (pseudo label) in the second set of training data as belonging to a respective class as predicted by the machine learning model (shown in algorithm 2))
using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model. (Guo Page 9 Paragraph 3; “We propose a novel Adsh approach that adaptively selects pseudo-labels to train models based on a class-dependent threshold.” Examiner notes that the second set of training data (pseudo labels) is used to further train the machine learning model, thereby generating an updated model)

Regarding claim 13, Guao anticipates The non-transitory computer-readable medium of claim 12, wherein determining the respective score comprises generating, as an output of the machine learning model when applying the given training example thereto, at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp(−sk) be equal to the predicted probability ranked at ρ×length (Ck)” Examiner notes that respective score comprises generating, as an output of the machine learning model (model used as shown in algorithm 2) when applying the given training example (unlabeled data), a probability that the given training example is a member of a predicted class of the given training example (probabilities predicted as class k))

Regarding claim 17, Guo anticipates A system comprising: one or more processors; and memory, containing program instructions that, upon execution by the one or more processors, cause the system to perform operations comprising: (Guo Page 6 Paragraph 2; “The proportion ρ is computed from the majority class” Examiner notes that the word “computed” suggested a computing system comprising one or more processors and memory having stored instructions to perform)
obtaining a first set of training data that includes a first plurality of training examples; (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a first set of training data that includes a first plurality of training examples (unlabeled data) is obtained)
obtaining a machine learning model that has been trained to select, from a set of classes, a predicted class for an input; (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp(−sk) be equal to the predicted probability ranked at ρ×length (Ck),where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels. Such a strategy takes the predicted probability ranked at ρ×100% separately from each class as a reference for thresholding.” Guo Page 7 Paragraph 3; “We train the model with batch size 64 for 218 iterations.” Examiner notes that a machine learning model (model) that has been trained to select, from a set of classes (pseudo-labels; each class) a predicted class for an input (number of unlabeled examples predicted as class K) is obtained through training)
applying each training example of the first set of training data to the machine learning model to (i) predict a respective class from the set of classes and (Examiners reference previous mapping to show that each training example of the first set of training data (unlabeled examples) is applied to the machine learning model to predict a respective class (class k) from the set of classes)
(ii) determine a respective score that is representative of a degree of confidence of the respective class; (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Examiner notes that a respective score that is representative of a degree of confidence (confidence level) of the respective class (class k) is determined as shown in algorithm 2)
selecting, from the first set of training data, a subset of training examples by selecting at least a top N training examples, with respect to score, from each of the predicted classes of the set of classes; (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp (−sk) be equal to the predicted probability ranked at ρ×length(Ck), where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 12 Paragraph 2; “This ensures pseudo-labels with the same confidence level within class can be selected for every class… Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence” Examiner notes that selecting, from the first set of training data (unlabeled examples), a subset of training examples by selecting at least a top N training examples (selected examples with high within-class confidence), with respect to score (confidence level), from each of the predicted classes of the set of classes (selected for every class))
generating a second set of training data that includes the subset of training examples selected from the first set of training data, wherein each training example of the subset is labelled in the second set of training data as belonging to a respective class as predicted by the machine learning model; and (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) that includes the subset of training examples selected from the first set of training data (unlabeled data) is generated, wherein each training example of the subset is labeled (pseudo label) in the second set of training data as belonging to a respective class as predicted by the machine learning model (shown in algorithm 2))
using the second set of training data, further training the machine learning model, thereby generating an updated machine learning model. (Guo Page 9 Paragraph 3; “We propose a novel Adsh approach that adaptively selects pseudo-labels to train models based on a class-dependent threshold.” Examiner notes that the second set of training data (pseudo labels) is used to further train the machine learning model, thereby generating an updated model)

Regarding claim 18, Guao anticipates The system of claim 12, wherein determining the respective score comprises generating, as an output of the machine learning model when applying the given training example thereto, at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Guo Page 6 Paragraph 1; “the algorithm to determine sk exploits the class-wise confidence threshold effectively by ranking all the probabilities predicted as class k in descending order and setting sk such that exp(−sk) be equal to the predicted probability ranked at ρ×length (Ck)” Examiner notes that respective score comprises generating, as an output of the machine learning model (model used as shown in algorithm 2) when applying the given training example (unlabeled data), a probability that the given training example is a member of a predicted class of the given training example (probabilities predicted as class k))

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 2, 8, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Lan-Zhe Guo et al; “Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding” published 2022 (hereinafter “Guo”) in further view of Mu Qiao et al; US 20240248920 A1 filed on Jan 19, 2023 (hereinafter “Qiao”).

Regarding claim 2, Guo does not teach The method of claim 1, wherein each training example in the first plurality of training examples represents a textual input, and wherein each class of the set of classes represents a respective different type of response that could be expressed in a textual input.
However, Qiao does teach The method of claim 1, wherein each training example in the first plurality of training examples represents a textual input, and wherein each class of the set of classes represents a respective different type of response that could be expressed in a textual input. (Qiao Paragraph 0053; “the first machine learning model 302 can comprise a first generative pre-trained transformer (GPT) model. GPT models are deep learning or neural network language models that are pre-trained on a large text corpus to generate text responses to input text prompts and are then adapted to perform specific tasks using additional training data specific to a specific task.” Qiao Paragraph 0048; “the in-context learning component 316 can utilize in-context learning, as described above in FIG. 1, to adapt the second GPT model to generate the second response portion. For example, the in-context learning component 316 can classify the input query into one or more emotion classes.” Examiner notes that each training example in the first plurality of training examples (large text corpus) represents a textural input (text prompts); each class of the set of classes represent different type of response (emotions) that could be expressed in a textual input)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Wu. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Wu teaches using GPT to generate responses to inputs. One of ordinary skill would have motivation to combine Guo and Wu to leverage pretrained models to reduce training cost, computation cycles, and amount of training data “In-context learning takes advantage of pre-trained models or previously trained models, such as GPT model 103, to reduce training cost, and thereby reduce the computation cycles and amount of training data called for to adapt a model to a specific task.” (Qiao Paragraph 0029).

Regarding claim 8, Guo does not teach The method of claim 1, wherein the machine learning model comprises a transformer.
However, Qiao does teach The method of claim 1, wherein the machine learning model comprises a transformer. (Qiao Paragraph 0028; “a machine learning model such as generative pre-trained transformer model (GPT) 103 can receive prompts 101 and input query 102.” Examiner notes that the machine learning model comprises a transformer (GPT))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Wu. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Wu teaches using GPT to generate responses to inputs. One of ordinary skill would have motivation to combine Guo and Wu to leverage pretrained models to reduce training cost, computation cycles, and amount of training data “In-context learning takes advantage of pre-trained models or previously trained models, such as GPT model 103, to reduce training cost, and thereby reduce the computation cycles and amount of training data called for to adapt a model to a specific task.” (Qiao Paragraph 0029).

Regarding claim 16, Guo does not teach The non-transitory computer-readable medium of claim 12, wherein the machine learning model comprises a transformer.
However, Qiao does teach The method of claim 1, wherein the machine learning model comprises a transformer. (Qiao Paragraph 0028; “a machine learning model such as generative pre-trained transformer model (GPT) 103 can receive prompts 101 and input query 102.” Examiner notes that the machine learning model comprises a transformer (GPT))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Wu. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Wu teaches using GPT to generate responses to inputs. One of ordinary skill would have motivation to combine Guo and Wu to leverage pretrained models to reduce training cost, computation cycles, and amount of training data “In-context learning takes advantage of pre-trained models or previously trained models, such as GPT model 103, to reduce training cost, and thereby reduce the computation cycles and amount of training data called for to adapt a model to a specific task.” (Qiao Paragraph 0029).

Claim(s) 4-6, 14-15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lan-Zhe Guo et al; “Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding” published 2022 (hereinafter “Guo”) in further view of Adrian Tam; “A Gentle Introduction to Vector Space Models” available online Jan 29, 2023 (hereinafter “Tam”) in further view of Georg Stemmer et al; “Comparison and Combination of Confidence Measures” available online Jan 01 2002 (hereinafter “Stemmer”) in further view of Busra Ozgode Yigin; “Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data” available online April 21, 2023 (hereinafter “Yigin”)

Regarding claim 4, Guo teaches The method of claim 3, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes, (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a third set of training data (labeled data) that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective selected from the set of classes is obtained)
and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input, (Guo Page 7 Paragraph 3; “we adopt the Wide ResNet-28-2 (Zagoruyko & Komodakis, 2016) as the backbone since it is commonly adopted in various SSL methods (Oliver et al., 2018). We train the model with batch size 64 for 218 iterations.” Examiner notes that model is trained using the third set of training data to select from the set of classes, a predicted class for an input)
wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data, (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) comprises adding the subset of training examples selected from the first set of training data (unlabeled data) is generated)

Guo does not teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and
However, Tam does teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space, (Tam Section “Using vector space model for similarity” Paragraph 4; “To better illustrate the idea rather than hiding the actual manipulation in pandas or numpy functions, we first extract the data for each country as a vector:… The Python dictionary we created has the name of each country as a key and the economic metrics as a numpy array. There are 5 metrics, hence each is a vector of 5 dimensions.” Examiner notes that a respective embedding vector that represents a respective location in an embedding space (vector space) can be generated of the first set of training data)
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and (Examiner references previous mapping to show that a respective embedding vector that represents a respective location in the embedding space is generated for the third set of training data)
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and (Tam Section “Using vector space model for similarity” Paragraph 6; “we can use the vector representation of each country to see how similar it is to another. Let’s try both the L2-norm of the difference (the Euclidean distance) and the cosine distance.” Examiner notes that at least one distance (Euclidean/cosine distance) between an embedding vector output by the machine learning model and embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example (vector representations of training example and third set of training data that is of a common class as the predicted class of the given training example))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Tam. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. One of ordinary skill would have motivation to combine Guo and Tam to utilize vector space and represent a data point as a vector for conveniency “It is useful to consider a vector space because it is useful to represent things as a vector. For example in machine learning, we usually have a data point with multiple features. Therefore, it is convenient for us to represent a data point as a vector.” (Tam Section “Vector Space and Cosine Formula” Paragraph 2).

Guo in view of Tam does not teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.
However, Stemmer does teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of [(i) the at least one distance and ](ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Stemmer Page 4 Paragraph 4; “As a word graph usually contains several instances wi of the word w which differ in ts(i) and te(i) the confidence measure can be improved by summing up P(wi,ts(i),te(i)| O) of all word hypotheses wi which overlap in the time domain.” Examiner notes that for the given training example of the first set of training data (word w), a score that is representative of the degree of confidence of a class of the given training example (confidence measure) comprises determining a weighted combination (each word hypotheses wi has a weight of 1) of at least a probability that the given training example is a member of a predicted class of the given training example (P(wi,ts(i),te(i)| O)))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, and Stemmer. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. One of ordinary skill would have motivation to combine Guo, Tam, and Stemmer to utilize the combination of features for improved results and confidence “The a-posteriori probability performs better than all other single features. The results indicate that feature combinations, like in GSCORE, improve results significantly. For WCPOS we were able to get a better confidence annotation by incorporating the part-of-speech labels of the neighboring words.” (Stemmer Page 5 Paragraph 3).

Guo in view of Tam in further view of Stemmer does not teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance [and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.]
However, Yigin does teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises [determining a weighted combination of] [(i) the at least one distance and ](ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures.” Examiner notes that degree of confidence comprises the at least one distance (distances between the neighbors))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 5, Guo does not teach The method of claim 4, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.
However, Yigin does teach The method of claim 4, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures…the most common distance measures namely Euclidean, cosine, … were used to extract features from the datasets from different domains for the prediction of confidence scores.” Examiner notes that the at least one distance comprises determining at least one cosine similarity between the embedding vector output and the one embedding vector of the third set of training data (distance between embedding vectors of the neighbors in dimensional space))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 6, Guo does not teach The method of claim 4, wherein determining the at least one distance comprises determining a distance between the embedding vector output by the machine learning model when applying the given training example thereto and an embedding vector generated for a training example of the third set of training data that is, of the training examples of the third set of training data that are the common class as the predicted class of the given training example, closest in the embedding space.
However, Yigin does teach The method of claim 4, wherein determining the at least one distance comprises determining a distance between the embedding vector output by the machine learning model when applying the given training example thereto and an embedding vector generated for a training example of the third set of training data that is, of the training examples of the third set of training data that are the common class as the predicted class of the given training example, closest in the embedding space. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures…the most common distance measures namely Euclidean, cosine, … were used to extract features from the datasets from different domains for the prediction of confidence scores.” Examiner notes that the at least one distance comprises determining a distance between the embedding vector output and the one embedding vector of the third set of training data (distance between embedding vectors of the neighbors in dimensional space) that is closest in the embedding space (neighbors includes closest embedding vectors of other neighbors))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 14, Guo teaches The non-transitory computer-readable medium of claim 13, wherein the operations further comprise, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes, (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a third set of training data (labeled data) that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective selected from the set of classes is obtained)
and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input, (Guo Page 7 Paragraph 3; “we adopt the Wide ResNet-28-2 (Zagoruyko & Komodakis, 2016) as the backbone since it is commonly adopted in various SSL methods (Oliver et al., 2018). We train the model with batch size 64 for 218 iterations.” Examiner notes that model is trained using the third set of training data to select from the set of classes, a predicted class for an input)
wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data, (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) comprises adding the subset of training examples selected from the first set of training data (unlabeled data) is generated)

Guo does not teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and
However, Tam does teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space, (Tam Section “Using vector space model for similarity” Paragraph 4; “To better illustrate the idea rather than hiding the actual manipulation in pandas or numpy functions, we first extract the data for each country as a vector:… The Python dictionary we created has the name of each country as a key and the economic metrics as a numpy array. There are 5 metrics, hence each is a vector of 5 dimensions.” Examiner notes that a respective embedding vector that represents a respective location in an embedding space (vector space) can be generated of the first set of training data)
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and (Examiner references previous mapping to show that a respective embedding vector that represents a respective location in the embedding space is generated for the third set of training data)
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and (Tam Section “Using vector space model for similarity” Paragraph 6; “we can use the vector representation of each country to see how similar it is to another. Let’s try both the L2-norm of the difference (the Euclidean distance) and the cosine distance.” Examiner notes that at least one distance (Euclidean/cosine distance) between an embedding vector output by the machine learning model and embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example (vector representations of training example and third set of training data that is of a common class as the predicted class of the given training example))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Tam. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. One of ordinary skill would have motivation to combine Guo and Tam to utilize vector space and represent a data point as a vector for conveniency “It is useful to consider a vector space because it is useful to represent things as a vector. For example in machine learning, we usually have a data point with multiple features. Therefore, it is convenient for us to represent a data point as a vector.” (Tam Section “Vector Space and Cosine Formula” Paragraph 2).

Guo in view of Tam does not teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.
However, Stemmer does teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of [(i) the at least one distance and ](ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Stemmer Page 4 Paragraph 4; “As a word graph usually contains several instances wi of the word w which differ in ts(i) and te(i) the confidence measure can be improved by summing up P(wi,ts(i),te(i)| O) of all word hypotheses wi which overlap in the time domain.” Examiner notes that for the given training example of the first set of training data (word w), a score that is representative of the degree of confidence of a class of the given training example (confidence measure) comprises determining a weighted combination (each word hypotheses wi has a weight of 1) of at least a probability that the given training example is a member of a predicted class of the given training example (P(wi,ts(i),te(i)| O)))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, and Stemmer. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. One of ordinary skill would have motivation to combine Guo, Tam, and Stemmer to utilize the combination of features for improved results and confidence “The a-posteriori probability performs better than all other single features. The results indicate that feature combinations, like in GSCORE, improve results significantly. For WCPOS we were able to get a better confidence annotation by incorporating the part-of-speech labels of the neighboring words.” (Stemmer Page 5 Paragraph 3).

Guo in view of Tam in further view of Stemmer does not teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance [and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.]
However, Yigin does teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises [determining a weighted combination of] [(i) the at least one distance and ](ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures.” Examiner notes that degree of confidence comprises the at least one distance (distances between the neighbors))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 15, Guo does not teach The non-transitory computer-readable medium of claim 14, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.
However, Yigin does teach The method of claim 4, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures…the most common distance measures namely Euclidean, cosine, … were used to extract features from the datasets from different domains for the prediction of confidence scores.” Examiner notes that the at least one distance comprises determining at least one cosine similarity between the embedding vector output and the one embedding vector of the third set of training data (distance between embedding vectors of the neighbors in dimensional space))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 19, Guo teaches The system of claim 18, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes, (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a third set of training data (labeled data) that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective selected from the set of classes is obtained)
and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input, (Guo Page 7 Paragraph 3; “we adopt the Wide ResNet-28-2 (Zagoruyko & Komodakis, 2016) as the backbone since it is commonly adopted in various SSL methods (Oliver et al., 2018). We train the model with batch size 64 for 218 iterations.” Examiner notes that model is trained using the third set of training data to select from the set of classes, a predicted class for an input)
wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data, (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) comprises adding the subset of training examples selected from the first set of training data (unlabeled data) is generated)

Guo does not teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and
However, Tam does teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space, (Tam Section “Using vector space model for similarity” Paragraph 4; “To better illustrate the idea rather than hiding the actual manipulation in pandas or numpy functions, we first extract the data for each country as a vector:… The Python dictionary we created has the name of each country as a key and the economic metrics as a numpy array. There are 5 metrics, hence each is a vector of 5 dimensions.” Examiner notes that a respective embedding vector that represents a respective location in an embedding space (vector space) can be generated of the first set of training data)
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and (Examiner references previous mapping to show that a respective embedding vector that represents a respective location in the embedding space is generated for the third set of training data)
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and (Tam Section “Using vector space model for similarity” Paragraph 6; “we can use the vector representation of each country to see how similar it is to another. Let’s try both the L2-norm of the difference (the Euclidean distance) and the cosine distance.” Examiner notes that at least one distance (Euclidean/cosine distance) between an embedding vector output by the machine learning model and embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example (vector representations of training example and third set of training data that is of a common class as the predicted class of the given training example))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Tam. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. One of ordinary skill would have motivation to combine Guo and Tam to utilize vector space and represent a data point as a vector for conveniency “It is useful to consider a vector space because it is useful to represent things as a vector. For example in machine learning, we usually have a data point with multiple features. Therefore, it is convenient for us to represent a data point as a vector.” (Tam Section “Vector Space and Cosine Formula” Paragraph 2).

Guo in view of Tam does not teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.
However, Stemmer does teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of [(i) the at least one distance and ](ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Stemmer Page 4 Paragraph 4; “As a word graph usually contains several instances wi of the word w which differ in ts(i) and te(i) the confidence measure can be improved by summing up P(wi,ts(i),te(i)| O) of all word hypotheses wi which overlap in the time domain.” Examiner notes that for the given training example of the first set of training data (word w), a score that is representative of the degree of confidence of a class of the given training example (confidence measure) comprises determining a weighted combination (each word hypotheses wi has a weight of 1) of at least a probability that the given training example is a member of a predicted class of the given training example (P(wi,ts(i),te(i)| O)))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, and Stemmer. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. One of ordinary skill would have motivation to combine Guo, Tam, and Stemmer to utilize the combination of features for improved results and confidence “The a-posteriori probability performs better than all other single features. The results indicate that feature combinations, like in GSCORE, improve results significantly. For WCPOS we were able to get a better confidence annotation by incorporating the part-of-speech labels of the neighboring words.” (Stemmer Page 5 Paragraph 3).

Guo in view of Tam in further view of Stemmer does not teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises determining a weighted combination of (i) the at least one distance [and (ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example.]
However, Yigin does teach wherein determining, for the given training example of the first set of training data, a score that is representative of the degree of confidence of a class of the given training example comprises [determining a weighted combination of] [(i) the at least one distance and ](ii) the at least one of a likelihood or a probability that the given training example is a member of a predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures.” Examiner notes that degree of confidence comprises the at least one distance (distances between the neighbors))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 20, Guo does not teach The system of claim 19, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.
However, Yigin does teach The system of claim 19, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures…the most common distance measures namely Euclidean, cosine, … were used to extract features from the datasets from different domains for the prediction of confidence scores.” Examiner notes that the at least one distance comprises determining at least one cosine similarity between the embedding vector output and the one embedding vector of the third set of training data (distance between embedding vectors of the neighbors in dimensional space))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Claim(s) 7 is rejected under 35 U.S.C. 103 as being unpatentable over Lan-Zhe Guo et al; “Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding” published 2022 (hereinafter “Guo”) in further view of Adrian Tam; “A Gentle Introduction to Vector Space Models” available online Jan 29, 2023 (hereinafter “Tam”) in further view of Georg Stemmer et al; “Comparison and Combination of Confidence Measures” available online Jan 01 2002 (hereinafter “Stemmer”) in further view of Busra Ozgode Yigin; “Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data” available online April 21, 2023 (hereinafter “Yigin”) in further view of James Fowe; US 20200208992 A1 filed on Jan 2, 2019 (hereinafter “Fowe”).

Regarding claim 7, Guo does not teach The method of claim 4, wherein determining the weighted combination comprises determining a combination that weighted between 0.6 and 0.85 toward the at least one distance.
However, Fowe does teach The method of claim 4, wherein determining the weighted combination comprises determining a combination that weighted between 0.6 and 0.85 toward the at least one distance. (Fowe Paragraph 0078; “The higher the confidence value, the higher the likelihood of its accuracy and consequently trust. If for example, the P.sub.correct.sup.h is 0.8, the P.sub.wrong.sup.h is 0.1, the P.sub.correct.sup.d is 0.6, and the P.sub.wrong.sup.d is 0.3, then the h weight would be 0.85 and the d_weight would be 0.65.” Examiner notes that the weighted combination comprises a combination that weighted between 0.6 and 0.85 (h weight and d weight contains weights within 0.6 and 0.85); weights can be adjusted towards the at least one distance)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, Stemmer, Yigin, and Fowe. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Stemmer teaches combining a number of features to obtain a confidence measure. Yigin teaches effects of distance measures on confidences of t-SNE. Fowe teaches using weights to calculate a confidence value. One of ordinary skill would have motivation to combine Guo, Tam, Stemmer, Yigin, and Fowe to apply weights to improve the real time performance of the system “The probability weights are used to improve the real time performance of the point map matcher.” (Fowe Abstract).

Claim(s) 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Lan-Zhe Guo et al; “Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding” published 2022 (hereinafter “Guo”) in further view of Adrian Tam; “A Gentle Introduction to Vector Space Models” available online Jan 29, 2023 (hereinafter “Tam”) in further view of Busra Ozgode Yigin; “Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data” available online April 21, 2023 (hereinafter “Yigin”)

Regarding claim 9, Guo teaches The method of claim 1, further comprising, prior to applying each training example of the first set of training data to the machine learning model: (i) obtaining a third set of training data that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective class selected from the set of classes, (Guo Page 6 Paragraph 6; “We conduct experiments on long tailed variants of CIFAR-10… SVHN… and STL-10… datasets with various levels of class imbalance and different ratios of labeled data. These are all widely adopted datasets to evaluate SSL algorithms. For constructing the class-imbalanced training dataset, we use two parameters γl, γu to denote the imbalance ratio of labeled and unlabeled data” Examiner notes that a third set of training data (labeled data) that includes a third plurality of training examples, wherein each training example of the third set of training data is labelled as belonging to a respective selected from the set of classes is obtained)
and (ii) using the third set of training data, training the machine learning model to select, from the set of classes, a predicted class for an input, (Guo Page 7 Paragraph 3; “we adopt the Wide ResNet-28-2 (Zagoruyko & Komodakis, 2016) as the backbone since it is commonly adopted in various SSL methods (Oliver et al., 2018). We train the model with batch size 64 for 218 iterations.” Examiner notes that model is trained using the third set of training data to select from the set of classes, a predicted class for an input)
wherein generating the second set of training data comprises adding, to the first set of training data, the subset of training examples selected from the first set of training data, (Guo Page 6 Paragraph 1; “where length (Ck) is the number of unlabeled examples predicted as class k and ρ×100% denotes the percentage of selected confident pseudo-labels.” Guo Page 6 Paragraph 3; “Selecting the pseudo-labels by utilizing the adaptive thresholding gives the advantage of selecting examples that have relatively low confidence, but high within-class confidence and thus help alleviate the bias problem of the original prediction under class-imbalanced distribution.” Examiner notes that a second set of training data (unlabeled data are given pseudo labels) comprises adding the subset of training examples selected from the first set of training data (unlabeled data) is generated)

Guo does not teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space,
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and
However, Tam does teach wherein applying each training example of the first set of training data to the machine learning model additionally comprises generating a respective embedding vector that represents a respective location in an embedding space, (Tam Section “Using vector space model for similarity” Paragraph 4; “To better illustrate the idea rather than hiding the actual manipulation in pandas or numpy functions, we first extract the data for each country as a vector:… The Python dictionary we created has the name of each country as a key and the economic metrics as a numpy array. There are 5 metrics, hence each is a vector of 5 dimensions.” Examiner notes that a respective embedding vector that represents a respective location in an embedding space (vector space) can be generated of the first set of training data)
wherein the method yet further comprises: applying each training example of the third set of training data to the machine learning model to generate a respective embedding vector that represents a respective location in the embedding space; and (Examiner references previous mapping to show that a respective embedding vector that represents a respective location in the embedding space is generated for the third set of training data)
determining, for the given training example, at least one distance between an
embedding vector output by the machine learning model when applying the given training example thereto and at least one embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example, and (Tam Section “Using vector space model for similarity” Paragraph 6; “we can use the vector representation of each country to see how similar it is to another. Let’s try both the L2-norm of the difference (the Euclidean distance) and the cosine distance.” Examiner notes that at least one distance (Euclidean/cosine distance) between an embedding vector output by the machine learning model and embedding vector generated for a respective at least one training example of the third set of training data that is of a common class as the predicted class of the given training example (vector representations of training example and third set of training data that is of a common class as the predicted class of the given training example))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo and Tam. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. One of ordinary skill would have motivation to combine Guo and Tam to utilize vector space and represent a data point as a vector for conveniency “It is useful to consider a vector space because it is useful to represent things as a vector. For example in machine learning, we usually have a data point with multiple features. Therefore, it is convenient for us to represent a data point as a vector.” (Tam Section “Vector Space and Cosine Formula” Paragraph 2).

Guo in view of Tam does not teach wherein determining a score that is representative of the degree of confidence of a class of the given training example comprises determining the score based on the at least one distance.
However, Yigin does teach wherein determining a score that is representative of the degree of confidence of a class of the given training example comprises determining the score based on the at least one distance. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures.” Examiner notes that degree of confidence comprises the at least one distance (distances between the neighbors))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 10, Guo does not teach The method of claim 9, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example.
However, Yigin does teach The method of claim 9, wherein determining the at least one distance comprises determining at least one cosine similarity between the embedding vector output by the machine learning model when applying the given training example thereto and the at least one embedding vector generated for the respective at least one training example of the third set of training data that is of the common class as the predicted class of the given training example. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures…the most common distance measures namely Euclidean, cosine, … were used to extract features from the datasets from different domains for the prediction of confidence scores.” Examiner notes that the at least one distance comprises determining at least one cosine similarity between the embedding vector output and the one embedding vector of the third set of training data (distance between embedding vectors of the neighbors in dimensional space))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Regarding claim 11, Guo does not teach The method of claim 9, wherein determining the at least one distance comprises determining a distance between the embedding vector output by the machine learning model when applying the given training example thereto and an embedding vector generated for a training example of the third set of training data that is, of the training examples of the third set of training data that are the common class as the predicted class of the given training example, closest in the embedding space.
However, Yigin does teach The method of claim 9, wherein determining the at least one distance comprises determining a distance between the embedding vector output by the machine learning model when applying the given training example thereto and an embedding vector generated for a training example of the third set of training data that is, of the training examples of the third set of training data that are the common class as the predicted class of the given training example, closest in the embedding space. (Yigin Page 7 Paragraph 1; “For the estimation of the confidence, the distances between the neighbors in original and low dimensional spaces can be identified by different distance measures…the most common distance measures namely Euclidean, cosine, … were used to extract features from the datasets from different domains for the prediction of confidence scores.” Examiner notes that the at least one distance comprises determining a distance between the embedding vector output and the one embedding vector of the third set of training data (distance between embedding vectors of the neighbors in dimensional space) that is closest in the embedding space (neighbors includes closest embedding vectors of other neighbors))

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Guo, Tam, and Yigin. Guo teaches a framework that involves adaptive thresholding for different classes in SSL algorithms. Tam teaches using vectors to consider the relationship between data. Yigin teaches effects of distance measures on confidences of t-SNE. One of ordinary skill would have motivation to combine Guo, Tam, and Yigin to replace one of the features within the weighted combination with the distance measure that has a considerable effect on the precision of the confidence score to improve clustering performance “The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm.” (Yigin Abstract).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL DUC TRAN whose telephone number is (571)272-6870. The examiner can normally be reached Mon-Fri 8:00-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/D.D.T./Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Improved Training Set Selection for Semi-Supervised Learning

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Improved Training Set Selection for Semi-Supervised Learning

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email