Office Action Analysis: 18038178 — METHOD OF AND SYSTEM FOR ADAPTING MULTIPLE TRAINED MACHINE LEARNING MODELS ON UNLABELLED DATASET

Office Action

§101 §103
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the original application filed on May 22nd, 2023.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-11, 15, 17-23, 25, and 28 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 2, 12-14, 16, 24, 26 and 27 are cancelled and are not considered. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).

Claim 1
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 1, recites “A method for training a set of trained models each comprising a common feature extractor by using an unlabelled training dataset to thereby obtain an updated common feature extractor, the method being executed by at least one processing device, the method comprising:” therefore it is directed to the statutory category of a process.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“updating the common feature extractor, the updating comprising: maximizing, for each given trained model of the set of trained models, a mutual information between the set of feature vectors and the respective predictions generated by the set of trained models.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of the models and use that evaluation to alter a model accordingly. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“obtaining the unlabelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
“obtaining the unlabelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
	“training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 3
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the common prediction task comprises one of: a regression task, and a classification task.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the common prediction task comprises one of: a regression task, and a classification task.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 4
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein said updating of the common feature extractor further comprises: minimizing a dissimilarity measure between at least a given prediction of the set of predictions and at least one other given prediction of the set of predictions of the set of trained models.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of models and adjust the parameters of function based on that evaluation. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 5
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “further comprising, prior to said obtaining of the set of trained models: obtaining the labelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“initializing, based on a different respective condition, each initial model of a set of initial models for the common prediction task, each initial model comprising an initial common feature extractor; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“training the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models, each trained model of the set of trained models comprising the common feature extractor.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “further comprising, prior to said obtaining of the set of trained models: obtaining the labelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
	“initializing, based on a different respective condition, each initial model of a set of initial models for the common prediction task, each initial model comprising an initial common feature extractor; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“training the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models, each trained model of the set of trained models comprising the common feature extractor.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 6
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“updating at least a portion of each of the set of initial models to obtain the set of trained models, the updating comprising: determining a respective loss for each of the set of initial predictions to obtain a set of losses.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of a model and adjust a model according to the evaluated results. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein said training of the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models comprises:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, wherein said training of the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models comprises:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 7
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein said updating of at least the portion of each of the set of initial models to obtain the set of trained models further comprises: determining an average loss based on the set of losses; and” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses mathematical concept of utilizing a mathematical formula to perform calculations. A human is able to evaluate data and use math to average a set of results. This claim discloses a math operation and therefore is ineligible.

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “backpropagating the average loss to the at least the portion of the set of initial models.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “backpropagating the average loss to the at least the portion of the set of initial models.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 8
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the labelled training dataset is associated with a first type of domain representation of a set of objects; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset is associated with a second type of domain representation of at least a portion of the set of objects.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the labelled training dataset is associated with a first type of domain representation of a set of objects; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset is associated with a second type of domain representation of at least a portion of the set of objects.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 9
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the labelled training dataset comprises labelled images; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset comprises unlabelled images.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the labelled training dataset comprises labelled images; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset comprises unlabelled images.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 10
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the labelled training dataset has been acquired using a first type of device; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset has been acquired using a second type of device.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the labelled training dataset has been acquired using a first type of device; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset has been acquired using a second type of device.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 11
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A process, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “further comprising using the updated common feature extractor to extract features from data in a radiomics process.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “further comprising using the updated common feature extractor to extract features from data in a radiomics process.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 15
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 15, recites “A system for training a set of trained models each comprising a common feature extractor by using an unlabelled training dataset to thereby obtain an updated common feature extractor, the system comprising: a processor; and a non-transitory storage medium operatively connected to the processor, the non- transitory storage medium comprising computer-readable instructions; the processor, upon executing the instructions, being configured for:” therefore it is directed to the statutory category of a machine.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“updating the common feature extractor, the updating comprising: maximizing, for each given trained model of the set of trained models, a mutual information between the set of feature vectors and the respective predictions generated by the set of trained models.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of the models and use that evaluation to alter a model accordingly. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“obtaining the unlabelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
	“obtaining the unlabelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
	“training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 17
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the common prediction task comprises one of: a regression task, and a classification task.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the common prediction task comprises one of: a regression task, and a classification task.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 18
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein said updating of the common feature extractor further comprises: minimizing a dissimilarity measure between at least a given prediction of the set of predictions and at least one other given prediction of the set of predictions of the set of trained models.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of models and adjust the parameters of function based on that evaluation. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	This claim does not recite any additional limitations which integrate the abstract idea into a practical application.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea and thus the claim is subject-matter ineligible.

Claim 19
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the processor is further configured for, prior to said obtaining of the set of trained models: obtaining the labelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“initializing, based on a different respective condition, each initial model of a set of initial models for the common prediction task, each initial model comprising an initial common feature extractor; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“training the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models, each trained model of the set of trained models comprising the common feature extractor.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the processor is further configured for, prior to said obtaining of the set of trained models: obtaining the labelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
	“initializing, based on a different respective condition, each initial model of a set of initial models for the common prediction task, each initial model comprising an initial common feature extractor; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“training the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models, each trained model of the set of trained models comprising the common feature extractor.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 20
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“updating at least a portion of each of the set of initial models to obtain the set of trained models, the updating comprising: determining a respective loss for each of the set of initial predictions to obtain a set of losses.” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of a model and adjust a model according to the evaluated results. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein said training of the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models comprises:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein said training of the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models comprises:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 21
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“wherein said updating of at least the portion of each of the set of initial models to obtain the set of trained models further comprises: determining an average loss based on the set of losses; and” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses mathematical concept of utilizing a mathematical formula to perform calculations. A human is able to evaluate data and use math to average a set of results. This claim discloses a math operation and therefore is ineligible.

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “backpropagating the average loss to the at least the portion of the set of initial models.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “backpropagating the average loss to the at least the portion of the set of initial models.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 22
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the labelled training dataset is associated with a first type of domain representation of a set of objects; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset is associated with a second type of domain representation of at least a portion of the set of objects.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the labelled training dataset is associated with a first type of domain representation of a set of objects; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“wherein the unlabelled training dataset is associated with a second type of domain representation of at least a portion of the set of objects.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 23
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the labelled training dataset comprises labelled images;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“and wherein the unlabelled training dataset comprises unlabelled images.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the labelled training dataset comprises labelled images;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“and wherein the unlabelled training dataset comprises unlabelled images.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 25
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	A machine, as above.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites the abstract ideas of the preceding claims from which it depends. 

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “wherein the processor is further configured for using the updated common feature extractor to extract features from data in a radiomics process.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “wherein the processor is further configured for using the updated common feature extractor to extract features from data in a radiomics process.” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim 28
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
	Claim 28, recites “A method of providing a final trained model by training a set of trained models each comprising a common feature extractor by using an unlabelled training dataset to thereby obtain an updated common feature extractor, the method being executed by at least one processing device, the method comprising:” therefore it is directed to the statutory category of a process.

Step 2A Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	The claim recites, inter alia:
	“updating the common feature extractor, the updating comprising: maximizing, for each given trained model of the set of trained models, a mutual information between the set of feature vectors and the respective predictions generated by the set of trained models; and” Under its broadest reasonable interpretation in light of the specification, this limitation encompasses the mental process of evaluating and observing data, which is an evaluation or observation that is practically capable of being performed in the human mind with the assistance of pen and paper. A human is able to evaluate the results of the models and use that evaluation to alter a model accordingly. The limitation is merely applying an abstract idea on generic computer system. See MPEP 2106.04(a)(2)(III)(c).

Step 2A Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	The claim recites the additional elements, “obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“obtaining the unlabelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“assigned training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“providing, using the set of trained models and the updated common feature extractor, the final trained model.” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.

Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
	Finally, the claim taken as a whole does not contain an inventive concept which provides significantly more than the abstract idea. The additional elements, “obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
	“obtaining the unlabelled training dataset;” is an insignificant extra-solution activity required for any uses of the mental processes (see MPEP § 2106.05(g)) As such, the claim is ineligible.
	“assigned training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” amounts to generic computer components used as a tool to perform an existing process. Thus, the additional element amounts to no more than a recitation of the words "apply it" (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer (see MPEP § 2106.05(f)).
	“providing, using the set of trained models and the updated common feature extractor, the final trained model.” is an insignificant extra-solution activity required for any uses of abstract ideas (see MPEP § 2106.05(g)), and is a well-understood, routine, conventional activity (see MPEP § 2106.05(d)(i); “Receiving or transmitting data over a network, e.g., using the Internet to gather data”.
Taken alone or in combination, the additional elements of the claim do not provide an inventive concept and thus the claim is subject-matter ineligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-6, 10, 15, 17-20, 22, 23, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Caltagirone et al., (Caltagirone et al., “Lidar-Camera Co-Training for Semi-Supervised Road Detection”, Nov. 2019, hereinafter “Caltagirone”) in view of Ronneberger et al., (Ronneberger et al., “U-Net: Convolutional Networks for Biomedical Image Segmentation”, May 2015, hereinafter “Ronneberger”).

Regarding claim 1, Caltagirone discloses, “A method for training a set of trained models each comprising a common feature extractor by using an unlabelled training dataset to thereby obtain an updated common feature extractor, the method being executed by at least one processing device, the method comprising:” (Algorithm 2, pp. 3; This algorithm discloses a process to train multiple Convolutional neural networks. The neural networks used in the algorithm are able to evaluate image data and produce classification or predictions from labeled and unlabeled data using a semi-supervised method.)
“obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” (Algorithm 1, pp. 2; This algorithm discloses a process of obtaining two models. The models, f1 and f2, are trained using labeled datasets T. This process is also present in algorithm 2 as well.) And (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This discloses the use of the two classifier models used in algorithm 2. U-Net is a Convolutional Neural Network architecture and contains feature evaluation of images.) 
“obtaining the unlabelled training dataset;” (Algorithm 2, pp. 3; The algorithm parameters include a set of label examples T and a set of unlabeled examples U.)
“training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” (Loss Function, pp. 2; “Let us now consider an unlabeled example                         
                            (
                            
                                    x
                                
                                    s
                                
                            ,
                            
                                    x
                                
                                    t
                                
                            )
                            ∈
                            U
                        
                    . Both the student’s prediction                         
                            
                                    f
                                
                                    s
                                
                            (
                            
                                    x
                                
                                    s
                                
                            ;
                            
                                    θ
                                
                                    s
                                
                            )
                        
                     and the teacher’s prediction                         
                            
                                    f
                                
                                    t
                                
                            (
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                
                            )
                        
                     represent probability distributions over the possible classes. Here, the teacher’s prediction is considered as the true distribution whereas the student’s prediction is regarded as an approximation. The maximization of their agreement can be then achieved by bringing the student’s prediction closer to the teacher’s. By considering that the Kullback-Leibler divergence, denoted as DKL, is a natural measure of difference between probability distributions, the co-training loss is implemented as follows: [See equation (2)]” The method in this paper discloses a process of training the models using generated predictions from the unlabeled examples. The teacher and the student models both generate a prediction those predictions are compared. The goal on this is to produce more similar results.)
“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” (Algorithm 1, pp. 2; This algorithm will initially train the two classifier models using labeled examples. After this a random subset of the unlabeled data set is sampled. Each unlabeled image in the set U contains two-views of the same image, x1 and x2. The models will take these views and produce a prediction                         
                            
                                            y
                                        
                                        ~
                                    
                                    1
                                
                     and                         
                            
                                            y
                                        
                                        ~
                                    
                                    2
                                
                    .)
“updating the common feature extractor, the updating comprising: maximizing, for each given trained model of the set of trained models, a mutual information between the set of feature vectors and the respective predictions generated by the set of trained models.” (Loss Function, pp. 2; “The co-training algorithm tries to accomplish two different objectives: (1) error minimization on the manually labeled examples, and (2) agreement maximization on the unlabeled examples.” The method in this article uses a loss function train the teacher and student models. This will attempt to minimize the loss between the predictions produced by the different models.) And (Loss Function, pp. 2; “Let us now consider an unlabeled example                         
                            (
                            
                                    x
                                
                                    s
                                
                            ,
                            
                                    x
                                
                                    t
                                
                            )
                            ∈
                            U
                        
                    . Both the student’s prediction                         
                            
                                    f
                                
                                    s
                                
                            (
                            
                                    x
                                
                                    s
                                
                            ;
                            
                                    θ
                                
                                    s
                                
                            )
                        
                     and the teacher’s prediction                         
                            
                                    f
                                
                                    t
                                
                            (
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                
                            )
                        
                     represent probability distributions over the possible classes. Here, the teacher’s prediction is considered as the true distribution whereas the student’s prediction is regarded as an approximation. The maximization of their agreement can be then achieved by bringing the student’s prediction closer to the teacher’s. By considering that the Kullback-Leibler divergence, denoted as DKL, is a natural measure of difference between probability distributions, the co-training loss is implemented as follows: [See equation (2)]” This the process used in the method. It will evaluate the predictions by comparing the difference in the model’s outcomes.)
Caltagirone fails to explicitly disclose, “generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;”.
However, Ronneberger discloses, “generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” (Network Architecture, pp. 4; “The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for down sampling. At each down sampling step, we double the number of feature channels. Every step in the expansive path consists of an up sampling of the feature map followed by a 2x2 convolution (“up-convolution") that halves the number of feature channels, a concatenation with the correspondingly cropped feature map e loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64- component feature vector to the desired number of classes. In total the network has 23 convolutional layers.” The CNN model used in Caltagirone is called U-net. This model is a standard CNN model for feature extraction and image processing. This CNN model is used to evaluate the data in Caltagirone to produce an image classification for a given image.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Caltagirone and Ronneberger. Caltagirone teaches a semi-supervised learning method that uses CNN classifiers, labeled data, and unlabeled data to evaluate images. Ronneberger teaches a CNN classifier model which is able to evaluate images. One of ordinary skill would have motivation to combine a system which is able to train and use CNNs in a semi-supervised training environment and a CNN classifier. Further, Caltagirone explicitly states that the CNN model disclosed in Ronneberger was used for their experiments: “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [Ronneberger et al.] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D.” (Caltagirone, Model and Training procedure, pp. 4).

Regarding claim 3, Caltagirone discloses, “wherein the common prediction task comprises one of: a regression task, and a classification task.” (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” The models used in this article are U-net and FCN-RedNet50 which are both image classification models. These models will evaluate images and produce a label for that image.)

Regarding claim 4, Caltagirone discloses, “wherein said updating of the common feature extractor further comprises: minimizing a dissimilarity measure between at least a given prediction of the set of predictions and at least one other given prediction of the set of predictions of the set of trained models.” (Algorithm 2, pp. 3; “Update                         
                            
                                    θ
                                
                                    s
                                
                     using the KL-divergence loss weighted by factor                         
                            
                                    λ
                                
                                    c
                                    o
                                    t
                                
                     and by considering                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                     as the target distribution” “Swap student-teacher roles” The models will produce a prediction from an unlabeled sample. The predictions from the different models produced are compared and loss value is produced. This will be used to train the models to ensure they produce more accuracy and similar results.)

Regarding claim 5, Caltagirone discloses, “further comprising, prior to said obtaining of the set of trained models: obtaining the labelled training dataset;” (Algorithm 2, pp. 3; There are four parameters for this algorithm, a set of labeled examples, a set of unlabeled examples, a CNN classifier model f1 and a CNN classifier model f2. These models are initialized and ready to be trained using supervised learning.)
“initializing, based on a different respective condition, each initial model of a set of initial models for the common prediction task, each initial model comprising an initial common feature extractor; and” (Algorithm 2, pp. 3; There are four parameters for this algorithm, a set of labeled examples, a set of unlabeled examples, a CNN classifier model f1 and a CNN classifier model f2. These models are initialized and ready to be trained using supervised learning.)
“training the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models, each trained model of the set of trained models comprising the common feature extractor.” (Algorithm 2, pp. 3; As stated in the algorithm, the two models, f1 and f2 are trained using the labeled data set using supervised learning. This will update the model parameter accordingly which would include the feature extraction weights in the network.)

Regarding claim 6, Caltagirone discloses, “wherein said training of the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models comprises:” (Algorithm 2, pp. 3; There are four parameters for this algorithm, a set of labeled examples, a set of unlabeled examples, a CNN classifier model f1 and a CNN classifier model f2. These models are initialized and ready to be trained using supervised learning.) And (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This method uses U-Net and FCN-ResNet50 both are able to identify images.) 
Caltagirone fails to explicitly disclose, “generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and”.
However, Ronneberger discloses, “generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” (Training, pp. 4; “The input images and their corresponding segmentation maps are used to train the network with the stochastic gradient descent implementation of Caffe [6]. Due to the unpadded convolutions, the output image is smaller than the input by a constant border width. To minimize the overhead and make maximum use of the GPU memory, we favor large input tiles over a large batch size and hence reduce the batch to a single image. Accordingly, we use a high momentum (0.99) such that a large number of the previously seen training samples determine the update in the current optimization step.” In Caltagirone it is stated that the classifier models are trained using supervised learning. This process is not explicitly disclosed in the Caltagirone. However, Ronneberger was cited by Caltagirone and this is the process used to train model used in Caltagirone. This process includes inputting labeled images into the network and then producing a result. This result is used to train the model over many training iterations.)
“updating at least a portion of each of the set of initial models to obtain the set of trained models, the updating comprising: determining a respective loss for each of the set of initial predictions to obtain a set of losses.” (Training, pp. 4; “The energy function is computed by a pixel-wise soft-max over the final feature map combined with the cross-entropy loss function. The soft-max is defined as                         
                            
                                    p
                                
                                    k
                                
                                    x
                                
                            =
                            
                                    exp
                                
                                ⁡
                                
                                                    a
                                                
                                                    k
                                                
                                                    x
                                                
                            /
                            
                                            ∑
                                            
                                                k
                                                '
                                            
                                                K
                                            
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                    a
                                                
                                                    k
                                                    '
                                                
                                                    x
                                                
                                            )
                                        
                     where                         
                            
                                    a
                                
                                    k
                                
                            (
                            x
                            )
                        
                     denotes the activation in feature channel k at the pixel position                         
                            x
                            ∈
                            Ω
                        
                     with                         
                            Ω
                            ⊂
                            
                                    z
                                
                                    2
                                
                    . K is the number of classes and                         
                            
                                    p
                                
                                    k
                                
                            (
                            x
                            )
                        
                     is the approximated maximum-function. I.e.                         
                            
                                    p
                                
                                    k
                                
                                    x
                                
                            ≈
                            1
                        
                     for the k that has the maximum activation                         
                            
                                    a
                                
                                    k
                                
                            (
                            x
                            )
                        
                     and                         
                            
                                    p
                                
                                    k
                                
                                    x
                                
                            ≈
                            0
                        
                     for all other k. The cross entropy then penalizes at each position the deviation of                         
                            
                                    p
                                
                                    l
                                    
                                            x
                                        
                            (
                            x
                            )
                        
                     from 1 using [see equation (1)] where                         
                            l
                            :
                            Ω
                            →
                            {
                            1
                            ,
                            …
                            ,
                            K
                            }
                        
                     is the true label of each pixel and                         
                            w
                            :
                            Ω
                            →
                            R
                        
                     is a weight map that we introduced to give some pixels more importance in the training.” One of the models used in Caltagirone was U-Net proposed in Ronneberger. This model initially trains by producing predictions from labeled images and evaluating the result. This model is will generate a set of predictions and then compare the results. This training method uses a loss function determined by comparing the outputs of the model. During training the model will attempt to reduce the loss value.)

Regarding claim 10, Caltagirone discloses, “wherein the labelled training dataset has been acquired using a first type of device; and” (Data Set, pp. 3; “The road data set contains 289 labeled examples. As for the previous data set, each example consists of a color image and a point cloud; however, in this case the road ground truth is also available. Figure 1 shows an example of a road scene and its corresponding road label. The road data set is split into three balanced broad categories, urban marked (UM), urban multiple marked (UMM), and urban unmarked (UU) according to the presence and number of marked lanes or lack thereof.” The datasets used contain driving sequences recorded with different sensors and cameras. The training data contains different images containing objects pertaining to roadways and driving. As stated, a portion of this data is labeled and is in the set of labeled data T.)
“wherein the unlabelled training dataset has been acquired using a second type of device.” (Data Set, pp. 3; “This work makes use of two data sets, the KITTI raw data set [16] and the KITTI road data set [17]. The raw data set consists of many driving sequences recorded over several days in urban, rural, and highway roads in daytime and fair-weather conditions. The sensor setup used for recording the sequences included four cameras and a high-resolution lidar Velodyne HDL-64E. In this work, only one of the color cameras and the lidar are considered.” This model uses road data as disclosed above. This data has been captured using different types of devices and the data contains images of objects pertaining to roadways and driving. A portion of this data is labeled and is separate from the unlabeled data and is labeled U in algorithm 2. The remaining data would be considered to be the unlabeled data.)

Regarding claim 15, Caltagirone discloses, “A system for training a set of trained models each comprising a common feature extractor by using an unlabelled training dataset to thereby obtain an updated common feature extractor, the system comprising: a processor; and a non-transitory storage medium operatively connected to the processor, the non- transitory storage medium comprising computer-readable instructions; the processor, upon executing the instructions, being configured for:” (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This method was carried out on a computing system able to implement the neural networks using PyTorch. This would lead one to reasonably believe this experiment was conducted in a generic computing system containing processors connected to memory to execute computer instructions stored in said memory. The instructions stored would be similar to the pseudo code of algorithm 2.)
“obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” (Algorithm 1, pp. 2; This algorithm discloses a process of obtaining two models. The models, f1 and f2, are trained using labeled datasets T. This process is also present in algorithm 2 as well.) And (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This discloses the use of the two classifier models used in algorithm 2. U-Net is a Convolutional Neural Network architecture and contains feature evaluation of images.) 
“obtaining the unlabelled training dataset;” (Algorithm 2, pp. 3; The algorithm parameters include a set of label examples T and a set of unlabeled examples U.)
“training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” (Loss Function, pp. 2; “Let us now consider an unlabeled example                         
                            (
                            
                                    x
                                
                                    s
                                
                            ,
                            
                                    x
                                
                                    t
                                
                            )
                            ∈
                            U
                        
                    . Both the student’s prediction                         
                            
                                    f
                                
                                    s
                                
                            (
                            
                                    x
                                
                                    s
                                
                            ;
                            
                                    θ
                                
                                    s
                                
                            )
                        
                     and the teacher’s prediction                         
                            
                                    f
                                
                                    t
                                
                            (
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                
                            )
                        
                     represent probability distributions over the possible classes. Here, the teacher’s prediction is considered as the true distribution whereas the student’s prediction is regarded as an approximation. The maximization of their agreement can be then achieved by bringing the student’s prediction closer to the teacher’s. By considering that the Kullback-Leibler divergence, denoted as DKL, is a natural measure of difference between probability distributions, the co-training loss is implemented as follows: [See equation (2)]” The method in this paper discloses a process of training the models using generated predictions from the unlabeled examples. The teacher and the student models both generate a prediction those predictions are compared. The goal on this is to produce more similar results.)
“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” (Algorithm 1, pp. 2; This algorithm will initially train the two classifier models using labeled examples. After this a random subset of the unlabeled data set is sampled. Each unlabeled image in the set U contains two-views of the same image, x1 and x2. The models will take these views and produce a prediction                         
                            
                                            y
                                        
                                        ~
                                    
                                    1
                                
                     and                         
                            
                                            y
                                        
                                        ~
                                    
                                    2
                                
                    .)
“updating the common feature extractor, the updating comprising: maximizing, for each given trained model of the set of trained models, a mutual information between the set of feature vectors and the respective predictions generated by the set of trained models.” (Loss Function, pp. 2; “The co-training algorithm tries to accomplish two different objectives: (1) error minimization on the manually labeled examples, and (2) agreement maximization on the unlabeled examples.” The method in this article uses a loss function train the teacher and student models. This will attempt to minimize the loss between the predictions produced by the different models.) And (Loss Function, pp. 2; “Let us now consider an unlabeled example                         
                            (
                            
                                    x
                                
                                    s
                                
                            ,
                            
                                    x
                                
                                    t
                                
                            )
                            ∈
                            U
                        
                    . Both the student’s prediction                         
                            
                                    f
                                
                                    s
                                
                            (
                            
                                    x
                                
                                    s
                                
                            ;
                            
                                    θ
                                
                                    s
                                
                            )
                        
                     and the teacher’s prediction                         
                            
                                    f
                                
                                    t
                                
                            (
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                
                            )
                        
                     represent probability distributions over the possible classes. Here, the teacher’s prediction is considered as the true distribution whereas the student’s prediction is regarded as an approximation. The maximization of their agreement can be then achieved by bringing the student’s prediction closer to the teacher’s. By considering that the Kullback-Leibler divergence, denoted as DKL, is a natural measure of difference between probability distributions, the co-training loss is implemented as follows: [See equation (2)]” This the process used in the method. It will evaluate the predictions by comparing the difference in the model’s outcomes.)
Caltagirone fails to explicitly disclose, “generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;”.
However, Ronneberger discloses, “generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” (Network Architecture, pp. 4; “The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for down sampling. At each down sampling step we double the number of feature channels. Every step in the expansive path consists of an up sampling of the feature map followed by a 2x2 convolution (“up-convolution") that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64- component feature vector to the desired number of classes. In total the network has 23 convolutional layers.” The CNN model used in Caltagirone is called U-net. This model is a standard CNN model for feature extraction and image processing. This CNN model is used to evaluate the data in Caltagirone to produce an image classification for a given image.)

Regarding claim 17, Caltagirone discloses, “wherein the common prediction task comprises one of: a regression task, and a classification task.” (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” The models used in this article are U-net and FCN-RedNet50 which are both image classification models. These models will evaluate images and produce a label for that image.)

Regarding claim 18, Caltagirone discloses, “wherein said updating of the common feature extractor further comprises: minimizing a dissimilarity measure between at least a given prediction of the set of predictions and at least one other given prediction of the set of predictions of the set of trained models.” (Algorithm 2, pp. 3; “Update                         
                            
                                    θ
                                
                                    s
                                
                     using the KL-divergence loss weighted by factor                         
                            
                                    λ
                                
                                    c
                                    o
                                    t
                                
                     and by considering                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                     as the target distribution” “Swap student-teacher roles” The models will produce a prediction from an unlabeled sample. The predictions from the different models produced are compared and loss value is produced. This will be used to train the models to ensure they produce more accuracy and similar results.)

Regarding claim 19, Caltagirone discloses, “wherein the processor is further configured for, prior to said obtaining of the set of trained models: obtaining the labelled training dataset;” (Algorithm 2, pp. 3; There are four parameters for this algorithm, a set of labeled examples, a set of unlabeled examples, a CNN classifier model f1 and a CNN classifier model f2. These models are initialized and ready to be trained using supervised learning.)
“initializing, based on a different respective condition, each initial model of a set of initial models for the common prediction task, each initial model comprising an initial common feature extractor; and” (Algorithm 2, pp. 3; There are four parameters for this algorithm, a set of labeled examples, a set of unlabeled examples, a CNN classifier model f1 and a CNN classifier model f2. These models are initialized and ready to be trained using supervised learning.)
“training the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models, each trained model of the set of trained models comprising the common feature extractor.” (Algorithm 2, pp. 3; As stated in the algorithm, the two models, f1 and f2 are trained using the labeled data set using supervised learning. This will update the model parameter accordingly which would include the feature extraction weights in the network.)

Regarding claim 20, Caltagirone discloses, “wherein said training of the set of initial models for the common prediction task during the supervised training phase on the labelled training dataset to obtain the set of trained models comprises:” (Algorithm 2, pp. 3; There are four parameters for this algorithm, a set of labeled examples, a set of unlabeled examples, a CNN classifier model f1 and a CNN classifier model f2. These models are initialized and ready to be trained using supervised learning.) And (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This method uses U-Net and FCN-ResNet50 both are able to identify images.)
Caltagirone fails to explicitly disclose, “generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” and “updating at least a portion of each of the set of initial models to obtain the set of trained models, the updating comprising: determining a respective loss for each of the set of initial predictions to obtain a set of losses.”.
However, Ronneberger discloses, “generating, using the set of initial models, a set of initial predictions for the labelled training dataset; and” (Training, pp. 4; “The input images and their corresponding segmentation maps are used to train the network with the stochastic gradient descent implementation of Caffe [6]. Due to the unpadded convolutions, the output image is smaller than the input by a constant border width. To minimize the overhead and make maximum use of the GPU memory, we favor large input tiles over a large batch size and hence reduce the batch to a single image. Accordingly, we use a high momentum (0.99) such that a large number of the previously seen training samples determine the update in the current optimization step.” In Caltagirone it is stated that the classifier models are trained using supervised learning. This process is not explicitly disclosed in the Caltagirone. However, Ronneberger was cited by Caltagirone and this is the process used to train model used in Caltagirone. This process includes inputting labeled images into the network and then producing a result. This result is used to train the model over many training iterations.)
“updating at least a portion of each of the set of initial models to obtain the set of trained models, the updating comprising: determining a respective loss for each of the set of initial predictions to obtain a set of losses.” (Training, pp. 4; “The energy function is computed by a pixel-wise soft-max over the final feature map combined with the cross-entropy loss function. The soft-max is defined as                         
                            
                                    p
                                
                                    k
                                
                                    x
                                
                            =
                            
                                    exp
                                
                                ⁡
                                
                                                    a
                                                
                                                    k
                                                
                                                    x
                                                
                            /
                            
                                            ∑
                                            
                                                k
                                                '
                                            
                                                K
                                            
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                    a
                                                
                                                    k
                                                    '
                                                
                                                    x
                                                
                                            )
                                        
                     where                         
                            
                                    a
                                
                                    k
                                
                            (
                            x
                            )
                        
                     denotes the activation in feature channel k at the pixel position                         
                            x
                            ∈
                            Ω
                        
                     with                         
                            Ω
                            ⊂
                            
                                    z
                                
                                    2
                                
                    . K is the number of classes and                         
                            
                                    p
                                
                                    k
                                
                            (
                            x
                            )
                        
                     is the approximated maximum-function. I.e.                         
                            
                                    p
                                
                                    k
                                
                                    x
                                
                            ≈
                            1
                        
                     for the k that has the maximum activation                         
                            
                                    a
                                
                                    k
                                
                            (
                            x
                            )
                        
                     and                         
                            
                                    p
                                
                                    k
                                
                                    x
                                
                            ≈
                            0
                        
                     for all other k. The cross entropy then penalizes at each position the deviation of                         
                            
                                    p
                                
                                    l
                                    
                                            x
                                        
                            (
                            x
                            )
                        
                     from 1 using [see equation (1)] where                         
                            l
                            :
                            Ω
                            →
                            {
                            1
                            ,
                            …
                            ,
                            K
                            }
                        
                     is the true label of each pixel and                         
                            w
                            :
                            Ω
                            →
                            R
                        
                     is a weight map that we introduced to give some pixels more importance in the training.” One of the models used in Caltagirone was U-Net proposed in Ronneberger. This model initially trains by producing predictions from labeled images and evaluating the result. This model is will generate a set of predictions and then compare the results. This training method uses a loss function determined by comparing the outputs of the model. During training the model will attempt to reduce the loss value.)

Regarding claim 22, Caltagirone discloses, “wherein the labelled training dataset is associated with a first type of domain representation of a set of objects; and” (Data Set, pp. 3; “The road data set contains 289 labeled examples. As for the previous data set, each example consists of a color image and a point cloud; however, in this case the road ground truth is also available. Figure 1 shows an example of a road scene and its corresponding road label. The road data set is split into three balanced broad categories, urban marked (UM), urban multiple marked (UMM), and urban unmarked (UU) according to the presence and number of marked lanes or lack thereof.” The datasets used contain driving sequences recorded with different sensors and cameras. The training data contains different images containing objects pertaining to roadways and driving.)
“wherein the unlabelled training dataset is associated with a second type of domain representation of at least a portion of the set of objects.” (Data Set, pp. 3; “This work makes use of two data sets, the KITTI raw data set [16] and the KITTI road data set [17]. The raw data set consists of many driving sequences recorded over several days in urban, rural, and highway roads in daytime and fair-weather conditions. The sensor setup used for recording the sequences included four cameras and a high-resolution lidar Velodyne HDL-64E. In this work, only one of the color cameras and the lidar are considered.” This model uses road data as disclosed above. This data has been captured using different types of devices and the data contains images of objects pertaining to roadways and driving. A portion of this data is labeled and is separate from the unlabeled data. The remaining data would be considered to be the unlabeled data.)

Regarding claim 23, Caltagirone discloses, “wherein the labelled training dataset comprises labelled images; and” (Data Set, pp. 3; “The road data set contains 289 labeled examples. As for the previous data set, each example consists of a color image and a point cloud; however, in this case the road ground truth is also available. Figure 1 shows an example of a road scene and its corresponding road label. The road data set is split into three balanced broad categories, urban marked (UM), urban multiple marked (UMM), and urban unmarked (UU) according to the presence and number of marked lanes or lack thereof.” The datasets used contain driving sequences recorded with different sensors and cameras. The training data contains different images containing objects pertaining to roadways and driving. As stated, a portion of this data is labeled and is in the set of labeled data T.)
“wherein the unlabelled training dataset comprises unlabelled images.” (Data Set, pp. 3; “This work makes use of two data sets, the KITTI raw data set [16] and the KITTI road data set [17]. The raw data set consists of many driving sequences recorded over several days in urban, rural, and highway roads in daytime and fair-weather conditions. The sensor setup used for recording the sequences included four cameras and a high-resolution lidar Velodyne HDL-64E. In this work, only one of the color cameras and the lidar are considered.” This model uses road data as disclosed above. This data has been captured using different types of devices and the data contains images of objects pertaining to roadways and driving. A portion of this data is labeled and is separate from the unlabeled data and is labeled U in algorithm 2. The remaining data would be considered to be the unlabeled data.)

Regarding claim 28, Caltagirone discloses, “A method of providing a final trained model by training a set of trained models each comprising a common feature extractor by using an unlabelled training dataset to thereby obtain an updated common feature extractor, the method being executed by at least one processing device, the method comprising:” (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This method was carried out on a computing system able to implement the neural networks using PyTorch. This would lead one to reasonably believe this experiment was conducted in a generic computing system containing processors connected to memory to execute computer instructions stored in said memory. The instructions stored would be similar to the pseudo code of algorithm 2.)
“obtaining the set of trained models, each trained model of the set of trained models having the common feature extractor, each trained model of the set of trained models having been trained for a common prediction task during a supervised training phase on a labelled training dataset;” (Algorithm 1, pp. 2; This algorithm discloses a process of obtaining two models. The models, f1 and f2, are trained using labeled datasets T. This process is also present in algorithm 2 as well.) And (Model and Training procedure, pp. 4; “The experiments reported in this work were carried out using two popular semantic segmentation networks, U-Net [20] for the main study described in Sect. IV-C and FCN-ResNet50 [21] for generating the results submitted to the KITTI road benchmark reported in Sect. IV-D. Both neural networks were implemented and trained using the deep learning library PyTorch [22].” This discloses the use of the two classifier models used in algorithm 2. U-Net is a Convolutional Neural Network architecture and contains feature evaluation of images.) 
“obtaining the unlabelled training dataset;” (Algorithm 2, pp. 3; The algorithm parameters include a set of label examples T and a set of unlabeled examples U.)
“assigned training, for the common prediction task, the set of trained models by using the obtained unlabelled training dataset to thereby obtain the updated common feature extractor, the training comprising:” (Loss Function, pp. 2; “Let us now consider an unlabeled example                         
                            (
                            
                                    x
                                
                                    s
                                
                            ,
                            
                                    x
                                
                                    t
                                
                            )
                            ∈
                            U
                        
                    . Both the student’s prediction                         
                            
                                    f
                                
                                    s
                                
                            (
                            
                                    x
                                
                                    s
                                
                            ;
                            
                                    θ
                                
                                    s
                                
                            )
                        
                     and the teacher’s prediction                         
                            
                                    f
                                
                                    t
                                
                            (
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                
                            )
                        
                     represent probability distributions over the possible classes. Here, the teacher’s prediction is considered as the true distribution whereas the student’s prediction is regarded as an approximation. The maximization of their agreement can be then achieved by bringing the student’s prediction closer to the teacher’s. By considering that the Kullback-Leibler divergence, denoted as DKL, is a natural measure of difference between probability distributions, the co-training loss is implemented as follows: [See equation (2)]” The method in this paper discloses a process of training the models using generated predictions from the unlabeled examples. The teacher and the student models both generate a prediction those predictions are compared. The goal on this is to produce more similar results.)
“generating, using the set of trained models, a set of predictions, the set of predictions comprising a respective prediction for each of the set of feature vectors; and” (Algorithm 1, pp. 2; This algorithm will initially train the two classifier models using labeled examples. After this a random subset of the unlabeled data set is sampled. Each unlabeled image in the set U contains two-views of the same image, x1 and x2. The models will take these views and produce a prediction                         
                            
                                            y
                                        
                                        ~
                                    
                                    1
                                
                     and                         
                            
                                            y
                                        
                                        ~
                                    
                                    2
                                
                    .)
“updating the common feature extractor, the updating comprising: maximizing, for each given trained model of the set of trained models, a mutual information between the set of feature vectors and the respective predictions generated by the set of trained models; and” (Loss Function, pp. 2; “The co-training algorithm tries to accomplish two different objectives: (1) error minimization on the manually labeled examples, and (2) agreement maximization on the unlabeled examples.” The method in this article uses a loss function train the teacher and student models. This will attempt to minimize the loss between the predictions produced by the different models.) And (Loss Function, pp. 2; “Let us now consider an unlabeled example                         
                            (
                            
                                    x
                                
                                    s
                                
                            ,
                            
                                    x
                                
                                    t
                                
                            )
                            ∈
                            U
                        
                    . Both the student’s prediction                         
                            
                                    f
                                
                                    s
                                
                            (
                            
                                    x
                                
                                    s
                                
                            ;
                            
                                    θ
                                
                                    s
                                
                            )
                        
                     and the teacher’s prediction                         
                            
                                    f
                                
                                    t
                                
                            (
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                
                            )
                        
                     represent probability distributions over the possible classes. Here, the teacher’s prediction is considered as the true distribution whereas the student’s prediction is regarded as an approximation. The maximization of their agreement can be then achieved by bringing the student’s prediction closer to the teacher’s. By considering that the Kullback-Leibler divergence, denoted as DKL, is a natural measure of difference between probability distributions, the co-training loss is implemented as follows: [See equation (2)]” This the process used in the method. It will evaluate the predictions by comparing the difference in the models’ outcomes.)
“providing, using the set of trained models and the updated common feature extractor, the final trained model.” (Main experiment, pp. 5; “The best performance overall was obtained when considering training sets with 144 labeled examples. In that case, the co-trained lidar-based road detector achieved 96.57% average F1-score, with an improvement of 1.04 percentage points over the supervised baseline. As can be noticed in Table I, another advantage of co-training was that the F1-score standard deviation decreased in all the considered cases. This result suggests that the co-training algorithm was able to compensate for the performance gap of data set splits where the domain of the training set and the domain of the validation set were quite different from each other thus resulting in low supervised F1-scores.” The method discloses in the article was implemented and tested. This model was able to produce the classifier models and train them. After this the system was used to perform experiments. This, using the broadest reasonable interpretation, would indicate that the model in this system was trained and completed and able to be used as a final trained model.)
Caltagirone fails to explicitly disclose, “generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;”.
However, Ronneberger discloses, “generating, using the common feature extractor of the set of trained models, a set of feature vectors for at least a portion of the unlabelled training dataset;” (Network Architecture, pp. 4; “The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for down sampling. At each down sampling step, we double the number of feature channels. Every step in the expansive path consists of an up sampling of the feature map followed by a 2x2 convolution (“up-convolution") that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64- component feature vector to the desired number of classes. In total the network has 23 convolutional layers.” The CNN model used in Caltagirone is called U-net. This model is a standard CNN model for feature extraction and image processing. This CNN model is used to evaluate the data in Caltagirone to produce an image classification for a given image.)

Claims 7-9, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Caltagirone and Ronneberger in view of Huang et al., (Huang et al., “Optimization of a Convolutional Neural Network Using a Hybrid Algorithm”, July 2019, hereinafter “Huang”).

Regarding claim 7, Caltagirone and Ronneberger fail to explicitly disclose the limitations of this claims, however, Huang discloses, “wherein said updating of at least the portion of each of the set of initial models to obtain the set of trained models further comprises: determining an average loss based on the set of losses; and” (Entropy Fitness Function, pp. 3; “The so-called learning means finding the loss function of the training data, as far as possible to find the parameters in order to reduce the value. Therefore, the loss function must be calculated for all training materials. In other words, in order to find the loss of the training data, the following formula must be used: [see equation (3)] Suppose there are N data, and                         
                            
                                    t
                                
                                    n
                                    k
                                
                     represents the                         
                            
                                    k
                                
                                    t
                                    h
                                
                     of the                         
                            
                                    n
                                
                                    t
                                    h
                                
                     data (                        
                            
                                    y
                                
                                    n
                                    k
                                
                     is the output of the neural network,                         
                            
                                    t
                                
                                    n
                                    k
                                
                     is the training  data). Expand the original loss function representing data into N data, and finally divide by N; Dividing by N can find each average loss function, and after the average, a unified index can be obtained, which is no longer affected by the amount of training data.” The method proposed to train the CNN in this article will use a cross-entropy error method. This model proposed that the loss over all the data is calculated using an average loss function. This will be calculated and backpropagated to train the network.)
“backpropagating the average loss to the at least the portion of the set of initial models.” (Solution Coding Method, pp. 3; “The weight and the bias are placed in the matrix of the particles in order. When the iSSO is updated, the particles will be updated from front to back, also known as forward-looking.  However, in the update of the BP algorithm, it is updated from the back to the front, which is called the BP. In this study, the two update methods will alternate.”  This article discloses that the model will use backpropagation during the training phases. As stated above the average loss is calculated and then backpropagated across the network from back to front.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Caltagirone, Ronneberger and Huang. Caltagirone teaches a semi-supervised learning method that uses CNN classifiers, labeled data, and unlabeled data to evaluate images. Ronneberger teaches a CNN classifier model which is able to evaluate images. Huang teaches a CNN model which uses an averaging loss function to train the CNN model. One of ordinary skill would have motivation to combine a system which is able to train and use CNNs in a semi-supervised training environment and a CNN classifier able to process images, “In this study, five image datasets were validated. For the best classification accuracy, the highest accuracy of the five datasets was better than PSO-SGD. After a rigorous experimental verification, iSSO-SGD was found to be significantly better than other traditional algorithms, such as Adadelta, momentum and Rmsprop, which proved that the new hybrid algorithm can establish a better prediction model, and proved that it can effectively solve the BP algorithm’s problem of becoming trapped in local optima.” (Huang, Conclusion, pp. 8).

Regarding claim 8, Caltagirone discloses, “wherein the labelled training dataset is associated with a first type of domain representation of a set of objects; and” (Data Set, pp. 3; “The road data set contains 289 labeled examples. As for the previous data set, each example consists of a color image and a point cloud; however, in this case the road ground truth is also available. Figure 1 shows an example of a road scene and its corresponding road label. The road data set is split into three balanced broad categories, urban marked (UM), urban multiple marked (UMM), and urban unmarked (UU) according to the presence and number of marked lanes or lack thereof.” The datasets used contain driving sequences recorded with different sensors and cameras. The training data contains different images containing objects pertaining to roadways and driving.)
“wherein the unlabelled training dataset is associated with a second type of domain representation of at least a portion of the set of objects.” (Data Set, pp. 3; “This work makes use of two data sets, the KITTI raw data set [16] and the KITTI road data set [17]. The raw data set consists of many driving sequences recorded over several days in urban, rural, and highway roads in daytime and fair-weather conditions. The sensor setup used for recording the sequences included four cameras and a high-resolution lidar Velodyne HDL-64E. In this work, only one of the color cameras and the lidar are considered.” This model uses road data as disclosed above. This data has been captured using different types of devices and the data contains images of objects pertaining to roadways and driving. A portion of this data is labeled and is separate from the unlabeled data. The remaining data would be considered to be the unlabeled data.)

Regarding claim 9, Caltagirone discloses, “wherein the labelled training dataset comprises labelled images; and” (Data Set, pp. 3; “The road data set contains 289 labeled examples. As for the previous data set, each example consists of a color image and a point cloud; however, in this case the road ground truth is also available. Figure 1 shows an example of a road scene and its corresponding road label. The road data set is split into three balanced broad categories, urban marked (UM), urban multiple marked (UMM), and urban unmarked (UU) according to the presence and number of marked lanes or lack thereof.” The datasets used contain driving sequences recorded with different sensors and cameras. The training data contains different images containing objects pertaining to roadways and driving. As stated, a portion of this data is labeled and is in the set of labeled data T.)
“wherein the unlabelled training dataset comprises unlabelled images.” (Data Set, pp. 3; “This work makes use of two data sets, the KITTI raw data set [16] and the KITTI road data set [17]. The raw data set consists of many driving sequences recorded over several days in urban, rural, and highway roads in daytime and fair-weather conditions. The sensor setup used for recording the sequences included four cameras and a high-resolution lidar Velodyne HDL-64E. In this work, only one of the color cameras and the lidar are considered.” This model uses road data as disclosed above. This data has been captured using different types of devices and the data contains images of objects pertaining to roadways and driving. A portion of this data is labeled and is separate from the unlabeled data and is labeled U in algorithm 2. The remaining data would be considered to be the unlabeled data.)

Regarding claim 21, Caltagirone and Ronneberger fail to explicitly disclose the limitations of this claim, however, Huang discloses, “wherein said updating of at least the portion of each of the set of initial models to obtain the set of trained models further comprises: determining an average loss based on the set of losses; and” (Entropy Fitness Function, pp. 3; “The so-called learning means finding the loss function of the training data, as far as possible to find the parameters in order to reduce the value. Therefore, the loss function must be calculated for all training materials. In other words, in order to find the loss of the training data, the following formula must be used: [see equation (3)] Suppose there are N data, and                         
                            
                                    t
                                
                                    n
                                    k
                                
                     represents the                         
                            
                                    k
                                
                                    t
                                    h
                                
                     of the                         
                            
                                    n
                                
                                    t
                                    h
                                
                     data (                        
                            
                                    y
                                
                                    n
                                    k
                                
                     is the output of the neural network,                         
                            
                                    t
                                
                                    n
                                    k
                                
                     is the training  data). Expand the original loss function representing data into N data, and finally divide by N; Dividing by N can find each average loss function, and after the average, a unified index can be obtained, which is no longer affected by the amount of training data.” The method proposed to train the CNN in this article will use a cross-entropy error method. This model proposed that the loss over all the data is calculated using an average loss function. This will be calculated and backpropagated to train the network.)
“backpropagating the average loss to the at least the portion of the set of initial models.” (Solution Coding Method, pp. 3; “The weight and the bias are placed in the matrix of the particles in order. When the iSSO is updated, the particles will be updated from front to back, also known as forward-looking.  However, in the update of the BP algorithm, it is updated from the back to the front, which is called the BP. In this study, the two update methods will alternate.”  This article discloses that the model will use backpropagation during the training phases. As stated above the average loss is calculated and then backpropagated across the network from back to front.)

Claims 11 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Caltagirone and Ronneberger in view of Zhou et al., (Zhou et al., “A Radiomics Approach With CNN for Shear-Wave Elastography Breast Tumor Classification”, Sep. 2018, hereinafter “Zhou”).

Regarding claim 11, Caltagirone and Ronneberger fail to explicitly disclose the limitations of this claim, however, Zhou discloses, “further comprising using the updated common feature extractor to extract features from data in a radiomics process.” (Introduction, pp. 1936; “We built a CNN based radiomics classification model for differentiating benign and malignant breast tumors with SWE data. To the best of our knowledge, it is the first attempt that makes use of radiomics based on CNN for automatically extracting high-throughput features from SWE to classify the malignant and benign breast tumors. Our method can automatically extract a large number of features from recoded SWE image dataset. These meaningful features include not only small structures/edges but also high-level and high-abstract information that allow us to get a good classification accuracy.” The model proposed in this article uses current CNN architectures and radiomics. This model will be able to process input images and classify them. This CNN model can be ported into Caltagirone as one of the classifier models uses.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Caltagirone, Ronneberger and Zhou. Caltagirone teaches a semi-supervised learning method that uses CNN classifiers, labeled data, and unlabeled data to evaluate images. Ronneberger teaches a CNN classifier model which is able to evaluate images. Zhou teaches a CNN model which is able to extract data from medical images using radiomic concepts. One of ordinary skill would have motivation to combine a system which is able to train and use CNNs in a semi-supervised training environment and a CNN classifier able to process images, “In order to further evaluate the proposed method, we compared it to two neural network-based methods, namely, the single layer neural network with PGBM and the two layers deep learning architectures PGBM [17], [18] and RBM. For the final classification, we adopted the support vector machine (SVM), the k-nearest neighbor (KNN) [19], and extreme learning machine (ELM) [20] classifier. The parameters in SVM were optimized by using the grid search method [21], and those in KNN and ELM were empirically set to yield their best performances. Table V shows the accuracy of different methods, we can find that the ELM classifier performance is better than SVM and KNN. And the two layers deep learning architecture (PGBMRBM) performance is better than single layer neural network performance. Compared with our proposed method, we can easily find that our proposed method has an overwhelming advantage over other methods to distinguish the benign and malignant of breast tumors.” (Zhou, Comparison with State-of-the-art Methods, pp. 1940).

Regarding claim 25, Caltagirone and Ronneberger fail to explicitly disclose the limitations of this claim, however, Zhou discloses, “wherein the processor is further configured for using the updated common feature extractor to extract features from data in a radiomics process.” (Introduction, pp. 1936; “We built a CNN based radiomics classification model for differentiating benign and malignant breast tumors with SWE data. To the best of our knowledge, it is the first attempt that makes use of radiomics based on CNN for automatically extracting high-throughput features from SWE to classify the malignant and benign breast tumors. Our method can automatically extract a large number of features from recoded SWE image dataset. These meaningful features include not only small structures/edges but also high-level and high-abstract information that allow us to get a good classification accuracy.” The model proposed in this article uses current CNN architectures and radiomics. This model will be able to process input images and classify them. This CNN model can be ported into Caltagirone as one of the classifier models uses.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL MICHAEL GALVIN-SIEBENALER whose telephone number is (571)272-1257. The examiner can normally be reached Monday - Friday 8AM to 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PAUL M GALVIN-SIEBENALER/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
METHOD OF AND SYSTEM FOR ADAPTING MULTIPLE TRAINED MACHINE LEARNING MODELS ON UNLABELLED DATASET

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD OF AND SYSTEM FOR ADAPTING MULTIPLE TRAINED MACHINE LEARNING MODELS ON UNLABELLED DATASET

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email