DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
In regards to claim 6:
Applicant is advised that should claim 2 be found allowable, claim 6 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).
Claim Rejections - 35 USC § 112
Regarding 112(b):
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-9, 14-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
In regard to Claim 1:
Claim 1 recites the limitation "and the one or more neural networks are trained using data for each prototype domain". There is insufficient antecedent basis for this limitation in the claim. There is no prior recitation of prototype domains within the claim to understand what “each prototype domain” is referring to.
In regard to Claim 7:
Claim 7 recites the limitation "wherein the one or more neural networks are derived based on a primitive neural network trained on the prototype domain ". There is insufficient antecedent basis for this limitation in the claim. There is no prior recitation of a prototype domain within the claims to understand what “the prototype domain” is referring to.
In regard to claim 14:
Claim 14 is analogous to claim 1, thus contains the same 112(b) issues as claim 1.
In regard to claim 20:
Claim 20 is analogous to claim 7, thus contains the same 112(b) issues as claim 7.
In regards to dependent claims:
Dependent claims of claims rejected under 112(b) are rejected for being dependent upon a claim rejected under 112(b).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed towards an abstract idea without significantly more.
In regards to Claim 1:
Step 1: Is the claim directed towards a process, machine, manufacture, or composition of matter?
Yes, the claim is directed towards an apparatus, so a machine.
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 1 recites the following abstract ideas:
determine a weight to be applied to one or more neural networks based on input data
This limitation is directed towards the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as evaluation.
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 1 recites the following additional elements:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
At a high level of generality, this is an activity of using a user and an agent device as an “apply it” use (see MPEP 2106.05(f)).
construct a final neural network by applying the weight to the one or more neural networks
At a high level of generality, this is an activity of applying a weight to neural networks as an “apply it” use (see MPEP 2106.05(f)).
output result data of the input data using the final neural network
This limitation is directed towards the insignificant extra solution activity of mere data outputting (see MPEP 2106.05(g)(Consideration 3)).
the one or more neural networks are trained using data for each prototype domain
At a high level of generality, this is an activity of using data as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 1 recites the following additional elements:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
At a high level of generality, this is an activity of using a user and an agent device as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a user and agent device appears to be an implementation of the abstract idea on a computer, so merely using a computer as a tool to perform the abstract idea.
construct a final neural network by applying the weight to the one or more neural networks
At a high level of generality, this is an activity of applying a weight to neural networks as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “applying the weight” to one or more neural networks to generically/non-specifically construct a final neural network does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
A possible method to satisfy 101 could be add more details to prevent the limitation from being seen as a generic recitation. Adding details, such as how the construction is performed, could help satisfy the requirements under 101.
output result data of the input data using the final neural network
This limitation is directed towards the insignificant extra solution activity of mere data outputting (see MPEP 2106.05(g)(Consideration 3)). This is a well understood, routine conventional activity of presenting or transmitting data (see MPEP 2106.05(d)(2)(example iv. Presenting offers and gathering statistics and example i. Receiving or transmitting data over a network)).
the one or more neural networks are trained using data for each prototype domain
At a high level of generality, this is an activity of using data as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “trained using data” does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 2:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 2 recites the following abstract ideas:
wherein the input data is data associated with one or more prototype domains
This limitation is directed towards the continuation of an abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as a continuation of evaluation in claim 1. Indicating that the input data is related to prototype domains does not integrate the invention into a practical application.
In regards to Claim 3:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 3 recites the following abstract ideas:
wherein the one or more neural networks all have the same structure
This limitation is directed towards the continuation of an abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as a continuation of evaluation in claim 1. Indicating that the neural networks have the same structure does not integrate the invention into a practical application.
In regards to Claim 4:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 4 recites the following additional elements:
wherein the one or more neural networks are stored in a neural network pool
At a high level of generality, this is an Insignificant extra-solution activity (MPEP 2106.05(g) for Mere Data Gathering).
the neural network pool is compressed through a singular vector decomposition (SVD) technique
At a high level of generality, this is an activity of using SVD as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 4 recites the following additional elements:
wherein the one or more neural networks are stored in a neural network pool
At a high level of generality, this is an Insignificant extra-solution activity (MPEP 2106.05(g) for Mere Data Gathering). Storing data to a process does not add a meaningful limitation. At a high level of generality this is a well-understood, routine, conventional activity (see MPEP 2106.05(d)(iv) for a computer). Storing data in memory is a well-understood, routine, conventional activity in the field of computers and computer science.
the neural network pool is compressed through a singular vector decomposition (SVD) technique
At a high level of generality, this is an activity of using SVD as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “compressed through” using SVD does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 5:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 5 recites the following abstract ideas:
wherein the weight is in the form of a vector
This limitation is directed towards the continuation of an abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as a continuation of evaluation in claim 1. Indicating that the weight is in the form of a vector does not integrate the invention into a practical application.
In regards to Claim 6:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 6 recites the same abstract ideas as analogous claim 2.
In regards to Claim 7:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 7 recites the following abstract ideas:
wherein the one or more neural networks are derived based on a primitive neural network trained on the prototype domain
This limitation is directed towards the continuation of an abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as a continuation of evaluation in claim 1. Indicating that the one or more neural networks are based on a primitive neural network does not change the determination of a weight and does not integrate the invention into a practical application.
In regards to Claim 8:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 8 recites the following additional elements:
wherein the primitive neural network is trained through supervised learning or representation learning
At a high level of generality, this is an activity of usingsupervised learning or representation learning as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 8 recites the following additional elements:
wherein the primitive neural network is trained through supervised learning or representation learning
At a high level of generality, this is an activity of usingsupervised learning or representation learning as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “trained through supervised learning or representation learning” does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 9:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 9 recites the following abstract ideas:
wherein the weight is derived based on a multilayer neural network
This limitation is directed towards the continuation of an abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as a continuation of evaluation in claim 1. Indicating that the weight is based on a multilayer neural network does not change the determination of a weight and does not integrate the invention into a practical application.
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 9 recites the following additional elements:
and the multilayer neural network is trained based on a weighted sum of results of the one or more neural networks
At a high level of generality, this is an activity of using weighted sum as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 9 recites the following additional elements:
and the multilayer neural network is trained based on a weighted sum of results of the one or more neural networks
At a high level of generality, this is an activity of using weighted sum as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “multilayer neural network is trained” using a weighted sum does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 10:
Step 1: Is the claim directed towards a process, machine, manufacture, or composition of matter?
Yes, the claim is directed towards an apparatus, so a machine.
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 10 recites the following abstract ideas:
and perform multilayer neural network learning to determine a weight to be applied to one or more neural networks using the collected learning data, and the weight is derived to combine result values of the one or more neural networks
This limitation is directed towards the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as evaluation.
The use of a neural network does not prevent the interpretation of an abstract idea (MPEP 2106.04(a)(2)(3)), and the determining of a weight for the later purpose of combining result values is performable within the human mind or with pen and paper.
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 10 recites the following additional elements:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
At a high level of generality, this is an activity of using a processor and memory as an “apply it” use (see MPEP 2106.05(f)).
collect learning data associated with one or more prototype domains
This limitation is directed towards the insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 10 recites the following additional elements:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
At a high level of generality, this is an activity of using a processor and memory as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a processor and memory appears to be an implementation of the abstract idea on a computer, so merely using a computer as a tool to perform the abstract idea.
collect learning data associated with one or more prototype domains
This limitation is directed towards the insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)). This is a well understood, routine, conventional activity of transmitting data (see MPEP 2106.05(d) example i in computer functions).
In regards to Claim 11:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 11 recites the following additional elements:
wherein the multilayer neural network learning is based on a weighted sum of the one or more neural networks and a cross entropy loss function of GT- Label
At a high level of generality, this is an activity of using a weighted sum and a cross entropy loss function as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 11 recites the following additional elements:
wherein the multilayer neural network learning is based on a weighted sum of the one or more neural networks and a cross entropy loss function of GT- Label
At a high level of generality, this is an activity of using a weighted sum and a cross entropy loss function as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “multilayer neural network learning” based on a weighted sum and a cross entropy loss function does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it” or generic training.
In regards to Claim 12:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 12 recites the following additional elements:
wherein the multilayer neural network learning is performed based on a knowledge distillation method
At a high level of generality, this is an activity of using knowledge distillation as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 12 recites the following additional elements:
wherein the multilayer neural network learning is performed based on a knowledge distillation method
At a high level of generality, this is an activity of using knowledge distillation as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “learning is performed” based on knowledge distillation does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 13:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 13 recites the following additional elements:
wherein the learning data is generated by a mixup method of adjusting a ratio of data to the prototype domain
This limitation is directed towards the continuation of insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 13 recites the following additional elements:
wherein the learning data is generated by a mixup method of adjusting a ratio of data to the prototype domain
This limitation is directed towards the continuation of insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)). Thus this is seen as a continuation of a well understood, routine, conventional activity of transmitting data (see MPEP 2106.05(d) example i in computer functions) from the mere data gathering limitation in claim 10. Noting that the data used comes from a mixup method does not integrate the invention into a practical application.
In regards to claim 14:
Step 1: Is the claim directed towards a process, machine, manufacture, or composition of matter?
Yes, the claim is directed towards a process
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 14 recites the same abstract ideas as recited in analogous claim 1.
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 14 recites the same additional elements as recited in analogous claim 1.
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 14 recites the same additional elements as recited in analogous claim 1.
In regards to claim 15:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 15 recites the same abstract ideas as recited in analogous claim 2.
In regards to claim 16:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 16 recites the same abstract ideas as recited in analogous claim 3.
In regards to claim 17:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 17 recites the same additional elements as recited in analogous claim 4.
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 17 recites the same additional elements as recited in analogous claim 4.
In regards to claim 18:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 18 recites the same abstract ideas as recited in analogous claim 5.
In regards to claim 19:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 19 recites the following additional elements:
wherein the final neural network is derived based on a linear combination of parameters of the one or more neural networks using the weight
At a high level of generality, this is an activity of applying a weight to parameters as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 19 recites the following additional elements:
wherein the final neural network is derived based on a linear combination of parameters of the one or more neural networks using the weight
At a high level of generality, this is an activity of applying a weight to parameters as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “derived based on a linear combination of parameters” of the one or more neural networks using the weight does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
A possible method to satisfy 101 could be add more details to prevent the limitation from being seen as a generic recitation. Adding details, such as how the construction is performed, could help satisfy the requirements under 101. The claim limitation invokes more elements of what might be the construction of the final neural network, but how the weight is used in the linear combination is noted generically, thus how the linear combination is performed is generic. The weight being determined in a broad manner also prevents an indication as to how the weight is supposed to be used. Details on what the weight is (what is the weight of or for) could be a detail to help prevent a generic recitation.
In regards to claim 20:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 20 recites the same abstract ideas as recited in analogous claim 7.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-9 and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hashem (“Optimal Linear Combinations of Neural Networks”), referred to as Hashem in this document, and further in combination with Izmailov et al (“Averaging Weights Leads to Wider Optima and Better Generalization”), referred to as Izmailov in this document, and further in combination with Samdani et al (“Domain Adaptation with Ensemble of Feature Groups”), referred to as Samdani in this document, and further in combination with Yeo et al (US 20220036152 A1), referred to as Yeo in this document.
Regarding Claim 1:
Hashem teaches:
determine a weight to be applied to one or more neural networks based on input data,
[Hashem 3. Linear Combinations of Neural Networks page 3]: “Consider a multi-input single-output mapping approximated by a trained NN. A trained NN accepts a vector-valued input x and returns a scalar output (response) y(x) [determine a weight to be applied to one or more neural networks based on input data]. The approximation error is δ(x) = r(x) − y(x), where r(x) is the response of the real system (true response) for x.”
construct a final neural network by applying the weight to the one or more neural networks,
[Hashem Introduction page 1]: “Hashem and Schmeiser (1995) proposed forming a linear combination of the corresponding outputs of the trained NNs, instead of just using the apparent best network. Combining the trained networks may help integrate the knowledge acquired by the component networks and often produces superior model accuracy compared to the single best-trained network (Hashem, 1993; Hashem and Schmeiser, 1993; Hashem et al. (1993), Hashem et al. (1994)). Optimal linear combinations (OLCs) of neural networks are constructed by forming weighted sums [by applying the weight to the one or more neural networks] of the corresponding outputs of the networks.”
and output result data of the input data using the final neural network,
[Hashem 2 Related Work page 3]: “For a given input, x, the output [and output result data of the input data using the final neural network] of the combined model y, is the weighted sum of the corresponding outputs of the component NNs, yj, j = 1,...,p, and the aj’s are the associated combination-weights.”
Hashem does not explicitly teach:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
construct a final neural network (Hashem notes a combined model [Hashem 2 Related Work page 3] which appears to fit the limitation under BRI, but another reference is used in hopes of progressing prosecution faster)
and the one or more neural networks are trained using data for each prototype domain
Yeo teaches:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
[Yeo 0164]: “The loaded dedicated artificial intelligence model may be a compressed artificial intelligence model having a smaller size than the generic-purpose artificial intelligence model stored in the memory 110. As described above, the processor 120 may load a dedicated artificial intelligence model having a small size and may perform an operation, and thus an operation amount for the target data can be reduced, and the processing speed can be improved, and the resources (e.g., the memory [An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data], the CPU [and a processor configured to control the memory, wherein the processor is configured to], the GPU, etc.) of the electronic apparatus 100 are not wasted, and thus usefulness of the resources can be enhanced.”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Yeo. Hashem and Yeo are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Yeo in order to be able to create a physical embodiment of the invention or to give the invention physical form [Yeo 0164]: “(and the resources (e.g., the memory, the CPU, the GPU, etc.) of the electronic apparatus 100 are not wasted, and thus usefulness of the resources can be enhanced)”.
Izmailov teaches:
construct a final neural network
[Izmailov 3.5 Connection To Ensembling page 7]: “In SWA instead of averaging the predictions of the models we average their weights [construct a final neural network]. However, the predictions proposed by FGE ensembles and SWA models have similar properties.”
Support for interpretation of combining weights/parameters of models [Current Application page 4 line 18]: “In addition, the final neural network may be derived based on a linear combination of parameters of the one or more neural networks using the weight.”. Izmailov notes the methods of ensembling outputs/predictions and the weights are similar, thus supporting the motivation of the combination.
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Izmailov. Hashem and Izmailov are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Izmailov in order to be able to create a better performing model by combining weights instead of predictions ([Izmailov 4.1 Cifar Datasets page 9]: “Amazingly, SWA is able to achieve comparable or better performance than FGE ensembles with just one model.”).
Samdani teaches:
and the one or more neural networks are trained using data for each prototype domain
[Samdani 3 Feature Ensembles for Domain Adaptation page 2]: “Compared to existing domain adaption methods, FEAD provides an additional degree of freedom to adjust the trade-off between the generalization error and domain distribution change of individual classifiers trained on the corresponding feature groups [and the one or more neural networks are trained using data for each prototype domain as Samdani teaches aspects for domain adaptation and notes within this quote that classifiers can be trained on information for a form of domain, such a feature group.].”
Samdani is relevant for the combination of models as shown by [Samdani Introduction page 1]: “Given a set of feature groups that capture this notion, where the grouping can be decided by domain knowledge or statistics derived from unlabeled data, we first train individual classifiers separately using only the corresponding group of features. The final model is a weighted ensemble of individual classifiers, where the weights are tuned based on the performance of the ensemble on a small amount of labeled target data.”.
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Samdani. Hashem and Samdani are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Samdani in order to be able to utilize domain adaptation to create more relevant or accurate models ([Samdani Introduction page 1]: “However, in several real-world applications, it is often highly desirable to train a classifier from one source domain, and apply it to a similar but different target domain, where the annotated data is unavailable or expensive to create. One example of this scenario is to learn a text categorizer from a large collection of labeled newswire articles, but use it to process regular Web documents. In this domain adaptation setting, the goal is to leverage the data available in the source domain to improve the accuracy of the model when testing on target domain examples.”)
Regarding Claim 2:
The apparatus of claim 1 is taught by Hashem, Izmailov, Samdani, and Yeo.
Samdani teaches:
wherein the input data is data associated with one or more prototype domains
[Samdani 3 Feature Ensembles for Domain Adaptation page 2]: “Compared to existing domain adaption methods, FEAD provides an additional degree of freedom to adjust the trade-off between the generalization error and domain distribution change of individual classifiers trained on the corresponding feature groups [wherein the input data is data associated with one or more prototype domains as Samdani teaches aspects for domain adaptation and notes within this quote that classifiers can be trained on information for a form of domain, such a feature group.].”
The motivation to combine with Samdani is the same motivation as the motivation to combine with Samdani in claim 1.
Regarding Claim 3:
The apparatus of claim 1 is taught by Hashem, Izmailov, Samdani, and Yeo.
Hashem teaches:
wherein the one or more neural networks all have the same structure
[Hashem 2. Related Work page 2]: “Hansen and Salamon (1990) suggested training a group of networks of the same architecture [wherein the one or more neural networks all have the same structure] but initialized with different connection-weights. Then, as screen subset of the trained networks is used for making the final classification decision by some voting scheme.”
Regarding Claim 4:
The apparatus of claim 1 is taught by Hashem, Izmailov, Samdani, and Yeo.
Samdani teaches:
wherein the one or more neural networks are stored in a neural network pool,
[Samdani 3 Feature Ensembles for Domain Adaptation page 2]: “Compared to existing domain adaption methods, FEAD provides an additional degree of freedom to adjust the trade-off between the generalization error and domain distribution change of individual classifiers [wherein the one or more neural networks are stored in a neural network pool, as a network pool is interpreted as a way to refer to multiple or a collection of neural networks as shown by Figure 3 of the current application] trained on the corresponding feature groups.”
The motivation to combine with Samdani is the same as the motivation to combine with Samdani in claim 1.
Yeo teaches:
and the neural network pool is compressed through a singular vector decomposition (SVD) technique
[Yeo 0013]: “FIG. 3A is a diagram showing compression of an artificial intelligence model using an SVD algorithm [and the neural network pool is compressed through a singular vector decomposition (SVD) technique] according to an embodiment”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Yeo. Hashem and Yeo are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Yeo in order to be able to compress neural networks to preserve resources or use resources more efficiently ([Yeo 0164]: “The loaded dedicated artificial intelligence model may be a compressed artificial intelligence model having a smaller size than the generic-purpose artificial intelligence model stored in the memory 110. As described above, the processor 120 may load a dedicated artificial intelligence model having a small size and may perform an operation, and thus an operation amount for the target data can be reduced, and the processing speed can be improved, and the resources (e.g., the memory, the CPU, the GPU, etc.) of the electronic apparatus 100 are not wasted, and thus usefulness of the resources can be enhanced.”)
Regarding Claim 5:
The apparatus of claim 1 is taught by Hashem, Izmailov, Samdani, and Yeo.
Hashem teaches:
wherein the weight is in the form of a vector
[Hashem 3. Linear Combinations of Neural Networks page 3]: “One approach for the multi-output case is to compute an optimal combination-weight vector [wherein the weight is in the form of a vector] for each output separately”
Regarding Claim 6:
This claim is analogous to claim 2.
Regarding Claim 7:
The apparatus of claim 1 is taught by Hashem, Izmailov, Samdani, and Yeo.
Izmailov teaches:
wherein the one or more neural networks are derived based on a primitive neural network trained on the prototype domain
[Izmailov 3.2 SWA Algorithm]: “Following Garipov et al. [2018], we start with a pretrained model wˆ [wherein the one or more neural networks are derived based on a primitive neural network trained on the prototype domain]. We will refer to the number of epochs required to train a given DNN with the conventional training procedure as its training budget and will denote it by B. The pretrained model wˆ can be trained with the conventional training procedure for full training budget or reduced number of epochs (e.g. 0.75B). In the latter case we just stop the training early without modifying the learning rate schedule. Starting from wˆ we continue training, using a cyclical or constant learning rate schedule. When using a cyclical learning rate we capture the models wi that correspond to the minimum values of the learning rate (see Figure 2), following Garipov et al. [2018]. For constant learning rates we capture models at each epoch. Next, we average the weights of all the captured networks wi to get our final model wSWA.”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Izmailov for the same motivation provided in claim 1 to combine with Izmailov, as well as the derivation from a primitive neural network acts as a starting point for method of providing a better model, thus still sensible for one of ordinary skill to combine.
Regarding Claim 8:
The apparatus of claim 7 is taught by Hashem, Izmailov, Samdani, and Yeo.
Hashem teaches:
wherein the primitive neural network is trained through supervised learning or representation learning
[Hashem Introduction page 2]: “This paper focuses mainly on function approximation or regression problems. However, MSE-OLCs are also applicable to supervised [wherein the primitive neural network is trained through supervised learning or representation learning as Hashem is teaching in this quote supervised learning in the example of supervised classification, where the primitive neural network is taught in claim 7] classification problems”
Regarding Claim 9:
The apparatus of claim 1 is taught by Hashem, Izmailov, Samdani, and Yeo.
Hashem teaches:
wherein the weight is derived based on a multilayer neural network,
[Hashem Introduction page 2]: “The class of neural networks investigated here is the class of multilayer [wherein the weight is derived based on a multilayer neural network as this quote teaches multilayer neural networks as claim 1 teaches the determination of a weight] feedforward networks. No further assumptions regarding the network architecture or the learning method are needed”
and the multilayer neural network is trained based on a weighted sum of results of the one or more neural networks
[Hashem 3. Linear Combinations of Neural Networks page 3]: “Consider a multi-input single-output mapping approximated by a trained NN. A trained NN accepts a vector-valued input x and returns a scalar output (response) y(x). The approximation error is δ(x) = r(x) − y(x), where r(x) is the response of the real system (true response) for x [and the multilayer neural network is trained based on a weighted sum of results of the one or more neural networks as Hashem is teaching that the network for predicting a weight is trained using the output of one or more neural networks].”
Further support is given in the words following the quote [Hashem 3. Linear Combinations of Neural Networks page 3]: “According to (Hashem& Schmeiser,1995),a linear combination of the outputs of p NNs returns the scalar y(x; a)=
∑
j
p
a
j
y
j
(
x
)
, with the corresponding approximation error (x; a)= r(x) –y(x; a); where yj(x) is the output of the jth network and aj is the associated combination-weight”
Regarding Claim 14:
Claim 14 is analogous to claim 1.
Regarding Claim 15:
Claim 15 is analogous to claim 2.
Regarding Claim 16:
Claim 16 is analogous to claim 3.
Regarding Claim 17:
Claim 17 is analogous to claim 4.
Regarding Claim 18:
Claim 18 is analogous to claim 5.
Regarding Claim 19:
The method of claim 14 is taught by Hashem, Izmailov, Samdani, and Yeo
Hashem teaches:
wherein the final neural network is derived based on a linear combination of parameters of the one or more neural networks using the weight
[Hashem Introduction page 1]: “Hashem and Schmeiser (1995) proposed forming a linear combination [wherein the final neural network is derived based on a linear combination of parameters of the one or more neural networks using the weight where the construction of a final neural network is taught in claim 14] of the corresponding outputs of the trained NNs, instead of just using the apparent best network. Combining the trained networks may help integrate the knowledge acquired by the component networks and often produces superior model accuracy compared to the single best-trained network (Hashem, 1993; Hashem and Schmeiser, 1993; Hashem et al. (1993), Hashem et al. (1994)). Optimal linear combinations (OLCs) of neural networks are constructed by forming weighted sums of the corresponding outputs of the networks.”
Regarding Claim 20:
Claim 20 is analogous to claim 7.
Claims 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hashem (“Optimal Linear Combinations of Neural Networks”), referred to as Hashem in this document, and further in combination with Samdani et al (“Domain Adaptation with Ensemble of Feature Groups”), referred to as Samdani in this document, and further in combination with Yeo et al (US 20220036152 A1), referred to as Yeo in this document.
Regarding Claim 10:
A note about claim 10 is claim 10 seems to be referring more to generic ensemble learning as the claim notes result values or outputs rather than combining of parameters like noted in the specification quote used to support the construction of a final neural network for claim 1 and analogous. This note is left for assistance of applicant in understanding the interpretation of claim 10, as claim 10s interpretation could affect both art used in rejection under 103 and the rejections possible to be raised in 101.
Hashem teaches:
and perform multilayer neural network learning to determine a weight to be applied to one or more neural networks using the collected learning data, and the weight is derived to combine result values of the one or more neural networks
[Hashem 3. Linear Combinations of Neural Networks page 3]: “Consider a multi-input single-output mapping approximated by a trained NN. A trained NN accepts a vector-valued input x and returns a scalar output (response) y(x) [and perform multilayer neural network learning to determine a weight to be applied to one or more neural networks using the collected learning data, and the weight is derived to combine result values of the one or more neural networks]. The approximation error is δ(x) = r(x) − y(x), where r(x) is the response of the real system (true response) for x.”
Further support is given in the words following the quote [Hashem 3. Linear Combinations of Neural Networks page 3]: “According to (Hashem& Schmeiser,1995),a linear combination of the outputs of p NNs returns the scalar y(x; a)=
∑
j
p
a
j
y
j
(
x
)
, with the corresponding approximation error (x; a)= r(x) –y(x; a); where yj(x) is the output of the jth network and aj is the associated combination-weight”
Hashem does not explicitly teach:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
collect learning data associated with one or more prototype domains,
Yeo teaches:
An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data; and a processor configured to control the memory, wherein the processor is configured to
[Yeo 0164]: “The loaded dedicated artificial intelligence model may be a compressed artificial intelligence model having a smaller size than the generic-purpose artificial intelligence model stored in the memory 110. As described above, the processor 120 may load a dedicated artificial intelligence model having a small size and may perform an operation, and thus an operation amount for the target data can be reduced, and the processing speed can be improved, and the resources (e.g., the memory [An apparatus for constructing a domain adaptive network, the apparatus comprising: a memory configured to store data;], the CPU [and a processor configured to control the memory, wherein the processor is configured to], the GPU, etc.) of the electronic apparatus 100 are not wasted, and thus usefulness of the resources can be enhanced.”
The motivation to combine with Yeo is the same motivation to combine with Yeo as claim 1.
Samdani teaches:
collect learning data associated with one or more prototype domains,
[Samdani 3 Feature Ensembles for Domain Adaptation page 2]: “Compared to existing domain adaption methods, FEAD provides an additional degree of freedom to adjust the trade-off between the generalization error and domain distribution change of individual classifiers trained on the corresponding feature groups [collect learning data associated with one or more prototype domains as Samdani teaches aspects for domain adaptation and notes within this quote that classifiers can be trained on information for a form of domain, such a feature group.].”
Where further support for collecting data is shown by Samdani noting data is collected for datasets [Samdani 3.3 FEAD as Product of Experts page 4]: “In this set of experiments, we use the benchmark dataset released by Blitzer et al. [2007], which consists of reviews of four different product categories: books, DVDs, electronics, and kitchen appliances, collected from Amazon.com.”
The motivation to combine with Samdani is the same motivation as the motivation to combine with Samdani in claim 1.
Claims 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hashem (“Optimal Linear Combinations of Neural Networks”), referred to as Hashem in this document, and further in combination with Samdani et al (“Domain Adaptation with Ensemble of Feature Groups”), referred to as Samdani in this document, and further in combination with Yeo et al (US 20220036152 A1), referred to as Yeo in this document, and further in combination with Meng et al (US 20200334538 A1), referred to as Meng in this document.
Regarding Claim 11:
The apparatus of claim 10 is taught by Hashem, Yeo, and Samdani
Hashem teaches:
wherein the multilayer neural network learning is based on a weighted sum of the one or more neural networks
[Hashem 3. Linear Combinations of Neural Networks page 3]: “Consider a multi-input single-output mapping approximated by a trained NN. A trained NN accepts a vector-valued input x and returns a scalar output (response) y(x) [wherein the multilayer neural network learning is based on a weighted sum of the one or more neural networks]. The approximation error is δ(x) = r(x) − y(x), where r(x) is the response of the real system (true response) for x.”
Where additional support for multilayer neural networks is given in [Hashem Introduction page 2]: “The class of neural networks investigated here is the class of multilayer feedforward networks. No further assumptions regarding the network architecture or the learning method are needed”
Hashem does not explicitly teach:
and a cross entropy loss function of GT- Label
Meng teaches:
and a cross entropy loss function of GT- Label
[Meng 0027]: “where 0≤λ≤1 is the weight for the class posteriors and <custom character> is the indicator function which equals to 1 if the condition in the squared bracket is satisfied and 0 otherwise. Note that the interpolated T/S learning becomes soft T/S when λ=1.0 and becomes standard cross-entropy [and a cross entropy loss function of GT- Label] training with hard labels when λ=0.0. Although interpolated T/S compensates for the imperfection in knowledge transfer, the linear combination of soft and hard labels destroys the correct relationships among different classes embedded naturally in the soft class posteriors and deviates the student model parameters from the optimal direction. Moreover, the search for the best student model is subject to the heuristic tuning of λ between 0 and 1.”
Further support for relation with ground truth is given in [Meng 0023]: “One shortcoming of T/S learning is that a teacher model, not always perfect, sporadically makes incorrect predictions that mislead the student model toward a suboptimal performance. In such a case, it may be beneficial to utilize hard labels of the training data to alleviate this effect. Some approaches use an interpolated T/S learning called knowledge distillation, in which a weighted sum of the soft posteriors and the one-hot hard label is used to train the student model. One issue is that the simple linear combination with one-hot vectors destroys the relationships among different classes embedded naturally in the soft posteriors produced by the teacher model. Moreover, proper setting of the interpolation weight with a fixed value is known to be critical and it varies with the adaptation scenarios and the qualities of the teacher and ground truth labels”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Meng. Hashem and Meng are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Meng in order to be able to find a better model as the method acts to aid training ([Meng 0027]: “where 0≤λ≤1 is the weight for the class posteriors and <custom character> is the indicator function which equals to 1 if the condition in the squared bracket is satisfied and 0 otherwise. Note that the interpolated T/S learning becomes soft T/S when λ=1.0 and becomes standard cross-entropy training with hard labels when λ=0.0… Moreover, the search for the best student model is subject to the heuristic tuning of λ between 0 and 1.”)
Regarding Claim 12:
The apparatus of claim 10 is taught by Hashem, Yeo, and Samdani
Hashem does not explicitly teach:
wherein the multilayer neural network learning is performed based on a knowledge distillation method
Meng teaches:
wherein the multilayer neural network learning is performed based on a knowledge distillation method
[Meng 0023]: “One shortcoming of T/S learning is that a teacher model, not always perfect, sporadically makes incorrect predictions that mislead the student model toward a suboptimal performance. In such a case, it may be beneficial to utilize hard labels of the training data to alleviate this effect. Some approaches use an interpolated T/S learning called knowledge distillation [wherein the multilayer neural network learning is performed based on a knowledge distillation method], in which a weighted sum of the soft posteriors and the one-hot hard label is used to train the student model. One issue is that the simple linear combination with one-hot vectors destroys the relationships among different classes embedded naturally in the soft posteriors produced by the teacher model. Moreover, proper setting of the interpolation weight with a fixed value is known to be critical and it varies with the adaptation scenarios and the qualities of the teacher and ground truth labels.”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Meng. Hashem and Meng are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Meng in order to be able to create a better model, such as a student model that is smarter than the teacher model ([Meng 0024]: “Some embodiments described herein utilize a conditional T/S learning scheme where a student model becomes smart so that it can criticize the knowledge imparted by the teacher model to make better use of the teacher and the ground truth. At the initial stage, when the student model is very weak, it may blindly follow all knowledge infused by the teacher model and use the soft posteriors as the sole training targets. As the student model grows stronger, it may begin to selectively choose the learning source from either the teacher model or the ground truth labels conditioned on whether the teacher's prediction coincides with the ground truth. That is, the student model may learn exclusively from the teacher when the teacher makes correct predictions on training samples, and otherwise from the ground truth when the teacher is wrong. With conditional T/S learning, the student makes good use of rich and correct knowledge encompassed by the teacher yet avoids receiving inaccurate knowledge generated by the teacher. Another advantage of the conditional T/S learning over the conventional T/S learning is that it forgoes tuning the interpolation weight between two knowledge sources.” )
Claims 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hashem (“Optimal Linear Combinations of Neural Networks”), referred to as Hashem in this document, and further in combination with Samdani et al (“Domain Adaptation with Ensemble of Feature Groups”), referred to as Samdani in this document, and further in combination with Yeo et al (US 20220036152 A1), referred to as Yeo in this document, and further in combination with Liang et al (“Understanding Mixup Training Methods”), referred to as Liang in this document.
Regarding Claim 13:
The apparatus of claim 10 is taught by Hashem, Yeo, and Samdani
Hashem does not explicitly teach:
wherein the learning data is generated by a mixup method of adjusting a ratio of data to the prototype domain
Liang teaches:
wherein the learning data is generated by a mixup method of adjusting a ratio of data to the prototype domain
[Liang 3. Method A. General Mixup page 3]: “In order to explore the impact of the mixing of images and their labels on the performance of the network [wherein the learning data is generated by a mixup method of adjusting a ratio of data to the prototype domain as Liang is teaching the mixup method and related information], we either interpolate the images or interpolate their labels, or interpolate them at the same time. The Uniform distribution can better control the range of λ compared to the Beta distribution, and has the same effect on some datasets. Therefore, the Uniform distribution is adopted to control the sampling of λ . We use λx to represent the mixing ratio of two samples x for λ∈ Uniform(λ1,λ2) , and Rl to represent the mixing ratio of two labels y for λ∈ Uniform(λ1,λ2) , where 0≤λ1≤λ≤λ2≤1 . When λx=λl , we denote them as λ .”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Hashem and Meng. Hashem and Meng are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Hashem and Meng in order to be able to prevent overfitting or improve generalization ([Liang Introduction page 1]: “In SamplePairing [13] and mixup [14], a method of training a neural network using two samples simultaneously is proposed. SamplePairing randomly picks one sample in the training set to add to the original sample and uses the original sample’s label to train the network. The mixup uses a random value to weight the two samples and their corresponding labels. All of the above methods have some effect of data augmentation and regularization, and they can achieve better generalization performance than Empirical Risk Minimization (ERM) [15].”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chu et al (US 20210374617 A1) is considered relevant art, as Chu et al discusses the use of weighted averaging but utilizing coefficients, aka weighted aggregation, in paragraphs 74-75. Thus Chu et al is an example of some of the ideas in the current application.
Gupta et al (“Stochastic Weight Ave raging in Parallel: Large-batch Training that Generalizes Well”) is relevant art that discusses a method of combining model parameters in a method called SWAP or Stochastic Weight Averaging in Parallel. Thus Gupta et al is relevant to the current application in that Gupta et al discusses the creation of a new “final” model utilizing the combination of existing models weights.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER D DEVORE whose telephone number is (703)756-1234. The examiner can normally be reached Monday-Friday 7:30 am - 5 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.D.D./Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129