DETAILED ACTION
Response to Amendment
1. This office action is in response to applicant’s communication filed on 11/20/2025 in response to PTO Office Action mailed 08/20/2025. The Applicant’s remarks and amendments to the claims and/or the specification were considered with the results as follows.
2. In response to the last Office Action, claims 1, 3-7, 9, 10, 13 and 17 are amended. No claims are added or canceled. As a result, claims 1-19 are pending in this office action.
Response to Arguments
3. Applicant's arguments with respect to 35 USC 101 have been fully considered but they are not persuasive and the details are as follow:
Applicant’s argument stated as “Claim 1 recites a client that is not merely directed to performing a mental process…the claimed subject-matter is integrated into a practical application”.
In response to Applicant’s argument, the Examiner disagrees because claim 1, under its broadest reasonable interpretation, covers performance of the limitation in the mind for the recitation of generic computer components. The abstract idea recited in claim 1 is not integrated into a practical application. Claim 1 or other independent claims recite(s) two models: a property classification model that trains a classification model by a gradient information of a target model and a target model that computes the gradient information of the target model by computing the gradient that decreases the objective function by adding inference result. The output [e.g., the gradient information] of the target model is transmitted to a server. For example, claim 1 merely recites a computer includes a processor and a memory to provide data results after a series of data-gathering steps using data models. The computer in all the steps is recited at a high-level generality [i.e., as a generic computer] performing a computer function of providing a subset of results to a user device such that it amounts to no more than mere instructions to apply the exception using a computer as a tool to retrieve data results after a series of data-gathering steps. Training a classification model by gradient information using a second set of training data labelled with a teacher label; computing the gradient information that decreases an objective function by adding inference result and transmitting the gradient information to a server are insignificant extra-solution activities. Accordingly, this additional element [e.g. computer ] does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Mere instructions to apply an exception using a generic computer cannot provide an inventive concept. The claim is not patent eligible. There is no indication that the recited features improve the functioning of a computer or improve any other technology. Hence, claim 1 is ineligible under 35 USC 101. Claims 2-7 and 9-19 are rejected because of similar reason.
4. Applicant's arguments with respect to 35 USC 102 have been fully considered but they are moot in view of new ground(s) of rejection.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 1-7 and 9-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The claims are generally narrative and indefinite, failing to conform with current U.S. practice. They appear to be a literal translation into English from a foreign document and are replete with grammatical and idiomatic errors. The amended features “…by a gradient information of a target model that is computed for a second training data labelled with a teacher label regarding the property of an input data…by computing the gradient that decreases the objective function obtained by adding the inference result, derived by inputting the gradient information to the classification model after training, as a regularization term to the loss function corresponding to the target model” are unclear and ambiguous.
Claim 1 recite the limitation "the property" in line 11. There is insufficient antecedent basis for this limitation in the claim.
Claim 1 recite the limitation "the gradient" in line 14. There is insufficient antecedent basis for this limitation in the claim.
Claim 1 recite the limitation "the objective function" in line 14. There is insufficient antecedent basis for this limitation in the claim.
Claim 1 recites the limitation "the inference result" in line 14. There is insufficient antecedent basis for this limitation in the claim.
Claim 1 recites the limitation "the loss function" in line 16. There is insufficient antecedent basis for this limitation in the claim.
Claim 6 recites the limitation "the property" in line 17. There is insufficient antecedent basis for this limitation in the claim.
Claim 6 recites the limitation "the gradient" in line 20. There is insufficient antecedent basis for this limitation in the claim.
Claim 6 recites the limitation "the objective function" in line 20. There is insufficient antecedent basis for this limitation in the claim.
Claim 6 recites the limitation "the inference result" in line 21. There is insufficient antecedent basis for this limitation in the claim.
Claim 6 recites the limitation "the loss function" in line 22. There is insufficient antecedent basis for this limitation in the claim.
Claim 7 recites the limitation "the property" in line 7. There is insufficient antecedent basis for this limitation in the claim.
Claim 7 recites the limitation "the gradient" in line 11. There is insufficient antecedent basis for this limitation in the claim.
Claim 7 recites the limitation "the objective function" in line 11. There is insufficient antecedent basis for this limitation in the claim.
Claim 7 recites the limitation "the inference result" in line 11. There is insufficient antecedent basis for this limitation in the claim.
Claim 7 recite the limitation "the loss function" in line 13. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
4. Claims 1-7 and 9-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claims 1-7 and 9-19 are rejected under 35 U.S.C 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more. Claim 1 is directed to the abstract idea of training a target model by federated learning, as explained in detail below. The claim does not include elements that are sufficient to amount to significantly more than the judicial exception because the elements can be concepts performed in the human mind which do not add meaningful limits to practicing the abstract idea.
Claim 1 recites a client connectable to a server for federated learning comprising at least in part:
a property classification model training part that trains a classification model by a gradient information of a target model that is computed for a second training data labelled with a teacher label regarding the property of an input data, the classification model inferring a property of an input data from a gradient information (e.g., observing a classification model receiving input data and inferring a property from a collection of information can be performed in the human mind);
a target model training part that computes the gradient information of the target model by computing the gradient that decreases the objective function obtained by adding the inference result, derived by inputting the gradient information to the classification model after training, as a regularization term to the loss function corresponding to the target model (e.g., observing a target model computing the collection of information using a training data, a target model and the classification model can be performed in the human mind using mathematical calculations); and
transmits the gradient information […] (e.g., transmitting and storing the computed information can be performed in the human mind using pen and paper), wherein
the property of the input data that the classification model infers can be set for each client (e.g., observing the inferred property of the input data for each client can be performed in the human mind).
Claim 1, as it is recited, falls within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in the human mind, including observation, evaluation, judgment, and opinion. That is, other than reciting a client system comprising at least a processor is configured to execute program instructions stored in a memory to implement the machine learning models, nothing in the claim precludes the step from practically being performed in the mind. Claim 1 recites a client provides machine learning to input data, and a server receives model update parameter from the client and updates gradient information, that is recited at a high level of generality and adds no more to the claimed invention than a computer that performs an abstract idea. Claim 1 uses a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claim 1 is not patent eligible.
Claims 2-5 recite similar features to claim 1, is also fall within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in human mind including observation, evaluation, judgement, opinion. Claims 2-5 further defines the target model training part using a loss or a gain function, a classifier, an influence function and an influence function expression. Claims 2-5 use a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claims 2-5 are not patent eligible.
Claim 6 recites a system comprising at least in part:
a server comprising a federated learning part that trains a target model by exchanging a model update parameter including gradient information with a client by a federated learning; and a plurality of clients (e.g., observing updated parameter including a collection of information in each client and a server in order to train a target model can be performed in the human mind including paper and pen), wherein each clients comprises:
a property classification model training part that trains a classification model by the gradient information of the target model that is computed for a second training data labelled with a teacher label regarding the property of an input data, the classification model inferring a property of an input data from a gradient information (e.g., observing a classification model receiving input data and inferring a property from the collection of information can be performed in the human mind);
a target model training part that computes the gradient information of the target model by computing the gradient that decreases the objective function obtained by adding the inference result, derived by inputting the gradient information to the classification model after training, as a regularization term to the loss function corresponding to the target model (e.g., observing a target model computing the collection of information using a training data, the target model and the classification model can be performed in the human mind using mathematical formula); and
transmits the gradient information […] (e.g., transmitting and storing the computed information can be performed in the human mind using pen and paper).
Claim 6, as it is recited, falls within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in the human mind, including observation, evaluation, judgment, and opinion. That is, other than reciting a server and a client system comprising at least a processor configured to execute program instructions stored in a memory to implement the machine learning models, nothing in the claim precludes the step from practically being performed in the mind. Claim 6 recites a client provides machine learning to input data, and a server receives model update parameter from the client and updates gradient information, that is recited at a high level of generality and adds no more to the claimed invention than a computer that performs an abstract idea. Claim 6 uses a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claim 6 is not patent eligible.
Claim 7 recites a machine learning method for a client connectable to a server for federated learning comprising at least in part:
training a classification model by a gradient information of a target model that is computed (e.g., observing updated parameter including a collection of information in a client and a server in order to train a target model can be performed in the human mind including paper and pen); for a second training data labelled with a teach label regarding the property of an input data, the classification model inferring a property of an input data from the gradient information (e.g., observing a classification model receiving input data and inferring a property from a collection of information can be performed in the human mind);
computing the gradient information of the target model by computing the gradient that decreases the objective function obtained by adding the inference result, derived by inputting the gradient information to the classification model after training, as a regularization term to the loss function corresponding to the target model and transmitting the gradient information to the server (e.g., observing a target model computing the collection of information using a training data, a target model and the classification model can be performed in the human mind);
transmits the gradient information […] (e.g., observing and evaluating the property classification model training the classification model using the target model and a labelled training data can be performed in the human mind).
wherein the property of the input data that the classification model infers can be set for each client (e.g., observing the inferred property of the input data for each client can be performed in the human mind).
Claim 7, as it is recited, falls within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in the human mind, including observation, evaluation, judgment, and opinion. That is, other than reciting a client system comprising at least a processor is configured to execute program instructions stored in a memory to implement the machine learning models, nothing in the claim precludes the step from practically being performed in the mind. Claim 7 recites a client provides machine learning to input data, and a server receives model update parameter from the client and updates gradient information, that is recited at a high level of generality and adds no more to the claimed invention than a computer that performs an abstract idea. Claim 7 uses a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claim 7 is not patent eligible.
Claims 9-11 recite similar features to claim 1, is also fall within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in human mind including observation, evaluation, judgement, opinion. Claims 9-11 further defines the target model training part using a classifier, an influence function and an influence function expression. Claims 9-11 use a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claims 9-11 are not patent eligible.
Claims 12-15 recite similar features to claim 6, is also fall within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in human mind including observation, evaluation, judgement, opinion. Claims 12-15 further defines the target model training part using a loss or a gain function, a classifier, an influence function and an influence function expression. Claims 12-15 use a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claims 12-15 are not patent eligible.
Claims 17-19 recite similar features to claim 7, is also fall within one of the groupings of abstract ideas [e.g., mental process] enumerated in the 2019 PEG. The recited concept can be performed in human mind including observation, evaluation, judgement, opinion. Claims 17-19 further defines the target model training part using a classifier, an influence function and an influence function expression. Claims 17-19 use a computer [e.g. a server or a client] as a tool to retrieve data results after a series of data-gathering steps is an insignificant extra-solution activity. Thus, the judicial exception is not integrated into a practical application. The additional feature does not appear to be improvements to the functioning of a computer or to any other technology or technical field. The additional feature does not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Therefore, claims 17-19 are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7, 9, 10, 12-14 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Malik (US 2021/017780A1) and in view of Theodorakopoulos (US 2018/0137417 A1).
Referring to claims 1 and 7, Malik discloses a client connectable to a server (See para. [0007] and para. [0009], a hybrid architecture system built upon both client-side processes and server-side processes) for federated learning (See para. [0010], the server has a global neural network model comprising a plurality of federated model parameters), the client comprising a processor; and a memory in circuit communication with the processor, wherein the processor is configured to executed program instructions stored in the memory to implement (See Figure 16, para. [0154] and para. [0155], the system comprises at least a processor and a memory):
a property classification model training part that trains a classification model (See para. [0076] and Figure 2, the remote NLU module on the server performs domain classification/selection using an intent classifier [e.g. a property classification model], the intent classifier’s user intent associated with user request, there can be one intent classifier for each domain to determine the most possible intents in a given domain, the intent classifier is based on a machine-learning model that takes the domain classification/selection result as input and calculates a probability of the input being associated with a particular predefined intent) by a gradient information of a target model that is computed for a second training data labelled with a teacher label regarding the property of an input data (See para. [0123] and Figure 12, implementing personalized federated learning in classification model training includes a client system receives a current version of a global neural network model comprises a plurality of federated model parameters [e.g., a classification model], the client system access, from a local data store, a plurality of examples and a local personalization model comprises a plurality of local model parameters [e.g., target model], wherein each of the plurality of examples comprises one or more features and one or more labels [e.g., second training with labels], the client system trains the global neural network model [e.g., the classification model] and the local personalization model [e.g. the target model ] together on the plurality of examples to generate a plurality of updated federated model parameters and a plurality of updated local model parameters, the client system stores the trained local personalization model comprising the plurality of updated local model parameters and send the trained global neural network model comprising the plurality of updated federated model parameters to one or more servers), the classification model inferring a property of an input data from the gradient information (See para. [0076] and Figure 2, the intent classifier is based on a machine-learning model that takes the domain classification/selection result as input and calculates a probability of the input being associated with a particular predefined intent [e.g., inferring an intent of the input ], the classifier determines categories that describes the user’s intent) and a target model training part that computes the gradient information of the target model using a training data, the target model and the classification model […](See para. [0060] and para. [0096], the output of the NLU model [e.g. the classification model and the NLU’s training data based on user request] is sent to a local reasoning module, the local reasoning model conducts on-device learning that is based on learning algorithms particularly tailored for client systems [e.g., a target model]), and transmits the gradient information to the server (See para. [0060]- para. [0063], the output of the local reasoning module is sent to a dialog arbitrator, the dialog arbitrator aggregates output from both reasoning modules [e.g., server’s reasoning module and client’s reasoning model] to select the best reasoning result, the dialog arbitrator sends necessary information regarding the user input to the agents on the server-side in order to execute tasks in response to the user input) , wherein the property of the input data that the classification model infers can be set for each client (See para. [0094], para. [0123] and Figure 12, for each client system, the distributed data can be in a plurality of examples in a local data store of client system, each input example includes features and labels, both features and labels are known by client system, the machine-learning model having model parameters w∈custom-character.sup.d can be configured to predict one or more candidate labels ŷ=ƒ(x; w) corresponding to a given input example having a feature x. In particular embodiments, the machine-learning model may be a neural network model configured to predict, in response to one or more n-grams received by a client system).
Malik does not explicitly disclose a regularization term using a gain obtained by inputting information into a model.
Theodorakopoulos discloses an influence function computation part that computes an influence function, the influence function representing a sensitivity with which an input data affecting a parameter of the model (See para. [0081], the G.sub.i factors control the weight of the extra regularization term L.sub.aug in the cost function. The higher its value, the higher the influence. This in turn controls how sensitive the optimization process will be to the number of active kernels), wherein the model training part trains the model using the influence function as a regularization term (See para. [0056] and para. [0081], training a model including a regularization term L.sub.aug using gain factors G.sub. to tune the active kernels).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify model of the Malik to include a regularization term using a gain obtained by inputting information into a model, as taught by Theodorakopoulos. Skilled artisan would have been motivated to control the inference accuracy of the network in a trade-off between accuracy and algorithm complexity (See Theodorakopoulos, para. [0081]). Both of the references (Theodorakopoulos and Malik) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a neural network model. This close relation between both references highly suggests an expectation of success.
As to claims 2, 12 and 16, Malik discloses wherein the target model training part computes the gradient information using a loss function corresponding to the target model and […] by inputting the gradient information into the classification model (See para. [0096] and para. [0111], “each selected client system S.sub.m.sup.t may train the current model on the locally stored plurality of examples (x.sub.k,y.sub.k) by inputting examples (x.sub.k,y.sub.k) into the model to generate, for each example, one or more candidate labels ŷ.sub.k corresponding to the input example, and then generating a plurality of updated model parameters w.sup.t+1=ƒ(x.sub.k,y.sub.k;w.sup.t) based on the generated candidate labels ŷ.sub.k and the known labels y.sub.k. In particular embodiments, generating the updated model parameters w.sup.t+1 may be based on, for each example (x.sub.k,y.sub.k), a measure of error between the one or more candidate labels ŷ.sub.k and the known label y.sub.k. In particular embodiments, the training may be based on one or more iterations of gradient descent, or a stochastic variation of gradient descent (SGD). In particular embodiments, the updated model parameters w.sup.t+1 may be produced by minimizing a loss function ƒ(x.sub.k,y.sub.k;w.sup.t)”).
Malik does not explicitly disclose a regularization term using a gain obtained by inputting information into a model.
Theodorakopoulos discloses a regularization term using a gain obtained by inputting information into a model (See para. [0056] and para. [0081], training a model including a regularization term L.sub.aug using gain factors G.sub. to tune the active kernels).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify model of the Malik to include a regularization term using a gain obtained by inputting information into a model, as taught by Theodorakopoulos. Skilled artisan would have been motivated to control the inference accuracy of the network in a trade-off between accuracy and algorithm complexity (See Theodorakopoulos, para. [0081]). Both of the references (Theodorakopoulos and Malik) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a neural network model. This close relation between both references highly suggests an expectation of success.
As to claims 3, 13 and 17, Malik discloses wherein the target model training part comprises a classifier, the classifier judging whether or not data corresponding to the gradient information is data having a property that can be set for each client, based on an output of the classification model, trains the classifier to maximize an output of the classifier, and trains the target model using output of the classifier after training as the regularization term (See para. [0076], “the intent classifier may take the user request as input and formulate it into a vector. The intent classifier may then calculate probabilities of the user request being associated with different predefined intents based on a vector comparison between the vector representing the user request and the vectors representing different predefined intents. In a similar manner, the slot tagger may take the user request as input and formulate each word into a vector. The intent classifier may then calculate probabilities of each word being associated with different predefined slots based on a vector comparison between the vector representing the word and the vectors representing different predefined slots. The intent of the user may be classified as “changing money”. The slots of the user request may comprise “500”, “dollars”, “account”, and “Japanese yen”. The meta-intent of the user may be classified as “financial service”. The meta slot may comprise “finance””).
As to claims 4, 14 and 18, Malik does not disclose an influence function computation part that computes an influence function, the influence function representing a sensitivity with which an input data affecting a parameter of the target model.
Theodorakopoulos discloses an influence function computation part that computes an influence function, the influence function representing a sensitivity with which an input data affecting a parameter of the model (See para. [0081], the G.sub.i factors control the weight of the extra regularization term L.sub.aug in the cost function. The higher its value, the higher the influence. This in turn controls how sensitive the optimization process will be to the number of active kernels), wherein the model training part trains the model using the influence function as a regularization term (See para. [0056] and para. [0081], training a model including a regularization term L.sub.aug using gain factors G.sub. to tune the active kernels).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify model of the Malik to include a regularization term using a gain obtained by inputting information into a model, as taught by Theodorakopoulos. Skilled artisan would have been motivated to control the inference accuracy of the network in a trade-off between accuracy and algorithm complexity (See Theodorakopoulos, para. [0081]). Both of the references (Theodorakopoulos and Malik) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a neural network model. This close relation between both references highly suggests an expectation of success.
Referring to claim 6, Malik discloses a machine learning system comprising: a server comprising: at least a processor and a memory in circuit communication with the processor wherein the processor is configured to execute program instructions stored in the memory (See para. [0007], a server-side processes are performed remotely on one or more computing systems) to implement;
a federated learning part that trains a target model by exchanging a model update parameter including gradient information with a client by a federated learning (See para. [0060], the federated parameters can be trained remotely on the server and transmit [e.g., a global model] to client systems and calculate gradients locally on the client systems); and a plurality of clients, wherein each of the clients (See para. [0007], para. [0008], the client processers are performed locally on one or more computing systems) comprises:
at least a processor and a memory in circuit communication with the processor (See Figure 16, para. [0154] and para. [0155], the system comprises at least a processor and a memory), wherein the processor is configured to execute program instructions stored in the memory to implement: a property classification model training part that trains a classification model (See para. [0076] and Figure 2, the remote NLU module on the server performs domain classification/selection using an intent classifier [e.g. a property classification model], the intent classifier’s user intent associated with user request, there can be one intent classifier for each domain to determine the most possible intents in a given domain, the intent classifier is based on a machine-learning model that takes the domain classification/selection result as input and calculates a probability of the input being associated with a particular predefined intent), the classification model inferring a property of an input data from the gradient information (See para. [0076] and Figure 2, the intent classifier is based on a machine-learning model that takes the domain classification/selection result as input and calculates a probability of the input being associated with a particular predefined intent [e.g., inferring an intent of the input], the classifier determines categories that describes the user’s intent); and
a target model training part that computes the gradient information of the target model using a training data, the target model and the classification model (See para. [0060] and para. [0096], the output of the NLU model [e.g. the classification model and the NLU’s training data based on user request] is sent to a local reasoning module, the local reasoning model conducts on-device learning that is based on learning algorithms particularly tailored for client systems [e.g., a target model]), and transmits the gradient information to the server (See para. [0060]- para. [0063], the output of the local reasoning module is sent to a dialog arbitrator, the dialog arbitrator aggregates output from both reasoning modules [e.g., server’s reasoning module and client’s reasoning model] to select the best reasoning result, the dialog arbitrator sends necessary information regarding the user input to the agents on the server-side in order to execute tasks in response to the user input), wherein the property of the input data that the classification model infers can be set by each client (See para. [0094], para. [0123] and Figure 12, for each client system, the distributed data can be in a plurality of examples in a local data store of client system, each input example includes features and labels, both features and labels are known by client system, the machine-learning model having model parameters w∈custom-character.sup.d can be configured to predict one or more candidate labels ŷ=ƒ(x; w) corresponding to a given input example having a feature x. In particular embodiments, the machine-learning model may be a neural network model configured to predict, in response to one or more n-grams received by a client system), and the property classification model training part trains the classification model using the target model and a second training data labelled with a teacher label regarding the property of the input data (See para. [0123] and Figure 12, implementing personalized federated learning in classification model training includes a client system receives a current version of a global neural network model comprises a plurality of federated model parameters [e.g., a classification model], the client system access, from a local data store, a plurality of examples and a local personalization model comprises a plurality of local model parameters [e.g., target model], wherein each of the plurality of examples comprises one or more features and one or more labels [e.g., second training with labels], the client system trains the global neural network model [e.g., the classification model] and the local personalization model [e.g. the target model ] together on the plurality of examples to generate a plurality of updated federated model parameters and a plurality of updated local model parameters, the client system stores the trained local personalization model comprising the plurality of updated local model parameters and send the trained global neural network model comprising the plurality of updated federated model parameters to one or more servers).
Malik does not explicitly disclose a regularization term using a gain obtained by inputting information into a model.
Theodorakopoulos discloses an influence function computation part that computes an influence function, the influence function representing a sensitivity with which an input data affecting a parameter of the model (See para. [0081], the G.sub.i factors control the weight of the extra regularization term L.sub.aug in the cost function. The higher its value, the higher the influence. This in turn controls how sensitive the optimization process will be to the number of active kernels), wherein the model training part trains the model using the influence function as a regularization term (See para. [0056] and para. [0081], training a model including a regularization term L.sub.aug using gain factors G.sub. to tune the active kernels).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify model of the Malik to include a regularization term using a gain obtained by inputting information into a model, as taught by Theodorakopoulos. Skilled artisan would have been motivated to control the inference accuracy of the network in a trade-off between accuracy and algorithm complexity (See Theodorakopoulos, para. [0081]). Both of the references (Theodorakopoulos and Malik) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a neural network model. This close relation between both references highly suggests an expectation of success.
As to claim 9, Malik discloses wherein the target model training part comprises a classifier, the classifier judging whether or not data corresponding to the gradient information is data having a property that can be set for each client, based on an output of the classification model, trains the classifier to maximize an output of the classifier, and trains the target model using output of the classifier after training as the regularization term (See para. [0096] and para. [0111], “each selected client system S.sub.m.sup.t may train the current model on the locally stored plurality of examples (x.sub.k,y.sub.k) by inputting examples (x.sub.k,y.sub.k) into the model to generate, for each example, one or more candidate labels ŷ.sub.k corresponding to the input example, and then generating a plurality of updated model parameters w.sup.t+1=ƒ(x.sub.k,y.sub.k;w.sup.t) based on the generated candidate labels ŷ.sub.k and the known labels y.sub.k. In particular embodiments, generating the updated model parameters w.sup.t+1 may be based on, for each example (x.sub.k,y.sub.k), a measure of error between the one or more candidate labels ŷ.sub.k and the known label y.sub.k. In particular embodiments, the training may be based on one or more iterations of gradient descent, or a stochastic variation of gradient descent (SGD). In particular embodiments, the updated model parameters w.sup.t+1 may be produced by minimizing a loss function ƒ(x.sub.k,y.sub.k;w.sup.t)”.
As to claim 10, Malik does not explicitly disclose influence function representing a sensitivity that an input data gives to a parameter of the target model.
Theodorakopoulos discloses an influence function computation part that computes an influence function, the influence function representing a sensitivity that an input data gives to a parameter of the target model (See para. [0081], the G.sub.i factors control the weight of the extra regularization term L.sub.aug in the cost function. The higher its value, the higher the influence. This in turn controls how sensitive the optimization process will be to the number of active kernels), wherein the target model training part trains the target model using the influence function as a regularization term (See para. [0056] and para. [0081], training a model including a regularization term L.sub.aug using gain factors G.sub. to tune the active kernels).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify model of the Malik to include a regularization term using a gain obtained by inputting information into a model, as taught by Theodorakopoulos. Skilled artisan would have been motivated to control the inference accuracy of the network in a trade-off between accuracy and algorithm complexity (See Theodorakopoulos, para. [0081]). Both of the references (Theodorakopoulos and Malik) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a neural network model. This close relation between both references highly suggests an expectation of success.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUK TING CHOI whose telephone number is (571)270-1637. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, AMY NG can be reached at 5712701698. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/YUK TING CHOI/Primary Examiner, Art Unit 2164