Last updated: July 17, 2026
Application No. 18/185,550
MACHINE LEARNING MODEL TRAINING METHOD AND RELATED DEVICE

Final Rejection §103
Filed
Mar 17, 2023
Priority
Sep 18, 2020 — CN 202010989062.5 +1 more
Examiner
TRAN, TAN H
Art Unit
2141
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Final)
This examiner grants 60% of cases after interview

— +32.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 315 resolved cases, 2023–2026
Examiner Intelligence

TRAN, TAN H View full profile →
Grants 60% of resolved cases
Career Allowance Rate
190 granted / 315 resolved
+5.3% vs TC avg
Strong +33% interview lift
Without
With
+32.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
33 currently pending
Career history
371
Total Applications
across all art units
Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
92.4%
+52.4% vs TC avg
§102
4.6%
-35.4% vs TC avg
§112
0.2%
-39.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 315 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
2.	This Office Action is sent in response to Applicant’s Communication received on 04/28/2026 for application number 18/185,550. 

Response to Amendments
3.	The Amendment filed 04/28/2026 has been entered. Claims 1-8, 10-17, 19, and 20 have been amended. Claims 1-20 remain pending in the application. 

4.	Applicant’s amendments to the claims 3, 11-12, and 19-20 have been fully considered and are persuasive. The objections to these claims are respectfully withdrawn.

Response to Arguments
	Applicant argues that Reisser fails to disclose the amended independent claims 1, 12, 19, and 20. However, the argument is moot since this is a newly presented limitation, thus changes the scope of the claim. However, a newly found reference, Benyahia, is applied.

Claim Rejections – 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


6.	Claims 1, 2, 7, 12-13, 15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Reisser et al. (U.S. Patent Application Pub. No. US 20230118025 A1) in view of Benyahia et al. (U.S. Patent Application Pub. No. US 20200104688 A1).

	Claim 1: Reisser teaches a machine learning model training method (i.e. federated learning involves collaborative training of neural network models across multiple users, without the need to gather the data at a central location; para. [0065]) performed by a first client device (i.e. A client or user may receive a neural network model at a local device such as a smartphone; para. [0092, 0093]), the machine learning model training method comprises a plurality of rounds of iteration, and one of the plurality of rounds of iteration (i.e. multiple gradient updates for parameters w in the inner optimization of objective may be performed for each shard S, thus obtaining local models with parameters Ws. The multiple gradient updates may be referred to as local epochs such as an amount of data passes through the entire local data set, with an abbreviation of E. Each shard may then communicate data corresponding to the local model Ws to the server. In turn, the server updates the global model at round t by averaging the parameters of the local model. This may be referred to as federated averaging; para. [0070, 0089-0091], federated training process in which local updates are repeatedly produced and aggregated, and the server updates the global model “at round t”) comprises:
obtaining at least one first machine learning model (i.e. receives a neural network model from a server; para. [0092]), wherein the at least one first machine learning model is obtained based on a data feature of a first data set stored in the first client device (i.e. At block 706, the method 700 selects one or more of the specialized models based in part on a characteristic associated with the local dataset. A client or user has a different set of parameters and may select which experts to use based on local data; para. [0093-0095], candidate models being selected using that local data characteristic);
performing a training operation on the at least one first machine learning model by using the first data set, to obtain at least one trained first machine learning model (i.e. At block 708, the method 700 generates a personalized model by fine tuning the neural network model based the selected one or more specialized models and the local dataset; para. [0070, 0095], fine-tuning/multiple gradient updates on the local dataset are training operations producing a trained local model); and
sending at least one updated neural network module comprised in the at least one trained first machine learning model to a server communicably coupled to the first client device (i.e. Each shard may then communicate data corresponding to the local model Ws to the server; para. [0070, 0082, 0089], local model information/expert specific updates from client/shard to the central server), wherein the updated neural network module is used by the server to update weight parameters of a plurality of neural network modules stored in the server (i.e. At block 604, the method 600 computes a global update for the neural network model based on the local updates from the subset of the multiple users. In some aspects, the global update is computed by aggregating the local updates received from the subset of the multiple user. In some aspects, the neural network model comprises multiple independent neural network models; para. [0070, 0082, 0090], the server aggregates client updates and updates parameters (weights) accordingly). 
Reisser does not explicitly teach selecting at least one neural network module from each of at least two groups of neural network modules to construct the at least one first machine learning model, and wherein different neural network modules in a same group of the at least two groups have a same function.
However, Benyahia teaches selecting at least one neural network module from each of at least two groups of neural network modules (i.e. At each of these nodes, a plurality of options for the functionality of that node may be provided … The options for each node may be considered to be represented as a series of candidate nodes clustered together at a selected location in the graph (e.g., in a region where the selected node would be in the final neural network) … two candidate nodes 21 a, 21 b are provided as candidates for the first node 21 of the second layer 20; para. [0075, 0078, 0079, 0082) to construct the at least one first machine learning model (i.e. The example method further comprises identifying a preferred model for the neural network comprising a selection of nodes, edges and associated weightings from the computational graph. The preferred model is identified based on an analysis of the first and second trained candidate models; para. [0026]), and wherein different neural network modules in a same group of the at least two groups have a same function (i.e. When which nodes to select for the network 100 are determined, a computational graph may be constructed. The computational graph comprises a plurality of nodes connected by a plurality of edges. Each node is configured to receive at least one item of input data from a preceding node connected to it via an edge. Each node is arranged to process input data by applying a weighting to the input data and performing an operation on the weighted input data to provide output data. Different nodes may perform different operations, e.g., they may have different functionalities; para. [0073, 0078]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Reisser to include the feature of Benyahia. One would have been motivated to make this modification because it improves neural network model selection/training when different model architectures may perform better for different data/tasks.

Claim 2: Reisser and Benyahia teach the method according to claim 1. Reisser further teaches wherein the plurality of neural network modules are configured to construct at least two second machine learning models (i.e. At block 604, the method 600 computes a global update for the neural network model based on the local updates from the subset of the multiple users. In some aspects, the global update is computed by aggregating the local updates received from the subset of the multiple user. In some aspects, the neural network model comprises multiple independent neural network models; para. [0090]), and the at least one first machine learning model is selected from the at least two second machine learning models (i.e. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models; para. [0092, 0094]).

Claim 7: Reisser and Benyahia teach the method according to claim 1. Reisser further teaches wherein the method further comprises: receiving a selector sent by the server, wherein the selector is a neural network configured to select, from the plurality of neural network modules, at least one neural network module that matches the data feature of the first data set (i.e. FIG. 7 is a flow diagram illustrating a method 700 for generating a personalized neural network model, according to aspects of the present disclosure. At block 702, the method 700 receives a neural network model from a server. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models. Each specialized neural network is associated with a subset of a first dataset. A client or user may receive a neural network model at a local device such as a smartphone, for example. As described, a mixture of experts may model a data set where different subsets of the data exhibit different relationships between input x and output y; para. [0092]); inputting training data into the selector based on the first data set, to obtain indication information output by the selector, wherein the indication information comprises a probability that each of the plurality of neural network modules is selected (i.e. the global parameters of an expert are trained using all data points assigned to that expert across all shards to enable learning more robust features. The robustness of the expert’s features may serve as conditions for the gating function rather than training an entirely separate model for pθs (x|s). Given a set of intermediary features hs(x) of expert k, a local vector πs ∈ ℝK, the global parameters of an expert are trained using all data points assigned to that expert across all shards to enable learning more robust features. The robustness of the expert’s features may serve as conditions for the gating function rather than training an entirely separate model for pθs (x|s). Given a set of intermediary features hs(x) of expert k, a local vector πs ∈ ℝK; para. [0085]), and the indication information indicates a neural network module that constructs at least one first neural network (i.e. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models. Each specialized neural network being associated with a subset of a first dataset; para. [0010]); and receiving, from the server, the neural network module that constructs the at least one first neural network (i.e. FIG. 7 is a flow diagram illustrating a method 700 for generating a personalized neural network model, according to aspects of the present disclosure. At block 702, the method 700 receives a neural network model from a server; para. [0092]).

Claim 12: Reisser teaches a machine learning model training method (i.e. federated learning involves collaborative training of neural network models across multiple users, without the need to gather the data at a central location; para. [0065]), performed by a server (i.e. a server; para. [0010]), the machine learning model training method comprises a plurality of rounds of iteration, and one of the plurality of rounds of iteration (i.e. multiple gradient updates for parameters w in the inner optimization of objective may be performed for each shard S, thus obtaining local models with parameters Ws. The multiple gradient updates may be referred to as local epochs such as an amount of data passes through the entire local data set, with an abbreviation of E. Each shard may then communicate data corresponding to the local model Ws to the server. In turn, the server updates the global model at round t by averaging the parameters of the local model. This may be referred to as federated averaging; para. [0070, 0089-0091], federated training process in which local updates are repeatedly produced and aggregated, and the server updates the global model “at round t”) comprises:
obtaining at least one first machine learning model (i.e. receives a neural network model from a server; para. [0092]) corresponding to a first client device, wherein the first client device is one of a plurality of client devices communicably coupled to the server (i.e. receiving a neural network model from a server. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models; para. [0010]), and the at least one first machine learning model corresponds to a data feature of a first data set stored in the first client device (i.e. At block 706, the method 700 selects one or more of the specialized models based in part on a characteristic associated with the local dataset. A client or user has a different set of parameters and may select which experts to use based on local data; para. [0093-0095], candidate models being selected using that local data characteristic);
sending the at least one first machine learning model to the first client device (i.e. FIG. 7 is a flow diagram illustrating a method 700 for generating a personalized neural network model, according to aspects of the present disclosure. At block 702, the method 700 receives a neural network model from a server. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models. Each specialized neural network is associated with a subset of a first dataset. A client or user may receive a neural network model at a local device such as a smartphone; para. [0092]), wherein the at least one first machine learning model indicates the first client device to perform a training operation on the at least one first machine learning model by using the first data set, to obtain at least one trained first machine learning model (i.e. At block 708, the method 700 generates a personalized model by fine tuning the neural network model based the selected one or more specialized models and the local dataset; para. [0070, 0095], fine-tuning/multiple gradient updates on the local dataset are training operations producing a trained local model); and
receiving, from the first client device, at least one updated neural network module comprised in the at least one trained first machine learning model, and updating weight parameters of a plurality of neural network modules stored based on the at least one updated neural network module (i.e. At block 604, the method 600 computes a global update for the neural network model based on the local updates from the subset of the multiple users. In some aspects, the global update is computed by aggregating the local updates received from the subset of the multiple users. In some aspects, the neural network model comprises multiple independent neural network models; para. [0070, 0082, 0090], the server aggregates client updates and updates parameters (weights) accordingly).
Reisser does not explicitly teach constructed by selecting at least one neural network module from each of at least two groups of neural network modules, wherein different neural network modules in a same group of the at least two groups have a same function
However, Benyahia teaches constructed by selecting (i.e. The example method further comprises identifying a preferred model for the neural network comprising a selection of nodes, edges and associated weightings from the computational graph. The preferred model is identified based on an analysis of the first and second trained candidate models; para. [0026]) at least one neural network module from each of at least two groups of neural network modules (i.e. At each of these nodes, a plurality of options for the functionality of that node may be provided … The options for each node may be considered to be represented as a series of candidate nodes clustered together at a selected location in the graph (e.g., in a region where the selected node would be in the final neural network) … two candidate nodes 21 a, 21 b are provided as candidates for the first node 21 of the second layer 20; para. [0075, 0078, 0079, 0082), wherein different neural network modules in a same group of the at least two groups have a same function (i.e. When which nodes to select for the network 100 are determined, a computational graph may be constructed. The computational graph comprises a plurality of nodes connected by a plurality of edges. Each node is configured to receive at least one item of input data from a preceding node connected to it via an edge. Each node is arranged to process input data by applying a weighting to the input data and performing an operation on the weighted input data to provide output data. Different nodes may perform different operations, e.g., they may have different functionalities; para. [0073, 0078]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Reisser to include the feature of Benyahia. One would have been motivated to make this modification because it improves neural network model selection/training when different model architectures may perform better for different data/tasks.

Claim 13: Reisser and Benyahia teach the method according to claim 12. Reisser further teaches wherein a plurality of neural network modules are stored in the server (i.e. the central server may aggregate the expert specific updates to generate a global update; para. [0082]) and configured to construct at least two second machine learning models (i.e. instead of learning a single global model, S individual models are learned; para. [0089]), and the at least one first machine learning model is selected from the at least two second machine learning models (i.e.  the method includes selecting one or more of the specialized models based on a characteristic associated with the local dataset; para [0010]).

Claim 15: Reisser and Benyahia teach the method according to claim 12. Reisser further teaches wherein the plurality of neural network modules stored in the server are neural network modules (i.e. The expert specific updates may be supplied to a central server; para. [0082]), and the method further comprises: receiving first identification information sent by the first client device (i.e. The method includes receiving a local update of the neural network model from a subset of multiple users. Each of the local updates is related to one or more subsets of a dataset and includes an indication of the one or more subsets of the dataset to which each local update relates; para. [0006]), wherein the first identification information is identification information of a first neural network or the a neural network module that constructs the first neural network (i.e. The method includes receiving a local update of the neural network model from a subset of multiple users. Each of the local updates is related to one or more subsets of a dataset and includes an indication of the one or more subsets of the dataset to which each local update relates; para. [0006]); and the sending the at least one first machine learning model to the first client device comprises: sending, to the first client device, the first neural network to which first identification information points, or the at least one neural network module that constructs the first neural network and to which the first identification information points (i.e. At block 606, the method 600 transmits the global update to the subset of the multiple users. For example, as described, the global parameters of an expert are trained using all data points assigned to that expert across all shards to enable learning more robust features; para. [0081, 0091]).

Claims 19-20 are similar in scope to Claims 1, 12 and are rejected under a similar rationale.


7.	Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, and further in view of Ghanta et al. (U.S. Patent Application Pub. No. US 20190377984 A1).

Claim 3: Reisser and Benyahia teach the method according to claim 2. Reisser further teaches (i.e. The method includes receiving a neural network model from a server. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models; para. [0010]) the first client device stores a first adaptation relationship (i.e. expert models learn to specialize in regions of the input space such that for a given expert, each client’s progress on that expert is aligned. Each client learns which experts are relevant for its shard or portion of the data; para. [0068]), the first adaptation relationship comprises a plurality of adaptation values (i.e. a mixture of experts may model a data set where different subsets of the data exhibit different relationships between input x and output y. Rather than training a single global model to fit this relationship at each client throughout the network, each expert k performs on a different subset of the input space. In some aspects, each expert may specialize on a region of the data set D; para. [0092]), and each of the adaptation values between the first data set and a second neural network module (i.e. expert models learn to specialize in regions of the input space such that for a given expert, each client’s progress on that expert is aligned. Each client learns which experts are relevant for its shard or portion of the data; para. [0068, 0085]); before the obtaining the at least one first machine learning model, the method further comprises: receiving the plurality of neural network modules sent by the server (i.e. block 702, the method 700 receives a neural network model from a server. The neural network model is collaboratively trainable across multiple clients via a set of specialized neural network models; para. [0092]); and the obtaining the at least one first machine learning model comprises: selecting the at least one neural network module from each of the at least two groups of neural network modules based on the first adaptation relationship (i.e. At block 706, the method 700 selects one or more of the specialized models based in part on a characteristic associated with the local dataset. A client or user has a different set of parameters and may select which experts to use based on local data; para. [0093-0095].
	Reisser does not explicitly teach the adaptation values indicates an adaptation degree, selecting the at least one neural network module from each of the at least two groups of neural network modules based on the first adaptation relationship, wherein the at least one neural network module comprises a neural network with a high adaptation value with the first data set.
	However, Ghanta teaches the first client device stores a first adaptation relationship, the first adaptation relationship comprises a plurality of adaptation values (i.e. A suitability score that satisfies (is equal to or greater than) a suitability threshold indicates that the training data set and the machine learning model trained with the training data set are suitable, accurate, or the like for the inference data set; para. [0095]), and each of the adaptation values indicates an adaptation degree between the first data set and a second neural network (i.e. A suitability score that satisfies (is equal to or greater than) a suitability threshold indicates that the training data set and the machine learning model trained with the training data set are suitable, accurate, or the like for the inference data set; para. [0095]); selecting the at least one neural network module from each of the at least two groups of neural network modules based on the first adaptation relationship, wherein the at least one neural network module comprises a neural network with a high adaptation value with the first data set (i.e. the action module 308 is configured to perform an action related to the machine learning system 200 in response to the suitability score satisfying (is equal to or less than) an unsuitability threshold, or in response to the suitability score not satisfying (e.g., is not equal to or greater than) a suitability threshold, which may be the same value as the unsuitability threshold. For example, the score module 306 may calculate a suitability score of 0.75 (or 0.25 unsuitability score) on a scale of 0 to 1, and if the unsuitability threshold is 0.3, or if the suitability threshold is 0.7, then the action module 308 may determine that the training data set, and by extension the machine learning model, is suitable for the particular inference data set; para. [0111, 0112]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Ghanta. One would have been motivated to make this modification because improves robustness by detecting mismatch between training characteristics and local data and using that to guide expert selection in a modular ensemble.

Claim 14: Reisser and Benyahia teach the method according to claim 13. Reisser further teaches (i.e. each of the experts may be implemented as a separate, independent artificial neural network; para. [0089]) the first client device stores a first adaptation relationship (i.e. a mixture of experts may model a data set where different subsets of the data exhibit different relationships between input x and output y. Rather than training a single global model to fit this relationship at each client throughout the network, each expert k performs on a different subset of the input space. In some aspects, each expert may specialize on a region of the data set D; para. [0092]), the first adaptation relationship comprises a plurality of adaptation values (i.e. expert models learn to specialize in regions of the input space such that for a given expert, each client’s progress on that expert is aligned. Each client learns which experts are relevant for its shard or portion of the data; para. [0068, 0085]), and the method further comprises: receiving an adaptation value that is between the first data set and at least one second neural network and that is sent by the first client device (i.e. The expert specific updates may be supplied to a central server; para. [0082]); and updating a second adaptation relationship (i.e. the central server may aggregate the expert specific updates to generate a global update; para. [0082]).
Reisser does not explicitly teach the first adaptation relationship comprises a plurality of adaptation values, and each of the adaptation values indicates an adaptation degree between the first data set and a second neural network; the obtaining the at least one first machine learning model comprises: selecting the at least one neural network module from each of the at least two groups of neural network module based on the second adaptation relationship, wherein the at least one neural network module comprises a neural network with a high adaptation value with the first data set.
However, Ghanta teaches the first adaptation relationship comprises a plurality of adaptation values, and each of the adaptation values indicates an adaptation degree between the first data set and a second neural network (i.e. A suitability score that satisfies (is equal to or greater than) a suitability threshold indicates that the training data set and the machine learning model trained with the training data set are suitable, accurate, or the like for the inference data set; para. [0095]); the obtaining the at least one first machine learning model comprises: selecting the at least one neural network module from each of the at least two groups of neural network module based on the second adaptation relationship, wherein the at least one neural network module comprises a neural network with a high adaptation value with the first data set (i.e. the action module 308 is configured to perform an action related to the machine learning system 200 in response to the suitability score satisfying (is equal to or less than) an unsuitability threshold, or in response to the suitability score not satisfying (e.g., is not equal to or greater than) a suitability threshold, which may be the same value as the unsuitability threshold. For example, the score module 306 may calculate a suitability score of 0.75 (or 0.25 unsuitability score) on a scale of 0 to 1, and if the unsuitability threshold is 0.3, or if the suitability threshold is 0.7, then the action module 308 may determine that the training data set, and by extension the machine learning model, is suitable for the particular inference data set; para. [0111, 0112]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Ghanta. One would have been motivated to make this modification because improves robustness by detecting mismatch between training characteristics and local data and using that to guide expert selection in a modular ensemble.

8.	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, Ghanta, and further in view of Shen et al. (U.S. Patent Application Pub. No. US 20210393229 A1).

Claim 4: Reisser, Benyahia, and Ghanta teach the method according to claim 3. Reisser further teaches a first loss function (i.e. Such clients may perform a series of mini-batch gradient updates with the data from their shard Ds on a local loss function, which may involve each client moving in possibly different directions in the parameter space; para. [0045, 0081]); and the first loss function indicates a similarity between a prediction result of first data and a correct result of the first data (i.e. an error may be calculated between the output 222 and a target output. The target output is the ground truth of the image 226 (e.g., “sign” and “60”); para. [0045]), the prediction result of the first data is obtained based on the second neural network (i.e. a forward pass may then be computed to produce an output 222; para. [0042-0044]), and the first data and the correct result of the first data are obtained based on the first data set (i.e. federated learning involves learning a server model with parameters W such as a neural network with a data set of N data points D = {(x1,y1), ..., (xN,yN)} that is distributed across shards S or portions; para. [0069, 0070]).
Reisser does not explicitly teach wherein an adaptation value corresponds to a function value of a first loss function, and a smaller function value of the first loss function indicates a larger adaptation value between the first data set and the second network.
However, Shen teaches wherein an adaptation value between the first data set and the second neural network corresponds to a function value of a first loss function, and a smaller function value of the first loss function indicates a larger adaptation value between the first data set and the second neural network module (i.e. the best checkpoint model with the smallest validation loss is selected as final model 120; para. [0046]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser, Benyahia, and Ghanta to include the feature of Shen. One would have been motivated to make this modification because provides a concrete for quantifying model to data fit and selecting the most suitable candidate model for a client dataset.

9.	Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, Ghanta, and further in view of Szeto et al. (U.S. Patent Application Pub. No. US 20180018590 A1).

Claim 5: Reisser, Benyahia, and Ghanta teach the method according to claim 3. Reisser does not explicitly teach wherein an adaptation value between the first data set and the second neural network module corresponds to a first similarity between the second neural network module and a third neural network, a larger first similarity indicates a larger adaptation value between the first data set and the second neural network module and the third neural network is a neural network with highest accuracy of outputting a prediction result in a previous round of iteration.
However, Szeto teaches wherein an adaptation value between the first data set and the second neural network module corresponds to a first similarity between the second neural network module and a third neural network (i.e. The similarity between trained proxy model 270 and trained actual model 240 can be measured through various techniques by modeling engine 226 calculating model similarity score 280 as a function of proxy model parameters 275 and actual model parameters 245. The resulting model similarity score 280 is a representation of how similar the two models are, at least to within similarity criteria; para. [0079]), a larger first similarity indicates a larger adaptation value between the first data set and the second neural network module and the third neural network is a neural network with highest accuracy of outputting a prediction result in a previous round of iteration (i.e. Operation 680, similar to operation 560 of FIG. 6, includes the global modeling engine calculating a model similarity score of the trained proxy model relative to the trained actual model(s) as a function of the proxy model parameters and the actual proxy model parameters. The actual proxy model parameters can be obtained along with the salient private data features as discussed with respect to operation 640 or could be obtained upon sending a request to the proxy data's modeling engine. Should the model similarity score fail to satisfy similarity requirements, then the global modeling engine can repeat operations 660 through 680 until a satisfactorily similar trained proxy model is generated; para. [0118]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser, Benyahia, and Ghanta to include the feature of Szeto. One would have been motivated to make this modification because provides a concrete way to quantify how closely a candidate neural network matches a reference neural network and to use that similarity as an adaptation value for selecting candidate neural networks.

Claim 6: Reisser, Benyahia, Ghanta, and Szeto teach the method according to claim 5. Reisser does not explicitly teach wherein the first similarity between the second neural network module and the third neural network is determined based on: inputting same data to the second neural network module and the third neural network, and comparing a similarity between output data of the second neural network and output data of the third neural network; or calculating a similarity between a weight parameter matrix of the second neural network module and a weight parameter matrix of the third neural network.
However, Szeto further teaches wherein the first similarity between the second neural network module and the third neural network is determined based on: inputting same data to the second neural network module and the third neural network, and comparing a similarity between output data of the second neural network and output data of the third neural network; or calculating a similarity between a weight parameter matrix of the second neural network module and a weight parameter matrix of the third neural network (i.e. Operation 680, similar to operation 560 of FIG. 6, includes the global modeling engine calculating a model similarity score of the trained proxy model relative to the trained actual model(s) as a function of the proxy model parameters and the actual proxy model parameters. The actual proxy model parameters can be obtained along with the salient private data features as discussed with respect to operation 640 or could be obtained upon sending a request to the proxy data's modeling engine. Should the model similarity score fail to satisfy similarity requirements, then the global modeling engine can repeat operations 660 through 680 until a satisfactorily similar trained proxy model is generated; para. [0118]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser, Benyahia, and Ghanta to include the feature of Szeto. One would have been motivated to make this modification because provides a concrete way to quantify how closely a candidate neural network matches a reference neural network and to use that similarity as an adaptation value for selecting candidate neural networks.

10.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, and further in view of Vierck et al. (U.S. Patent Application Pub. No. US 20230316072 A1).

Claim 8: Reisser and Benyahia teach the method according to claim 1. Reisser further teaches wherein (i.e. neural network model; para. [0089]) the plurality of neural network modules stored in the server are neural network modules (i.e. Each of the K experts specialized on a region of the input dataset D. Each of the experts may be implemented as a separate, independent artificial neural network, for example. Each of the K experts may correspond to one or more of the S models; para. [0082, 0089]), and wherein after the obtaining the at least one first machine learning model, the method further comprises: calculating an adaptation value between the first data set and each of at least one first neural network (i.e. A gating function controls selection of an expert for given data point of the input dataset; para. 0089]), wherein the first data set comprises a plurality of pieces of first training data (i.e. A mixture of expert models for data point (x, y); para. [0072]).
Reisser does not explicitly teach a larger adaptation value between the first training data and each of the at least one first neural network indicates a greater degree of modification of a weight parameter of the first neural network in a process of training the first neural network by using the first training data.
However, Vierck teaches a larger adaptation value between the first training data and each of the at least one first neural network indicates a greater degree of modification of a weight parameter (i.e. sample weighting can include telling the model to increase or decrease an amount of loss produced by a given piece of example data … the computed loss values are multiplied by the weights to determine with a set of final loss values; para. [0042, 0043]) of the first neural network in a process of training the first neural network by using the first training data (i.e. the new set of computed loss values to the set of weights such the set of weights is changed from the first state into a second state; para. [0011]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Vierck. One would have been motivated to make this modification because it provides predictable mechanisms for controlling the influence of individual training samples on parameter updates.

11.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, Vierck, and further in view of Balachandar et al. (U.S. Patent Application Pub. No. US 20210049473 A1).

Claim 9: Reisser, Benyahia, and Vierck teach the method according to claim 8. Reisser further teaches wherein the calculating an adaptation value between the first data set and each of the at least one first neural network comprises: clustering the first data set to obtain at least two data subsets, wherein a first data subset is a subset of the first data set, and the first data subset is one of the at least two data subsets (i.e. The mixture of experts may model a dataset where different subsets of the data exhibit different relationships between input x and output y; para. [0073, 0074]); and generating an adaptation value between the first data subset and one first neural network based on the first data subset and a first loss function (i.e. This objective corresponds to empirical risk minimization over the joint data set D with a loss L(·) for each data point; para. [0070]), wherein the first loss function indicates a similarity between a prediction result of first data and a correct result of the first data, the prediction result of the first data is obtained based on the first neural network (i.e. The mixture of experts may model a dataset where different subsets of the data exhibit different relationships between input x and output y; para. [0070, 0072, 0073]), the first data and the correct result of the first data are obtained based on the first data subset, and the adaptation value between the first data subset and the first neural network is determined as an adaptation value between each piece of data in the first data subset and the first neural network (i.e. The gating function models the decision boundary between input regions, assigning data points from subsets of the input region to their respective experts; para. [0070-0073]).
Reisser does not explicitly teach wherein a smaller function value of the first loss function indicates a larger adaptation value between the first data subset and the first neural network.
However, Balachandar teaches wherein a smaller function value of the first loss function indicates a larger adaptation value between the first data subset and the first neural network (i.e. some embodiments terminate model learning early, if an amount of iterations or epochs pass without an improvement in validation loss (e.g., model learning terminates if 4000 iterations and/or 20 epochs pass without an improvement in validation loss); para. [0042]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser, Benyahia, and Vierck to include the feature of Balachandar. One would have been motivated to make this modification because it ensures the adaptation value directly reflects real predictive quality of the neural network on the client’s data.

12.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, and further in view of Fukuda et al. (U.S. Patent Application Pub. No. US 20200034703 A1).

Claim 10: Reisser and Benyahia teach the method according to claim 1. Reisser further teaches wherein the plurality of neural network modules stored in the server are neural network modules (i.e. Each of the experts may be implemented as a separate, independent artificial neural network; para. [0072]), and the performing a training operation on the at least one first machine learning model by using the first data set (i.e. multiple gradient updates for parameters w in the inner optimization of objective may be performed for each shard S, thus obtaining local models with parameters Ws; para. [0070]) comprises: performing a training operation on each of the at least one neural network module based on a second loss function by using the first data set (i.e. multiple gradient updates for parameters w in the inner optimization of objective may be performed for each shard S, thus obtaining local models with parameters Ws; para. [0070]), wherein the first data set comprises a plurality of pieces of first training data (i.e. A mixture of expert models for data point (x, y); para. [0072]), wherein the second loss function indicates a similarity between a first prediction result and a correct result of the first training data (i.e. an error may be calculated between the output 222 and a target output; para. [0045, 0046]); the first prediction result is a prediction result that is of the first training data and that is output by the neural network module after the first training data is input into the neural network module (i.e. During training, the DCN 200 may be presented with an image, such as the image 226 of a speed limit sign, and a forward pass may then be computed to produce an output 222; para. [0042]).
Reisser does not explicitly teach wherein the second loss function further indicates a similarity between the first prediction result and a second prediction result; the second prediction result is a prediction result that is of the first training data and that is output by a fourth neural network after the first training data is input into the fourth neural network; and the fourth neural network is a first neural network on which no training operation is performed.
However, Fukuda teaches wherein the second loss function further indicates a similarity between the first prediction result and a second prediction result (i.e. the student training section 150 may train the student neural network, at block 340, such that soft label errors between (1) a soft label output generated by the student neural network in response to receiving the teacher input data (e.g., Input Data 1) and (2) the soft label output generated by the selected teacher neural network (e.g., Teacher NN1) in response to receiving the same teacher input data, is minimized; para. [0050, 0053]); the second prediction result is a prediction result that is of the first training data and that is output by a fourth neural network after the first training data is input into the fourth neural network (i.e. inputting the input data into the teacher neural network, comparing an output data (e.g., a soft label output) of the teacher neural network with the corresponding correct training data; para. [0038]); and the fourth neural network is a first neural network on which no training operation is performed (i.e. The student training section 150 may train a student neural network. The student training section 150 may train the student neural network with at least the teacher input data and the plurality of soft label outputs output from the plurality of teacher neural networks; para. [0025]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Fukuda. One would have been motivated to make this modification because it provides predictable mechanisms for controlling the influence of individual training samples on parameter updates.

13.	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, and further in view of Malik et al. (U.S. Patent Application Pub. No. US 20240112008 A1).

Claim 11: Reisser and Benyahia teach the method according to claim 7. Reisser further teaches wherein the first data set comprises a plurality of pieces of first training data (i.e. A mixture of expert models for data point (x, y);para. [0072]) and a correct result of each piece of first training data (i.e. The target output is the ground truth of the image 226 (e.g., “sign” and “60”); para. [0045]), and the method further comprises: wherein the selector is a neural network configured to select, from the plurality of neural network modules, at least one first neural network module that matches the data feature of the first data set (i.e. A gating function controls selection of an expert for given data point of the input dataset D; para. [0073, 0074, 0089]); the performing a training operation on the at least one first machine learning model by using the first data set comprises: inputting the first training data into the selector to obtain the indication information output by the selector, wherein the indication information comprises the probability that each of the plurality of neural network modules is selected, and the indication information indicates the neural network module that constructs the first neural network (i.e. the robustness of the expert’s features may serve as conditions for the gating function rather than training an entirely separate model for pθs (x|s). Given a set of intermediary features hs(x) of expert k, a local vector πs ∈ ℝK, with which the intermediate features are averaged before applying a linear transformation to compute the input to the softmax gates, which may scale with the number of experts, where θs = (πs, As, bs) are local learnable parameters and SM represents the softmax function; para. [0072, 0073, 0085]); obtaining, based on the plurality of neural network modules, the indication information and the first training data, a prediction result of the first training data and output by the first neural network (i.e. Each of the experts may be implemented as a separate, independent artificial neural network, for example. Each of the experts may be trained to determine a prediction for its designated region; para. [0072, 0073, 0089]); performing a training operation on the first neural network and the selector based on a third loss function (i.e. Fine-tuning may be performed by optimizing equation 5 for a small number of epochs (e.g., E = 1) with respect to w1:K, ϕ, and θs; para. [0080]), wherein the third loss function indicates a similarity between the prediction result of the first training data and a correct result (i.e. an error may be calculated between the output 222 and a target output; para. [0045, 0073, 0074]), and further indicates a dispersion degree of the indication information (i.e. to avoid prematurely pruning of experts and preserve model capacity, a marginal entropy term in the server H(Ep(y)[qϕ(z|y)]) may be included as a regularizer that encourages using all of the experts; para. [0079]).
Reisser does not explicitly teach receiving the selector sent by the server, sending a trained selector to the server.
However, Malik teaches receiving the selector sent by the server, sending a trained selector to the server (fig. 8, in a particular round of model training, server 510 may select a subset of client systems 130 including client system 130a to train the global neural network model 820 together with the respective local personalization models 820 of each selected client system 130. Client system 130a may then receive a current version of the global neural network model 820a from server 510. Client system 130a may then retrieve the plurality of examples 530a from the local data store of client system 130a. Client system 130a may then train the received global neural network model 820a together with the local personalization model 830a on the pluralities of examples 530a to generate a plurality of updated federated model parameters and a plurality of updated local model parameters. Client system 130a may then store in the local data store the trained local personalization model 830a including the updated local model parameters. Client system 130a may then send the trained global neural network model 820a including the updated federated model parameters to server 510 without sending any of the examples 530a to server 510; para. [0119]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Malik. One would have been motivated to make this modification because it provides predictable mechanisms for controlling the influence of individual training samples on parameter updates.

14.	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, and further in view of Chen et al. (A Practical Algorithm for Distributed Clustering and Outlier Detection, arXiv, published 2018, pages 1-15), and further in view of Tang et al. (Input partitioning to mixture of experts, IEEE, published 2002, pages 227-232).

Claim 16: Reisser and Benyahia teach the method according to claim 12. Reisser further teaches wherein (i.e. each of the experts may be implemented as a separate, independent artificial neural network; para. [0089]) the plurality of neural network modules stored in the server are neural network modules (i.e. The expert specific updates may be supplied to a central server; para. [0082]), the server is further configured with a selector (i.e. A gating function controls selection of an expert for given data point of the input dataset D; para. [0072]), and the method further comprises: determining, based on the indication information, a neural network module that constructs at least one first neural network (i.e. A mixture of experts may include a set of K experts. Each of the K experts specialized on a region of the input dataset D. A gating function controls selection of an expert for given data point of the input dataset D. Each of the experts may be implemented as a separate, independent artificial neural network, for example. Each of the experts may be trained to determine a prediction for its designated region. Thus the gating function determines for each data point of input dataset D, an expert for determining a prediction for the data point; para. [0072]), wherein the indication information comprises a probability that each of the plurality of is selected (i.e. A mixture of expert models for data point (x, y) may be described by: probability equation, where z is a categorical variable that denotes the expert, wk are the parameters of the expert k, and θ are the parameters of the gating function; para [0072]); and the sending the at least one first machine learning model to the first client device comprises: sending, to the first client device, the neural network module that constructs the at least one first neural network (i.e. FIG. 7 is a flow diagram illustrating a method 700 for generating a personalized neural network model, according to aspects of the present disclosure. At block 702, the method 700 receives a neural network model from a server; para. [0092]).
Reisser does not explicitly teach receiving at least one clustering center sent by the first client device; and performing a clustering operation on the first data set to obtain at least one data subset, wherein one clustering center in the at least one clustering center is a clustering center of one data subset in the at least one data subset; the obtaining the at least one first machine learning model corresponding to the first client device comprises: inputting the clustering center into the selector to obtain indication information output by the selector.
However, Chen teaches receiving at least one clustering center sent by the first client device (i.e. Each site constructs a summary of the local dataset using the k-means++ algorithm, and sends it to the coordinator; pages 1-11); and performing a clustering operation on the first data set to obtain at least one data subset, wherein one clustering center in the at least one clustering center is a clustering center of one data subset in the at least one data subset (i.e. Each site constructs a summary of the local dataset using the k-means++ algorithm, and sends it to the coordinator. The coordinator feeds the unions all summaries to k-means-- for a second level clustering; pages 1-11).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Reisser to include the feature of Chen. One would have been motivated to make this modification because it reduces communication and avoids direct access to client raw data while still capturing dataset structure.
However, Tang teaches inputting the clustering center into the selector to obtain indication information output by the selector (i.e. the Potential Function method as applied here defines a region ‘center’ point in terms of corresponding SOM node. Each instance of these nodes is forwarded to the gating network of the MoE to provide the ‘global’ view necessary to establish expert interaction; Section II, pages 227-228), and determining, based on the indication information, a neural network module that constructs at least one first neural network (i.e. The network is composed of K experts and one gating network. Each expert is composed of M input nodes and one output node. The gating network is composed of M input nodes, and K output nodes, such that there is a single output for every expert; Section III, page 228).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser, Benyahia, and Chen to include the feature of Tang. One would have been motivated to make this modification because it improves expert/module selection using partition-centroid signals in a MoE gating mechanism.

15.	Claims 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Reisser in view of Benyahia, and further in view of Liu et al. (U.S. Patent Application Pub. No. US 20170364799 A1).

Claim 17: Reisser and Benyahia teach the method according to claim 12. Reisser further teaches wherein the (i.e. each of the experts may be implemented as a separate, independent artificial neural network; para. [0089]) plurality of neural network modules stored in the server are neural network modules (i.e. The expert specific updates may be supplied to a central server; para. [0082]), one neural network is divided into at least two submodules (i.e. the DCN 200 may include a feature extraction section and a classification section; para. [0042]), the neural network modules stored in the server are divided into at least two groups corresponding to at least two submodules, and different neural network modules in a same group have a same function (i.e. each of the experts may be implemented as a separate, independent artificial neural network; para. [0089]), and wherein after the updating weight parameters of the plurality of neural network modules stored based on the at least one updated neural network module (i.e. Between each layer 356, 358, 360, 362, 364 of the deep convolutional network 350 are weights (not shown) that are to be updated; para. [0056, 0063]).
Reisser does not explicitly teach calculating a similarity between neural network modules in at least two neural network modules comprised in a same group, and combining each two of the neural network modules with similarity greater than a preset threshold.
However, Liu teaches calculating a similarity between neural network modules in at least two neural network modules comprised in a same group (i.e. based on the learnable parameters, the simplifying module 160 judges whether the operation executed by a first neuron can be merged into the operation executed by a second neuron. Once the first neuron is merged, one or more neuron connections connected to the first neuron is abandoned accordingly. The simplified neural network 200 in FIG. 2(C) is re-drawn in FIG. 3(A) as an example. First, based on the records in the memory 150, the simplifying module 160 tries to find out at least two weights conforming to both the following requirements: (1) corresponding to the same rear artificial neuron, and (2) having values close to each other (e.g. their difference falls in a predetermined small range); para. [0035]), and combining (i.e. the simplifying module 160 can merge the operation executed by the artificial neuron 121 into the operation executed by the artificial neuron 122; para. [0036]) each two of the neural network modules with similarity greater than a preset threshold (i.e. the simplifying module 160 further judges whether all the weights utilized in the computation of the preceding artificial neurons corresponding to the weights w4 and w5 are lower than a threshold T′; para. [0036]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Liu. One would have been motivated to make this modification because it reduces redundancy among same-function modules, thereby lowering memory footprint and compute cost.

Claim 18: Reisser, Benyahia, and Liu teach the method according to claim 17. Reisser further teaches wherein the different neural network modules comprise a second neural network module and a first neural network module (i.e. each of the experts may be implemented as a separate, independent artificial neural network; para. [0089]).
Reisser does not explicitly teach inputting same data to the second neural network module and the first neural network module, and comparing a similarity between output data of the second neural network module and output data of the first neural network module; or calculating a similarity between a weight parameter matrix of the second neural network module and a weight parameter matrix of the first neural network module.
However, Liu further teaches inputting same data to the second neural network module and the first neural network module, and comparing a similarity between output data of the second neural network module and output data of the first neural network module; or calculating a similarity between a weight parameter matrix of the second neural network module and a weight parameter matrix of the first neural network module (i.e. based on the learnable parameters, the simplifying module 160 judges whether the operation executed by a first neuron can be merged into the operation executed by a second neuron. Once the first neuron is merged, one or more neuron connections connected to the first neuron is abandoned accordingly. The simplified neural network 200 in FIG. 2(C) is re-drawn in FIG. 3(A) as an example. First, based on the records in the memory 150, the simplifying module 160 tries to find out at least two weights conforming to both the following requirements: (1) corresponding to the same rear artificial neuron, and (2) having values close to each other (e.g. their difference falls in a predetermined small range); para. [0035, 0036]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Reisser and Benyahia to include the feature of Liu. One would have been motivated to make this modification because it reduces redundancy among same-function modules, thereby lowering memory footprint and compute cost.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. 
Wu et al. (Pub. No. US 11748615 B1), a computer-implemented system includes a differentiable neural architecture search (DNAS) engine executing on one or more processors. The DNAS engine is configured with a stochastic super net defining a layer-wise search space having a plurality of candidate layers, each of the candidate layers specifying one or more operators for a neural network architecture.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art.  In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAN TRAN whose telephone number is (303)297-4266.  The examiner can normally be reached on Monday - Thursday - 8:00 am - 5:00 pm MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached on 571-270-3264.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TAN H TRAN/Primary Examiner, Art Unit 2141
Read full office action
Prosecution Timeline

Mar 17, 2023
Application Filed
May 02, 2023
Response after Non-Final Action
Jan 28, 2026
Non-Final Rejection mailed — §103
Apr 28, 2026
Response Filed
Jun 16, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/414,972
Patent 12682274
MODEL INTEGRATION APPARATUS, MODEL INTEGRATION METHOD, COMPUTER-READABLE STORAGE MEDIUM STORING A MODEL INTEGRATION PROGRAM, INFERENCE SYSTEM, INSPECTION SYSTEM, AND CONTROL SYSTEM
5y 0m to grant Granted Jul 14, 2026
17/654,824
Patent 12682621
META-LEARNING MODEL TRAINING BASED ON CAUSAL TRANSPORTABILITY BETWEEN DATASETS
4y 4m to grant Granted Jul 14, 2026
17/740,770
Patent 12682279
REINFORCEMENT MACHINE LEARNING FRAMEWORK FOR DYNAMIC DEMAND FORECASTING
4y 2m to grant Granted Jul 14, 2026
17/205,643
Patent 12675710
SYSTEMS AND METHODS FOR AUTOMATED ALERT PROCESSING
5y 3m to grant Granted Jul 07, 2026
17/518,059
Patent 12675679
CROSSBAR CIRCUIT FOR UNALIGNED MEMORY ACCESS IN NEURAL NETWORK PROCESSOR
4y 8m to grant Granted Jul 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
93%
With Interview (+32.6%)
3y 6m (~2m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 315 resolved cases by this examiner. Grant probability derived from career allowance rate.