Office Action Analysis: 17812910 — ACTIVE LEARNING FOR HIGH-COST TRAJECTORY COLLECTION FOR MOBILE EDGE DEVICES

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/23/2026 has been entered.
Claims 1, 7, 9, 11, 17, and 19 have been amended. Claims 1-20 are pending and have been examined.
 Information Disclosure Statement
The information disclosure statement (IDS) submitted on 1/26/2026 is in compliance with the provisions of 37 CFR 1.97, 1.98, and MPEP § 609. It has been placed in the application file, and the information referred to therein has been considered as to the merits.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-11, 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (U.S. Patent No. US 12094451 B1), hereinafter “Zhang” in view of He et al. “Federated Learning for Internet of Things”, hereinafter “He”, further in view of Gal et al. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, hereinafter “Gal”, further in view of Mirus et al. “Detection of abnormal driving situations using distributed representations and unsupervised learning”, hereinafter “Mirus”.

Regarding Claim 1, Zhang teaches:
A method comprising:
monitoring performance of a model deployed on nodes in a first location… (col. 5, lines 51-53, “The cohort devices 115 may determine (140) observation data based on operation of the first trained model” and lines 42-45, “the cohort devices 115 may operate a first trained model to perform some function. The function may be any function for which the model may be trained for”)
sending, from the first location, reconstruction errors of the models to a central node, wherein the central node receives… errors from nodes in other locations, wherein errors from all of the locations are stored at the central node as aggregate errors (The system is central node, variance is errors because it represents differences of operation of a model including differences that have a negative impact on performance, Fig 4, col. 6, lines 22-26, “The cohort device 125 may also send (148) the cohort variance data 147 to the system 120. The system 120 may then use the cohort variance data 147, as well as cohort variance data from one or more (or all) other cohorts, to determine multi-cohort variance data 149”, col .4, lines 12-13, “a cohort may include a group of devices all local to a particular location”)
collecting sample data from the nodes in the first location and the other locations in response to the aggregate errors exceeding a threshold error level (Determining a number of training steps for a local model based on cohort variance data is exceeding a threshold error level, col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”, after model update more sample data is collected, col. 7, lines 22-24, “After some period of time of use of the updated first trained model, the cohort devices 115 may determine new observation data”)
performing, at the central node, an uncertainty operation on the sample data… determining a standard deviation of the model outputs for each of the locations, wherein a near edge node corresponding to the location having a highest standard deviation is to identified as a near edge node with a highest uncertainty (Cohort device is near edge node, cohort device has a highest uncertainty if determined to need training, standard deviation contributes to variance in a cohort which is used to determine highest uncertainty, col. 7, lines 39-40, “The variance may correspond to the square of the standard deviation σ from the mean” and lines 50-52, “Thus, σ.sub.m.sup.2 represents the intra-client uncertainty of the particular parameter θ for device m”, col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”)
collecting training data from the identified near edge node with the highest uncertainty (Variance data is training data, col. 13, lines 46-52, “FIG. 5 is a flowchart illustrating operations 500 of training of an updated first trained model… The system may then receive (505) cohort variance data 147 from one or more cohorts of the system.”, col. 13, lines 65-67, “The system 120 may then determine (160) an updated first trained model using the cohort variance data and/or cohort model data”, cohort variance data is collected from one or more cohorts which includes the identified near edge node with highest uncertainty);
retraining the model using at least the collected training data to produce an updated model (col. 7, lines 3-8, “The system 120 may also receive cohort variance data for other parameters of the first trained model along with other data (e.g., gradient data) the system 120 may use to re-train the global model.”); and
deploying the updated model to all nodes in all of the locations (col. 14, lines 19-20, “The system 120 may then send (162) the updated first trained model 159 to various devices 110 including the cohort device 125, cohort devices 115, other devices not illustrated, etc.”)
Zhang does not expressly teach:
…the model being an autoencoder configured to… based on a reconstruction loss of the model
…detect cornering events…
…reconstruction error…
the uncertainty operation including:
performing multiple trial runs of the model on the sample data while randomly dropping different neurons of the model in each trial run to generate a distribution of model outputs 
However, He teaches:
…the model being an autoencoder configured to… based on a reconstruction loss of the model (He, p. 415, col. 2, ¶4, “autoencoder focuses on the reconstruction of the input data”, p.415, col. 2, ¶4, “This loss is also called reconstruction error calculated by mean square error”)
 and …reconstruction error…(He, p.415, col. 2, ¶4, “This loss is also called reconstruction error calculated by mean square error”, reconstruction error of the claims is equivalent to observation data in Zhang because Zhang’s observation data is any data relating to operation of a model, Zhang, p. 20 col. 5, lines 51-53, , “The cohort devices 115 may determine (140) observation data based on operation of the first trained model”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use an autoencoder model when monitoring performance as does He, in the invention of Zhang. The motivation to do so would be for models to evaluate simply and effectively (He, p.415, col. 2, ¶3, “We apply Deep Autoencoder [17] as the model for anomaly detection. Deep Autoencoder is simple but effective and does not lose the generality to evaluate FL algorithms and our FedIot platform”).
Zhang in view of He does not expressly teach:
…detect cornering events… 
the uncertainty operation including:
performing multiple trial runs of the model on the sample data while randomly dropping different neurons of the model in each trial run to generate a distribution of model outputs 
However, Gal teaches: 
the uncertainty operation including: 
performing multiple trial runs of the model on the sample data while randomly dropping different neurons of the model in each trial run to generate a distribution of model outputs  (Gal, p.3, col. 1, ¶3, “With dropout, we sample binary variables for every input point and for every network unit in each layer (apart from the last one). Each binary variable takes value 1 with probability pi for layer i. A unit is dropped (i.e. its value is set to zero) for a given input if its corresponding binary variable takes value 0.”, p.4, col. 1, ¶2, “We will perform moment-matching and estimate the first two moments of the predictive distribution empirically… We refer to this Monte Carlo estimate as MC dropout. In practice this is equivalent to performing T stochastic forward passes through the network and averaging the results”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Bayesian deep learning metrics when finding uncertainty as does Gal, in the variance data of Zhang. The motivation to do so would be to represent uncertainty without sacrificing computational complexity or test accuracy (Gal, p.1, col. 1, Abstract, “This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.”). This would contribute to helping better identify what models need training in Zhang and variance data can include any data representing operation of a model (Zhang, p.20, col. 6, lines 1-3, The device variance data 143 may also include other data representing operation of the first machine learning model by a device 110).
Zhang in view of He and Gal does not expressly teach:
…detect cornering events… 
However, Mirus teaches:
…detect cornering events… (Mirus, p. 365, ¶1, “detect abnormal driving situations”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the dataset and purpose of Mirus with the model updating of Zhang. The motivation to do so would be to apply Zhang’s improvements in federated learning to the domain of vehicle abnormal event detection.

	Regarding Claim 3, Zhang in view of He, Gal, and Mirus teaches the method of Claim 1 as referenced above. In the combination as set forth above, Zhang teaches:
	wherein the uncertainty operation is performed separately on… samples and… samples collected at each location, and a combined uncertainty is determined for each location (Zhang, col. 4, lines 12-13, “a cohort may include a group of devices all local to a particular location”, col. 12, lines 13-14, “the cohort device 125 may calculate (304) the variance data based on the observation data.”).
In the combination as set forth above, Mirus teaches:
	… positional data… and …inertial data… (Mirus, p. 364, ¶1, “variety of features such as position, velocity and acceleration”)

	Regarding Claim 4, Zhang in view of He, Gal, and Mirus teaches the method of Claim 3 as referenced above. In the combination as set forth above, Gal teaches:
	wherein the uncertainty operation includes performing at least twenty dropout trial runes of the autoencoder (Gal, “We scatter 100 stochastic forward passes”), wherein in each dropout trial run a different subset of neurons is omitted (Gal, “we sample T sets of vectors of realisations from the Bernoulli distribution {z t 1 , ..., z t L } T t=1 with z t i = [z t i,j ] Ki j=1, giving {Wt 1 , ...,Wt L } T t=1….We refer to this Monte Carlo estimate as MC dropout. In practice this is equivalent to performing T stochastic forward passes through the network and averaging the results”), and wherein the standard deviation of the model output is computed across the dropout trial runs (Gal, Equation 8 ½τ||y – y^t||^2 shows standard deviation).

Regarding Claim 5, Zhang in view of He, Gal, and Mirus teaches the method of Claim 3 as referenced above. In the combination as set forth above, Gal teaches:
wherein the standard deviation is calculated across reconstruction losses output by the autoencoder for the sample data… (Gal, “we sample T sets of vectors of realisations from the Bernoulli distribution {z t 1 , ..., z t L } T t=1 with z t i = [z t i,j ] Ki j=1, giving {Wt 1 , ...,Wt L } T t=1….We refer to this Monte Carlo estimate as MC dropout. In practice this is equivalent to performing T stochastic forward passes through the network and averaging the results”, Gal, Equation 8 ½τ||y – y^t||^2 shows standard deviation).
In the combination as set forth above, Zhang teaches:
…from each location (Zhang, col. 4, lines 12-13, “a cohort may include a group of devices all local to a particular location”, col. 12, lines 13-14, “the cohort device 125 may calculate (304) the variance data based on the observation data.”, col. 6, lines 1-3, “The device variance data 143 may also include other data representing operation of the first machine learning model by a device 110”, each location will have model performing operations)

	Regarding Claim 6, Zhang in view of He, Gal, and Mirus teaches the method of Claim 5 as referenced above. In the combination as set forth above, Zhang in view of He, Gal, and Mirus teaches:
wherein the near edge node with the highest uncertainty corresponds to the location having the greatest deviation from baseline reconstruction loss distribution determined during training of the autoencoder (In the combination set above a near edge node is identified with cohort variance data for training which is determined with the uncertainty operation involving deviation derived from reconstruction loss, Zhang,  col. 6, lines 1-3, “The device variance data 143 may also include other data representing operation of the first machine learning model by a device 110”, col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”).

Regarding Claim 7, Zhang in view of He, Gal, and Mirus teaches the method of Claim 1 as referenced above. Zhang further teaches:
further comprising collecting a smaller amount of training data from nodes of other near edge nodes associated with the other locations, wherein the smaller amount of training data is smaller than the collected data (other near edge node collected data can be smaller than previous collected sample and training data, Zhang, col. 4, lines 6-7, “A cohort may include a collection of devices (or in some cases a single device)”, col .6, lines 22-26, “The cohort device 125 may also send (148) the cohort variance data 147 to the system 120. The system 120 may then use the cohort variance data 147, as well as cohort variance data from one or more (or all) other cohorts, to determine multi-cohort variance data 149”).  

Regarding Claim 8, Zhang in view of He, Gal, and Mirus teaches the method of Claim 7 as referenced above. Zhang further teaches:
wherein the central node is configured to store the collected data (Zhang, col. 6, lines 22-26, “The cohort device 125 may also send (148) the cohort variance data 147 to the system 120. The system 120 may then use the cohort variance data 147, as well as cohort variance data from one or more (or all) other cohorts, to determine multi-cohort variance data 149”).  

Regarding Claim 9, Zhang in view of He, Gal, and Mirus teaches the method of Claim 8 as referenced above. Zhang further teaches:
further comprising retraining the model using newly obtained training data and original training data, which was used previously to train the model, to generate a new model (Model is retrained using newly obtained variance data which is based on variance data of a previous retraining, col. 7, lines 4-11, “The system 120 may also receive cohort variance data for other parameters of the first trained model along with other data (e.g., gradient data) the system 120 may use to re-train the global model. The system 120 may then determine (160) an updated first trained model (e.g., an updated global model) based on the cohort variance data using federated learning approaches.”, col. 14, lines 19-26, “The system 120 may then send (162) the updated first trained model 159 to one or more device(s) 110. Part or all of the process may then repeat itself… In this manner the system 100 may create new global model(s) and new personalized model(s) as desired by the system configuration”).  

Regarding Claim 10, Zhang in view of He, Gal, and Mirus teaches the method of Claim 9 as referenced above. Zhang further teaches:
wherein the central node is configured to deploy the new model to all nodes in all of the locations (col. 14, lines 19-20, “The system 120 may then send (162) the updated first trained model 159 to various devices 110 including the cohort device 125, cohort devices 115, other devices not illustrated, etc.”).  

Regarding Claim 11, Zhang teaches:
A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors (col. 51-52, lines 64-67 and 1-3, “Computer instructions for operating each device (110/120/625) and its various components may be executed by the respective device's controller(s)/processor(s) (1304/1404), using the memory (1306/1406) as temporary “working” storage at runtime. A device's computer instructions may be stored in a non-transitory manner in non-volatile memory (1306/1406), storage”) to perform operations comprising:
monitoring performance of a model deployed on nodes in a first location… (col. 5, lines 51-53, “The cohort devices 115 may determine (140) observation data based on operation of the first trained model” and lines 42-45, “the cohort devices 115 may operate a first trained model to perform some function. The function may be any function for which the model may be trained for”)
sending, from the first location, reconstruction errors of the models to a central node, wherein the central node receives… errors from nodes in other locations, wherein errors from all of the locations are stored at the central node as aggregate errors (The system is central node, variance is errors because it represents differences of operation of a model including differences that have a negative impact on performance, Fig 4, col. 6, lines 22-26, “The cohort device 125 may also send (148) the cohort variance data 147 to the system 120. The system 120 may then use the cohort variance data 147, as well as cohort variance data from one or more (or all) other cohorts, to determine multi-cohort variance data 149”, col .4, lines 12-13, “a cohort may include a group of devices all local to a particular location”)
collecting sample data from the nodes in the first location and the other locations in response to the aggregate errors exceeding a threshold error level (Determining a number of training steps for a local model based on cohort variance data is exceeding a threshold error level, col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”, after model update more sample data is collected, col. 7, lines 22-24, “After some period of time of use of the updated first trained model, the cohort devices 115 may determine new observation data”)
performing, at the central node, an uncertainty operation on the sample data… determining a standard deviation of the model outputs for each of the locations, wherein a near edge node corresponding to the location having a highest standard deviation is to identified as a near edge node with a highest uncertainty (Cohort device is near edge node, cohort device has a highest uncertainty if determined to need training, standard deviation contributes to variance in a cohort which is used to determine highest uncertainty, col. 7, lines 39-40, “The variance may correspond to the square of the standard deviation σ from the mean” and lines 50-52, “Thus, σ.sub.m.sup.2 represents the intra-client uncertainty of the particular parameter θ for device m”, col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”)
collecting training data from the identified near edge node with the highest uncertainty (Variance data is training data, col. 13, lines 46-52, “FIG. 5 is a flowchart illustrating operations 500 of training of an updated first trained model… The system may then receive (505) cohort variance data 147 from one or more cohorts of the system.”, col. 13, lines 65-67, “The system 120 may then determine (160) an updated first trained model using the cohort variance data and/or cohort model data”, cohort variance data is collected from one or more cohorts which includes the identified near edge node with highest uncertainty);
retraining the model using at least the collected training data to produce an updated model (col. 7, lines 3-8, “The system 120 may also receive cohort variance data for other parameters of the first trained model along with other data (e.g., gradient data) the system 120 may use to re-train the global model.”); and
deploying the updated model to all nodes in all of the locations (col. 14, lines 19-20, “The system 120 may then send (162) the updated first trained model 159 to various devices 110 including the cohort device 125, cohort devices 115, other devices not illustrated, etc.”)
Zhang does not expressly teach:
…the model being an autoencoder configured to… based on a reconstruction loss of the model
…detect cornering events…
…reconstruction error…
the uncertainty operation including: 
performing multiple trial runs of the model on the sample data while randomly dropping different neurons of the model in each trial run to generate a distribution of model outputs 
However, He teaches:
…the model being an autoencoder configured to… based on a reconstruction loss of the model (He, p. 415, col. 2, paragraph 4, “autoencoder focuses on the reconstruction of the input data”, p.415, col. 2, ¶4, “This loss is also called reconstruction error calculated by mean square error”)
 and …reconstruction error…(He, p.415, col. 2, ¶4, “This loss is also called reconstruction error calculated by mean square error”, reconstruction error of the claims is equivalent to observation data in Zhang because Zhang’s observation data is any data relating to operation of a model, Zhang, p. 20 col. 5, lines 51-53, , “The cohort devices 115 may determine (140) observation data based on operation of the first trained model”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use an autoencoder model when monitoring performance as does He, in the invention of Zhang. The motivation to do so would be for models to evaluate simply and effectively (He, p.415, col. 2, ¶3, “We apply Deep Autoencoder [17] as the model for anomaly detection. Deep Autoencoder is simple but effective and does not lose the generality to evaluate FL algorithms and our FedIot platform”).
Zhang in view of He does not expressly teach:
…detect cornering events… 
the uncertainty operation including: 
performing multiple trial runs of the model on the sample data while randomly dropping different neurons of the model in each trial run to generate a distribution of model outputs 
However, Gal teaches: 
the uncertainty operation including:
performing multiple trial runs of the model on the sample data while randomly dropping different neurons of the model in each trial run to generate a distribution of model outputs  (Gal, p.3, col. 1, ¶3, “With dropout, we sample binary variables for every input point and for every network unit in each layer (apart from the last one). Each binary variable takes value 1 with probability pi for layer i. A unit is dropped (i.e. its value is set to zero) for a given input if its corresponding binary variable takes value 0.”, p.4, col. 1, ¶2, “We will perform moment-matching and estimate the first two moments of the predictive distribution empirically… We refer to this Monte Carlo estimate as MC dropout. In practice this is equivalent to performing T stochastic forward passes through the network and averaging the results”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Bayesian deep learning metrics when finding uncertainty as does Gal, in the variance data of Zhang. The motivation to do so would be to represent uncertainty without sacrificing computational complexity or test accuracy (Gal, p.1, col. 1, Abstract, “This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.”). This would contribute to helping better identify what models need training in Zhang and variance data can include any data representing operation of a model (Zhang, col. 6, lines 1-3, The device variance data 143 may also include other data representing operation of the first machine learning model by a device 110).
Zhang in view of He and Gal does not expressly teach:
…detect cornering events… 
However, Mirus teaches:
…detect cornering events… (Mirus, p. 365, ¶1, “detect abnormal driving situations”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the dataset and purpose of Mirus with the model updating of Zhang. The motivation to do so would be to apply Zhang’s improvements in federated learning to the domain of vehicle abnormal event detection.

Regarding Claim 13, the rejection of Claim 11 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 3.

Regarding Claim 14 the rejection of Claim 13 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 4.

Regarding Claim 15 the rejection of Claim 13 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 5.

Regarding Claim 16 the rejection of Claim 15 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 6.

Regarding Claim 17 the rejection of Claim 111 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 7.

Regarding Claim 18 the rejection of Claim 17 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 8.

Regarding Claim 19 the rejection of Claim 18 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 9.

Regarding Claim 20 the rejection of Claim 19 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 10.

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of He, further in view of Gal, further in view of Mirus, further in view of Shekhar, “What Are L1 and L2 Loss Functions”, hereinafter “Shekhar”.

Regarding Claim 2, Zhang in view of He, Gal, and Mirus teaches the method of Claim 1 as referenced above. In the combination as set forth above, He teaches:
wherein the reconstruction loss comprises an… loss… corresponding reconstructed… data generated by the autoencoder (He, p. 3, col. 2, ¶4, “autoencoder focuses on the reconstruction of the input data”, p.415, col. 2, ¶4, “This loss is also called reconstruction error calculated by mean square error”)
In the combination as set forth above, Mirus teaches:
…computed between positional data of detected cornering events… (Mirus, p. 364, ¶1, “data sets contain… features such as position”)
Zhang in view of He, Gal, and Mirus does not expressly teach:
… L1 loss…
However, Shekhar teaches:
… L1 loss… (Shekhar, p. 2, L1 Loss Function, “L1 Loss Function is used to minimize the error which is the sum of the all the absolute differences between the true value and the predicted value”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the L1 loss of Shekhar instead of the L2 MSE loss of He. The motivation to do so would to minimize error from outliers (Shekhar, “when the outliers are present in the dataset, then the L2 Loss Function does not perform well… if the dataset is having outliers, then because of the consideration of the squared differences, it leads to the much larger error. Hence, L2 Loss Function is not useful here. Prefer L1 Loss Function”).

Regarding Claim 12, the rejection of Claim 11 is incorporated and further, the claim is rejected
for the same reasons as set forth in Claim 2.

Response to Arguments
35 U.S.C. 103
Argument 1: Zhang, He, Gal, Mirus, and Shekhar fails to teach recited features of independent claims 1 and 11.
	Examiner Response: Examiner respectfully disagrees. Regarding applicant assertion that a near edge node corresponding to the location having a highest standard deviation is identified as a near edge node with a highest uncertainty is not taught by the references, Zhang teaches this. In Zhang an identified near edge node with a highest uncertainty is a cohort device determined to need training (col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”). The cohort device having a highest standard device can be determined to need training because the determination is based off cohort variance data which standard deviation contributes to (col. 7, lines 39-40, “The variance may correspond to the square of the standard deviation σ from the mean” and lines 50-52, “Thus, σ.sub.m.sup.2 represents the intra-client uncertainty of the particular parameter θ for device m”). 
Regarding applicant assertion that collecting training data from the identified near edge node with the highest uncertainty and retraining the model using at least the collected training data to produce an updated model is not taught be the references, Zhang teaches this. Zhang collects training data from a plurality of cohorts which can include the identified near edge node (col. 13, lines 46-52, “operations 500 of training of an updated first trained model… The system may then receive (505) cohort variance data 147 from one or more cohorts of the system”). Zhang retrains the first trained model with the collected training data which then retrains the global model (col. 13, lines 65-67, “The system 120 may then determine (160) an updated first trained model using the cohort variance data and/or cohort model data”, col. 7, lines 3-8, “The system 120 may also receive cohort variance data for other parameters of the first trained model along with other data (e.g., gradient data) the system 120 may use to re-train the global model.”).

Argument 2: Zhang discloses a localized machine learning model and a global model where variance data is collected from all device cohorts. Zhang nowhere discloses a near edge node is identified as having the highest uncertainty and the training data is collected from the near edge node, as recited in independent claims 1 and 11.

Examiner Response: Examiner respectfully disagrees. Zhang discloses a near edge node identified as having the highest uncertainty (col. 6, lines 32-35, “Using the cohort variance data 147 and/or the multi-cohort variance data 149, the cohort device 125 may determine (152) a number of training steps to use to create a personalized local model based on the first trained model”) where a cohort device is a near edge node which has a highest uncertainty if it is determined to need training.   
Zhang also discloses training data is collected from the near edge node (col. 13, lines 46-52, “operations 500 of training of an updated first trained model… The system may then receive (505) cohort variance data 147 from one or more cohorts of the system.”) where the cohort variance data is used to train the model of the cohort device identified to need training and is collected from one or more cohorts of the system which includes the cohort device itself. There is nothing in the claim language itself that suggests the collecting of training data for the near edge node with the highest uncertainty would be different than the collection of data for all or a portion of the cohorts including the near edge node (cohort device determined to need training).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSE CHEN COULSON whose telephone number is (571)272-4716. The examiner can normally be reached Monday-Friday 8:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JESSE C COULSON/
Examiner, Art Unit 2122

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
ACTIVE LEARNING FOR HIGH-COST TRAJECTORY COLLECTION FOR MOBILE EDGE DEVICES

This examiner grants 25% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

ACTIVE LEARNING FOR HIGH-COST TRAJECTORY COLLECTION FOR MOBILE EDGE DEVICES

This examiner grants 25% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email