Office Action Analysis: 17850764 — SYSTEM AND METHOD FOR CAPACITY PLANNING FOR DATA AGGREGATION USING SIMILARITY GRAPHS

Office Action

§103
DETAILED ACTION
Status of Claims
This Office action is responsive to communications filed on 2026-03-17. Claim(s) 1-20 is/are pending and are examined herein.
Claim(s) 1-20 is/are rejected under 35 USC 103.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after 2013-03-16, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Regarding the rejections under 35 USC 103, the applicant’s arguments have been fully considered: 
Regarding the rejections of the previously filed claims, the applicant indicates that they do not acquiesce [remarks, page 13] but provides no rationale to explain why. A bare assertion of non-acquiescence fails to comply with 37 CFR 1.111(b) because it amounts to a general allegation that the previously filed claims defined a patentable invention without specifically pointing out how the language of the previously filed claims patentably distinguished them from the references.
Regarding the amended claims, the applicant’s arguments are not persuasive. The applicant argues that Jain does not disclose “forecasting the next data in the series in response to receiving any form of the real sensed data collected by the SNs” [remarks, page 15; emphasis in the original]. The examiner remarks that the pending claims also do not recite forecasting in response to receiving “the real sensed data collected by the SNs” as asserted here; rather, the claims recite forecasting in response to receiving “the reduced size data from each of the data collectors”. The examiner maintains, as described in the previous Office action and again below, that Jain does disclose “reduced size data” which is a “difference between a full data set… and an inference of the full data set”. It is merely the transmission of this difference that is not disclosed in Jain. The applicant goes on to assert cursorily that “Daoud, Petouis, Chakraborty, and Alakeel fail to supply that which Jain lacks” but the examiner respectfully disagrees. As noted in the previous Office action, Petousis discloses a “first computing entity” which “access[es] one or more streams of sensor data” (i.e., this first computing entity corresponds to an SH of Jain and a “data collector” of the claim) and which “comput[es] error values based on calculated differences between the actual sensor values and the predicted sensor values” (i.e., these error values correspond to the differences disclosed in Jain and to the “reduced size data” of the claim) [Petousis, 0019]. Moreover, Petousis discloses that the first computing entity “transmit[s] the computed error values to a second computing entity” which uses a “second instance of the trained machine learning model” in order to “reconstruct[…] estimates of the actual sensor values based on a reconstruction computation with the parallel predicted sensor values and the error values from the first computing entity” [Petousis, 0019]. In other words, the “second computing entity” of Petousis corresponds to the CH of Jain and the “data aggregator” of the claim, and since the reconstruction performed in Petousis uses the error values received from the first computing entity, this reconstruction is in fact “in response to receiving the reduced size data from each of the data collectors” as recited by the pending claims. 

The complete prior art mapping, updated in view of the applicant’s amendments, is given below. 

Claim Rejections - 35 USC 103
The following is a quotation of 35 USC 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 USC 102(b)(2)(C) for any potential 35 USC 102(a)(2) prior art against the later invention.

Claim(s) 1-10 and 12-20 is/are rejected under 35 USC 103 as being unpatentable over Khushboo JAIN et al. (A two-vector data-prediction model for energy-efficient data-aggregation in wireless sensor network, published 2022-02-20; hereafter, “Jain”) in view of Mohammad DAOUD et al. (A high performance algorithm for static task scheduling in heterogeneous distributed computing systems, published 2007-07-28; hereafter, “Daoud”), Roshni CHAKRABORTY et al. (Finding Representative Sampling Subsets in Sensor Graphs using Time Series Similarities, published 2022-02-18; hereafter, “Chakraborty”), and Ioannis PETOUSIS et al. (US20200097841A1, published 2020-03-26; hereafter, “Petousis”). 

Claim 1
Jain discloses: 
A method for managing data collection in a distributed environment where data collected by data collectors that are operably connected to a data aggregator via a communication system is to be aggregated by the data aggregator into aggregated data, comprising: ([Jain, section 1 and figure 1]: Jain works in the setting of wireless sensor networks (WSNs) in which “[s]ensor nodes (SNs) are typically deployed with the target of collecting a large amount of data through environmental monitoring” [Jain, section 1 first paragraph] and “SNs typically collect data over a time and deliver fused data to the cluster-head (CH) or directly to the base station on a regular basis” [Jain, section 1 third paragraph]. See [Jain, figure 1] for a depiction. The network maps to the “communication system” of the claim, the SNs map to the “data collectors” of the claim, and any one of the CHs maps to the “data aggregator” of the claim.)
obtaining an error limit for the data collected by the data collectors that is to be aggregated by the data aggregator; ([Jain, section 3.2]: In the system disclosed by Jain, “the BS will communicate its acceptable prediction threshold to all CHs and SNs as δ and cumulative threshold denoted as ε” [Jain, section 3.2 first bullet point]. These thresholds map to the “error limit” of the claim.)
obtaining a similarity graph for the data collectors; ([Jain, figure 1]: The “[f]irst [l]ayer” of the “three-layer network architecture” of Jain has nodes representing the SNs and these nodes are grouped into clusters [Jain, figure 1 “First Layer”]. This falls under the broadest reasonable interpretation of “a similarity graph for the data collectors” as recited by the claim, since each cluster of nodes is “similar” to other nodes in the same cluster.)
train a plurality of inference models ([Jain, sections 1, 3, and 4]: Jain discloses a “two-vector data-prediction model” which “predicts the sensor’s reading in the succeeding time slot at both SN and CH based on data stored on the vectors. When the next data reading is collected, each SN within the cluster compares its predicted data with the real sensed measurements. The SN will not relay the real sensor reading to the CH when the prediction error is lower than a predetermined threshold. When the CH does not receive a value from the SN, it uses the same data prediction algorithm to forecast the next data in the series” [section 1 paragraph beginning “This article”]. In other words, a “same instance of the data-prediction model will be employed both at the SNs and CHs” [Jain, section 3.1 second bullet]. The “goal is to reduce data transmission and enhance network lifespan by estimating future data based on prior sensed readings” [Jain, section 4 first paragraph], and Jain discloses algorithms for training this model (the process having an initialization stage [Jain, section 4.1 and algorithm 1] and model-building stage [Jain, section 4.2 and algorithm 2]). There is one model for each cluster, and each of these models maps to one of the “inference models” of the claim.)
grouping nodes of the similarity graph into groupings of the data collectors ([Jain, figure 1]: The clusters of Jain map to the “groupings” of the claim.)
wherein each of the plurality of inference models comprises identically trained copies of a machine learning model; ([Jain, sections 1 and 3 and figure 1]: As noted above, the model corresponding to a given cluster map to one of the “inference models” of the claim; each cluster includes several SNs and a CH [Jain, figure 1] and instances of the same data-prediction model are used within a cluster [Jain, sections 1 and 3]. In other words, the instances of the same data-prediction model that are employed by a given cluster map to the “identically trained copies of a machine learning model” of the claim.)
the data aggregator, [the model training device,] and the data collectors all being implemented using computing devices, ([Jain, sections 1 and 3]: The SNs and CHs of Jain are “implemented using computing devices” as recited by the claim since they are all capable of computation (e.g., computations involving making predictions using models, as explained above [Jain, sections 1 and 3]).)
initiating aggregation of the data collected by the data collectors that is to be aggregated by the data aggregator [using the model training device] by distributing at least one inference model from among the plurality of inference models to each of the data collectors and the data aggregator; ([Jain, sections 1, 3, and 4]: The examiner notes that the specification [specification, 0078] indicates that “initiating aggregation” refers to training models and distributing them. Jain discloses initializing and building models [Jain, sections 4.1-4.2], and, as noted above, it also discloses each SN and CH having an instance of a data-prediction model, with devices in the same cluster having instances of the same data-prediction model.)
obtaining the aggregated data ([Jain, figure 1]: As explained above, the CH aggregates data collected by the SNs in its cluster.) using the plurality of inference models trained [by the model training device] by: ([Jain, section 1]: As explained above, the predictions made by the two-vector data-prediction model are used in the data aggregation process [Jain, section 1 paragraph beginning with “This article”].) 
a reduced size data from each of the data collectors, wherein for each of the data collectors, the reduced size data is a difference between a full data set collected by a respective data collector of the data collectors and an inference of the full data set, ([Jain, sections 1 and 3]: Jain discloses a vector RDV_{SN} which stores “real sensed readings” of an SN [Jain, section 3.2 second bullet]. This vector maps to the “full data set” of the claim. Jain also discloses a vector FDV_{SN} of the forecasted sensor readings [Jain, section 3.2 second bullet], which maps to the “inference of the full data set” of the claim. Jain also discloses computing differences |p_{n+1} - r_{n+1}| between the predicted and real data [Jain, section 3.3 first paragraph]. These differences map to the “difference between the full data set… and [the] inference of the full data set” of the claim, i.e., to the “reduced size data” of the claim.)
and the inference of the full data set is generated by the at least one inference model distributed to the respective data collector of the data collectors; ([Jain, sections 1 and 3]: As explained above, the forecasted readings within each cluster are produced by the model assigned to that cluster [Jain, sections 1 and 3].)
reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using [the reduced size data received from each of the data collectors] and inferences generated by the at least one inference model distributed to the data aggregator ([Jain, section 3]: The vector FDV_{CH} constructed at each CH [Jain, section 3.2 second bullet] maps to the “representation” of the claim.)
without the data aggregator actually receiving the full data set from any of the data collectors; ([Jain, section 3]: As noted above, the vectors RDV_{SN} map to the “full data set” of the claim. The CH does not receive this full data set from any of the SHs in its cluster; it receives only those readings which deviate sufficiently from the predictions made by the models (i.e., only those r_n for which either |p_{n+1} - r_{n+1}| or |hat{p}_{n+1} - p_{n+1}| is larger than a given error limit) [Jain, section 3.3]. In other words, the “data aggregator [does not] actually receiv[e] the full data set from any of the data collectors”, as required by the claim.)
and using, by the data aggregator, the representation that is reconstructed as the aggregated data. ([Jain, section 3 and figure 1]: Jain discloses that that the BS “receives aggregated data from the CHs” [Jain, section 3 first paragraph; see also, figure 1]. This transmission of the CH’s data maps to the step of “using, by the data aggregator, the representation” as recited by the claim.)

Jain does not distinctly disclose a method of selecting devices on which the prediction models are trained based on the resource requirements of the training task, it does not distinctly disclose a method of constructing the clusters/groupings, and it does not distinctly disclose transmitting the differences between predicted and collected data. In other words, Jain might not distinctly disclose:
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] 
by at least: identifying an edge value threshold based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, 
obtaining a model training device based on the quantity of computing resources,
[the data aggregator,] the model training device, [and the data collectors all being implemented using computing devices]
[initiating aggregation…] using the model training device
[the plurality of inference models] trained by the model training device,
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Daoud works in the setting of task scheduling for a heterogeneous distributed computing system (HeDCS) which is “represented by a set P of m processors” [Daoud, section 2 second paragraph]. Moreover, Jain in view of Daoud discloses: 
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] ([Daoud, section 2; Jain, section 1]: Daoud works with a “parallel application is represented by a directed acyclic graph, or DAG, defined by the tuple (T, E), where T is a set of n tasks” [Daoud, section 2 first paragraph] and associates to this a “n x m computation cost matrix W” where “[e]ach element w_{i, j} in W represents the estimated execution time of task t_i on processor p_j” [Daoud, section 2 second paragraph]. In the combination, the parallel application of Daoud is taken to be the task of training the prediction models for each cluster as disclosed by Jain [Jain, section 1], each task t_i in T within the parallel application being the task of training one of the prediction models associated with a given cluster. The computation cost matrix is the “quantity of computing resources” recited by the claim. This computation cost matrix falls under the broadest reasonable interpretation of “using the similarity graph” because the number n of rows in the matrix corresponds to the number of tasks, i.e., to the number of clusters in the WSN of Jain.)
obtaining a model training device based on the quantity of computing resources, ([Daoud, abstract and section 4]: Daoud discloses a “scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm” [Daoud, abstract]. The LDCP algorithm includes a “processor selection” phase [Daoud, section 4 first paragraph] during which tasks are assigned to processors [Daoud, section 4.2]. Any of the processors selected for task assignment can map to the “model training device” of the claim, the processor selection mapping to the “obtaining” step of the claim. See [Daoud, figure 3] for pseudocode of the LDCP algorithm.)
[the data aggregator,] the model training device, [and the data collectors all being implemented using computing devices] ([Daoud, section 2]: As noted above, the HeCDS of Daoud is “represented by a set P of m processors” [Daoud, section 2 second paragraph]. In other words, the “model training device” as mapped above is a processor and is “implemented using computing devices” as required by the claim.)
[initiating aggregation…] using the model training device ([Daoud, section 2; Jain, section 1]: As explained above, the parallel application of Daoud [Daoud, section 2 first paragraph] is taken to be the task of training the prediction models for each cluster as disclosed by Jain [Jain, section 1]. In other words, in the combination, the “initiating aggregation” step as mapped above is performed by the “model training device” as mapped above.)
[the plurality of inference models] trained by the model training device, ([Daoud, abstract; Jain, sections 1 and 4]: As noted above, in the combination, the prediction models of Jain are trained on processors in a HeDCS using the scheduling algorithm disclosed by Daoud.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method for training and using data prediction models for sensor networks as disclosed by Jain with the scheduling algorithm of Daoud because this scheduling algorithm “outperforms [competing] algorithms” and “provides a practical solution for scheduling parallel applications with high communication costs” [Daoud, abstract], thereby resulting in a more efficient system. 

Jain in view of Daoud might not distinctly disclose: 
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] 
by at least: identifying an edge value threshold based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, 
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Chakraborty works in the setting of a “SwN that comprises of sensors” which communicate with “the base-station B” [Chakraborty, figure 1 caption]. Moreover, Jain in view of Daoud and Chakraborty discloses: 
by at least: identifying an edge value threshold ([Chakraborty, figure 2]: Chakraborty discloses “creat[ing] a similarity graph, G, of the sensors where there is an edge between a pair of sensors if the similarity is greater than the threshold” [Chakraborty, figure 2 caption]. This threshold maps to the “edge value threshold” of the claim. See [Chakraborty, section 4] for details.) based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, ([Jain, sections 1 and 3.2; Chakraborty, section 7.1]: Chakraborty discloses “vary[ing] the values of the threshold” and studies how varying the threshold affects the clustering of the graph [Chakraborty, section 7.1 first paragraph and table 1]. Each cluster in Jain shares a single prediction model, and the predictions made by the prediction model are compared against the prediction threshold δ (i.e., the “error limit” as mapped above) [Jain, section 1 paragraph beginning “This article” and section 3.2 first bullet point]. It would thus be obvious to a person of ordinary skill in the art, having both Jain and Chakraborty before them, to choose the “edge value threshold based on the error limit” as recited by the claim.) and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, ([Chakraborty, figure 2; Jain, figure 1]: As noted above, Chakraborty discloses creating a similarity graph based on the edge value threshold. In the combination, this similarity graph constructed based on the edge value threshold is used to form the clusters which appear in Jain [Jain, figure 1].)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the process for training and using data prediction models for sensor networks as disclosed by Jain in view of Daoud with clustering by means of a similarity graph as disclosed by Chakraborty, because Chakraborty’s “approach can yield significant battery life improvements within realistic error bounds” [Chakraborty, abstract], thereby resulting in a more efficient network. 

Jain in view of Daoud and Chakraborty might not distinctly disclose: 
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Petousis is in the field of machine learning. In particular, it describes a “first computing entity” which “access[es] one or more streams of sensor data” and “generates predictions of predicted sensor values” [Petousis, 0019]. In the combination, the first computing entity of Petousis corresponds to a sensor in Jain, i.e., to one of the “data collectors” of the claim. Moreover, Jain in view of Daoud, Chakraborty, and Petousis discloses:
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] ([Petousis, 0019]: Petousis discloses that the first computing entity “comput[es] error values based on calculated differences between the actual sensor values and the predicted sensor values” and “transmit[s] the computed error values to a second computing entity” [Petousis, 0019]. In other words, the second computing entity corresponds to the CH of Jain, i.e., to the “data aggregator” of the claim, and the error values based on calculated differences correspond to the analogous differences in Jain, i.e., to the “reduced size data” of the claim as mapped above. In the combination, the method of Jain could be modified in view of Petousis so that, whenever an SN would transfer data to the CH, instead of transferring the reading itself, it would instead transfer a difference as described in Petousis.)
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors ([Petousis, 0019]: Petousis discloses that the “second computing entity” executes a “second instance of the trained machine learning model” in order to “reconstruct[…] estimates of the actual sensor values based on a reconstruction computation with the parallel predicted sensor values and the error values from the first computing entity” [Petousis, 0019]. As noted above, the “error values from the first computing entity” correspond to the “reduced size data” of the claim. Since the reconstruction is based on these error values received from the first computing entity, the reconstruction both “us[es] the reduced size data received from each of the data collectors” and occurs “in response to receiving the reduced size data” as recited by the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the wireless sensor network of Jain in view of Daoud and Chakraborty with the techniques for handling sensor data described in Petouis because they provide “intelligent compression techniques” for “minimiz[ing] bandwidth usage when transferring data streams” [Petouis, 0029], thereby resulting in a more efficient and effective system.  

Claim 2
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein initiating aggregation of the data collected by the data collectors that is to be aggregated by the data aggregator using the model training device comprises:] training the plurality of inference models based on the error limit, ([Jain, section 4]: As described under the parent claim, Jain discloses training data-prediction models (i.e., the “inference models” as mapped above) making use of the acceptable prediction threshold δ as a parameter (i.e., the “error limit” as mapped above).)
wherein the distributing of at least one inference model from among the plurality of inference models to each of the data collectors and the data aggregator is based on the groupings of the data collectors. ([Jain, sections 1 and 4]: As described under the parent claim, Jain discloses each SN and CH having an instance of a data-prediction model, with devices in the same cluster having an instance of the same data-prediction model, and the clusters of SNs are the “groupings” of the claim. Thus, the distribution of instances of data-prediction models to the devices are “based on the groupings” as recited by the claim.)

The same motivation to combine applies. 

Claim 3
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses:
[The method of claim 2, wherein identifying the quantity of computing resources further comprises:] calculating the quantity of computing resources based on a cardinality of the groupings and a per inference model computing resources training cost. ([Daoud, section 2 and figure 3]: As noted under the parent claim, Daoud discloses estimating a n x m computation cost matrix W [Daoud, section 2 section paragraph], which maps to the “quantity of computing resources” of the claim. In the combination, the number n of rows in W, i.e., the number of tasks, is the number of clusters in the WSN of Jain; in other words, n maps to the “cardinality of the groupings” as recited by the claim. Moreover, since the entries “w_{i, j} in W represents the estimated execution time of task t_i on processor p_j”, and each task t_i corresponds in Jain to the task of training a prediction model, any of these entries maps to the “per twin inference model computing resources training cost” of the claim. The examiner notes that the algorithm of Daoud “computes the finish time of [tasks] on every processor in the system” [Daoud, figure 3 line 11; emphasis added].)

The same motivation to combine applies. 

Claim 4
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses:
[The method of claim 3, wherein the similarity graph comprises:] nodes, each node of the nodes corresponding to one of the data collectors; and edges, each of the edges associating a pair of nodes, the respective edge indicating a similarity of data collected by the associated pair of the nodes ([Chakraborty, section 3.1]: Chakraborty indicates that they “create a similarity graph, G = (V, E) such that the vertices V are the sensors, S and the edges E represent the similarity of the recorded data between each pair of sensors, S_i and S_j” [Chakraborty, section 3.1 last paragraph].) 

The same motivation to combine applies.

Claim 5
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 2, wherein] data collectors that are members of a first group of the groupings receive one inference model from a first set of the plurality of inference models, and data collectors that are members of a second groups of the groupings receive one inference model from a second set of the plurality of inference models, wherein the inference models making up the second set of the plurality of the inference models are separate from and trained differently from inference models making up the first set of the plurality of inference models, and the first group of the groupings is different from the second group of the groupings. ([Jain, sections 1 and 3]: As noted under the parent claim(s), each SN and CH has an instance of a data-prediction model, with each cluster having instances of the same model [Jain, sections 1 and 3]. The examiner notes that the fact that there is one model per cluster is also visible in [Jain, section 4 and algorithms 1-3], which also makes clear that the models corresponding to each cluster are separate and differently trained from those corresponding to other clusters. In other words, the mappings described under the parent claim already satisfy the limitation of this dependent, with any one two distinct clusters mapping to the “first group” and the “second group” of the claim, and the models corresponding to these two clusters mapping to the “first set of the plurality of inference models” and the “second set of the plurality of inference models” of the claim, respectively.) 

The same motivation to combine applies. 

Claim 6
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 5, wherein obtaining the aggregated data comprises:] obtaining, from a first data collector of the data collectors that is a member of the first group of the groupings, ([Jain, figure 1]: Any cluster maps to the “group” of the claim, and any SN within a cluster maps to the “first data collector” of the claim.) first reduced size data of the reduced size data based on a portion of data collected by the first data collector; ([Jain, section 1]: The data that is actually collected by each SN maps to the “portion of data collected by the first data collector” of the claim. As noted under the parent claim, the SN does not transmit measurements which fall within a predetermined threshold of predicted measurements. The data that is actually transmitted to the CH by the SN (which is less than the data that is collected by the SN) maps to the “first reduced size data” of the claim.) 
obtaining, from a second data collector of the data collectors that is also a member of the first group of the groupings, ([Jain, figure 1]: Any SN within the same cluster/group maps to the “second data collector” of the claim. The examiner notes that the broadest reasonable interpretation of the claim does not require the “second data collector” to be distinct from the “first data collector”, so the “second data collector” of the claim could even be mapped to the same SN as the one that was mapped to the “first data collector” above. However, the clusters depicted in Jain have more than one SN each, so such a redundant mapping is not strictly necessary.) second reduced size data of the reduced size data based on a portion of data collected by the second data collector; ([Jain, section 1]: As above, the data that is actually transmitted to the CH by the “second data collector” maps to the “second reduced size data” of the claim.)
reconstructing the portion of the data collected by the first data collector using a first inference obtained from a first inference model from the first set of the plurality of inference models; ([Jain, section 1]: As noted under the parent claim, “[w]hen the CH does not receive a value from the SN, it uses the same data prediction algorithm to forecast the next data in the series” [Jain, section 1 paragraph beginning “This article”]. This use of the data prediction algorithm by the CH maps to the “reconstructing” step of the claim. In other words, the instance of the data-prediction model at the CH corresponding to the “first data collector” as mapped above is the “first inference model” of the claim.)
and reconstructing the portion of the data collected by the second data collector using a second inference obtained from a second inference model from the first set of the plurality of inference models. ([Jain, section 1]: As above, the use of the data prediction algorithm by the CH maps to the “reconstructing” step of the claim. The examiner notes that the “same data prediction algorithm” [Jain, section 1 paragraph beginning “This article”] is shared by the cluster (cf. [Jain, section 4 and algorithms 1-3]), so this “reconstructing” step uses the same prediction/inference model as the previous “reconstructing” step of this claim, as is required by the claim. In other words, the instance of the data-prediction model at the CH corresponding to the “second data collector” as mapped above is the “second inference model” of the claim.)

The same motivation to combine applies. 

Claim 7
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 6, wherein obtaining the aggregated data further comprises:] obtaining, from a third data collector of the data collectors that is a member of the second group of the groupings, ([Jain, figure 1]: Any SN in the “second group” as mapped above maps to the “third data collector” of the claim.) third reduced size data based on a portion of data collected by the third data collector; ([Jain, section 1]: As described in the parent claim, the data that is actually transmitted to the corresponding CH by the “third data collector” maps to the “third reduced size data” of the claim.)
and reconstructing the portion of the data collected by the third data collector using a third inference obtained from a third inference model from the second set of the plurality of inference models. ([Jain, section 1]: As described in the parent claim, the use of the data prediction algorithm by the CH maps to this “reconstructing” step. In other words, the instance of the data-prediction model at the CH corresponding to the “third data collector” as mapped above is the “third inference model” of the claim.)

The same motivation to combine applies. 

Claim 8
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 7, wherein] the reconstructed portion of the data collected by the first data collector comprises a quantity of error that is within the error limit. ([Jain, section 1]: As noted under the parent claims, Jain discloses that “[t]he SN will not relay the real sensor reading to the CH when the prediction error is lower than a predetermined threshold” [Jain, section 1 paragraph beginning “This article”]. The prediction error maps to the “quantity of error” recited in the claim, and the predetermined threshold δ to the “error limit” of the claim.)

The same motivation to combine applies. 

Claim 9
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein the model training device is obtained by] selecting the model training device from a plurality of model training devices, ([Daoud, sections 2 and 4]: As noted under the parent claim, Daoud selects processors to assign tasks to, and any of the selected processors map to the “[selected] model training device” of the claim (cf. [Daoud, figure 3 line 12]). As further noted under the parent claim, processors are selected from a HeDCS “represented by a set P of m processors that have diverse capabilities” [Daoud, section 2 paragraph beginning “The HeDCS”]. In other words, the HeDCS is the “plurality of model training device[s]” of the claim.)
the selected model training device having access to a quantity of computing resources that exceeds the quantity of computing resources required to train the plurality of inference models. ([Daoud, section 4.2]: The LDCP algorithm assigns tasks to time slots on the selected processor by “consider[ing] all possible idle time slots on p_j to find a time slot of equal or greater length than the execution time of t_i” [Daoud, section 4.2]. In other words, the time slot to which the task is assigned maps to the “quantity of computing resources that exceeds the quantity of computing resources required” of the claim.)

The same motivation to combine applies. 

Claim 10
Jain in view of Daoud, Chakraborty, and Petousis discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein the model training device is obtained by] allocating computing resources to the model training device until the model training device has access to a quantity of computing resources that exceeds the quantity of computing resources required to train the plurality of inference models. ([Daoud, section 4.2]: The LDCP algorithm assigns tasks to time slots on the selected processor by “consider[ing] all possible idle time slots on p_j to find a time slot of equal or greater length than the execution time of t_i” [Daoud, section 4.2]. In other words, the time slot to which the task is assigned maps to the “quantity of computing resources that exceeds the identified quantity of computing resources” of the claim. This search for and assignment of an appropriate time slot on the selected processor falls under the broadest reasonable interpretation of “allocating computing resources to the model training device until the model training device has access to” resources sufficient for performing the task, as recited by the claim.)

The same motivation to combine applies. 

Claim 12
Jain in view of Daoud, Chakraborty, and Petousis discloses the element(s) of the parent claim(s). It also discloses: 
[The method of claim 1, wherein obtaining a model training device based on the quantity of computing resources comprises:] increasing the error limit; ([Jain, section 5]: Jain discloses two values for the prediction threshold δ, namely, δ = 0.5 and δ = 1 [Jain, section 5 below table 1]. Increasing δ from 0.5 to 1 maps to “increasing the error limit” as recited by the claim. )
identifying, using the similarity graph, a second quantity of computing resources ([Daoud, section 2]: The same mapping described under the parent claim for identifying the first quantity of computing resources applies for the second quantity, this time using the increased value of δ.) required to train a second quantity of the plurality of inference models, the second quantity of computing resources being based on the increased error limit; ([Jain, section 5]: As noted under the parent claim, δ is a parameter of the prediction models used in Jain. In other words, the prediction models constructed using the parameter δ = 1 are the “second quantity of twin inference models” of the claim. Since these models themselves are “based on the increased error limit”, so too is the quantity of computing resources required to train them, as required by the claim.)
and obtaining the model training device based on the second quantity of computing resources. ([Daoud, abstract and section 4]: The same mapping described under the parent claim for obtaining a model training device applies, this time using the increased value of δ.)

The same motivation to combine applies. 

Claim 13
Jain discloses: 
managing data collection in a distributed environment where data collected by data collectors that are operably connected to a data aggregator via a communication system is to be aggregated by the data aggregator into aggregated data, comprising: ([Jain, section 1 and figure 1]: Jain works in the setting of wireless sensor networks (WSNs) in which “[s]ensor nodes (SNs) are typically deployed with the target of collecting a large amount of data through environmental monitoring” [Jain, section 1 first paragraph] and “SNs typically collect data over a time and deliver fused data to the cluster-head (CH) or directly to the base station on a regular basis” [Jain, section 1 third paragraph]. See [Jain, figure 1] for a depiction. The network maps to the “communication system” of the claim, the SNs map to the “data collectors” of the claim, and any one of the CHs maps to the “data aggregator” of the claim.)
obtaining an error limit for the data collected by the data collectors that is to be aggregated by the data aggregator; ([Jain, section 3.2]: In the system disclosed by Jain, “the BS will communicate its acceptable prediction threshold to all CHs and SNs as δ and cumulative threshold denoted as ε” [Jain, section 3.2 first bullet point]. These thresholds map to the “error limit” of the claim.)
obtaining a similarity graph for the data collectors; ([Jain, figure 1]: The “[f]irst [l]ayer” of the “three-layer network architecture” of Jain has nodes representing the SNs and these nodes are grouped into clusters [Jain, figure 1 “First Layer”]. This falls under the broadest reasonable interpretation of “a similarity graph for the data collectors” as recited by the claim, since each cluster of nodes is “similar” to other nodes in the same cluster.)
train a plurality of inference models ([Jain, sections 1, 3, and 4]: Jain discloses a “two-vector data-prediction model” which “predicts the sensor’s reading in the succeeding time slot at both SN and CH based on data stored on the vectors. When the next data reading is collected, each SN within the cluster compares its predicted data with the real sensed measurements. The SN will not relay the real sensor reading to the CH when the prediction error is lower than a predetermined threshold. When the CH does not receive a value from the SN, it uses the same data prediction algorithm to forecast the next data in the series” [section 1 paragraph beginning “This article”]. In other words, a “same instance of the data-prediction model will be employed both at the SNs and CHs” [Jain, section 3.1 second bullet]. The “goal is to reduce data transmission and enhance network lifespan by estimating future data based on prior sensed readings” [Jain, section 4 first paragraph], and Jain discloses algorithms for training this model (the process having an initialization stage [Jain, section 4.1 and algorithm 1] and model-building stage [Jain, section 4.2 and algorithm 2]). There is one model for each cluster, and each of these models maps to one of the “inference models” of the claim.)
grouping nodes of the similarity graph into groupings of the data collectors ([Jain, figure 1]: The clusters of Jain map to the “groupings” of the claim.)
wherein each of the plurality of inference models comprises identically trained copies of a machine learning model; ([Jain, sections 1 and 3 and figure 1]: As noted above, the model corresponding to a given cluster map to one of the “inference models” of the claim; each cluster includes several SNs and a CH [Jain, figure 1] and instances of the same data-prediction model are used within a cluster [Jain, sections 1 and 3]. In other words, the instances of the same data-prediction model that are employed by a given cluster map to the “identically trained copies of a machine learning model” of the claim.)
the data aggregator, [the model training device,] and the data collectors all being implemented using computing devices, ([Jain, sections 1 and 3]: The SNs and CHs of Jain are “implemented using computing devices” as recited by the claim since they are all capable of computation (e.g., computations involving making predictions using models, as explained above [Jain, sections 1 and 3]).)
initiating aggregation of the data collected by the data collectors that is to be aggregated by the data aggregator [using the model training device] by distributing at least one inference model from among the plurality of inference models to each of the data collectors and the data aggregator; ([Jain, sections 1, 3, and 4]: The examiner notes that the specification [specification, 0078] indicates that “initiating aggregation” refers to training models and distributing them. Jain discloses initializing and building models [Jain, sections 4.1-4.2], and, as noted above, it also discloses each SN and CH having an instance of a data-prediction model, with devices in the same cluster having instances of the same data-prediction model.)
obtaining the aggregated data ([Jain, figure 1]: As explained above, the CH aggregates data collected by the SNs in its cluster.) using the plurality of inference models trained [by the model training device] by: ([Jain, section 1]: As explained above, the predictions made by the two-vector data-prediction model are used in the data aggregation process [Jain, section 1 paragraph beginning with “This article”].) 
a reduced size data from each of the data collectors, wherein for each of the data collectors, the reduced size data is a difference between a full data set collected by a respective data collector of the data collectors and an inference of the full data set, ([Jain, sections 1 and 3]: Jain discloses a vector RDV_{SN} which stores “real sensed readings” of an SN [Jain, section 3.2 second bullet]. This vector maps to the “full data set” of the claim. Jain also discloses a vector FDV_{SN} of the forecasted sensor readings [Jain, section 3.2 second bullet], which maps to the “inference of the full data set” of the claim. Jain also discloses computing differences |p_{n+1} - r_{n+1}| between the predicted and real data [Jain, section 3.3 first paragraph]. These differences map to the “difference between the full data set… and [the] inference of the full data set” of the claim, i.e., to the “reduced size data” of the claim.)
and the inference of the full data set is generated by the at least one inference model distributed to the respective data collector of the data collectors; ([Jain, sections 1 and 3]: As explained above, the forecasted readings within each cluster are produced by the model assigned to that cluster [Jain, sections 1 and 3].)
reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using [the reduced size data received from each of the data collectors] and inferences generated by the at least one inference model distributed to the data aggregator ([Jain, section 3]: The vector FDV_{CH} constructed at each CH [Jain, section 3.2 second bullet] maps to the “representation” of the claim.)
without the data aggregator actually receiving the full data set from any of the data collectors; ([Jain, section 3]: As noted above, the vectors RDV_{SN} map to the “full data set” of the claim. The CH does not receive this full data set from any of the SHs in its cluster; it receives only those readings which deviate sufficiently from the predictions made by the models (i.e., only those r_n for which either |p_{n+1} - r_{n+1}| or |hat{p}_{n+1} - p_{n+1}| is larger than a given error limit) [Jain, section 3.3]. In other words, the “data aggregator [does not] actually receiv[e] the full data set from any of the data collectors”, as required by the claim.)
and using, by the data aggregator, the representation that is reconstructed as the aggregated data. ([Jain, section 3 and figure 1]: Jain discloses that that the BS “receives aggregated data from the CHs” [Jain, section 3 first paragraph; see also, figure 1]. This transmission of the CH’s data maps to the step of “using, by the data aggregator, the representation” as recited by the claim.)

Jain does not distinctly disclose a method of selecting devices on which the prediction models are trained based on the resource requirements of the training task, it does not distinctly disclose a method of constructing the clusters/groupings, and it does not distinctly disclose transmitting the differences between predicted and collected data. In other words, Jain might not distinctly disclose:
A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] 
by at least: identifying an edge value threshold based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, 
obtaining a model training device based on the quantity of computing resources,
[the data aggregator,] the model training device, [and the data collectors all being implemented using computing devices]
[initiating aggregation…] using the model training device
[the plurality of inference models] trained by the model training device,
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Daoud works in the setting of task scheduling for a heterogeneous distributed computing system (HeDCS) which is “represented by a set P of m processors” [Daoud, section 2 second paragraph]. Moreover, Jain in view of Daoud discloses: 
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] ([Daoud, section 2; Jain, section 1]: Daoud works with a “parallel application is represented by a directed acyclic graph, or DAG, defined by the tuple (T, E), where T is a set of n tasks” [Daoud, section 2 first paragraph] and associates to this a “n x m computation cost matrix W” where “[e]ach element w_{i, j} in W represents the estimated execution time of task t_i on processor p_j” [Daoud, section 2 second paragraph]. In the combination, the parallel application of Daoud is taken to be the task of training the prediction models for each cluster as disclosed by Jain [Jain, section 1], each task t_i in T within the parallel application being the task of training one of the prediction models associated with a given cluster. The computation cost matrix is the “quantity of computing resources” recited by the claim. This computation cost matrix falls under the broadest reasonable interpretation of “using the similarity graph” because the number n of rows in the matrix corresponds to the number of tasks, i.e., to the number of clusters in the WSN of Jain.)
obtaining a model training device based on the quantity of computing resources, ([Daoud, abstract and section 4]: Daoud discloses a “scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm” [Daoud, abstract]. The LDCP algorithm includes a “processor selection” phase [Daoud, section 4 first paragraph] during which tasks are assigned to processors [Daoud, section 4.2]. Any of the processors selected for task assignment can map to the “model training device” of the claim, the processor selection mapping to the “obtaining” step of the claim. See [Daoud, figure 3] for pseudocode of the LDCP algorithm.)
[the data aggregator,] the model training device, [and the data collectors all being implemented using computing devices] ([Daoud, section 2]: As noted above, the HeCDS of Daoud is “represented by a set P of m processors” [Daoud, section 2 second paragraph]. In other words, the “model training device” as mapped above is a processor and is “implemented using computing devices” as required by the claim.)
[initiating aggregation…] using the model training device ([Daoud, section 2; Jain, section 1]: As explained above, the parallel application of Daoud [Daoud, section 2 first paragraph] is taken to be the task of training the prediction models for each cluster as disclosed by Jain [Jain, section 1]. In other words, in the combination, the “initiating aggregation” step as mapped above is performed by the “model training device” as mapped above.)
[the plurality of inference models] trained by the model training device, ([Daoud, abstract; Jain, sections 1 and 4]: As noted above, in the combination, the prediction models of Jain are trained on processors in a HeDCS using the scheduling algorithm disclosed by Daoud.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method for training and using data prediction models for sensor networks as disclosed by Jain with the scheduling algorithm of Daoud because this scheduling algorithm “outperforms [competing] algorithms” and “provides a practical solution for scheduling parallel applications with high communication costs” [Daoud, abstract], thereby resulting in a more efficient system. 

Jain in view of Daoud might not distinctly disclose: 
A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] 
by at least: identifying an edge value threshold based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, 
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Chakraborty works in the setting of a “SwN that comprises of sensors” which communicate with “the base-station B” [Chakraborty, figure 1 caption]. Moreover, Jain in view of Daoud and Chakraborty discloses: 
by at least: identifying an edge value threshold ([Chakraborty, figure 2]: Chakraborty discloses “creat[ing] a similarity graph, G, of the sensors where there is an edge between a pair of sensors if the similarity is greater than the threshold” [Chakraborty, figure 2 caption]. This threshold maps to the “edge value threshold” of the claim. See [Chakraborty, section 4] for details.) based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, ([Jain, sections 1 and 3.2; Chakraborty, section 7.1]: Chakraborty discloses “vary[ing] the values of the threshold” and studies how varying the threshold affects the clustering of the graph [Chakraborty, section 7.1 first paragraph and table 1]. Each cluster in Jain shares a single prediction model, and the predictions made by the prediction model are compared against the prediction threshold δ (i.e., the “error limit” as mapped above) [Jain, section 1 paragraph beginning “This article” and section 3.2 first bullet point]. It would thus be obvious to a person of ordinary skill in the art, having both Jain and Chakraborty before them, to choose the “edge value threshold based on the error limit” as recited by the claim.) and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, ([Chakraborty, figure 2; Jain, figure 1]: As noted above, Chakraborty discloses creating a similarity graph based on the edge value threshold. In the combination, this similarity graph constructed based on the edge value threshold is used to form the clusters which appear in Jain [Jain, figure 1].)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the process for training and using data prediction models for sensor networks as disclosed by Jain in view of Daoud with clustering by means of a similarity graph as disclosed by Chakraborty, because Chakraborty’s “approach can yield significant battery life improvements within realistic error bounds” [Chakraborty, abstract], thereby resulting in a more efficient network. 

Jain in view of Daoud and Chakraborty might not distinctly disclose: 
A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Petousis is in the field of machine learning. In particular, it describes a “first computing entity” which “access[es] one or more streams of sensor data” and “generates predictions of predicted sensor values” [Petousis, 0019]. In the combination, the first computing entity of Petousis corresponds to a sensor in Jain, i.e., to one of the “data collectors” of the claim. Moreover, Jain in view of Daoud, Chakraborty, and Petousis discloses:
A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for ([Petousis, 0087]: Petousis discloses that the methods disclosed therein are implemented using a “computer-readable medium storing computer-readable instructions” where the “instructions are preferably executed by computer-executable components” (such as a “general or application specific processor”) and the “computer-readable medium can be stored on any suitable computer-readable media such as RAMs, ROMs, flash memory, EEPROMs, optical devices (CD or DVD), hard drives, floppy drives, or any suitable device” [Petousis, 0087]. A CD, for example, maps to the “non-transitory machine-readable medium having instructions stored therein” of the claim, the processor by which instructions are executed map to the “processor” of the claim.)
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] ([Petousis, 0019]: Petousis discloses that the first computing entity “comput[es] error values based on calculated differences between the actual sensor values and the predicted sensor values” and “transmit[s] the computed error values to a second computing entity” [Petousis, 0019]. In other words, the second computing entity corresponds to the CH of Jain, i.e., to the “data aggregator” of the claim, and the error values based on calculated differences correspond to the analogous differences in Jain, i.e., to the “reduced size data” of the claim as mapped above. In the combination, the method of Jain could be modified in view of Petousis so that, whenever an SN would transfer data to the CH, instead of transferring the reading itself, it would instead transfer a difference as described in Petousis.)
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors ([Petousis, 0019]: Petousis discloses that the “second computing entity” executes a “second instance of the trained machine learning model” in order to “reconstruct[…] estimates of the actual sensor values based on a reconstruction computation with the parallel predicted sensor values and the error values from the first computing entity” [Petousis, 0019]. As noted above, the “error values from the first computing entity” correspond to the “reduced size data” of the claim. Since the reconstruction is based on these error values received from the first computing entity, the reconstruction both “us[es] the reduced size data received from each of the data collectors” and occurs “in response to receiving the reduced size data” as recited by the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the wireless sensor network of Jain in view of Daoud and Chakraborty with the techniques for handling sensor data described in Petouis because they provide “intelligent compression techniques” for “minimiz[ing] bandwidth usage when transferring data streams” [Petouis, 0029], thereby resulting in a more efficient and effective system.

Claims 14-16 inherit limitations from claim 13 and recite additional limitations which are substantially similar to those recited by claims 2-4, respectively, so they are rejected by the same rationale. 

Claim 17
Jain discloses: 
managing data collection in a distributed environment where data collected by data collectors that are operably connected to a data aggregator via a communication system is to be aggregated by the data aggregator into aggregated data, comprising: ([Jain, section 1 and figure 1]: Jain works in the setting of wireless sensor networks (WSNs) in which “[s]ensor nodes (SNs) are typically deployed with the target of collecting a large amount of data through environmental monitoring” [Jain, section 1 first paragraph] and “SNs typically collect data over a time and deliver fused data to the cluster-head (CH) or directly to the base station on a regular basis” [Jain, section 1 third paragraph]. See [Jain, figure 1] for a depiction. The network maps to the “communication system” of the claim, the SNs map to the “data collectors” of the claim, and any one of the CHs maps to the “data aggregator” of the claim.)
obtaining an error limit for the data collected by the data collectors that is to be aggregated by the data aggregator; ([Jain, section 3.2]: In the system disclosed by Jain, “the BS will communicate its acceptable prediction threshold to all CHs and SNs as δ and cumulative threshold denoted as ε” [Jain, section 3.2 first bullet point]. These thresholds map to the “error limit” of the claim.)
obtaining a similarity graph for the data collectors; ([Jain, figure 1]: The “[f]irst [l]ayer” of the “three-layer network architecture” of Jain has nodes representing the SNs and these nodes are grouped into clusters [Jain, figure 1 “First Layer”]. This falls under the broadest reasonable interpretation of “a similarity graph for the data collectors” as recited by the claim, since each cluster of nodes is “similar” to other nodes in the same cluster.)
train a plurality of inference models ([Jain, sections 1, 3, and 4]: Jain discloses a “two-vector data-prediction model” which “predicts the sensor’s reading in the succeeding time slot at both SN and CH based on data stored on the vectors. When the next data reading is collected, each SN within the cluster compares its predicted data with the real sensed measurements. The SN will not relay the real sensor reading to the CH when the prediction error is lower than a predetermined threshold. When the CH does not receive a value from the SN, it uses the same data prediction algorithm to forecast the next data in the series” [section 1 paragraph beginning “This article”]. In other words, a “same instance of the data-prediction model will be employed both at the SNs and CHs” [Jain, section 3.1 second bullet]. The “goal is to reduce data transmission and enhance network lifespan by estimating future data based on prior sensed readings” [Jain, section 4 first paragraph], and Jain discloses algorithms for training this model (the process having an initialization stage [Jain, section 4.1 and algorithm 1] and model-building stage [Jain, section 4.2 and algorithm 2]). There is one model for each cluster, and each of these models maps to one of the “inference models” of the claim.)
grouping nodes of the similarity graph into groupings of the data collectors ([Jain, figure 1]: The clusters of Jain map to the “groupings” of the claim.)
wherein each of the plurality of inference models comprises identically trained copies of a machine learning model; ([Jain, sections 1 and 3 and figure 1]: As noted above, the model corresponding to a given cluster map to one of the “inference models” of the claim; each cluster includes several SNs and a CH [Jain, figure 1] and instances of the same data-prediction model are used within a cluster [Jain, sections 1 and 3]. In other words, the instances of the same data-prediction model that are employed by a given cluster map to the “identically trained copies of a machine learning model” of the claim.)
the data aggregator, [the model training device,] and the data collectors all being implemented using computing devices, ([Jain, sections 1 and 3]: The SNs and CHs of Jain are “implemented using computing devices” as recited by the claim since they are all capable of computation (e.g., computations involving making predictions using models, as explained above [Jain, sections 1 and 3]).)
initiating aggregation of the data collected by the data collectors that is to be aggregated by the data aggregator [using the model training device] by distributing at least one inference model from among the plurality of inference models to each of the data collectors and the data aggregator; ([Jain, sections 1, 3, and 4]: The examiner notes that the specification [specification, 0078] indicates that “initiating aggregation” refers to training models and distributing them. Jain discloses initializing and building models [Jain, sections 4.1-4.2], and, as noted above, it also discloses each SN and CH having an instance of a data-prediction model, with devices in the same cluster having instances of the same data-prediction model.)
obtaining the aggregated data ([Jain, figure 1]: As explained above, the CH aggregates data collected by the SNs in its cluster.) using the plurality of inference models trained [by the model training device] by: ([Jain, section 1]: As explained above, the predictions made by the two-vector data-prediction model are used in the data aggregation process [Jain, section 1 paragraph beginning with “This article”].) 
a reduced size data from each of the data collectors, wherein for each of the data collectors, the reduced size data is a difference between a full data set collected by a respective data collector of the data collectors and an inference of the full data set, ([Jain, sections 1 and 3]: Jain discloses a vector RDV_{SN} which stores “real sensed readings” of an SN [Jain, section 3.2 second bullet]. This vector maps to the “full data set” of the claim. Jain also discloses a vector FDV_{SN} of the forecasted sensor readings [Jain, section 3.2 second bullet], which maps to the “inference of the full data set” of the claim. Jain also discloses computing differences |p_{n+1} - r_{n+1}| between the predicted and real data [Jain, section 3.3 first paragraph]. These differences map to the “difference between the full data set… and [the] inference of the full data set” of the claim, i.e., to the “reduced size data” of the claim.)
and the inference of the full data set is generated by the at least one inference model distributed to the respective data collector of the data collectors; ([Jain, sections 1 and 3]: As explained above, the forecasted readings within each cluster are produced by the model assigned to that cluster [Jain, sections 1 and 3].)
reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using [the reduced size data received from each of the data collectors] and inferences generated by the at least one inference model distributed to the data aggregator ([Jain, section 3]: The vector FDV_{CH} constructed at each CH [Jain, section 3.2 second bullet] maps to the “representation” of the claim.)
without the data aggregator actually receiving the full data set from any of the data collectors; ([Jain, section 3]: As noted above, the vectors RDV_{SN} map to the “full data set” of the claim. The CH does not receive this full data set from any of the SHs in its cluster; it receives only those readings which deviate sufficiently from the predictions made by the models (i.e., only those r_n for which either |p_{n+1} - r_{n+1}| or |hat{p}_{n+1} - p_{n+1}| is larger than a given error limit) [Jain, section 3.3]. In other words, the “data aggregator [does not] actually receiv[e] the full data set from any of the data collectors”, as required by the claim.)
and using, by the data aggregator, the representation that is reconstructed as the aggregated data. ([Jain, section 3 and figure 1]: Jain discloses that that the BS “receives aggregated data from the CHs” [Jain, section 3 first paragraph; see also, figure 1]. This transmission of the CH’s data maps to the step of “using, by the data aggregator, the representation” as recited by the claim.)

Jain does not distinctly disclose a method of selecting devices on which the prediction models are trained based on the resource requirements of the training task, it does not distinctly disclose a method of constructing the clusters/groupings, and it does not distinctly disclose transmitting the differences between predicted and collected data. In other words, Jain might not distinctly disclose:
A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for 
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] 
by at least: identifying an edge value threshold based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, 
obtaining a model training device based on the quantity of computing resources,
[the data aggregator,] the model training device, [and the data collectors all being implemented using computing devices]
[initiating aggregation…] using the model training device
[the plurality of inference models] trained by the model training device,
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Daoud works in the setting of task scheduling for a heterogeneous distributed computing system (HeDCS) which is “represented by a set P of m processors” [Daoud, section 2 second paragraph]. Moreover, Jain in view of Daoud discloses: 
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] ([Daoud, section 2; Jain, section 1]: Daoud works with a “parallel application is represented by a directed acyclic graph, or DAG, defined by the tuple (T, E), where T is a set of n tasks” [Daoud, section 2 first paragraph] and associates to this a “n x m computation cost matrix W” where “[e]ach element w_{i, j} in W represents the estimated execution time of task t_i on processor p_j” [Daoud, section 2 second paragraph]. In the combination, the parallel application of Daoud is taken to be the task of training the prediction models for each cluster as disclosed by Jain [Jain, section 1], each task t_i in T within the parallel application being the task of training one of the prediction models associated with a given cluster. The computation cost matrix is the “quantity of computing resources” recited by the claim. This computation cost matrix falls under the broadest reasonable interpretation of “using the similarity graph” because the number n of rows in the matrix corresponds to the number of tasks, i.e., to the number of clusters in the WSN of Jain.)
obtaining a model training device based on the quantity of computing resources, ([Daoud, abstract and section 4]: Daoud discloses a “scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm” [Daoud, abstract]. The LDCP algorithm includes a “processor selection” phase [Daoud, section 4 first paragraph] during which tasks are assigned to processors [Daoud, section 4.2]. Any of the processors selected for task assignment can map to the “model training device” of the claim, the processor selection mapping to the “obtaining” step of the claim. See [Daoud, figure 3] for pseudocode of the LDCP algorithm.)
[the data aggregator,] the model training device, [and the data collectors all being implemented using computing devices] ([Daoud, section 2]: As noted above, the HeCDS of Daoud is “represented by a set P of m processors” [Daoud, section 2 second paragraph]. In other words, the “model training device” as mapped above is a processor and is “implemented using computing devices” as required by the claim.)
[initiating aggregation…] using the model training device ([Daoud, section 2; Jain, section 1]: As explained above, the parallel application of Daoud [Daoud, section 2 first paragraph] is taken to be the task of training the prediction models for each cluster as disclosed by Jain [Jain, section 1]. In other words, in the combination, the “initiating aggregation” step as mapped above is performed by the “model training device” as mapped above.)
[the plurality of inference models] trained by the model training device, ([Daoud, abstract; Jain, sections 1 and 4]: As noted above, in the combination, the prediction models of Jain are trained on processors in a HeDCS using the scheduling algorithm disclosed by Daoud.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method for training and using data prediction models for sensor networks as disclosed by Jain with the scheduling algorithm of Daoud because this scheduling algorithm “outperforms [competing] algorithms” and “provides a practical solution for scheduling parallel applications with high communication costs” [Daoud, abstract], thereby resulting in a more efficient system. 

Jain in view of Daoud might not distinctly disclose: 
A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for 
identifying, using the similarity graph, a quantity of computing resources required to [train a plurality of inference models] 
by at least: identifying an edge value threshold based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, 
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Chakraborty works in the setting of a “SwN that comprises of sensors” which communicate with “the base-station B” [Chakraborty, figure 1 caption]. Moreover, Jain in view of Daoud and Chakraborty discloses: 
by at least: identifying an edge value threshold ([Chakraborty, figure 2]: Chakraborty discloses “creat[ing] a similarity graph, G, of the sensors where there is an edge between a pair of sensors if the similarity is greater than the threshold” [Chakraborty, figure 2 caption]. This threshold maps to the “edge value threshold” of the claim. See [Chakraborty, section 4] for details.) based on the error limit for the data collected by the data collectors that is to be aggregated by the data aggregator, ([Jain, sections 1 and 3.2; Chakraborty, section 7.1]: Chakraborty discloses “vary[ing] the values of the threshold” and studies how varying the threshold affects the clustering of the graph [Chakraborty, section 7.1 first paragraph and table 1]. Each cluster in Jain shares a single prediction model, and the predictions made by the prediction model are compared against the prediction threshold δ (i.e., the “error limit” as mapped above) [Jain, section 1 paragraph beginning “This article” and section 3.2 first bullet point]. It would thus be obvious to a person of ordinary skill in the art, having both Jain and Chakraborty before them, to choose the “edge value threshold based on the error limit” as recited by the claim.) and [grouping nodes of the similarity graph into groupings of the data collectors] based on the edge value threshold, ([Chakraborty, figure 2; Jain, figure 1]: As noted above, Chakraborty discloses creating a similarity graph based on the edge value threshold. In the combination, this similarity graph constructed based on the edge value threshold is used to form the clusters which appear in Jain [Jain, figure 1].)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the process for training and using data prediction models for sensor networks as disclosed by Jain in view of Daoud with clustering by means of a similarity graph as disclosed by Chakraborty, because Chakraborty’s “approach can yield significant battery life improvements within realistic error bounds” [Chakraborty, abstract], thereby resulting in a more efficient network. 

Jain in view of Daoud and Chakraborty might not distinctly disclose: 
A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for 
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] 
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors 

Petousis is in the field of machine learning. In particular, it describes a “first computing entity” which “access[es] one or more streams of sensor data” and “generates predictions of predicted sensor values” [Petousis, 0019]. In the combination, the first computing entity of Petousis corresponds to a sensor in Jain, i.e., to one of the “data collectors” of the claim. Moreover, Jain in view of Daoud, Chakraborty, and Petousis discloses:
A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for ([Petousis, 0087]: Petousis discloses that the methods disclosed therein are implemented using a “computer-readable medium storing computer-readable instructions” where the “instructions are preferably executed by computer-executable components” (such as a “general or application specific processor”) and the “computer-readable medium can be stored on any suitable computer-readable media such as RAMs, ROMs, flash memory, EEPROMs, optical devices (CD or DVD), hard drives, floppy drives, or any suitable device” [Petousis, 0087]. The processor maps to the “processor” of the claim, and a RAM, for example, maps to the “memory coupled to the processor to store instructions” of the claim.)
receiving, by the data aggregator, [a reduced size data from each of the data collectors,] ([Petousis, 0019]: Petousis discloses that the first computing entity “comput[es] error values based on calculated differences between the actual sensor values and the predicted sensor values” and “transmit[s] the computed error values to a second computing entity” [Petousis, 0019]. In other words, the second computing entity corresponds to the CH of Jain, i.e., to the “data aggregator” of the claim, and the error values based on calculated differences correspond to the analogous differences in Jain, i.e., to the “reduced size data” of the claim as mapped above. In the combination, the method of Jain could be modified in view of Petousis so that, whenever an SN would transfer data to the CH, instead of transferring the reading itself, it would instead transfer a difference as described in Petousis.)
and in response to receiving the reduced size data from each of the data collectors: [reconstructing, by the data aggregator, a representation of the data collected by the data collectors that is to be aggregated by the data aggregator into the aggregated data using] the reduced size data received from each of the data collectors ([Petousis, 0019]: Petousis discloses that the “second computing entity” executes a “second instance of the trained machine learning model” in order to “reconstruct[…] estimates of the actual sensor values based on a reconstruction computation with the parallel predicted sensor values and the error values from the first computing entity” [Petousis, 0019]. As noted above, the “error values from the first computing entity” correspond to the “reduced size data” of the claim. Since the reconstruction is based on these error values received from the first computing entity, the reconstruction both “us[es] the reduced size data received from each of the data collectors” and occurs “in response to receiving the reduced size data” as recited by the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the wireless sensor network of Jain in view of Daoud and Chakraborty with the techniques for handling sensor data described in Petouis because they provide “intelligent compression techniques” for “minimiz[ing] bandwidth usage when transferring data streams” [Petouis, 0029], thereby resulting in a more efficient and effective system.

Claims 18-20 inherit limitations from claim 17 and recite additional limitations which are substantially similar to those recited by claims 2-4, respectively, so they are rejected by the same rationale. 

Claim(s) 11 is/are rejected under 35 USC 103 as being unpatentable over Jain in view of Daoud, Chakraborty, and Petousis, further in view of Ali ALAKEEL (A Guide to Dynamic Load Balancing in Distributed Computer Systems, published 2010-06; hereafter, “Alakeel”).

Claim 11
Jain in view of Daoud, Chakraborty, and Petousis discloses the element(s) of the parent claim(s). It does not distinctly disclose:  
[The method of claim 1, wherein the model training device is obtained by] transferring workloads hosted by the model training device to other devices until a quantity of free computing resources of the model training device exceeds the quantity of computing resources required to train the plurality of inference models. 

Alakeel is in the field of distributed computing. Moreover, Jain in view of Daoud, Chakraborty, Petousis, and Alakeel discloses:
[The method of claim 1, wherein the model training device is obtained by] transferring workloads hosted by the model training device to other devices until a quantity of free computing resources of the model training device exceeds the quantity of computing resources required to train the plurality of inference models. ([Alakeel, abstract, figure 2 and sections 2.1.2.2-3]: Alakeel discusses dynamic load balancing algorithms for “redistributing the work load among nodes of the distributed system” [Alakeel, abstract]. Whenever a node receives an “[i]ncoming job” [Alakeel, figure 2], it uses a combination of a transfer strategy and a location strategy to determine whether to execute the job locally or to send it to another node [Alakeel, figure 2 and sections 2.1.2.2-3]. The transfer strategy can use a variety of criteria to mark jobs as candidates for transfer (e.g., “based on their future resource requirements” or based on “estimat[ing] a job’s execution time in the near future” [Alakeel, section 2.1.2.2]). The location strategy can use a variety of strategies to choose a node to which to transfer a task to (e.g., “select[ing] a remote node randomly and transfers the job there for execution”, this process iterating until “a limit on the number of hops” is reached so as to “avoid having this job ponged among nodes without getting serviced” [Alakeel, section 2.1.2.3]). In the combination, the scheduling algorithm of Daoud is augmented with a load balancing algorithm as in Alakeel.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method for training and using data prediction models for sensor networks as disclosed by Jain in view of Daoud, Chakraborty, and Petousis with a load balancing algorithm as in Alakeel because load balancing “creates faster job service” [Alakeel, section 1], thereby resulting in a more efficient system. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Shishir AGRAWAL whose telephone number is +1 703-756-1183. The examiner can normally be reached Monday through Thursday, 08:30-14:30 Pacific Time.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey SHMATOV can be reached on +1 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is +1 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at +1 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call +1 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.A./Examiner, Art Unit 2123                                                                                                                                                                                      
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
SYSTEM AND METHOD FOR CAPACITY PLANNING FOR DATA AGGREGATION USING SIMILARITY GRAPHS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

SYSTEM AND METHOD FOR CAPACITY PLANNING FOR DATA AGGREGATION USING SIMILARITY GRAPHS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email