Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 07/28/2025 has been entered.
Response to Arguments
Applicant’s argument filed 07/28/2025 have been fully considered but they are not persuasive.
Applicant’s Argument: On page 14 and 15 of Applicant’s response, applicant states “Applicant’s claim is directed to an improvement in computer functionality itself—specifically, the training efficiency and predictive accuracy of machine- learning models in distributed federated learning environments.
Applicant submits that claim 1 is integrated into a practical application. Claim 1 is not merely using a computer as a tool to perform abstract calculations. Instead, claim 1 operates within a specific computing architecture (a federated learning system), applies influence-based filtering to optimize data flow and model training, and results in a reduced training dataset that improves model accuracy and efficiency.”
Examiner’s Response: Applicant’s argument is not persuasive. The amended claims do not overcome the determination that the claim recites an abstract idea under subject matter eligibility analysis in step 2A prong 1. The claim as a whole still recites an abstract idea of a mental process that can be perform in the human mind. The process of determining a score to rank a plurality of sources and removing the source with the lowest score may be similar to the determination of where to source the supplies from for a supermarket and the determination may be based on the reputation of the third-party supplier. A supermarket may want to continue to conduct business with suppliers with high reputation and the supermarket may also decide to reduce the communication with suppliers with low reputation scores in sourcing supplies from them. Therefore, the claimed invention as a whole is directed to an abstract idea and it is subject matter ineligible.
McRO, Inc. v. Bandai Namco Games Am. Inc., 837 F.3d 1299 (Fed. Cir. 2016) recited claims that are directed to clear improvements to computer-related technology. The claimed invention is directed to a process of selecting training data based on a score and it is directed to an abstract idea. Therefore, the claimed invention is not analogous to the claims in McRO, Inc. v. Bandai Namco Games Am. Inc., 837 F.3d 1299 (Fed. Cir. 2016).
Applicant’s Argument: On page 17 and 18 of Applicant’s response, applicant states “Applicant submits that Mars’ “seed samples” fails to disclose the claimed “validation dataset having a plurality of annotated datapoints configured to test an accuracy of the machine- learning model,” as recited in amended claim 1.”
Examiner’s Response: Applicant’s argument is not persuasive. Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
The Li reference (pg. 8, col. 2, par. 4) has been included to teach a ground truth dataset that is used to evaluate the model provided in the framework.
Applicant’s Argument: On page 18 of Applicant’s response, applicant states “Mars also fails to disclose the claimed, “wherein a respective influential score of a data source indicates an influence of the data source on the machine-learning model accurately predicting each annotated datapoint of the plurality of annotated datapoints of the validation dataset,” as recited in amended independent claim 1. (Emphasis added). At best, Mars discloses “level of quality preferably relates to an accuracy of labels generated by the external training data source for each labeled training data sample provided thereby.””
Examiner’s Response: Applicant’s argument is not persuasive. Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Li (pg. 3, col. 2, par. 2-5) additionally teaches determining a source reliability metric for a plurality of data sources and the source reliability metric determines how accurate the information from a source is compared to the ground truth data.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-9, and 11-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1:
Subject Matter Eligibility Analysis Step 1:
Claim 1 recites “A method, comprising” and is thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
“testing,” (a mental process that can be performed in the human mind with the aid of pen and paper)
“based on the influential score of each of the plurality of data sources, removing, ” (a mental process, i.e. judgement; selecting a data source with low influential score to be removed)
Claim 1 therefore recites an abstract idea.
Subject Matter Eligibility Analysis Step 2A Prong 2:
"training, at a central server of a federated learning system, a machine-learning model utilizing a training dataset including data received from a plurality of data sources connected to the central server, ” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
"” (merely specifies a particular technological environment in which the abstract idea is to take place, ie. a field of use, and thus does not integrate the abstract idea into a practical application nor cannot provide significantly more than the abstract idea itself - see MPEP 2106.05(h))
“testing, at the central server, the accuracy of the machine-learning model against the validation dataset ” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
“retraining, at the central server, the machine-learning model utilizing a reduced training dataset including the data from the subset of the plurality of data sources, wherein the reduced training dataset improves the accuracy of the machine-learning model in predicting the plurality of annotated datapoints of the validation dataset, while utilizing less training data” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
The additional elements as disclosed above alone or in combination do not integrate the judicial exception into practical application as they are mere insignificant extra solution activity in combination of generic computer functions being implemented with generic computer elements in a high level of generality to perform the disclosed abstract idea above. Therefore, Claim 1 is directed to the abstract idea.
Subject Matter Eligibility Analysis Step 2B:
"training, at a central server of a federated learning system, a machine-learning model utilizing a training dataset including data received from a plurality of data sources connected to the central server, ” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
"” (merely specifies a particular technological environment in which the abstract idea is to take place, ie. a field of use, and thus does not integrate the abstract idea into a practical application nor cannot provide significantly more than the abstract idea itself - see MPEP 2106.05(h))
“testing, at the central server, the accuracy of the machine-learning model against the validation dataset ” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
“retraining, at the central server, the machine-learning model utilizing a reduced training dataset including the data from the subset of the plurality of data sources, wherein the reduced training dataset improves the accuracy of the machine-learning model in predicting the plurality of annotated datapoints of the validation dataset, while utilizing less training dat” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
The additional elements as disclosed above alone or in combination do not recite significantly more than the abstract idea itself as they are mere insignificant extra solution activity in combination of generic computer functions being implemented with generic computer elements in a high level of generality to perform the disclosed abstract idea above. Therefore, Claim 1 is subject-matter ineligible.
Regarding Claim 12:
The claim recites a system (“A computer program product, comprising”) that performs the method as described in claim 1. Therefore, claim 12 is rejected for the same reasons as disclosed for claim 1. The limitations for additional elements of claim 12 are analyzed below.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Please see Step 2A Prong 1 analysis of claim 1
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B:
“a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code executable by a processor to perform operations comprising” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
Regarding Claims 2 and 13:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“wherein the testing comprises evaluating the respective data of the data source against each annotated datapoint within the validation dataset” (a mental process, i.e. evaluation)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B: None
Regarding Claims 3 and 14:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“wherein the testing comprises generating rankings of the plurality of data sources for each annotated datapoint” (a mental process, i.e. judgement)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B: None
Regarding Claims 4 and 15:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“wherein the testing comprises aggregating the rankings of the plurality of data sources across the plurality of annotated datapoints of the validation dataset” (a mental process, judgment)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B:
Regarding Claims 5 and 16:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“wherein the testing comprises computing the accuracy of the machine-learning model trained with the data source in predicting the plurality of annotated datapoints utilizing an influence function comprising a loss function component, a metrics of the data source, and a gradient of the loss function component, wherein the loss function component and the gradient of the loss function component is computed at the central server and wherein a result of the metrics of the data source is provided to the central server from a corresponding data source” (a mathematical calculation, see par. 57 in the Specification)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B: None
Regarding Claims 6 and 17:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“further comprising constructing a bipartite graph comprising the plurality of annotated datapoints and the plurality of data sources by generating weighted edges between annotated datapoints and data sources based upon the influence of the data source” (a mental process with the aid of pen and paper, i.e. evaluation)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B: None
Regarding Claims 7 and 18:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“further comprising selecting the subset of the plurality of data sources based upon a coverage budget provided by a user, wherein the coverage budget is selected from the group consisting of a first constraint on a number of the plurality of data sources and a second constraint on a number of the plurality of annotated datapoints” (a mental process, i.e. judgement)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B: None
Regarding Claims 8 and 19:
Subject Matter Eligibility Analysis Step 2A Prong 1:
“further comprising selecting a least number of data sources that covers the validation dataset” (a mental process, i.e. judgement)
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B: None
Regarding Claims 9 and 20:
Subject Matter Eligibility Analysis Step 2A Prong 1: None
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B:
“wherein the respective data comprises a derivative of local data stored at the data source to preserve a privacy of the local data stored at the data source” (merely specifies a particular technological environment in which the abstract idea is to take place, ie. a field of use, and thus does not integrate the abstract idea into a practical application nor cannot provide significantly more than the abstract idea itself - see MPEP 2106.05(h))
Regarding Claim 11:
The claim recites a system (“An apparatus, comprising”) that performs the method as described in claim 1. Therefore, claim 11 is rejected for the same reasons as disclosed for claim 1. The limitations for additional elements of claim 11 are analyzed below.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Please see Step 2A Prong 1 analysis of claim 1
Subject Matter Eligibility Analysis Step 2A Prong 2 & 2B:
“at least one processor” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
“a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor to cause the at least one processor to perform operations comprising” (mere instructions to apply the exception using a generic computer component - see MPEP 2106.05(f))
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 7, 11-15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Mars (US10296848B1) in view of Li, “Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation”.
Regarding claim 1, Mars teaches:
“A method, comprising: training, at a central server of a federated learning system, a machine-learning model utilizing a training dataset including data received from a plurality of data sources connected to the central server, wherein the central server includes a validation dataset having a plurality of annotated datapoints ”([col. 2, lines 55-61;col. 5, lines 30-53; col. 10, lines 6-19; col. 11, lines 8-26; col. 13, lines 32-62, Figure 2], The ML management console (central server) collects training data from a plurality of external sources. The ML management console comprises a collection of seed samples (validation dataset) that can be used as input data to retrieve a plurality of labeled training samples from external data sources. After the console collects and process the ML training data, the training data is deployed to a ML model for execution. Slot identification engine may generate slot labels for the user input queries and slot identification engine may assign multiple labels to the user input. The seed samples are a type of user input that is received by the model and the user query may have a plurality of slot labels (plurality of annotated datapoints) assigned it the input.)
“testing, at the central server, the accuracy of the machine-learning model ” ([col. 9, lines 39-45; col. 11, lines 50-53; col. 12, lines 36-58; col. 13, lines 53-67; col. 14, lines 1-5], The training data may have metadata that identifies which external data source it originated from. The fit score (influential score) for each of the training data samples generally represents how well a given training data samples fits the machine learning model or one or more of the seed training data samples. The system may test the performance of the model and measure one or more operational metrics of the model. When the training data sample is poor or bad, the system may re-evaluate the training data sample set by calculating the fit score for the training data. The model may be a classification model that outputs labels for the input data.)
“based on the influential score of each of the plurality of data sources, removing, at the central server, at least one data source of the plurality of data sources from contributing to the training dataset, wherein the at least one data source includes less influence, relative to a subset of the plurality of data sources, on the machine-learning model accurately predicting each annotated datapoint ” ([col. 12, lines 4-58; col. 13, lines 5-31], A threshold may be set for each of the external training data sources to limit the amount of data collected from the data source. After the fit score has been determine for the training data samples, the system may remove training data samples that have a fit score below a certain threshold. A training data source that has a low level of quality may be requested to stop transmitting machine learning training data compared to training data sources that have a higher level of quality.)
“retraining, at the central server, the machine-learning model utilizing a reduced training dataset including the data from the subset of the plurality of data sources, wherein the reduced training dataset improves the accuracy of the machine-learning model in predicting the plurality of annotated datapoints ” ([col. 9, lines 39-45; col. 13, lines 32-67; col. 14, lines 1-5], After a subset of the data is determined from the external data sources, the system combines all the data into a training set and loads the training data samples into the ML model for execution. The system continuously received data samples to update and execute the ML model. The system tests the performance of the machine learning model based on one or more operational metrics to measure the improvement with the updated training data samples. The model may be a classification model that outputs labels for the input data.)
Mars does not explicitly disclose an implementation of “a validation dataset having a plurality of annotated datapoints configured to test an accuracy of the machine-learning model”, and “testing ... the accuracy of the machine-learning model against the validation dataset”. However, Li discloses in the same field of endeavor:
“training, at a central server of a federated learning system, a machine-learning model utilizing a training dataset including data received from a plurality of data sources connected to the central server, wherein the central server includes a validation dataset having a plurality of annotated datapoints configured to test an accuracy of the machine-learning model” ([pg. 2, col. 1, par. 3; pg. 3, Section 2.2, par. 1-5; pg. 8, Section 3.2.1, par. 2-5; pg. 3, Section 2.1, Table 1 & 2; pg. 4, Section 2.2, Algorithm 1], The framework describes an optimization method for determining source reliability for a plurality of data sources. Weather forecast data is collected from 3 different sources and it is validated against the ground truth dataset (validation dataset). The CRH framework is a machine learning model that consist of an optimization process to minimize an objective function. Algorithm 1 shows the framework of performing multiple iterations to train the model until a convergence criterion (accuracy of the machine learning model) is satisfied.)
“testing, at the central server, the accuracy of the machine-learning model against the validation dataset to determine an influential score for each of the plurality of data sources based upon respective data from each of the plurality of data sources included in the training dataset, wherein a respective influential score of a data source indicates an influence of the data source on the machine-learning model accurately ” ([pg. 2, col. 1, par. 3; pg. 3, Section 2.2, par. 1-5; pg. 8, Section 3.2.1, par. 2-5; pg. 3, Section 2.1, Table 1 & 2], The framework consists of minimizing an objective function that contains weights of multi-source input that reflects the reliability degree (influential score) of the sources. The source reliability is validated against a ground truth dataset. The model consists of a loss function that measures the deviation of the source data from the truth. The model performs an optimization process based on a convergence criterion to minimize the weighted deviation from the truths to the multi-source input.)
“... wherein the at least one data source includes less influence, relative to a subset of the plurality of data sources, on the machine-learning model accurately ” ([pg. 2, col. 1, par. 3; pg. 3, Section 2.2, par. 1-5; pg. 8, Section 3.2.1, par. 2-5; pg. 3, Section 2.1, Table 1 & 2], Observations made by an unreliable source will be determined to have a lower source weight. A reliable source consists of a high source weight value and provides closer observations to the ground truth. The model determines the deviation between the truth and the observations.)
“... training dataset improves the accuracy of the machine-learning model in ” ([pg. 2, col. 1, par. 3; pg. 3, Section 2.2, par. 1-5; pg. 8, Section 3.2.1, par. 2-5; pg. 3, Section 2.1, Table 1 & 2], Observations made by an unreliable source will be determined to have a lower source weight. A reliable source consists of a high source weight value and provides closer observations to the ground truth. The model determines the deviation between the truth and the observations.)
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of “a validation dataset having a plurality of annotated datapoints configured to test an accuracy of the machine-learning model”, and “testing ... the accuracy of the machine-learning model against the validation dataset” from Li into the teaching of Mars. The source reliability determination from Li can be implemented into Mars to generate a score for each data source and ranking the data sources based on the computed score. Doing so can train a ML model to determine an accurate estimation of source reliability by comparing observations to true information (Li, abstract).
Regarding claim 12, Mars teaches:
Claim 12 recites a system (“A computer program product, comprising”) that performs the same process as described in Claim 1. Therefore claim 12 is rejected under the same reasons mention for claim 1. However, claim 12 has additional limitations and the claim elements are addressed below:
“a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code executable by a processor to perform operations comprising” ([col. 14, lines 6-20], A processor executes instructions (program code) from a computer-readable medium to perform the methods described by the reference.)
Regarding claims 2 and 13, Mars teaches:
“wherein the testing comprises evaluating the respective data of the data source against each annotated datapoint within the validation dataset”([col. 12, lines 36-58], The system processes the training data to determine a fit score (influential score) to rank each of the training data samples from external sources. The fit score describes how well a training data sample matches the seed samples (validation set) of a training request and the overall data quality in representing the ML model task.)
Regarding claims 3 and 14, Mars teaches:
“wherein the testing comprises generating rankings of the plurality of data sources for each annotated datapoint” ([col. 11, lines 35-44; col. 12, lines 36-58], The system uses the calculated fit score to generate a ranking for each of the data samples that comes from the plurality of data sources. The ranking represents how valuable a particular data source is for training the ML model. The training data samples from various external data source can be stored into distinct datastores for specific processing of the set of data.)
Regarding claims 4 and 15, Mars teaches:
“wherein the testing comprises aggregating the rankings of the plurality of data sources across the plurality of annotated datapoints of the validation dataset” ([col. 11, lines 35-44; col. 13, lines 1-4], The training data samples from various external data source can be stored into distinct datastores for specific processing of the set of data. The fit score may be calculated for a set of training samples that originated from the same data source. The system lists (aggregate) the rank based on the fit score in descending or ascending order.)
Regarding claims 7 and 18, Mars teaches:
“further comprising selecting the subset of the plurality of data sources based upon a coverage budget provided by a user, wherein the coverage budget is selected from a group consisting of a first constraint on a number of the plurality of data sources and a second constraint on a number of the plurality of annotated datapoints”([col. 10, lines 17-27; col. 12, lines 4-35], A user interface is provided where the administrator (user) is able to provide a specific number (first constraint) of external training data sources from a list of external data sources. It is implied that the user may have prior knowledge to gather data from known data sources that can highly benefit the ML model training and limit the data gather from a selected list of external data sources. In the user interface, the administrator may also provide an input value to define the threshold for limiting the number of training data samples (second constraint) receive from the external data sources.)
Regarding claim 11, Mars teaches:
Claim 11 recites a system (“An apparatus, comprising”) that performs the same process as described in Claim 1. Therefore claim 11 is rejected under the same reasons mention for claim 1. However, claim 11 has additional limitations and the claim elements are addressed below:
“at least one processor” ([col. 14, lines 6-20], A processor executes instructions (program code) from a computer-readable medium to perform the methods described by the reference.)
“a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor to cause the at least one processor to perform operations comprising” ([col. 14, lines 6-20], A processor executes instructions (program code) from a computer-readable medium to perform the methods described by the reference.)
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Mars (US10296848B1) in view of Li, “Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation” and Hall (US20230162049A1).
Regarding claims 5 and 16, Mars in view of Li teaches:
“wherein the testing comprises computing the accuracy of the machine-learning model trained with the data source in predicting the plurality of annotated datapoints utilizing an influence function comprising a loss function component, a metrics of the data sourceMars teaches the system (central server) may compute an operational metric (accuracy) of the ML model in making accurate predictions or classifying labels accurately. The system retrieves training data samples from external data sources. The system validates the performance of the ML model based on the training data samples by determining and comparing operational metrics to determine the reliability of the external data sources. The computation of the operational metrics is performed in a central server system after receiving the training data samples from external data sources. Li further teaches a model that determines source reliability in providing information that is related to the truth and uses a loss function that measures the truth and observation.)
Mars in view of Li does not explicitly disclose an implementation of a computing function “a gradient of the loss function component”. However, Hall discloses in the same field of endeavor:
“wherein the computing comprises computing an accuracy of the data source in predicting the annotations utilizing an influence function comprising a loss function component, a metrics of the data source component, and a gradient of the loss function component, wherein the loss function component and the gradient of the loss function component is computed at the central server and wherein a result of the metrics of the data source component is provided to the central server from a corresponding data source” ([0169, 0185-0187], The training data for the ML model may come from a variety of data sources. In one embodiment, data may be collected at a central server to perform the method of determining the predictive power of the dataset from a particular data source. A balanced accuracy metric or a log loss function may be used to evaluate the dataset for each data source to determine the accuracy and data quality of a dataset.)
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of a computing function “a gradient of the loss function component” from Hall into the teaching of Mars in view of Li. Doing so can train a global ML model with high quality data and remove the noisy data in the dataset using metrics to determine which data are consistently providing incorrect predictions (Hall, abstract).
Claims 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mars (US10296848B1) in view of Li, “Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation” and Rounthwaite (US20100325133A1).
Regarding claims 6 and 17, Mars in view of Li teaches:
“further comprising A fit score assesses the training data samples and each data source may be evaluated based on a level of quality score. The scores describe the data source and training data that can improve the machine learning model training.)
Mars in view of Li does not explicitly disclose an implementation of representing the data quality as a bipartite graph. However, Rounthwaite discloses in the same field of endeavor:
“wherein the selecting comprises constructing a bipartite graph comprising the plurality of ” ([0021], A bipartite graph is generated to show a visual representation of the relationship between two types of entities. A weighted edge defines how relevant a first set of nodes map to a second set of nodes.)
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of representing the data quality as a bipartite graph from Rounthwaite into the teaching of Mars in view of Li. Doing so can use a visual graphical representation to define high correlations between 2 different sets of data (Rounthwaite, par. 21).
Claims 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mars (US10296848B1) in view of Li, “Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation” and McDonald (US20170212241A1).
Regarding claims 8 and 19, Mars in view of Li teaches:
“further comprising selecting Mars teaches a training data seed samples (validation dataset) is a set of dataset that a user would like to obtain more training data that are similar to the seed samples from a plurality of external data sources.)
Mars in view of Li does not explicitly disclose an implementation of selecting the least number of data sources.
However, McDonald discloses in the same field of endeavor:
“wherein the selecting comprises selecting the least number of data sources that covers the The system evaluate the data from multiple data sources and selects a data source that meets a certain criterion, such as a data quality metric.)
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of selecting the least number of data sources from McDonald into the teaching of Mars in view of Li. Doing so can select the data with the lowest uncertainty value to obtain the best results (Thompson, par. 33).
Claims 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mars (US10296848B1) in view of Li, “Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation” and Choudhury (US20210150269A1).
Regarding claims 9 and 20:
Mars in view of Li does not explicitly disclose an implementation of “the respective data comprises a derivative of local data stored at the data source to preserve a privacy of the local data stored at the data source”. However, Choudhury discloses in the same field of endeavor:
“wherein the respective data comprises a derivative of local data stored at the data source to preserve a privacy of the local data stored at the data source” ([0039-0040], The training data at each local data source have certain attributes filtered out prior to training the federated learning model. Attributes like direct identifiers are not used to train the model.)
It would be obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of “the respective data comprises a derivative of local data stored at the data source to preserve a privacy of the local data stored at the data source” from Choudhury into the teaching of Mars in view of Li. Doing so can protect sensitive information from being leaked by a data source during training of a global federated learning model (Choudhury, abstract).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GARY MAC whose telephone number is (703)756-1517. The examiner can normally be reached Monday - Friday 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571) 270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GARY MAC/Examiner, Art Unit 2127
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127