Last updated: May 29, 2026
Application No. 17/696,593
MODEL TRAINING METHOD, APPARATUS, AND SYSTEM

Non-Final OA §101§103
Filed
Mar 16, 2022
Priority
Sep 17, 2019 — CN 201910878280.9 +1 more
Examiner
DIEP, DUY T
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Non-Final)
Interview Optional

— +6.7% interview lift. Interview lift (+6.7%) is below the 15.0% threshold. A written response is recommended.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

DIEP, DUY T View full profile →
Grants only 29% of cases
Career Allowance Rate
7 granted / 24 resolved
-25.8% vs TC avg
Moderate +7% lift
Without
With
+6.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
17 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.0%
-38.0% vs TC avg
§103
98.0%
+58.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendments filed 09/02/2025 have been entered. Claims 1-20 remain pending in the application.
Applicant’s amendments and arguments, with respect to claim rejections of claims 1, 2, 3, 8, 10, 11, 19, 20 under 35 U.S.C 101 filed 06/02/2025 have been considered and are not persuasive. Therefore, the previous rejections as set forth in the previous office action will be maintain.
The applicant argues that the claims are not directed to abstract ideas of a mental process, but are directed to specific technological solutions for distributed machine learning systems. As amended, claim 1 recites a concrete distributed architecture with a "first analysis device" performing offline training and sending models to "a plurality of local analysis devices" that perform incremental training. This represents a specific technological improvement to computer functionality, not an abstract concept. The claims are analogous to patent-eligible claims involving network communications and distributed computing systems that have been found patent-eligible. The distributed architecture requires specific technological components working together in a defined manner to achieve improved machine learning functionality. 
However, even if the claims could be construed as involving abstract concepts, they are integrated into practical applications. The claims require specific hardware components (processors, non-transitory memory), define particular network architectures (local networks, distributed analysis devices), and specify concrete training processes (offline training by a central device, incremental training based on local sample sets by distributed devices). The distributed architecture necessarily improves computer functionality by enabling machine learning across multiple devices. The claims solve the technological problem of how to train machine learning models while allowing it to be adapted to a requirement of the local analysis device. 
The claims contain significantly more than any alleged abstract idea because they require a specific distributed computing architecture with defined roles for different types of analysis devices. The "first analysis device" must perform offline training on historical data and distribute the resulting model, while "local analysis devices" must receive the model and perform incremental training using local network data. This arrangement goes well beyond merely applying an abstract idea on a computer, as it requires specific technological components, network communications, and coordinated processing across multiple devices. The claims improve the functioning of computer systems by enabling distributed machine learning. 
The combination of offline training by a first device followed by distributed incremental training by local devices provides technological benefits including adaptability of the model to requirements of the local analysis devices. These concrete technological improvements constitute patent-eligible technological advances that transform the nature of the computing system and represent significantly more than any alleged abstract idea.
The examiner respectfully disagrees. As recited in the previous Office Action, claim 1 is rejected in combination with dependent claim 2, because if dependent claim 2 is rewritten as independent claim in combination with claim 1, the rejection of 35 U.S.C 101 is applied for the claim combination. Claim 2 recites the following abstract idea of mental process “predicting a classification result ...”, and “... evaluates, based on the prediction information, whether the machine learning model is degraded”. While claim 1 does not recite the abstract idea, claim 1 still recite the additional element of a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), which is the amended limitation “by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model”. This limitation simply recites the application of a conventional machine learning procedure, which is training a machine learning model using a set of training data without providing any improvement in the structure of the machine learning model or an improved learning algorithm or improvement toward a computer element. Claim 1 also recites insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g) and well-understood, routine, conventional activity of data transmitting as identified in MPEP 2106.05(d)(II)(i), which is the amended limitations “sending the machine learning model to a plurality of local analysis device”, and “by each of the local analysis device: receiving a machine learning model sent by the first analysis device”. These limitations simply recite the information is transmitted from one device to another device, without providing any improvement toward the machine learning algorithm or specific machine learning architecture or improvement toward computer element.
Although the applicant asserts that the amended claims are directed to specific technological solutions for distributed machine learning systems and improve computer functionality, the claim as presently drafted fail to recite any particular manner in which the alleged improvements are achieved. Instead, the claims merely describe the concept of distributing known machine learning operations (training, sending, predicting and evaluating) across generic computing devices, which constitutes an abstract idea implemented using routine computer components. The claim languages describe only a generic allocation of machine learning functions between devices without specifying any technical implementation details – such as communication protocols or hardware-based optimization – that result in an improvement to the functioning of the computer itself. The mere involvement of multiple devices performing known machine learning task does not transform the nature of the abstract idea, which is predicting and evaluating. No evidence is provided that the claimed arrangement improves computer operation (e.g., speed, memory usage, accuracy) in a manner beyond what conventional distributed computing inherently provides.
Furthermore, applicant’s argument of the arrangement of devices – namely the first analysis device and distributed local device that perform incremental training does not does not amount to a specific technological implementation. The claimed arrangement merely distributed well-understood machine learning tasks among conventional computing components and lacks any details as to how the devices interact in a manner that improves the functioning of the computer system. Without such details, the claimed “arrangement” simply represents a conceptual allocation of functions among generic computer entities, which is insufficient to integrate the abstract idea into a practical application. This arrangement is simply functional and result-oriented and does not constitute a technological improvement. The applicant also argues that “improve the functioning of computer systems by enabling distributed machine learning”. However, such an improvement must be recited in the claims with sufficient specificity to demonstrate how the computer’s operation itself is improved, rather than merely performing an abstract idea using generic computer components. In the present case, the claims do not specify how the alleged improvement is achieved. The limitations merely describe distributing conventional machine learning operations (training, sending, predicting and evaluating) across a first and local device. These steps, even if executed by multiple computers, simply describe where the data processing occurs, but not a change in how the computer operate to achieve a technological improvement. 
Even when considering the elements in combination, the claims do not amount to significantly more than the judicial exception. The steps of receiving, sending, and training using data are conventional black-box AI practices of machine learning and can be performed by generic computing devices. Applicant has not identified any specific element or combination of elements that yields an improvement in the functioning of a computer or any other technology. The amended claims, even when viewed as a whole, remain directed to abstract idea of predicting and evaluating, while performing training using data and distributing information among devices, which does not integrate the abstract idea into a practical application nor provide significantly more than the judicial exception. Therefore, the rejections under 35 U.S.C 101 is maintained.

Applicant’s amendments and arguments, with respect to claim rejections of claims 1-20 under 35 U.S.C 103 filed 06/02/2025 have been considered and are persuasive.
The applicant argues that Dai and Song teaching’s references, alone or in combination, fail to teach or suggest at least these features of amended claim 1. Song generally describes methods for determining target predictive models based on service type identifiers. However, Song does not teach distributed machine learning architectures, model distribution from central to local devices, or incremental training concepts. Song’s disclosure of service-type-based model selection is entirely different from claimed distributed training architecture. Dai and Song does not teach or suggest performing offline training based on a historical training sample and sending the machine learning model to a plurality of local analysis devices as suggested by the amended claims "by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model; and sending the machine learning model to a plurality of local analysis devices; and by each of the local analysis devices: receiving a machine learning model sent by the first analysis device; and performing incremental training...". The remaining references do not make up for the deficiencies of Dai and Song.

The examiner respectfully agrees that Dai and Song does not teach the newly added limitations of claim 1. However, these limitations were first introduced by applicant’s amendments filed on 09/02/2022, and therefore change the scope of the claim as compared to the claims previously examined. The newly amended claim introduces the concepts of offline training performed by a device and distribution of the resulting machine learning model to a plurality of local devices. 
Therefore, upon further consideration, new ground(s) of rejections have been raised (See Below.)


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 2, 3, 8, 10, 11, 19, 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more and while these claims are dependent claims, if the claim is rewritten as independent claim, the rejection of 35 U.S.C 101 is applied for these claims.
	
Regarding claim 2, in combination with the limitations of claim 1
Step 1:
Claim 2 depends on claim 1, wherein a claim 1 recites a method, one of the four statutory categories of patentable subject matter
Step 2A, Prong I:
Claim 2 further recites the limitations of:
“predicting a classification result ... ”. The process of predicting a classification result is considered to a mental process. A person can mentally predict a classification result.
“... evaluates, based on the prediction information, whether the machine learning model is degraded”. The process of evaluating whether a machine learning model is degrading based on the prediction information is a mental process. A person can mentally evaluate whether the machine learning model is degrading based on prediction information such as by mentally evaluate various predicting results to determine if the machine learning model is correct, therefore determine if it is degrading or not.
Step 2A, Prong II:
Claim 2 in combination with claim 1 further recites the limitations of:
“by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model” (recited from claim 1). This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites the application of the machine learning model which is trained using the historical training sample set without reciting how the training is performed using the training data in specific detail or if the training is configured in an unconventional practice using an improved machine learning algorithm, or any improvement toward computer elements
“sending the machine learning model to a plurality of local analysis devices” (recited from claim 1). This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“by each of the local analysis device: receiving a machine learning model sent by a first analysis device” (recited from claim 1). This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“performing incremental training on the machine learning model based on a first training sample set, wherein feature data in the first training sample set is feature data from a local network corresponding to the local analysis device.” (recited from claim 1). This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites the application of the machine learning model which is incremental trained using the first training sample set and using the feature data without reciting how the incremental training is performed using the training data and feature data in specific detail or if the training is configured in an unconventional practice using an improved machine learning algorithm, or any improvement toward computer elements
“... using the machine learning model” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application or significantly more than the abstract idea. The limitation recites using a machine learning model without reciting how the model is used or configured as a novel invention.
“sending prediction information to an evaluation device, wherein the prediction information comprises the predicted classification result, wherein the evaluation devices ...” This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“after receiving a training instruction sent by the evaluation device, performing incremental training on the machine learning model based on the first training sample set, wherein the training instruction is used to instruct to train the machine learning model” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites performing incremental training on the machine learning model based on the first training sample set without reciting how the incremental training is performed in an unconventional practice using an improved machine learning algorithm, or any improvement toward computer elements.
Step 2B:
When considered individually or in combination, the additional limitations and elements of claim 2 does not amount to significantly more than the judicial exception for the same reasons discussed above as to why the additional limitations do not integrate the abstract idea into a practical application. The additional elements of outlined in Step 2A performing functions as designed simply accomplishes execution of the abstract ideas.
The additional element “by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model” (recited from claim 1) recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “sending the machine learning model to a plurality of local analysis devices” (recited from claim 1) further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “by each of the local analysis device: receiving a machine learning model sent by a first analysis device” (recited from claim 1) further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “performing incremental training on the machine learning model based on a first training sample set, wherein feature data in the first training sample set is feature data from a local network corresponding to the local analysis device.” (recited from claim 1) recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “... using the machine learning model” recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “sending prediction information to an evaluation device, wherein the prediction information comprises the predicted classification result, wherein the evaluation devices ......” further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “after receiving a training instruction sent by the evaluation device, performing incremental training on the machine learning model based on the first training sample set, wherein the training instruction is used to instruct to train the machine learning model” recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above. 
In conclusions from above for the elements considered as a mental process, elements reciting additional element of instruction to apply an exception as identified in MPEP 2106.05(f), elements reciting a well-known technique of data transmitting as identified in MPEP 2106.05(g) and a well-understood, routine, conventional activity as identified in MPEP 2106.05(d) are carried over and do not provide significantly more than the abstract idea. Looking at the limitations in combination and the claims as a whole does not change this conclusion and the claim is ineligible.
Therefore, additional limitations of claim 2 in combination with additional limitations of claim 1 do not amount to significantly more than the judicial exception.
Thus, claim 2 in combination with claim 1 recites abstract ideas with additional elements rendered at a high level of generality resulting in claims that do not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception. 
Therefore, claim 2 in consideration of combination with claim 1 is not patent eligible.  

Regarding claim 3 depends on claim 2 thus the rejection of claim 2 is incorporated. 
Claim 3 recites the limitation:
	“the machine learning model is used to predict a classification result of to-be-predicted data consisting of one or more pieces of key performance indicator (KPI) feature data and the KPI feature data is feature data of one of a KPI time series or is KPI data” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f). The limitation recites using a machine learning model to perform prediction with data without reciting how the model performs the prediction process as a novel invention.
	“the prediction information further comprises a KPI category corresponding to the KPI feature data in the to-be-predicted data, an identifier of a device to which the to-be-predicted data belongs, and a collection moment of KPI data corresponding to the to-be-predicted data” This additional element further specify the information in which the prediction is associated with these information.
Thus, claim 3 recites additional elements rendered at a high level of generality resulting in claims that do not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception. Therefore, claim 3 is not patent eligible.  

Regarding claim 8 depends on claim 1 thus the rejection of claim 1 is incorporated. The additional element of claim 1 is incorporated into the rejection of claim 8.
Step 1:
Claim 8 depends on claim 1, wherein a claim 1 recites a method, one of the four statutory categories of patentable subject matter
Step 2A, Prong I:
Claim 8 recites the limitations:
“an absolute value of a difference between any two probabilities in probabilities obtained by predicting a sample ... is less than a second difference threshold” The process of calculating a discrimination condition comprising of an absolute value and compare it to a threshold is considered to be a mathematical formula as well as a mental process. The calculating of an absolute value of a difference between two predicted probabilities is a mathematical formula and further compare it with a threshold is a mental process as a person can mentally predict one or more samples, then calculate the absolute value and compare it to a threshold.
Step 2A, Prong II:
Claim 8 in combination with claim 1 further recites the limitations:
“by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model” (recited from claim 1). This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites the application of the machine learning model which is trained using the historical training sample set without reciting how the training is performed using the training data in specific detail or if the training is configured in an unconventional practice using an improved machine learning algorithm, or any improvement toward computer elements
“sending the machine learning model to a plurality of local analysis devices” (recited from claim 1). This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“by each of the local analysis device: receiving a machine learning model sent by a first analysis device” (recited from claim 1). This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“performing incremental training on the machine learning model based on a first training sample set, wherein feature data in the first training sample set is feature data from a local network corresponding to the local analysis device.” (recited from claim 1). This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites the application of the machine learning model which is incremental trained using the first training sample set and using the feature data without reciting how the incremental training is performed using the training data and feature data in specific detail or if the training is configured in an unconventional practice using an improved machine learning algorithm, or any improvement toward computer elements
“...by using the machine learning model ...” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites using a machine learning model to perform prediction of a sample without reciting how the model performs the prediction process is configured in an unconventional practice using an improved machine learning algorithm, or any improvement toward computer elements
“the first training sample set comprises a sample that is obtained by screening a sample obtained by the local analysis device and that meets a low discrimination condition” This additional element recites additional element of an insignificant extra-solution of a well-known technique of mere data gathering as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
Step 2B:
When considered individually or in combination, the additional limitations and elements of claim 8 in combination with claim 1 does not amount to significantly more than the judicial exception for the same reasons discussed above as to why the additional limitations do not integrate the abstract idea into a practical application. The additional elements of outlined in Step 2A performing functions as designed simply accomplishes execution of the abstract ideas.
The additional element “by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model” (recited from claim 1) recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “sending the machine learning model to a plurality of local analysis devices” (recited from claim 1) further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “by each of the local analysis device: receiving a machine learning model sent by a first analysis device” (recited from claim 1) further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “performing incremental training on the machine learning model based on a first training sample set, wherein feature data in the first training sample set is feature data from a local network corresponding to the local analysis device.” (recited from claim 1) recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “the first training sample set comprises a sample that is obtained by screening a sample obtained by the local analysis device and that meets a low discrimination condition” further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that receiving data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the receiving step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “... predicting a sample by using the machine learning model ...” recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
In conclusions from above for the elements considered as a mental process, elements reciting additional element of instruction to apply an exception as identified in MPEP 2106.05(f), elements reciting a well-known technique of data transmitting as identified in MPEP 2106.05(g) and a well-understood, routine, conventional activity as identified in MPEP 2106.05(d) are carried over and do not provide significantly more than the abstract idea. Looking at the limitations in combination and the claims as a whole does not change this conclusion and the claim is ineligible.
Therefore, additional limitations of claim 8 in combination with claim 1 do not amount to significantly more than the judicial exception.
Thus, claim 8 in combination with claim 1 recites abstract ideas with additional elements rendered at a high level of generality resulting in claims that do not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception. 
Therefore, claim 8 in consideration of the combination with claim 1 is not patent eligible.  

Regarding claim 10 in combination with the limitations of claim 9, the applicant is further directed to the rejection of claim 2 above, because claim 10 recites similar limitations to claim 2 and claim 9 recites similar limitation to claim 1, thus claim 10 is similarly rejected under the same rationale of claim 2. 

Regarding claim 11, the applicant is further directed to the rejection of claim 3 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 

Regarding claim 19, in combination with claim 9, wherein claim 9 recites similar limitation to claim 1
Step 1:
Claim 19 depends on claim 9, wherein a claim 9 recites a system, one of the four statutory categories of patentable subject matter
Step 2A, Prong I:
Claim 19 further recites the limitations of:
“create a root node”. The process of creating a root node is considered to be a mental process. A person ordinary skilled in the art can manually create a root node using a pen and paper.
“determine a classification result for each leaf node to obtain the machine learning model”. The process of determining a classification result for each leaf node to obtain the machine learning model is considered to be a mental process. A person ordinary skilled in the art can mentally determine a classification result for each leaf node and manually create a decision tree of a machine learning model using a pen and paper.
“splitting the third node to obtain a left child node and a right child node of the third node”. The process of splitting the node to obtain a left and right child node is considered to be a mental process. A person ordinary skilled in the art can mentally split or manually demonstrate the splitting of node by drawing using a pen and paper.
Step 2A, Prong II:
Claim 19 further recites the limitations of:
“a non-transitory memory coupled to the processor and configured to store instructions that when executed by the processor” (recited from claim 9). These limitations are a high-level recitation of generic computer components used as a tool, and does not provide integration into a practical application.
“perform offline training based on a historical training sample set to obtain a machine learning model” (recited from claim 9). This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites perform offline training based on a training data set without reciting how the device configured the training process to be performed offline using the data or the configuration of the machine learning model.
“send the machine learning model to a plurality of local analysis devices” (recited from claim 9) This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“wherein each of the local analysis device is configured to: receive the machine learning model sent by the first analysis device” (recited from claim 9) This additional element recites additional element of an insignificant extra-solution of a well-known technique of data transmitting as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“perform incremental training on the machine learning model based on a first training sample set, wherein feature data in a training sample set used by any local analysis device to train the machine learning model is feature data from a local network corresponding to the any local analysis device” (recited from claim 9). This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites performing incremental training on the machine learning model based on the first training sample set which comprises of feature data without reciting how the incremental training is performed using the training data and feature data in specific detail or if the training is configured in an unconventional manner.
“use the root node as a third node, and execute an offline training process until a split stop condition is met” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The limitation recites using the root node as a third node and execute an offline training process until a condition is met without reciting how the root node is a third node contribute to the process of executing the training process and how the training process is configured to perform in an unconventional manner.
“obtain a historical training sample set having a determined label, wherein a training sample in the historical training sample set comprises feature data in one or more feature dimensions, and the feature data is value data” This additional element recites additional element of an insignificant extra-solution of a well-known technique of obtaining information as identified in MPEP 2106.05(g), and does not provide integration into a practical application.
“using the left child node as an updated third node, using, as an updated historical training sample set, a left sample set that is in the historical training sample set and that is allocated to the left child node, and executing the offline training process again” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The claim recites using the child node as an updated node and using update training sample set to execute the training process without reciting how the training is performed or how the node is associated with the training process.
“using the right child node as the updated third node, using, as the updated historical training sample set, a right sample set that is in the historical training sample set and that is allocated to the right child node, and executing the offline training process again” This additional element recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not provide integration into a practical application. The claim recites using the child node as an updated node and using update training sample set to execute the training process without reciting how the training is performed or how the node is associated with the training process.
Step 2B:
When considered individually or in combination, the additional limitations and elements of claim 19 does not amount to significantly more than the judicial exception for the same reasons discussed above as to why the additional limitations do not integrate the abstract idea into a practical application. The additional elements of outlined in Step 2A performing functions as designed simply accomplishes execution of the abstract ideas.
The additional element “first analysis device to: performing offline training based on a historical training sample set to obtain a machine learning model” (recited from claim 9) recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “sending the machine learning model to a plurality of local analysis devices” (recited from claim 9) further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “by each of the local analysis device: receiving a machine learning model sent by a first analysis device” (recited from claim 9) further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that transmitting data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the transmitting step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “performing incremental training on the machine learning model based on a first training sample set, wherein feature data in the first training sample set is feature data from a local network corresponding to the local analysis device.” (recited from claim 9) recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “use the root node as a third node, and execute an offline training process until a split stop condition is met” recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “obtain a historical training sample set having a determined label, wherein a training sample in the historical training sample set comprises feature data in one or more feature dimensions, and the feature data is value data” further recites a well-understood, routine, conventional activity as identified in MPEP 2106.05(d)(II)(i), which indicate that receiving data is a well-understood, routine, conventional activity when it is claimed in a generic manner (as it is here). Accordingly, a conclusion that the receiving step is well-understood, routine, conventional activity is supported under Berkheimer option II.
The additional element “using the left child node as an updated third node, using, as an updated historical training sample set, a left sample set that is in the historical training sample set and that is allocated to the left child node, and executing the offline training process again” recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
The additional element “using the right child node as the updated third node, using, as the updated historical training sample set, a right sample set that is in the historical training sample set and that is allocated to the right child node, and executing the offline training process again” recites a mere instruction to apply an exception with a recitation of the words "apply it" (or an equivalent) as identified in MPEP 2106.05(f), and does not amount to significantly more than the judicial exception for the same reasons discussed above.
In conclusions from above for the elements considered as a mental process, elements reciting additional element of instruction to apply an exception as identified in MPEP 2106.05(f), elements reciting a well-known technique of data transmitting as identified in MPEP 2106.05(g) and a well-understood, routine, conventional activity as identified in MPEP 2106.05(d) are carried over and do not provide significantly more than the abstract idea. Looking at the limitations in combination and the claims as a whole does not change this conclusion and the claim is ineligible.
Therefore, additional limitations of claim 19 in combination with claim 9 do not amount to significantly more than the judicial exception.
Thus, claim 19 in combination with claim 9 recites abstract ideas with additional elements rendered at a high level of generality resulting in claims that do not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception. 
Therefore, claim 19 in consideration of the combination with claim 9 is not patent eligible.  

Regarding claim 20 depends on claim 19 thus the rejection of claim 1 is incorporated. 
Claim 20 recites the limitation:
“the split stop condition comprises at least one of the following: ... a depth of the third node in the machine learning model is greater than a depth threshold” Claim 20 recites an abstract idea of a mental process. The process to determine that the node depth is greater than a threshold is a mental process as a person ordinary skilled in the art can mentally compare the node depth with a threshold to determine if it has a greater value.
Thus, claim 20 recites abstract ideas rendered at a high level of generality resulting in claims that do not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception. Therefore, claim 20 is not patent eligible.  


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9 are rejected under 35 U.S.C. 103 as being unpatentable over DE BROUWER et.al (US 20200293887 A1), further in view of Song et.al (US 20210027173 A1).

Regarding claim 1, 
DE BROUWER teaches the 1st limitation “by a first analysis device: performing offline training based on a historical training sample set to obtain a machine learning model” (paragraph 54 “In a federated workflow 915, we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953. These edge devices perform local training to generate local model updates 957, using data (not shown) that is on those devices.”, and paragraph 63 “Initial training of the base model can be offline. Then, the trained base model can be distributed to edge devices”, paragraph 64 “a coordinating server 1221 that manages training tasks and performs model aggregation”, and paragraph 133 “real-world data that has previously been collected from sources such as ... historical clinical trial data, etc. This can be done via a federated learning model” DE BROUWER discloses a system and method with Federated Learning Model. Within the disclosure, DE BROUWER discloses the federated learning framework, comprising of a coordinating server that can perform initial training of the base model offline using the historical data such as the historical clinical trial data. The coordinating server that employs a base model that can be offline trained using historical data is analogous to the first analysis device that can perform offline training based on a historical training sample set within the claim.)
DE BROUWER teaches the 2nd limitation “sending the machine learning model to a plurality of local analysis devices;” (paragraph 54 “In a federated workflow 915, we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953.” DE BROUWER discloses after the base model is trained, it can be distributed to individual edge devices for further training, which is analogous to the claimed sending the machine learning model to a plurality of local analysis devices.)
DE BROUWER teaches the 3rd limitation “and by each of the local analysis devices: receiving a machine learning model sent by a first analysis device” (paragraph 54 “In a federated workflow 915, we start with a base model 951 that may have been trained in this conventional manner. Once this base model 951 is trained, refinement can proceed without centrally collecting any further data. Instead, the base model is distributed to individual devices 953.” DE BROUWER discloses after the base model is trained, it can be distributed to individual edge devices by the coordinating server, suggesting the receiving of the trained base model from the server at each edge device, which is analogous to the claimed each local analysis devices receive a machine learning model sent by a first analysis device.)
DE BROUWER teaches a part of the 4th limitation “performing incremental training on the machine learning model based on a first training sample set ...”. (paragraph 53 “A federated learner (Flea) can be implemented as an end user side library, built for an edge device environment, to perform local model update calculations using data collected in the edge device environment. The Flea can perform post-processing after model updating, including applying perturbations (e.g., encryption and introduction of noise for privacy purposes), sharing the model update with a central update repository (i.e., an FL aggregator)”, and paragraph 54 “These edge devices perform local training to generate local model updates 957, using data (not shown) that is on those devices.” DE BROUWER discloses each local edge device further perform local model update calculations using data collected in the edge device environment and can further perform post-processing after model updating. These further training steps of local update using local data that is on those devices are analogous to the incremental training on the machine learning model based on a first training sample set within the claim.)
DE BROUWER does not teach a part of the 4th limitation “... wherein feature data in the first training sample set is feature data from a local network corresponding to the local analysis device”. However, Song teaches this part of the limitation (paragraph 18 “generating the target predictive model by training a model based on the first training record.”, and paragraph 125 “x is a feature value of each dimension (for example, for a three-dimensional plane, the data x has three feature values) of the KPI data” Song discloses an indicator determining method and related device. Within the disclosure, Song discloses an analysis apparatus configured to generate a predictive model based on analyzing and processing sample data set as well as the network KPI feature data, such that a feature value of the network KPI feature data is used to train the predictive model. A person ordinary skilled in the art would recognize that the predictive model by Song may correspond to the machine learning model at each edge device by DE BROUWER. Therefore, the network KPI feature value taught by Song is analogous to the claimed feature data is from the local network corresponding to the local analysis device. The motivation to combine the teachings is disclosed below.)
Before the effective filing date, it would have been obvious to a person ordinary skilled in the art to combine the teaching of system and method with Federated Learning model by DE BROUWER, with the teaching of feature value data by Song. The motivation to do so is referred to in Song’s disclosure (paragraph 4 “A reason why user experience results of the service are different may be a defect of a design of the application, or may be freezing, instability, and the like of a network connected to the service. Therefore, a network status of the network connected to the service also becomes one of the parameters that evaluate user experience. In an existing training model, a KQI is predicted by analyzing network KPI data. The training model can be generated by collecting a large amount of KPI data, and a larger data volume generally indicates a more accurate result of the training model”, and paragraph 13 “An embodiment of this disclosure has the following advantages: ... In this embodiment of this disclosure, when the KQI of the service is predicted by using the network KPI parameter, it is considered that services of different service types have different predictive models, so that the KQI result of the service predicted by using the network KPI parameter is more accurate” Song discloses that incorporating network KPI feature data improves the accuracy of a machine learning model by adapting the model to network conditions specific to each device or service. Given that a machine learning model can be improved with the network KPI data represented as feature value, one of ordinary skilled in the art would recognize that by incorporating the KPI feature value into each edge device by DE BROUWER would improve the accuracy and adaptability of the distributed model in heterogeneous network environment, which is a predictable and advantageous result. such combination represents the use of known techniques to improve model adaptation to device-specific conditions, and therefore would have been obvious.)

Regarding claim 9, 
DE BROUWER teaches limitations “a processor”, and “a non-transitory memory coupled to the processor and configured to store instructions that when executed by the processor, cause the device” (paragraph 151 “These software modules are generally executed by processor alone or in combination with other processors”, and paragraph 164 “Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform actions of the system described above.” DE BROUWER discloses the invention may be configured within a non-transitory computer readable storage medium storing instructions executable by a processor to perform actions of the system and method as described above.)
The applicant is further directed to the rejection of claim 1 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 


Claims 2-4, 10-12, 18 are rejected under 35 U.S.C. 103 as being unpatentable over DE BROUWER et.al (US 20200293887 A1), in view of Song et.al (US 20210027173 A1), further in view of Fly et.al (US 20200082296 A1)

Regarding claim 2 depends on claim 1, thus the rejection of claim 1 is incorporated.
DE BROUWER teaches the limitation “predicting a classification result by-using the machine learning model” (paragraph 62 “In practice, the base convolution model can be a MobileNet V2 model with supplemental training that builds on transfer learning of facial images. Transfer learning can leverage training on an ImageNet classification problem”, and paragraph 65 “The edge devices can use the new model for predictions and training as additional data is collected”. DE BROUWER discloses the training of the base model at each edge device which can then be used for predictions on additional collected data. The training of the base model may be performed with ImageNet classification problem, thus indicating that the edge device perform prediction on classification problem, more particularly, prediction on classification of an image.)
DE BROUWER teaches the limitation “the performing incremental training on the machine learning model based on a first training sample set comprises: after receiving a training instruction sent by the evaluation device, performing incremental training on the machine learning model based on the first training sample set, wherein the training instruction is used to instruct to train the machine learning model” (paragraph 60 “The edge devices 953 train using respective partitions of the data 1015, producing the updated models 957, which are aggregated 959 into an updated model which can be distributed as a new base model 951. In this process, the base model resides locally on each device. Each device trains locally on data that is available on device.” DE BROUWER discloses the local update of each edge device using the received base model, which produce the updated models locally at each device. Each devices trains locally on data available on each device, which is analogous to the first training sample set within the claim. One of ordinary skilled in the art may configure the local update of the base model at each edge device after the base model is analyzed for performance degradation using the system and method by Fly, wherein the motivation to combine the teachings is disclosed below.)
DE BROUWER/Song does not teach the limitation “sending prediction information to an evaluation device, wherein the prediction information comprises the predicted classification result, wherein the evaluation device evaluates, based on the prediction information, whether the machine learning model is degraded”. However, Fly teaches this limitation (paragraph 83 “FIG. 2B illustrates an online scoring system 140, which scores incoming scoring requests, and establishes the baseline for detecting performance degradation.”, and paragraph 86 “the parity detection engine 260 measures the performance degradation of the online scoring system 140 against the model generated by the offline training system 130”. Fly discloses systems and methods for detecting drift that leads to errors in a predictive analytics system. Within the disclosure, Fly discloses the online scoring system that scores incoming request and provide predictions for incoming scoring request data based on the offline trained machine learning model, wherein the parity detection engine utilizes the prediction result of the online scoring system as well as the offline training system to detect performance degradation of the machine learning model.)
Before the effective filing date, it would have been obvious to a person ordinary skilled in the art to combine the teaching of system and method with Federated Learning model by DE BROUWER, and the teaching of feature value data by Song, with the teaching of systems and methods for detecting drift that leads to errors in a predictive analytics system by Fly. The motivation to do so is referred to in Fly’s disclosure (paragraph 15 “once the offline and online environments are connected, the inventive system is enabled to detect drift in real-time or near real-time. The connected systems identify or detect performance degradation (caused by drift) in an “online” operational scoring environment. Specifically, the inventive system identifies degradation by establishing a baseline in an “online” operational scoring environment of trained and validated modeling dataset and scores, and by ensuring that new online data and scores match those that were established in the “offline” discovery environment when the model was trained.”, and paragraph 18 “In order to detect outcome anomalies, the inventive system and method detects, for example and without limitation, outcome in relation to prediction volume distribution changes (e.g.: low prediction scores are outperforming high prediction scores). Through the various embodiment and examples described herein, the inventive system detects anomalies in real-time or near real-time, while improving the prediction accuracy for a system that traditionally detects these anomalies too late, if at all.” Fly discloses an improve method of detecting anomalies and degradation of machine learning model by incorporating between an offline training system of machine learning model with an online scoring system for determination. Fly discloses the inventive system detects anomalies in real-time or near real-time, while improving the prediction accuracy for a system. Therefore, a person ordinary skilled in the art can further incorporate the teaching by Fly into the teaching combination for further improvement on the training of the machine learning model at each edge device in the federated learning method and system by DE BROUWER.)

Regarding claim 3 depends on claim 2, thus the rejection of claim 2 is incorporated.
Song teaches the limitation “The method according to claim 2, wherein the machine learning model is used to predict a classification result of to-be-predicted data consisting of one or more pieces of key performance indicator (KPI) feature data, and the KPI feature data is feature data of one of a KPI time series or is KPI data” (paragraph 46 “an obtaining unit, configured to obtain to-be-predicted data of a service, where the to-be-predicted data includes a key performance indicator KPI”, and paragraph 125 “x is a feature value of each dimension (for example, for a three-dimensional plane, the data x has three feature values) of the KPI data”. Song discloses obtaining to-be-predicted data, wherein a person ordinary skilled in the art may utilize the trained machine learning model from Fly to predict a classification result using to-be-predicted data, wherein the to-be-predicted data comprises of KPI feature data with x feature values.)
Song teaches the limitation “the prediction information further comprises a KPI category corresponding to the KPI feature data in the to-be-predicted data, an identifier of a device to which the to-be-predicted data belongs, and a collection moment of KPI data corresponding to the to-be-predicted data” (paragraph 46 “an obtaining unit, configured to obtain to-be-predicted data of a service, where the to-be-predicted data includes a key performance indicator KPI of a network in which the service is located and a type identifier of the service, and the type identifier is used to indicate a type of the service”, and paragraph 51 “The obtaining unit is further configured to obtain a sample data set of the service, where the sample data set includes at least one piece of sample data, and each piece of sample data in the sample data set includes the KPI of the network in which the service is located.” Song discloses the to-be-predicted data includes a key performance indicator KPI of a network in which the service is located and a type identifier of the service such that the KPI data corresponding to each identified type of service to be predicted.)

Regarding claim 4 depends on claim 1, thus the rejection of claim 1 is incorporated.
Fly teaches the limitation “when performance of the machine learning model obtained through incremental training does not meet a performance fulfillment condition, sending a retraining request to the first analysis device, wherein the retraining request is used to request the first analysis device to retrain the machine learning model” (paragraph 21 “In one embodiment, detecting whether drift is statistically significant is comprised of determining whether the combined result is above a threshold, and/or based on model prediction and score distributions, model variable profiling statistics, time window comparison and rate of change, model variable predictive power and/or relative importance, and/or risk factor associated with the identified drift.”, paragraph 90 “The new baseline generator 265 generates new baselines whenever a model is retrained based on more current, enriched data containing new features.”, and paragraph 118 “Drift indicates, generally, that the performance of the online scoring system may be degrading. In this manner, the process identifies and detect potential performance degradation issues”. Fly discloses the retraining of the model based on more current, enriched data, in which a person ordinary skilled may configure the retraining in accordance with the drift detected that suggest a potential performance degradation issue. The retraining may occur after the drift detecting process evaluate the model at each edge device. One of ordinary skilled in the art may configure the retraining of the base model if the evaluation at each edge device resulted in a potential performance degradation issue.)

Regarding claim 10 depends on claim 9, thus the rejection of claim 9 is incorporated. The applicant is further directed to the rejection of claim 2 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 

Regarding claim 11 depends on claim 10, thus the rejection of claim 10 is incorporated. The applicant is further directed to the rejection of claim 3 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 

Regarding claim 12 depends on claim 9, thus the rejection of claim 9 is incorporated. The applicant is further directed to the rejection of claim 4 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 

Regarding claim 18 depends on claim 9, thus the rejection of claim 9 is incorporated. The applicant is further directed to the rejection of claim 1 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 


Claims 5, 6, 13, 14, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over DE BROUWER et.al (US 20200293887 A1), further in view of Song et.al (US 20210027173 A1), further in view of Fly et.al (US 20200082296 A1), further in view of Jagannath et.al (US 20070185896 A1)

Regarding claim 5 depends on claim 1, thus the rejection of claim 1 is incorporated. 
Fly teaches the limitation “The method according to claim 1, wherein the machine learning model is a tree model” (paragraph 62 “in general, the offline training system 130 may use any “offline” learning algorithm that may be known to a person of ordinary skill in the art without departing from the scope of the invention, including, large-scale distributed training of decision trees” Fly discloses training system may use any learning algorithm for the machine learning model, including decision trees. One of ordinary skilled in the art would have been able to configure the base machine learning model at the coordinating server and at each edge device as a decision tree machine learning model based on the combination of teachings above.)
Fly teaches the limitation “for any training sample in the first training sample set, starting traversal from a root node of the machine learning model, to execute one of the following traversal processes” (paragraph 62 “in general, the offline training system 130 may use any “offline” learning algorithm that may be known to a person of ordinary skill in the art without departing from the scope of the invention, including, large-scale distributed training of decision trees”. Fly discloses the machine learning model may be a decision tree, wherein a person ordinary skilled in the art would recognize that a decision tree includes a traversal from the root node.)
Song teaches a part of the 4th limitation “the first training sample is any training sample in the first training sample set, the first training sample comprises feature data in one or more feature dimensions, the feature data is value data” (paragraph 15 “obtaining a sample data set, where the sample data set includes at least one piece of sample data, and each piece of sample data in the sample data set includes the KPI”, and paragraph 125 “x is a feature value of each dimension (for example, for a three-dimensional plane, the data x has three feature values) of the KPI data” Song discloses a traing sample data set, wherein the sample data set include KPI feature data with x feature value of dimension.)
DE BROUWER/Song/Fly does not teach the 2nd limitation “when a current split cost of a traversed first node is less than a historical split cost of the first node, adding an associated second node, wherein the first node is any non-leaf node in the machine learning model, and the second node is a parent node or a child node of the first node”. However, Jagannath teaches this limitation (paragraph 11 “In one aspect of the present invention, the binary tree may be constructed by recursively computing joint counts of predictor and target values, finding a split point for a node for a portion of the values of the predictor, computing a cost of representing the split node in the tree”, paragraph 83 “The best split point, for example, is the one that has the lowest Gini index value ... If this branch cannot be split any further, binning for this branch is complete. Otherwise, in step 914, the cost of representing this node in the tree is computed and stored. This cost is stored for every node”, and paragraph 47 “First, the root split is determined. Once this is done, the root's two child node bitmaps are generated and the best splits for those two children are determined. Once this is done, the process moves to the third level, and so on.” Jagannath discloses system and method for building decision trees in a database system as well as computing split cost. Within the disclosure, Jagannath discloses the tree may be constructed in a recursive manner with a split point for a node as well as computing a cost of representing the split node in the tree. Jagannath also discloses the best split point being the one that has the lowest Gini index value, wherein such Gini index value suggest the split cost, suggesting a comparison of the split cost such that the split cost of a node with the lowest split cost (current split cost of first node) may be less than the split cost of that node in a previous split (historical split cost of 1st node) based on the comparison of the Gini index value as the tree is constructed in a recursive manner, thus determine the best split cost at that node.  Jagannath also discloses the root split is determined with two child node is obtained, suggesting an associated second node, wherein the root node is a non-leaf node and the second node is its child node.)
DE BROUWER/Song/Fly does not teach part of the 4th limitation “wherein the current split cost of the first node is a cost at which node split is performed on the first node based on a first training sample, ... the historical split cost of the first node is a cost at which node split is performed on the first node based on a historical training sample set of the first node, and the historical training sample set of the first node is a set of samples that are grouped to the first node and that are in a historical training sample set of the machine learning model”. However, Jagannath teaches this part of the limitation (paragraph 11 “In one aspect of the present invention, the binary tree may be constructed by recursively computing joint counts of predictor and target values, finding a split point for a node for a portion of the values of the predictor, computing a cost of representing the split node in the tree”. Jagannath discloses computing the split cost for a split at a node in the tree, wherein a person ordinary skilled in the art can configure the node split performed on a node based on the sample data set based on the teaching combination below, as well as the node split performed on a node based on the sample data set at a previous iteration, wherein the sample data set is configured to be grouped with the first node to perform machine learning at the first node as a previous training iteration.)
Before the effective filing date, it would have been obvious to a person ordinary skilled in the art to combine the teaching of system and method with Federated Learning model by DE BROUWER, and the teaching of feature value data by Song, and the teaching of systems and methods for detecting drift that leads to errors in a predictive analytics system by Fly, with the teaching of system and method for building decision trees in a database system as well as computing split cost by Jagannath. The motivation to do so is referred to in Jagannath’s disclosure (paragraph 0006 “Among the methods proposed, decision trees are popular for modeling data for classification purposes.”, paragraph 0009 “The present invention performs binning that provides useful models, but which reduces the information loss of the model and reduces the introduction of false information artifacts.”, paragraph 29 “Another advantage is that this method doesn't have to incur the expense, management, and security issues of moving the data to a specialized mining engine.”, and paragraph 34 “Trained model 210 includes representations of the decision tree model. Trained model 210 may also be evaluated and adjusted in order to improve the quality, i.e. prediction accuracy, of the model. Trained model 210 is then encoded in an appropriate format and deployed for use in making predictions or recommendations.” Jagannath discloses an embodiment for building a decision tree, wherein Fly also suggest the trained machine learning model may be a decision tree. Jagannath also discloses various techniques to build and improve the decision tree machine learning model such as binning technique to educes the information loss of the model and reduces the introduction of false information, the method doesn't have to incur the expense, management, and security issues of moving the data to a specialized mining engine, and the technique to split node and compute cost to construct a tree. A person ordinary skilled in the art would have been able to incorporate the teaching combination with the teaching by Jagannath for further improvement.)

Regarding claim 6 depends on claim 5, thus the rejection of claim 5 is incorporated. 
Fly teaches the limitation “The method according to claim 5, wherein the current split cost of the first node is negatively correlated with a size of a first value distribution range, the first value distribution range is a distribution range determined based on a feature value in the first training sample and a second value distribution range, the second value distribution range is a distribution range of feature values in the historical training sample set of the first node, and the historical split cost of the first node is negatively correlated with a size of the second value distribution range” (paragraph 92 “in one embodiment, the features preparation engine 310 converts numerical data associated with each relevant feature value into categorical data. For example, the features preparation engine 310 may convert specific age data associated with scoring requests into buckets or categories with age ranges”, and paragraph 93 “The dataset also illustrates at least one feature values that has been prepared by the features preparation engine 310. For example, the “age band” column represents feature values that have been converted from a specific number (a first feature value) into a category (a second feature value) comprising a range of numbers”. Fly discloses feature value may be associated with a distribution range, wherein the feature value associated with the training sample set of either current or previous iteration of training. A person ordinary skilled in the art would have been able to configure an inverse correlation between the split cost as disclosed in Jagannath above and range of feature value such that when the distribution range of a feature value increases, the split cost tends to decrease. This suggests that having a wide range of values for a feature in training sample set can lead to better splits in the decision tree.)

Regarding claim 13 depends on claim 9, thus the rejection of claim 9 is incorporated. The applicant is further directed to the rejection of claim 5 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 

Regarding claim 14 depends on claim 13, thus the rejection of claim 13 is incorporated. The applicant is further directed to the rejection of claim 6 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 

Regarding claim 19 depends on claim 9, thus the rejection of claim 9 is incorporated. 
DE BROUWER teaches the limitation “obtain a historical training sample set having a determined label, wherein a training sample in the historical training sample set comprises feature data in one or more feature dimensions, and the feature data is value data” (paragraph 96 “The intermediary step in the training includes generating a feature vector from the input data using the convolution layers”, paragraph 115 “many participants who collect information on their edge device, label the information and compute it locally”, paragraph 133 “historical clinical trial data” DE BROUWER discloses each edge device may obtain collect information on their edge device and label them locally. That information may be historical clinical trial data as understood by one of ordinary skilled in the art and would be able to be utilized for training the model at each edge device. The training at each edge device may comprises training steps of generating a feature vector from the input data, suggesting that the historical training data with label comprises of feature data in one or more feature dimensions, as they are being converted into vector, wherein the feature vector is analogous to the feature value data within the claim.)
Jagannath teaches the limitation “create a root node” (paragraph 31 “the root of the tree”. Jagannath discloses creating a decision tree including a root node.)
Jagannath teaches the limitation “use the root node as a third node, and execute an offline training process until a split stop condition is met” (paragraph 31 “In order to obtain the prediction, information relating to the particular customer may be used to traverse the tree by, at each node of the tree, using values of the customer's information to select a branch of the tree to follow. For example, the root of the tree, with no information about the customer, the prediction is that the customer is 56% (150 Y, 120 N) likely to respond to the promotion.” Jagannath discloses perform prediction by traversing the tree, such as a prediction at the root node of the tree, wherein a person ordinary skilled in the art would have been able to configure the training for the prediction at the root node of the tree by the offline training system based on the teaching combination with Fly, and further configure a third node obtained after the split as the root node, wherein this root node may be further split.)
Jagannath teaches the limitation “determine a classification result for each leaf node to obtain the machine learning model” (paragraph 31 “In order to obtain the prediction, information relating to the particular customer may be used to traverse the tree by, at each node of the tree, using values of the customer's information to select a branch of the tree to follow.” Jagannath discloses traversing the tree to perform prediction, wherein a person ordinary skilled in the art may configure a prediction of classification result at each leaf node as the traversal of the tree occur.)
Jagannath teaches the limitation “splitting the third node to obtain a left child node and a right child node of the third node” (paragraph 47 “First, the root split is determined. Once this is done, the root's two child node bitmaps are generated and the best splits for those two children are determined. Once this is done, the process moves to the third level, and so on” Jagannath discloses splitting the root node to obtain two child nodes, wherein the two child nodes may be a left child node and right child node as configured by a person ordinary skilled in the art.)
Jagannath teaches the limitation “using the left child node as an updated third node, using, as an updated historical training sample set, a left sample set that is in the historical training sample set and that is allocated to the left child node, and executing the offline training process again” (paragraph 30 “A decision tree is represented as a directed acyclic graph consisting of links and nodes. The structure defines a set of parent-child relationships. Parent nodes contain splitting rules that define the conditions under which a specific child is chosen” Jagannath discloses the splitting from a parent node into child nodes, wherein a specific child is chosen according to splitting rules, wherein a person ordinary skilled in the art can configured the chosen child node as the left child node as an updated traversal node such that the offline training with sample data set may occur at the left child node.)
Jagannath teaches the limitation “using the right child node as the updated third node, using, as the updated historical training sample set, a right sample set that is in the historical training sample set and that is allocated to the right child node, and executing the offline training process again” (paragraph 30 “A decision tree is represented as a directed acyclic graph consisting of links and nodes. The structure defines a set of parent-child relationships. Parent nodes contain splitting rules that define the conditions under which a specific child is chosen” Jagannath discloses the splitting from a parent node into child nodes, wherein a specific child is chosen according to splitting rules, wherein a person ordinary skilled in the art can configured the chosen child node as the right child node as an updated traversal node such that the offline training with sample data set may occur at the right child node.)
The motivation to combine the teaching combination of DE BROUWER/Song/Fly with Jagannath is similar to the motivation in claim 5 above.

Regarding claim 20 depends on claim 19, thus the rejection of claim 19 is incorporated. 
Jagannath teaches the limitation “a depth of the third node in the machine learning model is greater than a depth threshold” (paragraph 65 “Comparing node depth to a pre-defined maximum value” Jagannath discloses comparison of node depth to a pre-defined maximum value, suggesting comparison of node depth to a pre-defined threshold, wherein a person ordinary skilled in that art would have been able to configure the comparison of node depth with the pre-defined value as a node split stop condition.)


Claims 7, 15 are rejected under 35 U.S.C. 103 as being over DE BROUWER et.al (US 20200293887 A1), further in view of Song et.al (US 20210027173 A1), further in view of Fly et.al (US 20200082296 A1), further in view of Jagannath et.al (US 20070185896 A1), further in view of Ignatov et.al (NPL: Decision Stream: Cultivating Deep Decision Trees)

Regarding claim 7 depends on claim 5, thus the rejection of claim 5 is incorporated. 
Jagannath teaches a part of the 2nd limitation “Wherein the first leaf node is a child node of the first non-leaf node, the second leaf node is a child node of the second non-leaf node ...” (paragraph 47 “First, the root split is determined. Once this is done, the root's two child node bitmaps are generated and the best splits for those two children are determined. Once this is done, the process moves to the third level, and so on.”)
DE BROUWER/Song/Fly/Jagannath does not teach the 1st limitation “combining a first non-leaf node and a second non-leaf node in the machine learning model, and combining a first leaf node and a second leaf node, to obtain a reduced machine learning model wherein the reduced machine learning model is used to predict a classification result” However, Ignatov teaches this part of the limitation (page 3 section 3.1 fig.2 “The overview of the merging operation is illustrated in the Fig. 2. After the classical decision tree branching, the merging algorithm takes as an input leaf nodes generated at the current stage (Fig. 2(a)) as well as previously obtained unsplit leaves from the upper levels of the model, and fuses statistically similar nodes (Fig. 2(b-c))” and page 1 “To evaluate the proposed solution, we test it on several common machine learning problems ..., MNIST and CIFAR image classification, .... Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35 %” Ignatov discloses a method of building a tree structure that include merging nodes from different branches based on their similarity to generate a deep directed acyclic graph of decision rules. Within the disclosure, Ignatov discloses the process to merge leaf nodes demonstrated in figure 2. The decision tree in figure 2 also suggest and understood by a person ordinary skilled in the art the combining of non-leaf nodes (nodes with at least one child node) to construct the decision tree machine learning model, wherein the model may be a reduced machine learning model as leaf nodes are merged. The decision tree model is tested on classification task and provide prediction on classification tasks with a prediction error decrease up to 35%.)
DE BROUWER/Song/Fly/Jagannath does not teach a part of the 2nd limitation “... the first leaf node and the second leaf node comprise a same classification result, and span ranges of feature values that are in historical training sample sets allocated to the two leaf nodes and that are in a same feature dimension are adjacent” However, Ignatov teaches this part of the limitation (Page 2 section 2 “In [17], the number of terminal nodes is reduced by fusing the leaves with similar predictions after the training is finished.”, and page 6 section 3.5 “If the feature is continuous, all samples are firstly sorted according to values of the feature and then divided into √ N ranges, where N is a number of samples in the node. Samples from the same range are then associated with one leaf node (Fig. 3(a)). At the next step, the adjacent leaves are merged” Ignatov discloses another algorithm that fuse leaf nodes with similar classification predictions, suggesting that a person ordinary skilled in the art can configure to check if the classification result at each leaf node is similar. Ignatov also discloses each leaf node is associated with divided N samples into √ N sample groups suggesting the √ N sample at each leaf node may in in a same feature dimension as they are divided from the same N samples, wherein the √ N sample is associated with feature values, wherein the feature values comprise a range as disclosed in claim 6 above  based on the teaching by Fly, wherein each leaf node may be adjacent to each other and may be further merged, suggesting that the range of feature values at each node with √ N sample is also adjacent.)
Before the effective filing date, it would have been obvious to a person ordinary skilled in the art to combine the teaching of system and method with Federated Learning model by DE BROUWER, and the teaching of feature value data by Song, and the teaching of systems and methods for detecting drift that leads to errors in a predictive analytics system by Fly, and the teaching of system and method for building decision trees in a database system as well as computing split cost by Jagannath, with the teaching of a method of building a tree structure that include merging nodes from different branches based on their similarity to generate a deep directed acyclic graph of decision rules by Ignatov. The motivation to do so is referred to in Ignatov’s disclosure (page 1 “To evaluate the proposed solution, we test it on several common machine learning problems ..., MNIST and CIFAR image classification, .... Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35 %”, page 12 section 5 “In this paper we presented a novel decision tree based algorithm — a Decision Stream, which avoids the problems of data exhaustion and formation of unrepresentative data samples in decision tree nodes by merging the leaves from the same and/or different levels of the predictive model structure ... The experiments demonstrated that Decision Stream algorithm shows a strong advantage over the standard decision tree learning methods on both regression and classification tasks” Ignatov discloses a novel decision tree-based algorithm which provide several improvements such as avoiding the problems of data exhaustion and formation of unrepresentative data samples in decision tree nodes and the method show strong advantage over the standard decision tree learning methods. While the teaching combination, wherein Jagannath also discloses a decision tree, a person ordinary skilled in the art may further incorporate the teaching combination with the teaching by Ignatov for further improvement.)

Regarding claim 15 depends on claim 13, thus the rejection of claim 13 is incorporated. The applicant is further directed to the rejection of claim 7 above, because the claim recites similar limitations, thus the claim is similarly rejected under the same rationale. 


Claims 7, 15 are rejected under 35 U.S.C. 103 as being over DE BROUWER et.al (US 20200293887 A1), further in view of Song et.al (US 20210027173 A1), further in view of Fly et.al (US 20200082296 A1), further in view of Taniguchi et.al (US 20210012244 A1)

Regarding claim 8 depends on claim 1, thus the rejection of claim 1 is incorporated. 
Song teaches the limitation “The method according to claim 1,wherein the first training sample set comprises a sample that is obtained by screening a sample obtained by the local analysis device and that meets a low discrimination condition” (paragraph 51 “The obtaining unit is further configured to obtain a sample data set of the service, where the sample data set includes at least one piece of sample data, and each piece of sample data in the sample data set includes the KPI of the network in which the service is located” Song discloses the obtaining unit to obtain a sample data set wherein the sample data set can be configured to meet a low discrimination condition based on the teaching combination below.)
DE BROUWER/Song/Fly does not teach the limitation “an absolute value of a difference between any two probabilities in probabilities obtained by predicting a sample by using the machine learning model is less than a second difference threshold”. However, Taniguchi teaches this limitation (paragraph 129 “Further, for example, in the above method (4), when two predicted values (y1, y2) are given, the ascore is obtained as follows, and if the obtained ascore is equal to or more than a threshold, the shape may be determined to be defective.”, and paragraph 129 “Further, for example, in the above method (4), when two predicted values (y1, y2) are given, the ascore is obtained as follows, and if the obtained ascore is equal to or more than a threshold, the shape may be determined to be defective. ascore=|y1−y2|−a ...” Taniguchi discloses a model generation system, a prediction system, a model generation method that can reduce the number of samples in which an invalid transition of a predicted value is output in progress prediction. Within the disclosure, Taniguchi discloses calculating an absolute value of differences between two predicted values, wherein the predicted values may be the probabilities from predicting a sample as understood by a person ordinary skilled in the art, such score is obtained and compare to a threshold to determine if a shape is defective, wherein a defective shape indicate a sample that cannot be interpreted.)
Before the effective filing date, it would have been obvious to a person ordinary skilled in the art to combine the teaching of system and method with Federated Learning model by DE BROUWER, and the teaching of feature value data by Song, and the teaching of systems and methods for detecting drift that leads to errors in a predictive analytics system by Fly with the teaching of absolute values of differences between two predictions compared to a threshold by Taniguchi. The motivation to do so is referred to in Taniguchi’s disclosure (paragraph 123 “when a predicted value at each time point included in the confirmation period is obtained for a combination of explanatory variables, an amount of error between the predicted value and a curve (approximate curve) obtained by curve fitting to the transition of the predicted value, that is, the sum of differences between the value at each time point on the approximate curve and each predicted value is obtained as an asymptotic score. Then, if the asymptotic score is equal to or more than the predetermined threshold, the combination may be determined to be a defective sample.” Taniguchi discloses comparing the difference between two predicted values against a threshold enables detection of defective or unreliable prediction outputs, thereby improving the accuracy and trustworthiness of the model’s results. Accordingly, a person of ordinary skilled in the art would have been motivated to incorporate the teaching by Taniguchi into the distributed training framework by DE BROUWER to enhance prediction quality and detect degraded model outputs across edge devices, yielding a predictable and advantageous improvement.)


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DUY TU DIEP whose telephone number is (703)756-1738. The examiner can normally be reached M-F 8-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY T DIEP/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Mar 16, 2022
Application Filed
Jun 09, 2022
Response after Non-Final Action
Jun 02, 2025
Non-Final Rejection mailed — §101, §103
Sep 02, 2025
Response Filed
Nov 19, 2025
Final Rejection mailed — §101, §103
Feb 05, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/459,157
Patent 12608642
MODEL PARAMETER LEARNING METHOD AND MOVEMENT MODE DETERMINATION METHOD
4y 7m to grant Granted Apr 21, 2026
17/551,821
Patent 12579428
METHOD FOR INJECTING HUMAN KNOWLEDGE INTO AI MODELS
4y 3m to grant Granted Mar 17, 2026
17/557,096
Patent 12488223
FEDERATED LEARNING FOR TRAINING MACHINE LEARNING MODELS
3y 11m to grant Granted Dec 02, 2025
17/317,908
Patent 12412129
DISTRIBUTED SUPPORT VECTOR MACHINE PRIVACY-PRESERVING METHOD, SYSTEM, STORAGE MEDIUM AND APPLICATION
4y 4m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
29%
Grant Probability
36%
With Interview (+6.7%)
4y 3m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allowance rate.