Last updated: May 29, 2026
Application No. 17/614,920
DATA PROCESSING METHOD AND APPARATUS, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

Non-Final OA §103§112
Filed
Nov 29, 2021
Priority
May 31, 2019 — CN 201910468502.X +1 more
Examiner
THAI, JASMINE THANH
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Jingdong City (Beijing) Digits Technology Co. Ltd.
OA Round
4 (Non-Final)
This examiner grants 25% of cases after interview

— +56.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

THAI, JASMINE THANH View full profile →
Grants only 25% of cases
Career Allowance Rate
6 granted / 24 resolved
-30.0% vs TC avg
Strong +56% interview lift
Without
With
+56.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
16 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
83.6%
+43.6% vs TC avg
§102
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 09/11/2025 have been fully considered and they are partially persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 101, the applicant argues that the amended claims are directed to a practical application. Applicant argues that “when considered as a whole, amended Claim 1 is directed to a specific improvement over the prior art as it improves the applicability and accuracy of data processing for electronic text data.” Examiner respectfully agrees and withdraws the prior rejection of claims under 35 USC § 101.

Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 103, 
Alleged No teaching of multiple machine learning models are trained to process multiple different types of data based on multiple data subsets of different types
	In Remarks p. 9-12, Applicant contends:
The prior art of record does not teach “multiple machine learning models are trained to process multiple different types of data based on multiple data subsets of different types, wherein one corresponding machine learning model is trained to process one corresponding type of data.” As “Szeto only discloses that "Each hospital's local private data servers, via their modeling engines, constructs their own proxy data using training actual data as a foundation and based on the same data selection criteria. The global modeling engine then aggregates the individual proxy data sets together to create a global training data set. Operation 590 includes the global modeling engine train a global model on the aggregated sets of proxy data. The global model integrates the knowledge gained from each entity's private data. The global modeling engine can create the trained global model by accumulating sets of actual model parameters and combining them into a single trained model. (Szeto, paragraph [0106]). In other words, in Szeto, the proxy data set of each hospital is of the same type as it is constructed based on the same selection criteria.
Thus, in Szeto, the global training data set created based on the proxy data set of each
hospital only comprises data of one single type, rather than multiple data subsets of different types. Furthermore, only one model is trained to process one single type of data, rather than multiple machine learning models trained to process multiple different types of data.”

	The relevant claim limitations appear to be “determining, by using the server, a machine learning model corresponding to each data subset, according to a type of the each data subset, wherein a data subset corresponding to different types is for training different types of machine learning models, and the different types of machine learning models are for processing different types of data” in claim 1. 
As noted in the previous Office Action, Szeto teaches (emphasis added):
Szeto teaches determining, by using the server, a machine learning model corresponding to each data subset, according to a type of the each data subset, wherein a data subset corresponding to different types is for training different types of machine learning models, and the different types of machine learning models are for processing different types of data; 
(Szeto, fig. 1, “[0106] Operation 580, performed by a global modeling engine or a peer private data machine, includes aggregating two or more proxy data sets from different private data servers. The aggregate proxy data sets (global proxy sets) are combined based on a given machine learning task and are generated according to the originally requested model instructions. Although each set of proxy data will likely be generated from different private data distributions, it should be appreciated that the corresponding private data training sets are constructed according to the same selection criteria. For example, a researcher might wish to build a prediction model on how well smokers respond to a lung cancer treatment. The research will request models to be built at many private hospitals where each hospital has its own private data. Each hospital receives the same data selection criteria; patients who are smokers, given the treatment, and their associated known outcome. Each hospital's local private data servers, via their modeling engines, constructs their own proxy data using training actual data as a foundation and based on the same data selection criteria. The global modeling engine then aggregates the individual proxy data sets together to create a global training data set [corresponding to each data subset, according to a type of the each data subset, wherein a data subset corresponding to different types is for training different types of machine learning models, and the different types of machine learning models are for processing different types of data; wherein the corresponding data subset is the aggregate proxy data sets for each entity ie the hospitals]. Operation 590 includes the global modeling engine train a global model on the aggregated sets of proxy data. The global model integrates the knowledge gained from each entity's private data. In some embodiments, the global modeling engine can create the trained global model by accumulating sets of actual model parameters and combining them into a single trained model [determining a machine learning model]. Such an approach is considered feasible for simplistic, linear algorithms, a linear SVM, for example. However, in more complex embodiments, say neural networks, using proxy data sets is considered superior due to the retention of the potential knowledge in proxy data sets that might be lost through mathematically combining (e.g., adding, averaging, etc.) individual parameters together.”)
	First, Examiner respectfully points out that “type of each data subset” is not explicitly defined and Examiner looks to para. [0051] of the instant application for examples of “type of each data subset” wherein it is disclosed “In step 130, a machine learning model corresponding to each data subset is determined according to the type of each data subset. For example, the server may configure an optimal model framework in advance as a machine learning model corresponding to various types of data according to factors such as modeling requirements (for example, solving a classification problem, a regression problem and the like), data types, and prior knowledge.” Thus, Examiner notes that an exemplary type can be different model tasks (classification/regression/etc), data types, and prior knowledge.
	After careful consideration, the argument is considered unpersuasive as Examiner points out that while each hospital constructs their own proxy data based on the same data selection criteria, it does so using the training actual data as a foundation wherein training actual data is interpreted to be prior knowledge and Szeto further discloses “Although each set of proxy data will likely be generated from different private data distributions, it should be appreciated that the corresponding private data training sets are constructed according to the same selection criteria.” Thus, as the data distribution is different (ie the prior knowledge of each hospital is different), Szeto teaches different types of data subsets where in the context of Szeto para. [0106], an example selection criterion is “patients who are smokers, given the treatment, and their associated known outcome” to provide a prediction model (a type of machine learning model) on how smokers respond to lung cancer treatment (data pertaining to smokers as a type of data; which is determined by the selection criteria). Szeto additionally discloses “For example, a researcher might wish to build a prediction model on how well smokers respond to a lung cancer treatment. The research will request models to be built at many private hospitals where each hospital has its own private data.” Wherein one of ordinary skills would be able to realize another selection criteria (different data type) with another appropriate model (different model type) as Szeto discloses multiple models can be built (thus trained) at multiple hospitals.
	Additionally, Examiner respectfully points out the claim recites “determining, by using the server, a machine learning model corresponding to each data subset” ie a single machine learning model corresponding to each data subset (ie a one to many relationship wherein a single machine learning model is interpreted to be determined based on iterating through each data subset) such as when the global model is determined based on the aggregate proxy data of the participating hospitals.
Lastly, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1-4, 9-11, and 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 and analogous claim 13-15 recites the limitation "wherein for any of the plurality of data platforms, the first gradient is calculated by the any data platform based on an intermediate value calculated by itself and intermediate values from other ones of the plurality of data platforms."  There is insufficient antecedent basis for this limitation in the claim and it is unclear if this first gradient is the same as “wherein one of the first gradients is a gradient of a loss function obtained by one of the plurality of data platforms training its corresponding machine learning model according to its corresponding data subset.”
	Claim 1 and analogous claim 13 further recites “so that each data platform trains a corresponding machine learning model according to the second gradient.” It is unclear if this “a corresponding machine learning model” is the same as the “corresponding machine learning model” recited in “sending the each data subset and corresponding machine learning model to each of the plurality of data platforms.”
	Claims 2-4, 9-11, and 14-15 are further rejected on virtue of their dependencies to claim 1. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 9-11, and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US20170093996A1 Amalapurapu et al. (“Amalapurapu”) in view of U.S. Pub. No. US20180018590A1 Szeto et al. (“Szeto”) in further view of Yang, Qiang, et al. "Federated machine learning: Concept and applications." (“Yang”).
In regards to claim 1 and analogous claim 13,
Amalapurapu teaches A data processing method, comprising: obtaining original data from a plurality of data platforms, 
(Amalapurapu, “[0018] Existing web sites [a plurality of data platforms] can generally track activities of users [obtaining original data ie activities of users on the website] on the web site (e.g., monitoring and tracking a user's browsing activity during a session on the web site). For example, the web site can use cookies (e.g., in pixel logs, in which cookies are used to identify a user, at least anonymously, per device, and in which the cookies are typically persistent on the device, that is, the cookies are stored across sessions) or other approaches to track a user's activity on the web site. Thus, what is being actively browsed, viewed, and/or other user activities on the web site can be tracked. But if the user is anonymously accessing (e.g., browsing) the web site (e.g., if the user has not logged into/signed into the web site and/or has not otherwise verified the user's identity with the web site using a type of user authorization/authentication, such as username, a username and password, a biometric verification of a user, a token, or other schemes or combinations thereof), then the web site generally cannot associate such tracked user activities with a specific/confirmed user identity.”)
Amalapurapu teaches using a server, wherein the server is a third-party server other than the plurality of data platforms; 
(Amalapurapu, “[0047] The web servers can also subscribe to a cross platform user joining service 120 (e.g., which can be provided as a cloud-based cross platform user joining service for web sites [using a server, wherein the server is a third-party server other than the plurality of data platforms]). In some implementations, the cross platform user joining service provides various techniques for identifying users across devices, even if such users are accessing the web sites anonymously and using multiple different/distinct devices to access the web sites, as disclosed herein.”)

Amalapurapu teaches wherein the original data comprises electronic text data;
(Amalapurapu, “[0003] For example, a web service can be delivered via a web site. In some cases, the web site can allow users to access content delivered via the web site using anonymous user access. Some web sites can allow or require that users login to access some or all of the content delivered via the web site (e.g., subscription access may be required to access certain content on the web site, such as for an online newspaper, an e-commerce shopping site [original data comprises electronic text data; wherein a cookie is electronic text data], a social networking web site, a web-based email service, a file sharing web site, and/or other web services)…
[0018] Existing web sites can generally track activities of users on the web site (e.g., monitoring and tracking a user's browsing activity during a session on the web site). For example, the web site can use cookies (e.g., in pixel logs, in which cookies are used to identify a user, at least anonymously, per device, and in which the cookies are typically persistent on the device, that is, the cookies are stored across sessions) or other approaches to track a user's activity on the web site”)

Amalapurapu teaches combining, by using the server, the original data from a plurality of data platforms to create a training data set, according to an overlap condition between the original data;
(Amalapurapu, “[0089] Referring to FIG. 6, pixel logs data input 604 is received at IP [the original data from a plurality of data platforms], UID mapping component 606. For example, the IP, UID mapping can be implemented using a Map Reduce (MR) (e.g., also commonly referred to as MapReduce) processing operation(s). The result of this IP, UID mapping is a generation of a bipartite graph 608 (e.g., providing an IP versus UID bipartite graph, such as shown in FIG. 4 as discussed above). Bipartite graph 608 is provided to a UID similarities component 610 for determining a mapping of (likely) related/associated UIDs at 612 [according to an overlap condition between the original data]. The UID to UID relationship mapping result 612 is provided to merge UID, behavior data component 620.
[0090] As also shown, pixel logs data input 614 is received at UID/behavior analysis component 616 [the original data from a plurality of data platforms]. For example, UID/behavior analysis component 616 can determine an association of various behaviors (e.g., products, categories, search terms, etc.) to each UID, which can be implemented using a Map Reduce (MR) processing operation(s). The result of this UID to behavior(s) relationships result 618 is also provided to merge UID, behavior data component 620.
[0091] Merge UID, behavior data component 620 receives input 612 and 618 and merges the received UID and behavior data to generate merged UID, behavior data results 622 [combining, by using the server,… ].”)

Amalapurapu teaches clustering, by using the server, data in the training data set into multiple types to obtain a plurality of data subsets corresponding to the multiple types, according to attributes of the data in the training data set; 
(Amalapurapu, “[0059] As shown in FIG. 3, one use case of merged profiles of users across devices are then provided as input into a search platform, such as an Apache Solr open source search platform (e.g., or another open source or commercially available search platform can be used), to serve personalized results to each profile in accordance with some embodiments. Specifically, at 312, user data received from UID correlation based on IP, email 306 and session analysis 310 can be merged. At 314, such users can be clustered into user profiles [clustering, by using the server, data in the training data set into multiple types to obtain a plurality of data subsets corresponding to the multiple types, according to attributes of the data in the training data set; wherein the data subset is the joined user profile] (e.g., various clustering techniques are further described below). At 316, the Apache Solr platform can also be used to provide user preferences based on the clustered user profiles (e.g., to associate common user session/behavior data parameters with clustered user profiles).”)

Amalapurapu teaches and sending the each data subset and [corresponding machine learning model] to each of the plurality of data platforms, using the server,
(Amalapurapu, “[0142] At 910, sending the joined user profile [sending the each data subset] to a web service [to each of the plurality of data platforms], in which the web service can customize content presented by the web service to a user based on the joined user profile is performed [wherein the each data platform uses the each data subset to train a machine learning model corresponding to the each data subset for processing data of the type corresponding to the each data subset; wherein the web service uses some machine learning model to customize content (ie training) based on the given joined user profile]. For example, using a user profile based on a plurality of UIDs that have been joined based on such cross platform user joining techniques, content on the web site can be customized and/or personalized for presentation to the user based on the joined user profile. In some cases, a recommended category of content can be displayed automatically or in response to a user request on a web site.”)

However, Amalapurapu does not explicitly teach 
determining, by using the server, a machine learning model corresponding to each data subset, according to a type of the each data subset, wherein a data subset corresponding to different types is for training different types of machine learning models, and the different types of machine learning models are for processing different types of data; 

and sending [the each data subset and] corresponding machine learning model to each of the plurality of data platforms, using the server,wherein the each data platform uses the each data subset to train a machine learning model corresponding to the each data subset for processing data of the type corresponding to the each data subset.

Szeto teaches determining, by using the server, a machine learning model corresponding to each data subset, according to a type of the each data subset, wherein a data subset corresponding to different types is for training different types of machine learning models, and the different types of machine learning models are for processing different types of data; 
(Szeto, fig. 1, “[0106] Operation 580, performed by a global modeling engine or a peer private data machine, includes aggregating two or more proxy data sets from different private data servers. The aggregate proxy data sets (global proxy sets) are combined based on a given machine learning task and are generated according to the originally requested model instructions. Although each set of proxy data will likely be generated from different private data distributions, it should be appreciated that the corresponding private data training sets are constructed according to the same selection criteria. For example, a researcher might wish to build a prediction model on how well smokers respond to a lung cancer treatment. The research will request models to be built at many private hospitals where each hospital has its own private data. Each hospital receives the same data selection criteria; patients who are smokers, given the treatment, and their associated known outcome. Each hospital's local private data servers, via their modeling engines, constructs their own proxy data using training actual data as a foundation and based on the same data selection criteria. The global modeling engine then aggregates the individual proxy data sets together to create a global training data set [corresponding to each data subset, according to a type of the each data subset, wherein a data subset corresponding to different types is for training different types of machine learning models, and the different types of machine learning models are for processing different types of data; wherein the corresponding data subset is the aggregate proxy data sets for each entity ie the hospitals]. Operation 590 includes the global modeling engine train a global model on the aggregated sets of proxy data. The global model integrates the knowledge gained from each entity's private data. In some embodiments, the global modeling engine can create the trained global model by accumulating sets of actual model parameters and combining them into a single trained model [determining a machine learning model]. Such an approach is considered feasible for simplistic, linear algorithms, a linear SVM, for example. However, in more complex embodiments, say neural networks, using proxy data sets is considered superior due to the retention of the potential knowledge in proxy data sets that might be lost through mathematically combining (e.g., adding, averaging, etc.) individual parameters together.”)

Szeto teaches and sending [the each data subset and] corresponding machine learning model to each of the plurality of data platforms, using the server,
(Szeto, “[0107] In other embodiments, the global modeling engine also transmits the trained global model back to one or more of the private data servers [sending [the each data subset and] corresponding machine learning model to each of the plurality of data platforms, using the server,]. The private data servers can then leverage the global trained model to conduct local prediction studies in support of local clinical decision making workflows. In addition, the private data servers can also use the global model as a foundation for continued online learning [wherein the each data platform uses the each data subset to train a machine learning model corresponding to the each data subset for processing data of the type corresponding to the each data subset]. Thus, the global model becomes a basis for continued machine learning as new private data becomes available. As new data becomes available, method 500 can be repeated to improve the global modeling engine.”)

However, Szeto does not explicitly teach calculating, by using the server, a second gradient, according to first gradients returned by the plurality data platforms, wherein one of the first gradients is a gradient of a loss function obtained by one of the plurality of data platforms training its corresponding machine learning model according to its corresponding data subset; and sending, by using the server, the second gradient to each of the plurality of data platforms, so that each data platform trains a corresponding machine learning model according to the second gradient, wherein for any of the plurality of data platforms, the first gradient is calculated by the any data platform based on an intermediate value calculated by itself and intermediate values from other ones of the plurality of data platforms, and a corresponding data subset from the server is used by the any data platform to train a corresponding machine learning model so as to obtain the intermediate value.

Yang teaches calculating, by using the server, a second gradient, according to first gradients returned by the plurality data platforms, wherein one of the first gradients is a gradient of a loss function obtained by one of the plurality of data platforms training its corresponding machine learning model according to its corresponding data subset; and sending, by using the server, the second gradient to each of the plurality of data platforms, so that each data platform trains a corresponding machine learning model according to the second gradient, 
(Yang, Section 2.4.2, “Part 2. Encrypted model training. After determining the common entities, we can use these common entities’ data to train the machine learning model. The training process can be divided into the following four steps (as shown in Figure 4):
• Step 1: collaborator C creates encryption pairs, send public key to A and B;
• Step 2: A and B encrypt and exchange the intermediate results for gradient and loss calculations [first gradients returned by the plurality data platforms, wherein one of the first gradients is a gradient of a loss function obtained by one of the plurality of data platforms training its corresponding machine learning model according to its corresponding data subset];
• Step 3: A and B computes encrypted gradients and adds additional mask, respectively,and B
also computes encrypted loss [calculating a second gradient]; A and B send encrypted values to C;
• Step 4: C [by using the server] decrypts and send the decrypted gradients and loss back to A and B; A and B
unmask the gradients, update the model parameters accordingly [sending, .., the second gradient to each of the plurality of data platforms, so that each data platform trains a corresponding machine learning model according to the second gradient].

    PNG
    media_image1.png
    106
    683
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    355
    689
    media_image2.png
    Greyscale
”)


Yang teaches wherein for any of the plurality of data platforms, the first gradient is calculated by the any data platform based on an intermediate value calculated by itself and intermediate values from other ones of the plurality of data platforms, and a corresponding data subset from the server is used by the any data platform to train a corresponding machine learning model so as to obtain the intermediate value.  
(Yang, Section 2.4.2, “Part 2. Encrypted model training. After determining the common entities, we can use these common entities’ data to train the machine learning model [and a corresponding data subset from the server is used by the any data platform to train a corresponding machine learning model so as to obtain the intermediate value]. The training process can be divided into the following four steps (as shown in Figure 4):
• Step 1: collaborator C creates encryption pairs, send public key to A and B;
• Step 2: A and B encrypt and exchange the intermediate results for gradient and loss calculations [wherein for any of the plurality of data platforms, the first gradient is calculated by the any data platform based on an intermediate value calculated by itself and intermediate values from other ones of the plurality of data platforms];”)

Yang teaches wherein during a training process, a training result of the training data set is determined according to a training result of the each data subset, and the training result of the each data subset is obtained by training a machine learning model corresponding to the each data subset by the each data platform using the each data subset, and wherein during a process of using the corresponding machine learning model on the each data platform, the corresponding machine learning model is used to process corresponding type of data so as to obtain the training result of the each data subset, and the training result of the each data subset is spliced into a final training result of the data.

    PNG
    media_image3.png
    168
    681
    media_image3.png
    Greyscale
 
(Yang, Section 2.4.2 , “Security Analysis. The training protocol [wherein during a training process wherein a training protocol is a training process] shown in Table 1 does not reveal any information to C, because all C learns are the masked gradients and the randomness and secrecy of the masked matrix are guaranteed [16]. In the above protocol, party A learns its gradient at each step, but this is not enough for A to learn any information from B according to equation 8, because the security of scalar product protocol is well-established based on the inability of solving n equations in more than n unknowns [16, 65]. Here we assume the number of samples NA is much greater than nA, where nA is the number of features. Similarly, party B can not learn any information from A. Therefore the security of the protocol is proved. Note we have assumed that both parties are semi-honest. If a party is malicious and cheats the system by faking its input, for example, party A submits only one non-zero input with only one non-zero feature, it can tell the value of uBi [training result of each the data subset, 
and the training result of the each data subset is obtained by training a machine learning model corresponding to the each data subset by the each data platform using the each data subset; wherein ui^A is the training result for data platform ie party A; ex ui^A is the training result for data platform ie party A] for that feature of that sample. It still can not tell xBi or ΘB though, and the deviation will distort results for the next iteration, alarming the other party who will terminate the learning process. At the end of the training process, each party (A or B) remains oblivious to the data structure of the other party, and it obtains the model parameters associated only with its own features. At inference time [and wherein during a process of using the corresponding machine learning model on the each data platform; ie inference time], the two parties need to collaboratively compute the prediction results [the corresponding machine learning model is used to process corresponding type of data so as to obtain the training result of the each data subset; … a training result of the training data set is determined according to a training result of the each data subset…and the training result of the each data subset is spliced (see concatenation of ui^A and ui^B) into a final training result of the data; wherein the result is ui^A + ui^B], with the steps shown in Table 2, which still do not lead to information leakage.”)

Amalapurapu in view of Szeto and Yang are both considered to be analogous to the claimed invention because they are in the same field of providing focused aggregated data based on the similarities obtained from consumers. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have substituted the joined user profile of Amalapurapu for the aggregate proxy data sets for each entity of Szeto in order to provide targeted aggregated data that maintains the privacy of patients ie users (Szeto, “[0005] Unfortunately, researchers often encounter obstacles when compiling data sets for their in-progress research, especially when attempting to build trained machine learning models capable of generating interesting predictions using in-the-field data. One major obstacle is that researchers often lack access to the data they require. Consider, for example, a scenario where a researcher wishes to build a trained model from patient data where the patient data is stored in multiple hospitals' electronic medical record databases. The researcher would likely not have authorization to access each hospital's patient data due to privacy restrictions or HIPAA compliance. In order to compile a desired data set, the researcher must request the data from the hospital. Assuming the hospital is amenable to the request, the hospital must then de-identify the data to remove references to specific patients before providing the data to the researcher. However, de-identification results in loss of possibly valuable information in the dataset that could be instrumental in training machine learning algorithms, which in turn can provide opportunities for discovering new relationships in the data or provide value predictive properties. Thus, because of the security restrictions, the datasets available to the researcher could lack information. Clearly, researchers would benefit from technologies that could extract learned information or “knowledge” while also respecting private or secured information distributed across multiple data stores.”)

Yang is considered to be analogous to the claimed invention because they are in the same field of federated learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Amalapurapu in view of Szeto to incorporate the teachings of Yang in order to protect data privacy and bolster data security of multi-party systems for a wide variety of applications (Yang, Section 4, “As an innovative modeling mechanism that could train a united model on data from multiple parties without compromising privacy and security of those data, federated learning has a promising application in sales, financial, and many other industries, in which data cannot be directly aggregated for training machine learning models due to factors such as intellectual property rights, privacy protection, and data security.
Take the smart retail as an example. Its purpose is to use machine learning techniques to provide customers with personalized services, mainly including product recommendation and sales services. The data features involved in the smart retail business mainly include user purchasing power, user personal preference, and product characteristics. In practical applications, these three data features are likely to be scattered among three different departments or enterprises. For example, a user’s purchasing power can be inferred from her bank savings and her personal preference can be analyzed from her social networks, while the characteristics of products are recorded by an e-shop. In this scenario, we are facing two problems. First, for the protection of data privacy and data security, data barriers between banks, social networking sites, and e-shopping sites are difficult to break. As a result, data cannot be directly aggregated to train a model. Second, the data stored in the three parties are usually heterogeneous, and traditional machine learning models cannot directly work on heterogeneous data. For now, these problems have not been effectively solved with traditional machine learning methods, which hinder the popularization and application of artificial intelligence in more fields.
Federated learning and transfer learning are the key to solving these problems. First, by exploiting the characteristics of federated learning, we can build a machine learning model for the three parties without exporting the enterprise data, which not only fully protects data privacy and data security, but also provides customers with personalized and targeted services and thereby achieves mutual benefits. Meanwhile, we can leverage transfer learning to address the data heterogeneity problem and break through the limitations of traditional artificial intelligence techniques. Therefore federated learning provides a good technical support for us to build a cross-enterprise, cross-data, and cross-domain ecosphere for big data and artificial intelligence.”)

In regards to claim 2,
Amalapurapu in view of Szeto and Yang teaches The data processing method according to claim 1,
Yang teaches wherein the original data comprises user identifiers and user characteristics, and the combining original data from a plurality of data platforms to create a training data set comprises:
selecting data with a same user identifier in the original data from different data platforms to create the training data set, in the case where an overlap degree of user identifiers exceeds an overlap degree of user characteristics in the original data from different data platforms.
(Yang, Section 2.3, “2.3 A Categorization of Federated Learning
In this section we discuss how to categorize federated learning based on the distribution characteristics of the data.
Let matrix Di denotes the data held by each data owner i. Each row of the matrix represents a sample, and each column represents a feature. At the same time, some data sets may also contain label data. We denote the features space as X, the label space as Y and we use I to denote the sample ID space. For example, in the financial field labels may be users’ credit; in the marketing field labels may be the user’s purchase desire; in the education field, Y may be the degree of the students. The feature X, label Y [user characteristics] and sample Ids I [user identifiers ] constitutes the complete training dataset (I,X,Y). The feature and sample space of the data parties may not be identical, and we classify federated learning into horizontally federated learning, vertically federated learning and federated transfer learning based on how data is distributed among various parties in the feature and sample ID space. Figure 2 shows the various federated learning frameworks for a two-party scenario.”)
(Yang, Section 2.3.2,  “Vertically federated learning is the process of aggregating these different features and computing the training loss and gradients in a privacy-preserving manner to build a model with data from both parties collaboratively. Under such a federal mechanism, the identity and the status of each participating party is the same, and the federal system helps everyone establish a "common wealth" strategy, which is why this system is called "federated learning.". Therefore, in such a system, we have:

    PNG
    media_image4.png
    38
    491
    media_image4.png
    Greyscale
”; wherein the overlap degree (equivalence) of the user identifiers Ii=Ij exceeds the overlap degree (equivalence) of user characteristics Yi=Yj )

Yang is considered to be analogous to the claimed invention because they are in the same field of federated learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Amalapurapu in view of Szeto to incorporate the teachings of Yang in order to protect data privacy and bolster data security of multi-party systems for a wide variety of applications (Yang, Section 4, “As an innovative modeling mechanism that could train a united model on data from multiple parties without compromising privacy and security of those data, federated learning has a promising application in sales, financial, and many other industries, in which data cannot be directly aggregated for training machine learning models due to factors such as intellectual property rights, privacy protection, and data security.
Take the smart retail as an example. Its purpose is to use machine learning techniques to provide customers with personalized services, mainly including product recommendation and sales services. The data features involved in the smart retail business mainly include user purchasing power, user personal preference, and product characteristics. In practical applications, these three data features are likely to be scattered among three different departments or enterprises. For example, a user’s purchasing power can be inferred from her bank savings and her personal preference can be analyzed from her social networks, while the characteristics of products are recorded by an e-shop. In this scenario, we are facing two problems. First, for the protection of data privacy and data security, data barriers between banks, social networking sites, and e-shopping sites are difficult to break. As a result, data cannot be directly aggregated to train a model. Second, the data stored in the three parties are usually heterogeneous, and traditional machine learning models cannot directly work on heterogeneous data. For now, these problems have not been effectively solved with traditional machine learning methods, which hinder the popularization and application of artificial intelligence in more fields.
Federated learning and transfer learning are the key to solving these problems. First, by exploiting the characteristics of federated learning, we can build a machine learning model for the three parties without exporting the enterprise data, which not only fully protects data privacy and data security, but also provides customers with personalized and targeted services and thereby achieves mutual benefits. Meanwhile, we can leverage transfer learning to address the data heterogeneity problem and break through the limitations of traditional artificial intelligence techniques. Therefore federated learning provides a good technical support for us to build a cross-enterprise, cross-data, and cross-domain ecosphere for big data and artificial intelligence.”)

In regards to claim 3,
Amalapurapu in view of Szeto and Yang teaches The data processing method according to claim 1,
Yang teaches wherein the original data comprises user identifiers and user characteristics, and the combining original data from different data platforms to create a training data set comprises:
selecting data with a same user characteristic in the original data from different data platforms to create the training data set, in the case where an overlap degree of user characteristics exceeds an overlap degree of user identifiers in the original data from different data platforms.
(Yang, Section 2.3, “2.3 A Categorization of Federated Learning
In this section we discuss how to categorize federated learning based on the distribution characteristics of the data.
Let matrix Di denotes the data held by each data owner i. Each row of the matrix represents a sample, and each column represents a feature. At the same time, some data sets may also contain label data. We denote the features space as X, the label space as Y and we use I to denote the sample ID space. For example, in the financial field labels may be users’ credit; in the marketing field labels may be the user’s purchase desire; in the education field, Y may be the degree of the students. The feature X, label Y [user characteristics] and sample Ids I [user identifiers ] constitutes the complete training dataset (I,X,Y). The feature and sample space of the data parties may not be identical, and we classify federated learning into horizontally federated learning, vertically federated learning and federated transfer learning based on how data is distributed among various parties in the feature and sample ID space. Figure 2 shows the various federated learning frameworks for a two-party scenario.”)
(Yang, Section 2.3.1, “In [60], a multi-task style federated learning system is proposed to allow multiple sites to complete separate tasks, while sharing knowledge and preserving security. Their proposed multi-task learning model can in addition address high communication costs, stragglers, and fault tolerance issues. In [41], the authors proposed to build a secure client-server structure where the federated learning system partitions data by users, and allow models built at client devices to collaborate at the server site to build a global federated model. The process of model building ensures that there is no data leakage. Likewise, in [36], the authors proposed methods to improve the communication cost to facilitate the training of centralized models based on data distributed over mobile clients. Recently, a compression approach called Deep Gradient Compression [39] is proposed to greatly reduce the communication bandwidth in large-scale distributed training. We summarize horizontal federated learning as:

    PNG
    media_image5.png
    32
    492
    media_image5.png
    Greyscale
”; wherein the overlap degree (equivalence) of user characteristics between different data platforms is the Yi=Yj and exceeds an overlap degree of user identifiers ie the non-equivalence of Ii and Ij)
In regards to claim 4,
Amalapurapu in view of Szeto and Yang teaches The data processing method according to claim 1,
Yang teaches wherein the original data comprises user identifiers and user characteristics, and the combining original data from different data platforms to create a training data set comprises:
determining which data platform has original data comprising label features, 
(Yang, Section 2.4.3, “2.4.3 Federated Transfer Learning. Suppose in the above vertical federated learning example, party A and B only have a very small set of overlapping samples and we are interested in learning the labels for all the data set in party A [determining which data platform has original data comprising label features]. The architecture described in the above section so far only works for the overlapping data set. To extend its coverage to the entire sample space, we introduce transfer learning. This does not change the overall architecture shown in Figure 4 but the details of the intermediate results that are exchanged between party A and party B. Specifically, transfer learning typically involves in learning a common representation between the features of party A and B, and minimizing the errors in predicting the labels for the target-domain party by leveraging the labels in the source-domain party (B in this case). Therefore the gradient computations for party A and party B are different from that in the vertical federated learning scenario. At inference time, it still requires both parties to compute the prediction results.”)
in the case where neither an overlap degree of user characteristics nor an overlap degree of user identifiers in original data from different data platforms exceeds a threshold; and
creating the training data set, according to the label features.
(Yang, Section 2.3, “2.3 A Categorization of Federated Learning
In this section we discuss how to categorize federated learning based on the distribution characteristics of the data.
Let matrix Di denotes the data held by each data owner i. Each row of the matrix represents a sample, and each column represents a feature. At the same time, some data sets may also contain label data. We denote the features space as X, the label space as Y and we use I to denote the sample ID space. For example, in the financial field labels may be users’ credit; in the marketing field labels may be the user’s purchase desire; in the education field, Y may be the degree of the students. The feature X, label Y [user characteristics] and sample Ids I [user identifiers ] constitutes the complete training dataset (I,X,Y) [creating the training data set]. The feature and sample space of the data parties may not be identical, and we classify federated learning into horizontally federated learning, vertically federated learning and federated transfer learning based on how data is distributed among various parties in the feature and sample ID space [according to the label features]. Figure 2 shows the various federated learning frameworks for a two-party scenario.”)
(Yang, Section 2.3.3, “2.3.3 Federated Transfer Learning (FTL). Federated Transfer Learning applies to the scenarios that the two data sets differ not only in samples but also in feature space. Consider two institutions, one is a bank located in China, and the other is an e-commerce company located in the United States. Due to geographical restrictions, the user groups of the two institutions have a small intersection. On the other hand, due to the different businesses, only a small portion of the feature space from both parties overlaps. In this case, transfer learning [50] techniques can be applied to provide solutions for the entire sample and feature space under a federation (Figure2c). Specially, a common representation between the two feature space is learned using the limited common sample sets and later applied to obtain predictions for samples with only one-side features. FTL is an important extension to the existing federated learning systems because it deals with problems exceeding the scope of existing federated learning algorithms:

    PNG
    media_image6.png
    34
    493
    media_image6.png
    Greyscale
”; wherein both user characteristics Yi and Yj are not overlapping (equivalent) nor is there an overlapping (equivlance) of user identifiers Ii and Ij)

In regards to claim 9,
Amalapurapu in view of Szeto and Yang teaches The data processing method according to claim 1,
Yang teaches wherein the sending the each data subset to each of a plurality of data platforms comprises:
encrypting and sending the each data subset to the each of a plurality of data platforms.
(Yang, Section 2.4.2, “Part 1. Encrypted entity alignment. Since the user groups of the two companies are not the same, the system uses the encryption-based [encrypting] user ID alignment techniques such as [38, 56] to confirm the common users of both parties without A and B exposing their respective data. During the entity alignment, the system does not expose users that do not overlap with each other.
Part 2. Encrypted model training. After determining the common entities, we can use these
common entities’ data [sending the each data subset to the each of a plurality of data platforms; wherein the each data subset is determined by the entity alignment for each data platform ie party] to train the machine learning model. The training process can be divided
into the following four steps (as shown in Figure 4):
• Step 1: collaborator C creates encryption pairs, send public key to A and B;
• Step 2: A and B encrypt and exchange the intermediate results for gradient and loss calculations;
• Step 3: A and B computes encrypted gradients and adds additional mask, respectively,and B
also computes encrypted loss; A and B send encrypted values to C;
• Step 4: C decrypts and send the decrypted gradients and loss back to A and B; A and B
unmask the gradients, update the model parameters accordingly.”)

In regards to claim 10,
Amalapurapu in view of Szeto and Yang teaches The data processing method according to claim 1, 
Amalapurapu teaches wherein the attributes comprise at least one of spatial attributes, temporal attributes, and corresponding or business attributes of the data.
(Amalapurapu, “[0070] Small—Medium Group IP: if the IP address is shared by 5-20 people, most likely these people have some common interests, demographics, and/or other aspects or attributes that can be reasonably inferred by the cross platform user joining service. For example, such persons may be part of the same Small/Medium Business (SMB) or may be frequent visitors of the same coffee shop or another location [business attributes].”)

In regards to claim 11,
Amalapurapu in view of Szeto and Yang teaches The data processing method according to claim 1, 
Amalapurapu teaches wherein the original data is original electronic text data, a type of the data platform is at least one of a bank data platform or an electronic-commerce data platform, and the original electronic text data are electronic text data storing user-related information and business-related information.
(Amalapurapu, “[0003] For example, a web service can be delivered via a web site. In some cases, the web site can allow users to access content delivered via the web site using anonymous user access. Some web sites can allow or require that users login to access some or all of the content delivered via the web site (e.g., subscription access may be required to access certain content on the web site, such as for an online newspaper, an e-commerce shopping site [electronic-commerce data platform; wherein a cookie is electronic text data], a social networking web site, a web-based email service, a file sharing web site, and/or other web services).”)

Claim 13 is rejected under the same rationale as claim 1 as they are substantially similar.
In regards to claim 14,
Amalapurapu teaches A data processing apparatus, comprising:
a memory; and a processor coupled to the memory, wherein the processor is configured to perform the data processing method according to claim 1 based on instructions stored in the memory.
(Amalapurapu, “[0016] The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.”)
The remaining steps of claim 14 are rejected under the same rationale as the analogous steps of claim 1.

Claim 15 is rejected under the same rationale as claim 14 as they are substantially similar.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US20140279741A1 Sow et al. teaches Scalable online hierarchical meta-learning
US11580380B2 Maloney et al. teaches Systems and methods for distributed training of deep learning models
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        




/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Show 3 earlier events
Mar 06, 2025
Final Rejection mailed — §103, §112
May 09, 2025
Request for Continued Examination
May 16, 2025
Response after Non-Final Action
Jun 12, 2025
Non-Final Rejection mailed — §103, §112
Sep 11, 2025
Response Filed
Oct 27, 2025
Final Rejection mailed — §103, §112
Dec 26, 2025
Response after Non-Final Action
May 28, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773
Patent 12561603
SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA
4y 7m to grant Granted Feb 24, 2026
17/588,175
Patent 12555000
GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE
4y 0m to grant Granted Feb 17, 2026
17/676,775
Patent 12462154
METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS
3y 8m to grant Granted Nov 04, 2025
17/470,900
Patent 12395590
REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING
3y 11m to grant Granted Aug 19, 2025
17/357,626
Patent 12380361
Federated Machine Learning Management
4y 1m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
25%
Grant Probability
81%
With Interview (+56.3%)
3y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allowance rate.
DATA PROCESSING METHOD AND APPARATUS, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

This examiner grants 25% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email