Last updated: April 19, 2026
Application No. 18/077,471
TRAINING METHOD AND APPARATUS FOR NEURAL NETWORK MODEL, DEVICE AND STORAGE MEDIUM

Final Rejection §103
Filed
Dec 08, 2022
Examiner
THAI, JASMINE THANH
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
OA Round
2 (Final)
This examiner grants 25% of cases after interview

— +56.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

THAI, JASMINE THANH View full profile →
Grants only 25% of cases
Career Allow Rate
6 granted / 24 resolved
-30.0% vs TC avg
Strong +56% interview lift
Without
With
+56.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
23.6%
-16.4% vs TC avg
§103
37.2%
-2.8% vs TC avg
§102
14.6%
-25.4% vs TC avg
§112
21.8%
-18.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 11/25/2025 have been fully considered but they are not persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 102, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 7-8 and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over CN Pub. No. CN113051586A Du et al.(“Du”) in view of CN Pub. No. CN110968886A Zhou Yashun (“Zhou”) in further view of US Pub No. US20200366459A1 Nandakumar et al. (“Nandakumar”)
In regards to claim 1,
Du teaches A training method for a neural network model, comprising: acquiring a feature representation ciphertext of a sample user from each feature provider of at least two feature providers separately, wherein the feature representation ciphertext is determined based on a feature sub-neural network in the each feature provider according to feature data of the sample user on a feature term associated with the each feature provider; determining a tag ciphertext of the sample user, 
(Du, Disclosure of Invention, “According to a first aspect of the present disclosure, there is provided a federal modeling system [A training method for a neural network model], the federal modeling system comprises a plurality of modeling device and a characteristic processing device, and disposed in the label and data, wherein the characteristic processing device for obtaining the first sample characteristic and tag value of the label; performing iteration processing to the first sample feature to obtain the intermediate feature vector; The modeling device comprises: a vector receiving module, for receiving the encryption feature vector of the data party, wherein the encryption feature vector is obtained by encrypting the second sample feature after the iterative process by the data party modeling device based on the first public key encryption data of the data party [acquiring a feature representation ciphertext ie encryption feature vector of a sample user from each feature provider ie data party of at least two feature providers separately, wherein the feature representation ciphertext is determined based on a feature sub-neural network ie characteristic processing device/feature processing device wherein Examiner interprets characteristic to be analogous to feature and the characteristic/feature processing device generates the feature vector (thus, the determining is based on the characteristic/feature processing device) in the each feature provider according to feature data ie sample feature of the sample user on a feature term ie the size of the feature vector == feature dimensions and the feature term is the feature dimensions per the specification of the instant application associated with the each feature provider]; a data processing module, for based on the intermediate feature vector, encrypting the feature vector and the tag value [determining a tag ciphertext of the sample user; ie encrypted tag value], calculating and encrypting a plurality of local data corresponding to the first sample feature; a data decryption module for decrypting the local data related to the first public key encryption data in the data party, receiving and decrypting the local data decrypted by the data party to obtain the corresponding target decryption data; training the model to be trained based on the target decryption data to obtain the target model.”)
Du teaches and determining a loss error ciphertext and a gradient ciphertext of a tag neuron in a tag sub-neural network based on the tag sub-neural network according to the feature representation ciphertext and the tag ciphertext; 
Examiner’s note: Examiner interprets a neuron in BRI to be a be a mathematical computation that receives an input and provide an output.
(Du, Disclosure of Invention, “In step S702, the characteristic processing device of the label party can obtain the first sample characteristic and label value; in step S704, the tag of the feature processing device can perform iterative processing to the first sample feature to obtain the intermediate feature vector; In step S706, the characteristic processing device of the label party can send the intermediate feature vector and label value to the label side of the modeling device; In step S708, the feature processing device of the data party can obtain the second sample feature; In step S710, the characteristic processing device of the data party can perform iterative processing to the second sample characteristic to obtain the dense characteristic vector; In step S712, the feature processing device of the data party can send the dense feature vector to the modeling device of the data party; In step S714, the modeling device of the data party can use the first public key encryption data of the data party to encrypt the dense feature vector to obtain the encrypted feature vector; In step S716, the modeling device of the label party can receive the encrypted feature vector; in step S718, the modeling device of the label party can be based on the intermediate feature vector, the encryption feature vector and the tag value [a tag neuron in a tag sub-neural network ie modeling device based on the tag sub-neural network according to the feature representation ciphertext and the tag ciphertext], calculating and encrypting the loss function [determining a loss error ciphertext], gradient [a gradient ciphertext] and joint calculation gradient of the intermediate parameter; In step S720, the modeling device of the label party can send the loss function after encryption, gradient and joint calculation gradient of the intermediate parameter to the modeling device of the data party; In step S722, the modeling device of the data party can encrypt data related to the first public key loss function; the gradient is decrypted; In step S724, the modeling device of the data party can calculate and encrypt the gradient of the data party based on the intermediate parameter; In step S726, the modeling device of the data party to the first public key encryption data related loss function, gradient after decrypting, the modeling device of the label party can receive and delete the loss function, the mask data in the gradient; In step S728, the modeling device of the label party can decrypt the gradient of the data party associated with the encryption of the second public key; in the step S730, the modeling device of the label party can decrypt the gradient of the data party associated with the second public key encryption, the modeling device of the data party can receive and delete the mask data in the gradient of the data party; In step S732, the characteristic processing device of the data party can receive the gradient of the data party after decryption, and calculating the model parameter gradient in the characteristic processing device of the data party; In step S734, the label side characteristic processing device can receive the gradient of the label side after decryption, and calculating the model parameter gradient in the characteristic processing device of the label party; in step S736, the modelling device of the label party based on the loss function after decryption, gradient training model to be trained to obtain the target model.”)
Du teaches controlling the each feature provider to decrypt the gradient ciphertext of the tag neuron to obtain a decryption result 
(Du, Disclosure of Invention, “In step S722, the modeling device of the data party can encrypt data related to the first public key loss function; the gradient is decrypted…In step S728, the modeling device of the label party can decrypt the gradient of the data party associated with the encryption of the second public key; in the step S730, the modeling device of the label party can decrypt the gradient of the data party associated with the second public key encryption, the modeling device of the data party can receive and delete the mask data in the gradient of the data party”; wherein the federal modeling system ‘controls’ each data party to decrypt the gradient ciphertext to obtain a decrypted gradient)
Du teaches and updating a network parameter of the tag neuron according to the decryption result acquired from the each feature provider; 
(Du, Disclosure of Invention, “in step S736, the modelling device of the label party based on the loss function after decryption, gradient training model to be trained to obtain the target model.”; wherein training based on the decrypted gradient to obtain the target model includes updating network parameters of the model)
Du teaches and using a tag neuron connected to a feature neuron in the feature sub-neural network as an association neuron of the feature sub-neural network, sending a loss error ciphertext of the association neuron to the each feature provider, 
(Du, Disclosure of Invention, “In step S720, the modeling device of the label party [using a tag neuron connected to a feature neuron in the feature sub-neural network as an association neuron of the feature sub-neural network] can send the loss function after encryption, gradient and joint calculation gradient of the intermediate parameter to the modeling device of the data party [sending a loss error ciphertext ie loss after encryption of the association neuron to the each feature provider];”)
Du teaches decrypting, by the each feature provider, the loss error ciphertext to obtain a loss error plaintext and updating a network parameter of the feature neuron according to the loss error plaintext, 
(Du, Disclosure of Invention, “Optionally, the data decryption module further includes: a parameter determining unit for determining a decrypted loss function [decrypting, by the each feature provider, the loss error ciphertext to obtain a loss error plaintext]; the function judgment unit is used for finishing iterative update processing aiming at the parameters of the model to be trained under the condition that the decrypted loss function is smaller than a loss threshold value so as to obtain a target model; and if the loss function is larger than the loss threshold value, iteratively updating the parameters of the model to be trained [updating a network parameter of the feature neuron according to the loss error plaintext].”)
Du teaches wherein the to-be-trained neural network model comprises at least two feature sub- neural networks and the tag sub-neural network.
(Du, Disclosure of Invention,  “in the technical solutions provided by some embodiments of the present disclosure, the federal modeling system includes [wherein the to-be-trained neural network model comprises] a plurality of modeling apparatuses [the tag sub-neural network] and feature processing apparatuses [at least two feature sub- neural networks], and is deployed on a tag side and a data side, where the feature processing apparatus is configured to obtain a first sample feature and a tag value of the tag side, and perform iterative processing on the first sample feature to obtain an intermediate feature vector; the modeling apparatus includes: the vector receiving module is used for receiving the encrypted characteristic vector of the data side, wherein the encrypted characteristic vector is obtained by encrypting the second sample characteristic after the iterative processing by the modeling device of the data side based on the first public key encrypted data of the data side;”)
Du teaches wherein the feature representation ciphertext is obtained by performing homomorphic encryption on a feature representation plaintext of the sample user, 
(Du, Disclosure of Invention, “In the exemplary embodiment of the present disclosure, the encryption feature vector can be the data party of the modelling device based on the first public key encryption data of the data party to the second sample feature after the iteration processing to be encrypted. The invention can use homomorphic encryption method to encrypt the second sample characteristic.”)
Du teaches the feature representation plaintext is an output result of the feature sub- neural network with regard to the feature data, 
(Du, Disclosure of Invention, “According to a first aspect of the present disclosure, there is provided a federal modeling system, the federal modeling system comprises a plurality of modeling device and a characteristic processing device, and disposed in the label and data, wherein the characteristic processing device for obtaining the first sample characteristic and tag value of the label; performing iteration processing to the first sample feature to obtain the intermediate feature vector [the feature representation plaintext is an output result of the feature sub- neural network with regard to the feature data];”)
Du teaches and the tag ciphertext is obtained by performing homomorphic encryption on tag data of the sample user 
(Du, Disclosure of Invention, “a data processing module, for based on the intermediate feature vector, encrypting the feature vector and the tag value [and the tag ciphertext is obtained by performing homomorphic encryption on tag data of the sample user; wherein the tag value is encrypted and Du previously taught an encryption method as homomorphic encryption], calculating and encrypting a plurality of local data corresponding to the first sample feature”)
Du teaches wherein the determining the loss error ciphertext and the gradient ciphertext of the tag neuron in the tag sub-neural network based on the tag sub-neural network according to the feature representation ciphertext and the tag ciphertext 
Examiner’s note: Examiner interprets a neuron in BRI to be a be a mathematical computation that receives an input and provide an output.
(Du, Disclosure of Invention, “In step S702, the characteristic processing device of the label party can obtain the first sample characteristic and label value; in step S704, the tag of the feature processing device can perform iterative processing to the first sample feature to obtain the intermediate feature vector; In step S706, the characteristic processing device of the label party can send the intermediate feature vector and label value to the label side of the modeling device; In step S708, the feature processing device of the data party can obtain the second sample feature; In step S710, the characteristic processing device of the data party can perform iterative processing to the second sample characteristic to obtain the dense characteristic vector; In step S712, the feature processing device of the data party can send the dense feature vector to the modeling device of the data party; In step S714, the modeling device of the data party can use the first public key encryption data of the data party to encrypt the dense feature vector to obtain the encrypted feature vector; In step S716, the modeling device of the label party can receive the encrypted feature vector; in step S718, the modeling device of the label party can be based on the intermediate feature vector, the encryption feature vector and the tag value [a tag neuron in a tag sub-neural network ie modeling device based on the tag sub-neural network according to the feature representation ciphertext and the tag ciphertext], calculating and encrypting the loss function [determining a loss error ciphertext], gradient [a gradient ciphertext] and joint calculation gradient of the intermediate parameter; In step S720, the modeling device of the label party can send the loss function after encryption, gradient and joint calculation gradient of the intermediate parameter to the modeling device of the data party; In step S722, the modeling device of the data party can encrypt data related to the first public key loss function; the gradient is decrypted; In step S724, the modeling device of the data party can calculate and encrypt the gradient of the data party based on the intermediate parameter; In step S726, the modeling device of the data party to the first public key encryption data related loss function, gradient after decrypting, the modeling device of the label party can receive and delete the loss function, the mask data in the gradient; In step S728, the modeling device of the label party can decrypt the gradient of the data party associated with the encryption of the second public key; in the step S730, the modeling device of the label party can decrypt the gradient of the data party associated with the second public key encryption, the modeling device of the data party can receive and delete the mask data in the gradient of the data party; In step S732, the characteristic processing device of the data party can receive the gradient of the data party after decryption, and calculating the model parameter gradient in the characteristic processing device of the data party; In step S734, the label side characteristic processing device can receive the gradient of the label side after decryption, and calculating the model parameter gradient in the characteristic processing device of the label party; in step S736, the modelling device of the label party based on the loss function after decryption, gradient training model to be trained to obtain the target model.”)

However, Du does not explicitly teach the each feature provider provides the feature data of the sample user and does not provide tag data of the sample user, and a tag provider provides the tag data of the sample user and does not provide the feature data of the sample user, [wherein the determining the loss error ciphertext and the gradient ciphertext of the tag neuron in the tag sub-neural network based on the tag sub-neural network according to the feature representation ciphertext and the tag ciphertext] comprises: obtaining an activation value ciphertext of the tag neuron by forward propagation based on a tag hidden layer and an output layer in the tag sub-neural network according to the feature representation ciphertext of the sample user acquired from the at least two feature providers; determining the loss error ciphertext of the tag neuron by backpropagation according to the activation value ciphertext of the tag neuron and the tag ciphertext of the sample user; and determining the gradient ciphertext of the tag neuron according to the loss error ciphertext of the tag neuron

Zhou teaches the each feature provider provides the feature data of the sample user and does not provide tag data of the sample user, and a tag provider provides the tag data of the sample user and does not provide the feature data of the sample user,
(Zhou, Detailed Description, “Fig. 8 is a flowchart of an example of a screening method for training samples of a machine learning model proposed in an embodiment of the present specification. As shown in fig. 8, the samples include a positive sample and a negative sample, the tag value corresponding to the positive sample is 1, the tag value corresponding to the negative sample is 0, the tag provider performs homomorphic encryption on the tag value 1 and the tag value 0, respectively, to obtain a corresponding tag value ciphertext, and sends the tag value ciphertext to the feature provider [a tag provider provides the tag data of the sample user].
And the feature provider establishes a corresponding relation between the class-type feature value x and the corresponding tag value ciphertext according to the class-type feature value x of the sample and the tag value ciphertext corresponding to the sample, adds the tag value ciphertexts corresponding to the class-type feature value x to generate the feature ciphertext corresponding to the class-type feature value x, and sends the feature ciphertext corresponding to each class-type feature value to the tag provider [the each feature provider provides the feature data of the sample user].
And the label provider determines the number of the sample cases with the category characteristic value of x according to the characteristic ciphertext corresponding to the category characteristic value of x. And sending the number of the positive examples with the class type characteristic value of x and the total number of the positive examples to a characteristic provider.
And the feature provider performs value evaluation on the category type feature variable according to the total number of the samples, the number of the positive example samples corresponding to each category type feature value and the total number of the positive example samples. And the feature provider screens the sample according to the evaluation result.”)

    PNG
    media_image1.png
    983
    965
    media_image1.png
    Greyscale
(Zhou, Abstract, “Therefore, the feature provider cannot acquire the label value corresponding to each sample [the each feature provider …does not provide tag data of the sample user], and the label provider cannot acquire the category feature value corresponding to each sample [a tag provider … does not provide the feature data of the sample user], so that the privacy data of the user is prevented from being leaked, and the data information safety of the user is protected.”; see translated figure 8 below wherein Examiner notes that the claim recites that the feature provider does not provide tag data (it does not recite that it cannot acquire/receive tag data) and the tag provider does not provide feature data (it does not recite that it cannot acquire/receive feature data); thus the data transfers and local storing of Zhou (as seen in figure 8) can be applied to the system of Du)

Nandakumar teaches comprises: obtaining an activation value ciphertext of the tag neuron by forward propagation based on a tag hidden layer and an output layer in the tag sub-neural network according to the feature representation ciphertext of the sample user acquired from the at least two feature providers; determining the loss error ciphertext of the tag neuron by backpropagation according to the activation value ciphertext of the tag neuron and the tag ciphertext of the sample user; and determining the gradient ciphertext of the tag neuron according to the loss error ciphertext of the tag neuron.
Examiner notes: The feature and tag sub-neural networks and providers is taught by Du and Nandakumar is relied upon to teach forward propagation/backpropagation and determining a gradient ciphertext in an encrypted learning scheme.
(Nandakumar, "[0065] In this part of this document, the deep learning model and the components of possible solutions are described that are needed to achieve full homomorphic encryption for learning and inference. In this document, the primarily focus is on supervised deep learning, where the broad objective is to learn a non-linear mapping between the inputs (e.g., training samples) [according to the feature representation ciphertext of the sample user acquired from the at least two feature providers; wherein Nandakumar teaches learning on encrypted inputs (see figure 2) and the encrypted inputs can be provided by Du] and the outputs (e.g., class labels of the training samples). Deep learning models are typically implemented as multi-layer neural networks, which allows higher-level abstract features to be computed as non-linear functions of lower-level features (starting with the raw data). FIG. 1 shows an exemplary neural network (NN) with two hidden layers. The NN 100 is a deep neural network (DNN), which is typically defined as a NN with multiple hidden layers. The input layer is x1, . . . , xd-1, xd the output layer has nodes y1, . . . , yc, there are two hidden layers shown, and weights (e.g., as matrices) W1, W2, and W3 are shown [based on a tag hidden layer and an output layer in the tag sub-neural network; wherein the tag layers would be referring to the tag layers in the neural networks provided by Du]. The output of each node in the network (where a node is also known as a neuron) is computed by applying a non-linear activation function to the weighted average of its inputs, which includes a bias term (shown darkened) that always emits value 1. The output vector of neurons in layer l(l=1, 2, …., L) is obtained as the following

    PNG
    media_image2.png
    34
    243
    media_image2.png
    Greyscale

Where f is the activation function, w is the weight matrix of layer l, and L is the total number of layers in the network…
[0069] where L is the loss function computed over the mini-batch B and α is the learning rate. The error or loss value at the output layer is computed based on the forward pass [forward propagation], while backpropagation [backpropagation according to the activation value ciphertext of the tag neuron and the tag ciphertext of the sample user] is used to propagate this error back through the network.
[0072] Two key steps in the algorithm require computing “complex” functions (such as exponentiation, and the like). These two steps are (i) computation of the activation function ƒ and its derivative, and (ii) computation of the loss function L and its derivative [determining the loss error ciphertext of the tag neuron… determining the gradient ciphertext of the tag neuron according to the loss error ciphertext of the tag neuron]. The “natural” approaches for computing these functions homomorphically, are either to approximate them by low-degree polynomials (e.g., using their Taylor expansion), or by pre-computing them in a table and performing homomorphic table lookup [obtaining an activation value ciphertext; ie computing the activation function homomorphically]. Namely, for a function ƒ that is needed to be computed, it is possible to pre-compute (in the clear) a table Tf such that Tf [x]=ƒ(x) for every X in some range. Subsequently, given the encryptions of the (bits of) x, homomorphic table lookup is performed to get the (bits of) value Tf [x]. Following Crawford et al. (see Crawford, J. L. H.; Gentry, C.; Halevi, S.; Platt, D.; and Shoup, V., “Doing real work with FHE: the case of logistic regression”, IACR Cryptology ePrint Archive 2018:202, 2018), the second approach is adopted here. This is faster and shallower when it is applicable, but it can only be used to get a low-precision approximation of these functions. In order to avoid the use of too many table lookups, a sigmoid activation function and a quadratic loss function are used, which have simpler derivatives.”)
Zhou is considered to be analogous to the claimed invention because they are in the same field of multi-party learning on encrypted data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Du to incorporate the teachings of Zhou in order to prevent the privacy data of the user from being leaked by limiting unnecessary data transfers (Zhou, Abstract, “Therefore, the feature provider cannot acquire the label value corresponding to each sample, and the label provider cannot acquire the category feature value corresponding to each sample, so that the privacy data of the user is prevented from being leaked, and the data information safety of the user is protected.”)
Nandakumar is considered to be analogous to the claimed invention because they are in the same field of multi-party learning on encrypted data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Du to incorporate the teachings of Nandakumar in order to provide a fully homomorphic encryption scheme to protect privacy and data security while learning a network (Nandakumar, “[0058] Perhaps surprisingly, evidence is provided herein that in some setting, it may be fast enough to support even the demanding training phase of deep neural networks, in certain situations. To the best of our knowledge, this is the first demonstration that fully homomorphic encryption can be used not just for inferencing but also for training. The design approaches presented in this document demonstrate the feasibility of FHE to protect privacy and data security while learning a network.”)

In regards to claim 7, 
Du in view of Zhou and Nandakumar teaches The method according to claim 1, 
wherein controlling the each feature provider to decrypt the gradient ciphertext of the tag neuron to obtain the decryption result and updating the network parameter of the tag neuron according to the decryption result acquired from the each feature provider comprise: adding a random mask to the gradient ciphertext of the tag neuron to obtain a gradient masked ciphertext; 
(Du, Disclosure of Invention, “Optionally, the data processing module comprises: the public key determining unit is used for determining second public key encrypted data of the label party; a data encryption unit for encrypting the intermediate parameter based on the second public key encryption data; and adding a mask data to the loss function and the gradient respectively for encryption [adding a random mask to the gradient ciphertext of the tag neuron to obtain a gradient masked ciphertext].”)
Du teaches sending the gradient masked ciphertext to any one feature provider of the at least two feature providers and decrypting, by the any one feature provider, the gradient masked ciphertext to obtain a gradient masked plaintext; 
and acquiring the gradient masked plaintext from the any one feature provider, 
removing the random mask from the gradient masked plaintext to obtain a gradient plaintext of the tag neuron and updating the network parameter of the tag neuron by using the gradient plaintext of the tag neuron.
(Du, Disclosure of Invention, “Optionally, the data decryption module includes: the data receiving unit is used for receiving the local data decrypted by the data side as intermediate decrypted data after the data side decrypts the local data related to the first public key encrypted data [sending the gradient masked ciphertext to any one feature provider of the at least two feature providers and decrypting, by the any one feature provider, the gradient masked ciphertext to obtain a gradient masked plaintext; and acquiring the gradient masked plaintext from the any one feature provider wherein decrypting the gradient masked data is thus acquiring it at the respective feature provider]; the data decryption unit is used for decrypting intermediate decryption data related to the second public key encryption data and deleting mask data in the intermediate decryption data to obtain corresponding target decryption data [removing the random mask from the gradient masked plaintext to obtain a gradient plaintext of the tag neuron]; and the data modeling unit is used for iteratively updating the parameters of the model to be trained based on the target decryption data to obtain the target model [and updating the network parameter of the tag neuron by using the gradient plaintext of the tag neuron].”)

Claim 8 is rejected on the same grounds under 35 U.S.C. 103 as claim 1.
Claim 9 is rejected on the same grounds under 35 U.S.C. 103 as claim 1.
Claim 12 is rejected on the same grounds under 35 U.S.C. 103 as claim 7.
Claim 13 is rejected on the same grounds under 35 U.S.C. 103 as claim 1 (see the teachings of Nandakumar).

In regards to claim 14, 
Du teaches An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to execute a training method for a neural network model, and the training method comprises: 
(Du, Disclosure of Invention, “According to a fifth aspect of the present disclosure, there is provided an electronic apparatus comprising: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a federated modeling approach as described above or a federated model prediction approach as described above.”)
The rest of the steps of claim 14 are taught by the analogous steps of claim 1.
Claim 20 is rejected on the same grounds under 35 U.S.C. 103 as claim 14.

Claim(s) 3-4, 10, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Zhou and Nandakumar in further view of US Pub No. US20210241112A1 Dai et al. (“Dai”)
In regards to claim 3,
Du in view of Zhou and Nandakumar teaches The method according to claim 1, 
Dai teaches wherein the feature sub-neural network comprises a feature input layer and at least one feature hidden layer, 

    PNG
    media_image3.png
    423
    403
    media_image3.png
    Greyscale
(Dai, “[0115] First, an input layer [a feature input layer] and a first hidden layer [at least one feature hidden layer] are shown in FIG. 2.
[0116] Here, it is assumed that, in the training phase for the current task, the first two channels in the first hidden layer are active channels while the last two channels are inactive channels.
For the active channels in the first hidden layer, they are configured to receive information of channel variables of all channels of the input layer from all the channels of the input layer. Based on the information and current connection parameters (weights) for the connections between the active channels in the first hidden layer and all the channels of the input layer, connection parameters are updated and the updated connection parameters are generated to complete the training phase of the current task.”)
Dai teaches and the tag sub- neural network comprises at least one tag hidden layer and the output layer; 

    PNG
    media_image4.png
    473
    435
    media_image4.png
    Greyscale
(Dai, “[0125] FIGS. 4A and 4B show an L-th hidden layer [one tag hidden layer] and an output layer [the output layer]. Here, it is assumed that L is the total number of hidden layers of the neural network model, that is, the L-th hidden layer is the last hidden layer. In multi-head learning, each task corresponds to a specific output layer channel. In FIG. 4A, the output layer channels used for the training phase of the current task are circled in the form of a box.”)
Dai teaches and wherein the method further comprises: acquiring a number of feature neurons in a tail feature hidden layer of the at least one feature hidden layer of the feature sub-neural network from the each feature provider separately; 
(Dai, “[0116] Here, it is assumed that, in the training phase for the current task, the first two channels in the first hidden layer are active channels while the last two channels are inactive channels.
For the active channels in the first hidden layer, they are configured to receive information of channel variables of all channels of the input layer from all the channels of the input layer. Based on the information and current connection parameters (weights) for the connections between the active channels in the first hidden layer and all the channels of the input layer, connection parameters are updated and the updated connection parameters are generated to complete the training phase of the current task [acquiring a number of feature neurons ie active channels in a tail feature hidden layer of the at least one feature hidden layer of the feature sub-neural network from the each feature provider separately wherein the feature vectors from each data party of Du can be provided separately (ie acquiring new data and training the network (See Dai, Figure 1)]. Furthermore, in the training phase of the subsequent task, the updated connection parameters remain unchanged. In an embodiment, the connections between the active neurons in the first hidden layer and each neuron in the previous network layer remain unchanged in the training procedure.”)
Dai teaches and determining a number of tag neurons in a head tag hidden layer of the at least one tag hidden layer according to the number of feature neurons.
(Dai, “[0125] FIGS. 4A and 4B show an L-th hidden layer and an output layer. Here, it is assumed that L is the total number of hidden layers of the neural network model, that is, the L-th hidden layer is the last hidden layer. In multi-head learning, each task corresponds to a specific output layer channel. In FIG. 4A, the output layer channels used for the training phase of the current task are circled in the form of a box.
For the channels used for the training phase of the current task in the output layer, they are configured to receive the information for indicating the channel variables of the active channels of the L-th hidden layer from the active channels of the L-th hidden layer [determining a number of tag neurons ie active channels in a head tag hidden layer of the at least one tag hidden layer according to the number of feature neurons]. Based on the information and the current connection parameters for the connections between the channels used for the current task in the output layer and the active channels of the L-th hidden layer, the updated connection parameters are generated to complete the training phase of the current task. In the training phase of the subsequent task, the updated connection parameters remain unchanged. Furthermore, the connection parameters for the connections between the channels used for the current task in the output layer and the inactive channels of the L-th hidden layer are set to 0 and are fixed to 0 in the training phase of the subsequent task. In an embodiment, if the output layer is a multi-head output layer, the connections between the active neurons in the output layer and the active neurons in the previous network layer remain unchanged in the training procedure; the connections between the active neurons in the output layer and the inactive neurons in the previous network layer are 0 and remain unchanged in the training procedure; the connections between the inactive neurons in the output layer and each neuron in the previous network layer are trainable in the training procedure. Here, the multi-head output layer means an output layer to which at least one new output neuron that does not interact with an existing output neuron during the learning process is added.”)

    PNG
    media_image5.png
    250
    701
    media_image5.png
    Greyscale

Dai is considered to be analogous to the claimed invention because they are reasonably pertinent to the problem the inventor faced of retraining a neural network with new data and determining which neurons to utilize. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Du to incorporate the teachings of Dai in order to provide a method of continual learning to overcome catastrophic forgetting (Dai, “[0003] Traditional machine learning is performed for fixed tasks, that is, the dataset used to train a neural network (also called a neural network model or a learning model) contains training data with a fixed distribution. When a new dataset (e.g., a dataset containing training data with a new distribution different from the fixed distribution) is input, it is generally necessary to retrain the neural network. After retraining, the neural network can give a response to the new dataset, but may not give a response to the original dataset (e.g., the dataset containing the fixed class of data). This issue is called “Catastrophic Forgetting” in machine learning. In fact, the “Catastrophic Forgetting” is the result of the “Stability-Plasticity Dilemma” faced by machine learning, where the stability refers to the ability to maintain original knowledge while learning new knowledge and the plasticity refers to the ability to learn new knowledge. [0004] Continual learning is to perform training for a continuous sequence composed of multiple different tasks on a neural network. Continual learning aims to solve the above-mentioned “Catastrophic Forgetting” issue. More specifically, it maintains the performance of the neural network in completing historical tasks while training the neural network based on new input data to adapt to new tasks. Continual learning is the key to continuously adjust a neural network to rapidly changing learning tasks, so it is very critical to realize the application of artificial intelligence in real scenarios. Therefore, there is a need to optimize the neural network to better maintain the performance on historical tasks while adapting to new tasks.”)

In regards to claim 4, 
Du in view of Zhou and Nandakumar and Dai teaches The method according to claim 3, wherein the using the tag neuron connected to the feature neuron in the feature sub-neural network as the association neuron of the feature sub-neural network comprises: 
Dai teaches selecting, from the tag neurons in the head tag hidden layer, a tag neuron connected to a feature neuron of the feature neurons in the tail feature hidden layer of the feature sub-neural network and using the selected tag neuron as the association neuron of the feature sub-neural network.
(Dai, “[0125] FIGS. 4A and 4B show an L-th hidden layer and an output layer. Here, it is assumed that L is the total number of hidden layers of the neural network model, that is, the L-th hidden layer is the last hidden layer. In multi-head learning, each task corresponds to a specific output layer channel. In FIG. 4A, the output layer channels used for the training phase of the current task are circled in the form of a box.
For the channels used for the training phase of the current task in the output layer, they are configured to receive the information for indicating the channel variables of the active channels of the L-th hidden layer from the active channels of the L-th hidden layer [selecting, from the tag neurons in the head tag hidden layer, a tag neuron ie active channel connected to a feature neuron of the feature neurons in the tail feature hidden layer of the feature sub-neural network and using the selected tag neuron as the association neuron of the feature sub-neural network; wherein identifying the active channel is selecting a “tag neuron”]. Based on the information and the current connection parameters for the connections between the channels used for the current task in the output layer and the active channels of the L-th hidden layer, the updated connection parameters are generated to complete the training phase of the current task. In the training phase of the subsequent task, the updated connection parameters remain unchanged. Furthermore, the connection parameters for the connections between the channels used for the current task in the output layer and the inactive channels of the L-th hidden layer are set to 0 and are fixed to 0 in the training phase of the subsequent task. In an embodiment, if the output layer is a multi-head output layer, the connections between the active neurons in the output layer and the active neurons in the previous network layer remain unchanged in the training procedure; the connections between the active neurons in the output layer and the inactive neurons in the previous network layer are 0 and remain unchanged in the training procedure; the connections between the inactive neurons in the output layer and each neuron in the previous network layer are trainable in the training procedure. Here, the multi-head output layer means an output layer to which at least one new output neuron that does not interact with an existing output neuron during the learning process is added.”)

Claim 10 is rejected on the same grounds under 35 U.S.C. 103 as claim 3.
Claim 16 is rejected on the same grounds under 35 U.S.C. 103 as claim 3.
Claim 17 is rejected on the same grounds under 35 U.S.C. 103 as claim 4.

Claim(s) 5, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Zhou and Nandakumar in further view of WIPO Pub No. WO2021114927A1 Lu et al. (“Lu”)
In regards to claim 5,
Du in view of Zhou and Nandakumar teaches The method according to claim 1, 
Lu teaches further comprising: acquiring a candidate user identifier associated with the each feature provider from the at least two feature providers separately; 
calculating an intersection of candidate user identifiers associated with the at least two feature providers to obtain a common user identifier; 
(Lu, Detailed ways, “In the process of information value calculation, data party A gets the ID of data party B which is encrypted by keyB and the corresponding Fb feature box, but this data is sufficiently secret for data party A, because: 1) The ID obtained by data party A is encrypted by keyB, and data party A cannot know the corresponding original ID behind it, and therefore cannot match the Fb binning result with the real ID; 2) binning information used when calculating the value of the information It is irrelevant to the order of bins, so the identification of the bin where the data party B transmits to the data party A can be in disorder (can be implemented when the order of the second encryption ID is disrupted), or the identification of the bin is just one Code name, so that data party A cannot know the order of feature size corresponding to the bins; 3) Each bin of the feature contains K IDs, which is equivalent to the information obtained by data party A about the characteristics of data party B is anonymized by K , The information of any ID has at least K IDs that are the same. Data party A also gets the result of the second encryption of data party A's ID. This encrypted ID has been shuffled by B and does not carry any additional information that can be identified. Therefore, data party A only knows these IDs. They are all the results obtained after their own ID is encrypted, and there is a one-to-one correspondence, but the correspondence relationship is not clear. After the data party A gets the two pieces of data [acquiring a candidate user identifier associated with the each feature provider from the at least two feature providers separately], they perform matching, intersections [calculating an intersection of candidate user identifiers associated with the at least two feature providers to obtain a common user identifier], and operations. These operations are equivalent to performing in an ID encrypted space, and the corresponding relationship between the encrypted space and the original space is unknown (this mapping relationship must have two The two keys keyA and keyB of the party can be known), therefore, the calculation is safe. Similarly, it can be seen that the data available to data party B is not enough for data party B to derive data information of data party A.”)
Lu teaches and sending the common user identifier to the at least two feature providers to determine the sample user based on the common user identifier.
(Lu, Detailed ways, “According to the first aspect, there is provided a method for multi-party joint feature evaluation to protect privacy. The multi-party includes at least a first device and a second device. The first device stores a first sample set and a label of each sample therein. , The second device stores a second sample set, and the method is applied to the first device; the method includes: using the first key to encrypt the initial ID of each sample in the first sample set to obtain the first sample set The first encrypted ID of each sample; send the first exchange information to the second device, which includes at least the first encrypted ID and tag of each sample in the first sample set; receive respectively from the second device The second exchange information and the third exchange information, wherein the second exchange information includes: the second device uses a second key to perform secondary encryption on the first encrypted ID of each sample in the first sample set The second encrypted ID and the corresponding label obtained later, and the relative order of the samples in the second exchange information has been disturbed by the second device; the third exchange information includes, for each of the second sample set A sample, the first encrypted ID obtained by the second device encrypting its initial ID based on the second key and the identification of the first sub-box where the sample is located, the identification of the first sub-box is determined by the The second device performs binning based on the feature value of the first feature of each sample in the second sample set; uses the first key to perform secondary encryption on the first encrypted ID of each sample in the third exchange information , Get the first encrypted set; based on the second encrypted ID in the second exchange information and the second encrypted ID in the first encrypted set [sending the common user identifier to the at least two feature providers], determine the common sample of the first sample set and the second sample set [determine the sample user based on the common user identifier]; based on the common sample The label of each sample in the sample and the identification of the first bin in which it is located determine the information value of the first feature for feature selection for the machine learning model.”)
Lu is considered to be analogous to the claimed invention because they are in the same field of multi-party joint learning on encrypted data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Du to incorporate the teachings of Lu in order to protect privacy and security by providing a method of multi-party joint feature evaluation (Lu, summary of the invention, “One or more embodiments of this specification describe a method and device for multi-party joint feature evaluation to protect privacy and security, which can calculate the characteristics of users shared by both parties when the other party is unknown to the user and the tag and feature data are isolated.”)

Claim 11 is rejected on the same grounds under 35 U.S.C. 103 as claim 5.
Claim 18 is rejected on the same grounds under 35 U.S.C. 103 as claim 5.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        





/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Dec 08, 2022
Application Filed
Oct 01, 2025
Non-Final Rejection — §103
Nov 25, 2025
Response Filed
Jan 25, 2026
Final Rejection — §103
Apr 06, 2026
Request for Continued Examination
Apr 09, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773
Patent 12561603
SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA
2y 5m to grant Granted Feb 24, 2026
17/588,175
Patent 12555000
GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE
2y 5m to grant Granted Feb 17, 2026
17/676,775
Patent 12462154
METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS
2y 5m to grant Granted Nov 04, 2025
17/470,900
Patent 12395590
REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING
2y 5m to grant Granted Aug 19, 2025
17/357,626
Patent 12380361
Federated Machine Learning Management
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
81%
With Interview (+56.3%)
4y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allow rate.