Last updated: May 29, 2026
Application No. 18/843,060
INFERENCING ON HOMOMORPHICALLY ENCRYPTED VECTORS AT TRANSFORMER

Non-Final OA §103
Filed
Aug 30, 2024
Priority
Mar 30, 2022 — nonprovisional of PCTCN2022084134
Examiner
LONG, EDWARD X
Art Unit
2439
Tech Center
2400 — Computer Networks
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
Interview Optional

— +48.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 73% grant rate with +48.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 185 resolved cases, 2023–2026
Examiner Intelligence

LONG, EDWARD X View full profile →
Grants 73% — above average
Career Allowance Rate
135 granted / 185 resolved
+15.0% vs TC avg
Strong +48% interview lift
Without
With
+48.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
13 currently pending
Career history
207
Total Applications
across all art units
Statute-Specific Performance

§101
0.6%
-39.4% vs TC avg
§103
99.4%
+59.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 185 resolved cases
Office Action

§103
DETAILED ACTION
This Office Action is in response to the application 18/843,060 filed on 08/30/2024.
Claims 1-20 have been examined and are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  This Action is made Non-FINAL.
Priority
The present application is a U.S. National Phase of PCT/CN2022/084134, filed March 30, 2022. 
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/03/2025 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements have been considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically discloses as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5-8, 12-15, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Emmadi et al. (“Emmadi,” US 20210367758, published Nov. 25, 2021) in view of Wang et al. (“Wang,” US 20230153381, filed Nov. 17, 2021). 
Regarding claim 1, Emmadi discloses  
A server computing device comprising: a processor configured to (Emmadi [0023]. According to an embodiment of the disclosure, the system 100 is implemented on the client machine 102 and the server 104. The system 100 further comprises an input module 106, one or more hardware processors 108 and a memory 110 in communication with the one or more hardware processors 108 as shown in the block diagram of FIG. 1. The one or more hardware processors 108 work in communication with the memory 110. The one or more hardware processors 108 are configured to execute a plurality of algorithms stored in the memory 110.):
receive a homomorphically encrypted input embedding vector from a client computing device (Emmadi FIG. 1, [0025]-[0026], [0028]. Initially at step 202, the website URL is provided as an input URL to the client machine 102, wherein the input URL is kept at a predefined character length. At step 204, a feature vector is extracted out of the input URL using one of a deep neural network (DNN) based technique, or an n-gram based feature extraction method. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL.); 
at a [transformer] network, generate a plurality of homomorphically encrypted intermediate output vectors at least in part by performing inferencing on the homomorphically encrypted input embedding vector; transmit the plurality of homomorphically encrypted intermediate output vectors to the client computing device (Emmadi [0026]-[0027], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 210, the encrypted intermediate computation results are computed by applying at least one of a deep neural network (DNN) based technique, a logistic regression (LR) based technique, or a hybrid of the DNN based technique and the LR based technique on the representative encrypted feature vector. At step 212, the encrypted intermediate computation results are sent to the client machine 102. At step 214, the encrypted intermediate computation results are decrypted using the fully homomorphic encryption (FHE) method. According to an embodiment of the disclosure, the inference component is described as follows: On receiving encrypted feature vector from the client machine, the server 104 passes these encrypted input through a fully connected network to give two 512 dimension feature vectors. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.); 
receive a plurality of homomorphically encrypted [intermediate] input vectors from the client computing device subsequently to transmitting the homomorphically encrypted [intermediate] output vectors to the client computing device (Emmadi FIG. 1, [0025]-[0026], [0028]. [At a later time (i.e., subsequently to receiving an encrypted output vector as in [0052]], [i]nitially at step 202, the website URL is provided as an input URL to the client machine 102, wherein the input URL is kept at a predefined character length. At step 204, a feature vector is extracted out of the input URL using one of a deep neural network (DNN) based technique, or an n-gram based feature extraction method. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL.); 
at the [transformer] network, generate a homomorphically encrypted output vector at least in part by performing [additional] inferencing on the homomorphically encrypted [intermediate] input vectors; and transmit the homomorphically encrypted output vector to the client computing device (Emmadi [0026]-[0027], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 210, the encrypted intermediate computation results are computed by applying at least one of a deep neural network (DNN) based technique, a logistic regression (LR) based technique, or a hybrid of the DNN based technique and the LR based technique on the representative encrypted feature vector. At step 212, the encrypted intermediate computation results are sent to the client machine 102. At step 214, the encrypted intermediate computation results are decrypted using the fully homomorphic encryption (FHE) method. According to an embodiment of the disclosure, the inference component is described as follows: On receiving encrypted feature vector from the client machine, the server 104 passes these encrypted input through a fully connected network to give two 512 dimension feature vectors. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.).
Emmadi does not explicitly disclose: a transformer network, intermediate input vector, intermediate output vector. 
Howerver, in an analogous art, Wang discloses a server computing device, comprising: 
a transformer network (Wang [0042]. FIG. 1 is a block diagram illustrating a multi-head attention module in a transformer in accordance with an example of the present disclosure. As illustrated in FIG. 1 , the multi-head attention module 100 may be implemented through linear projections, SDDMMs, softmax, and concatenation.). 
intermediate input vector; intermediate output vector (Wang [0033], [0035], [0042], [0044]. For example, a transformer may have an encoder-decoder structure. The encoder of the transformer and the decoder of the transformer may be respectively implemented on different GPUs. Before loading to the encoder, audio, video, or image data may be pre-stored in a sever, a terminal, or storages in clouds. The encoder of the transformer may include a plurality of stacked encoder layers that process the input iteratively one layer after another, each of which may include a multi-head attention layer and a position-wise fully connected feed-forward layer. FIG. 1 is a block diagram illustrating a multi-head attention module in a transformer in accordance with an example of the present disclosure. As illustrated in FIG. 1 , the multi-head attention module 100 may be implemented through linear projections, SDDMMs, softmax, and concatenation. The multi-head attention module may be the multi-head self-attention module in an encoder layer i or a decoder layer j, or the multi-head cross-attention module in a decoder layer j. In some examples, a layer norm layer will be provided following each linear layer. The first matrix Q1, the second matrix K1, and the third matrix V1 may be respectively intermediate representations of an encoder input of a current encoder layer or a decoder input of a current decoder layer.). 
Therefore, it would have been obvious to one of ordinary skill in the art on or before the effective filing date of the claimed invention to combine the teachings of Emmadi and Wang to include: a transformer network, intermediate input vector, intermediate output vector. One would have been motivated to provide user with a means for securely executing an AI algorithm (e.g., homomorphic encryption) in a cloud setting and across client and server devices. (See Wang [0033], [0044].)
 Regarding claim 2, Emmadi and Wang disclose the device of claim 1. Emmadi further discloses wherein the plurality of homomorphically encrypted [intermediate] input vectors [include a plurality of homomorphically encrypted rectified linear unit (ReLU) output vectors] (Emmadi [0052]-[0053]. According to an embodiment of the disclosure, the inference component is described as follows: On receiving encrypted [see par. [0026] for homomorphic encryption of vectors] feature vector from the client machine, the server 104 passes these encrypted input through a fully connected network to give two 512 dimension feature vectors. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results. According to an embodiment of the disclosure, URLnet uses ReLU in the fully connected layers and Softmax in the final output layer.). 
 Wang further discloses wherein the plurality of [homomorphically encrypted] [intermediate] input vectors include a plurality of [homomorphically encrypted rectified linear unit (ReLU)] output vectors (Wang [0033], [0042], [0044]. For example, a transformer may have an encoder-decoder structure. The encoder of the transformer and the decoder of the transformer may be respectively implemented on different GPUs. Before loading to the encoder, audio, video, or image data may be pre-stored in a sever, a terminal, or storages in clouds. FIG. 1 is a block diagram illustrating a multi-head attention module in a transformer in accordance with an example of the present disclosure. As illustrated in FIG. 1 , the multi-head attention module 100 may be implemented through linear projections, SDDMMs, softmax, and concatenation. The first matrix Q1, the second matrix K1, and the third matrix V1 may be respectively intermediate representations of an encoder input of a current encoder layer or a decoder input of a current decoder layer.).
The motivation is the same as that of claim 1 above. 
Regarding claim 3, Emmadi and Wang disclose the device of claim 2. Emmadi further discloses wherein, when performing inferencing on the homomorphically encrypted input embedding vector, the processor is configured to compute an estimated softmax function at least in part by executing a softmax estimation machine learning algorithm (Emmadi [0026], [0028], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 304, the character level URL and the word level URL are embedded. Further at step 306, the character level URL and the word level URL are changed into a matrix URL format using an embedding matrix. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.). 
Regarding claim 5, Emmadi and Wang disclose the device of claim 3. Wang further discloses
wherein: the transformer network includes a plurality of encoder layers and a plurality of decoder layers (Wang [0033], [0035], [0040]. For example, a transformer may have an encoder-decoder structure. The encoder of the transformer and the decoder of the transformer may be respectively implemented on different GPUs. The encoder of the transformer may include a plurality of stacked encoder layers that process the input iteratively one layer after another, each of which may include a multi-head attention layer and a position-wise fully connected feed-forward layer. A decoder may include a plurality of stacked decoder layers. For example, the plurality of stacked decoder layers may include decoder layer 1, decoder layer 2, . . . , decoder layer J, where J may be a positive integer.); 
the plurality of encoder layers and the plurality of encoder layers each include a respective plurality of attention heads (Wang [0033], [0035]. For example, a transformer may have an encoder-decoder structure. The encoder of the transformer and the decoder of the transformer may be respectively implemented on different GPUs. The encoder of the transformer may include a plurality of stacked encoder layers that process the input iteratively one layer after another, each of which may include a multi-head attention layer and a position-wise fully connected feed-forward layer.); and 
the processor is configured to compute the estimated softmax function at each of the plurality of attention heads (Wang [0042], [0044]. FIG. 1 is a block diagram illustrating a multi-head attention module in a transformer in accordance with an example of the present disclosure. As illustrated in FIG. 1 , the multi-head attention module 100 may be implemented through linear projections, SDDMMs, softmax, and concatenation. The multi-head attention module may be the multi-head self-attention module in an encoder layer i or a decoder layer j, or the multi-head cross-attention module in a decoder layer j. The multi-head attention module 100 may include a sparse attention module 110 which may include two SDDMM kernels including a first SDDMM kernel 104 and a second SDDMM kernel 106, and a softmax kernel 105.). 
The motivation is the same as that of claim 3 above. 
Regarding claim 6, Emmadi and Wang disclose the device of claim 3. Emmadi further discloses wherein the processor is further configured to:
homomorphically encrypted output vector (Emmadi [0026], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.). 
Wang further discloses 
at a final linear layer, receive a decoder layer output from a final decoder layer of the plurality of decoder layers; compute a final linear layer output at the final linear layer based at least in part on the decoder layer output (Wang [0040], [0044], [0058]. A decoder may include a plurality of stacked decoder layers. For example, the plurality of stacked decoder layers may include decoder layer 1, decoder layer 2, . . . , decoder layer J, where J may be a positive integer. A decoder input is fed into a process of decoder embedding first. A decoder embedding output generated by the process of decoder embedding is then sent to the decoder. Each decoder layer may include a plurality of modules including a multi-head attention module. The first matrix Q1, the second matrix K1, and the third matrix V1 may be respectively intermediate representations of an encoder input of a current encoder layer or a decoder input of a current decoder layer. For example, the softmax kernel 205 may apply the softmax function over the first output A as follows: B=softmax (A/√{square root over (d k)}), where dk indicates dimension of a query matrix or a key matrix, such as the first linearly projected matrix Q1′, the second linearly projected matrix K1′, as shown in FIG. 1.); and 
compute the estimated softmax function on the final linear layer output of the final linear layer to compute the [homomorphically encrypted] output vector (Wang [0040], [0044], [0058]. A decoder may include a plurality of stacked decoder layers. For example, the plurality of stacked decoder layers may include decoder layer 1, decoder layer 2, . . . , decoder layer J, where J may be a positive integer. A decoder input is fed into a process of decoder embedding first. A decoder embedding output generated by the process of decoder embedding is then sent to the decoder. Each decoder layer may include a plurality of modules including a multi-head attention module. The first matrix Q1, the second matrix K1, and the third matrix V1 may be respectively intermediate representations of an encoder input of a current encoder layer or a decoder input of a current decoder layer. For example, the softmax kernel 205 may apply the softmax function over the first output A as follows: B=softmax (A/√{square root over (d k)}), where dk indicates dimension of a query matrix or a key matrix, such as the first linearly projected matrix Q1′, the second linearly projected matrix K1′, as shown in FIG. 1.).
The motivation is the same as that of claim 5 above. 
Regarding claim 7, Emmadi and Wang disclose the device of claim 3. Emmadi further discloses
wherein performing inferencing on the homomorphically encrypted input embedding vector includes (Emmadi [0026], [0028], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.), 
receiving a feed-forward network input vector (Emmadi [0026], [0028], [0039].   At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL. The neurons can be arranged in different ways which form different types of neural networks such as feed forward network, fully connected neural network, convolution network, Radial bias function neural network and so on.); 
at a first [linear layer], generating a homomorphically encrypted ReLU input vector based at least in part on the feed-forward network input vector (Emmadi [0026], [0028], [0039], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL. The neurons can be arranged in different ways which form different types of neural networks such as feed forward network, fully connected neural network, convolution network, Radial bias function neural network and so on. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.); 
transmitting the homomorphically encrypted ReLU input vector to the client computing device; subsequently to transmitting the homomorphically encrypted ReLU input vector to the client computing device (Emmadi [0052]-[0053]. According to an embodiment of the disclosure, the inference component is described as follows: On receiving encrypted [see par. [0026] for homomorphic encryption of vectors] feature vector from the client machine, the server 104 passes these encrypted input through a fully connected network to give two 512 dimension feature vectors. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results. According to an embodiment of the disclosure, URLnet uses ReLU in the fully connected layers and Softmax in the final output layer.); 
receiving a homomorphically encrypted ReLU output vector from the client computing device (Emmadi [0026], [0028], [0038]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. At step 304, the character level URL and the word level URL are embedded. Prominent examples of activation functions include Sigmoid, Tanh, ReLU, leaky ReLU and Softmax. FIG. 4 illustrates neuron computation in FHE setting. FIG. 2 describes DNN based classification using FHE.); 
at a second [linear layer], generating a feed-forward network output vector based at least in part on the homomorphically encrypted ReLU output vector (Emmadi [0039]. The encrypted activation output is the output of the neuron. These neurons are arranged in layers which are collection of neurons operating together to perform a prediction or classification task. The neurons can be arranged in different ways which form different types of neural networks such as feed forward network, fully connected neural network, convolution network, Radial bias function neural network and so on.). 
Wang further discloses 
at each of a plurality of feed- forward networks included in the transformer network (Wang [0035]. The encoder of the transformer may include a plurality of stacked encoder layers that process the input iteratively one layer after another, each of which may include a multi-head attention layer and a position-wise fully connected feed-forward layer.); 
linear layer (Wang [0042]. The multi-head attention module may be the multi-head self-attention module in an encoder layer i or a decoder layer j, or the multi-head cross-attention module in a decoder layer j. In some examples, a layer norm layer will be provided following each linear layer.); 
outputting the feed-forward network output vector to an additional computing process included in the transformer network (Wang [0036], [0038]-[0039]. The input embedding may be obtained by mapping one audio, video, or image feature sequence into an embedding vector based on a word embedding table. Each encoder layer may include a plurality of modules including a multi-head attention module and a feed forward module. The multi-head attention module may implement a process of multi-head attention and the feed forward module may implement a process of feed forward.). 
The motivation is the same as that of claim 2 above. 
Regarding claim 8, Emmadi and Wang disclose the server computing device of claim 1. Emmadi further discloses wherein performing inferencing on the homomorphically encrypted [intermediate] input vectors [includes computing a plurality of layernorm approximations] (Emmadi FIG. 1, [0025]-[0026], [0028]. [At a later time (i.e., subsequently to receiving an encrypted output vector as in [0052]], [i]nitially at step 202, the website URL is provided as an input URL to the client machine 102, wherein the input URL is kept at a predefined character length. At step 204, a feature vector is extracted out of the input URL using one of a deep neural network (DNN) based technique, or an n-gram based feature extraction method. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL.).  
Wang further discloses intermediate input vectors includes computing a plurality of layernorm approximations (Wang [0035]. The encoder of the transformer may include a plurality of stacked encoder layers that process the input iteratively one layer after another, each of which may include a multi-head attention layer and a position-wise fully connected feed-forward layer. A residual connection may be provided around each of the stacked multi-head attention layer and the position-wise fully connected feed-forward layer, followed by layer normalization/layer norm.). 
The motivation is the same as that of claim 1 above. 
Regarding claim 12, Emmadi and Wang disclose the server computing device of claim 1. Emmadi further discloses wherein each computation performed on the homomorphically encrypted input embedding vector and the homomorphically encrypted [intermediate] input vectors during inferencing at the transformer network is an addition or multiplication operation (Emmadi [0026], [0035], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. The two primitive operations, addition and multiplication, can be used to realize any arbitrary computation on ciphertexts. According to an embodiment of the disclosure, the inference component is described as follows: On receiving encrypted feature vector from the client machine, the server 104 passes these encrypted input through a fully connected network to give two 512 dimension feature vectors. The server 104 then passes this through a fully connected network. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.). 
Wang further discloses intermediate input vector (Wang [0033], [0042], [0044]. For example, a transformer may have an encoder-decoder structure. The encoder of the transformer and the decoder of the transformer may be respectively implemented on different GPUs. Before loading to the encoder, audio, video, or image data may be pre-stored in a sever, a terminal, or storages in clouds. FIG. 1 is a block diagram illustrating a multi-head attention module in a transformer in accordance with an example of the present disclosure. As illustrated in FIG. 1 , the multi-head attention module 100 may be implemented through linear projections, SDDMMs, softmax, and concatenation. The first matrix Q1, the second matrix K1, and the third matrix V1 may be respectively intermediate representations of an encoder input of a current encoder layer or a decoder input of a current decoder layer.). 
The motivation is the same as that of claim 1 above. 
Regarding claim 13, claim 13 is directed to a method corresponding to the server computing device of claim 1. Claim 13 is similar in scope to claim 1 and is therefore rejected under similar rationale.
Regarding claim 14, claim 14 is directed to a method corresponding to the server computing device of claim 2. Claim 14 is similar in scope to claim 2 and is therefore rejected under similar rationale.
Regarding claim 15, claim 15 is directed to a method corresponding to the server computing device of claim 3. Claim 15 is similar in scope to claim 3 and is therefore rejected under similar rationale.
Regarding claim 17, claim 17 is directed to a method corresponding to the server computing device of claim 5. Claim 17 is similar in scope to claim 5 and is therefore rejected under similar rationale.
Regarding claim 18, claim 18 is directed to a method corresponding to the server computing device of claim 7. Claim 18 is similar in scope to claim 7 and is therefore rejected under similar rationale.
Regarding claim 19, claim 19 is directed to a method corresponding to the server computing device of claim 8. Claim 19 is similar in scope to claim 8 and is therefore rejected under similar rationale.
Claim 4 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Emmadi et al. (“Emmadi,” US 20210367758, published Nov. 25, 2021) in view of Wang et al. (“Wang,” US 20230153381, filed Nov. 17, 2021) and Antic (“Antic,” US 20210342700, published Nov. 4, 2021). 
Regarding claim 4, Emmadi and Wang disclose the server computing device of claim 3. Emmadi further discloses 
wherein, when computing the estimated softmax function, the processor is further configured to (Emmadi [0052]-[0053].  The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results. According to an embodiment of the disclosure, URLnet uses ReLU in the fully connected layers and Softmax in the final output layer.), 
transmit a homomorphically encrypted ReLU input vector to the client computing device as a homomorphically encrypted [intermediate] output vector (Emmadi [0052]-[0053].  The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results. According to an embodiment of the disclosure, URLnet uses ReLU in the fully connected layers and Softmax in the final output layer.); 
receive a homomorphically encrypted ReLU output vector from the client computing device as a [homomorphically encrypted intermediate input vector subsequently to transmitting the homomorphically encrypted ReLU input vector to the client computing device] (Emmadi FIG. 1, [0025]-[0026], [0028],  [0038]. [At a later time (i.e., subsequently to receiving an encrypted output vector as in [0052]], [i]nitially at step 202, the website URL is provided as an input URL to the client machine 102, wherein the input URL is kept at a predefined character length. At step 204, a feature vector is extracted out of the input URL using one of a deep neural network (DNN) based technique, or an n-gram based feature extraction method. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 304, the character level URL and the word level URL are embedded. The matrix URL comprising feature vectors representative of the input URL. This encrypted output is sent to an activation function to get an activation output. Prominent examples of activation functions include Sigmoid, Tanh, ReLU, leaky ReLU and Softmax.); and 
at the softmax estimation machine learning algorithm, compute the estimated softmax function based at least in part on the homomorphically encrypted ReLU output vector (Emmadi [0052]-[0053]. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results. According to an embodiment of the disclosure, URLnet uses ReLU in the fully connected layers and Softmax in the final output layer. ReLU is chosen to add non linearity in the network to perform robust prediction. ReLU function is as follows in equation (6): ReLU(x)=max(0,x)  (6), Let y1 . . . yn be the n real numbers the Softmax function normalizes these into a probability distribution having n probabilities as follows in equation (7)…).
In an analogous art, Antic discloses a device, comprising: 
receive a homomorphically encrypted [ReLU] output vector from the client computing device as a homomorphically encrypted intermediate input vector subsequently to transmitting the homomorphically encrypted [ReLU] input vector to the client computing device  (Antic [0018], [0045]-[0046], [0055]. The computing system 100 may represent, for example, a user device such as a desktop, a laptop, a mobile phone, personal entertainment device, DVR, and so on.  In order to train the AI model, the training set of input feature vectors and the training set of output feature vectors may be provided as input to the AI model. In some embodiments, additionally, at step 512, each of the training set of input feature vectors and the training set of output feature vectors may be encrypted based on Homomorphic keys. It may be noted that in such embodiments, the AI model may be trained using encrypted training set of input vectors and encrypted training set of output vectors. In order to train the AI model, the training set of input feature vectors and the training set of output feature vectors may be provided as input to the AI model. It may be noted that the AI model may be trained to generate the training set of output feature vectors based on the training set of input feature vectors as input. It may be noted that if the AI model is being trained using encrypted training set of input vectors and encrypted training set of output vectors, then the method may proceed to an additional step 516, where the encrypted output feature vectors may be decrypted, by the trained AI model, to generate the output data. At step 522, the output error and the training set of input feature vectors may be iteratively fed back into the AI model till the output error is below a predefined threshold. Further, the features of encrypted data presentation may be sent over a communication network, and homomorphic transformations may be applied as part of the service in the cloud. ). 
Therefore, it would have been obvious to one of ordinary skill in the art on or before the effective filing date of the claimed invention to combine the teachings of Emmadi, Wang and Antic to include: receive a homomorphically encrypted ReLU output vector from the client computing device as a homomorphically encrypted intermediate input vector subsequently to transmitting the homomorphically encrypted ReLU input vector to the client computing device. One would have been motivated to provide user with a means for secure AI training and inference across the cloud network including server and client devices. (See Antic [0045].)
Regarding claim 16, claim 16 is directed to a method corresponding to the server computing device of claim 4. Claim 16 is similar in scope to claim 4 and is therefore rejected under similar rationale.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Emmadi et al. (“Emmadi,” US 20210367758, published Nov. 25, 2021) in view of Wang et al. (“Wang,” US 20230153381, filed Nov. 17, 2021) and Boytsov et al. (“Boytsov,” US 20220253447, filed Feb. 8, 2021). 
Regarding claim 9, Emmadi and Wang disclose the server computing device of claim 1. Emmadi and Wang do not explicitly disclose: wherein the processor is configured to compute each of the layernorm approximations elementwise as y^ = x o y+b, where x is an input matrix element, o is a Hadamard product, and y and b are learned affine transform parameters. 
However, in an analogous art, Boytsov discloses 
wherein the processor is configured to compute each of the layernorm approximations elementwise as y^ = x o y+b, where x is an input matrix element, o is a Hadamard product, and y and b are learned affine transform parameters (Boytsov [0112], [0119], [0121]. Lastly, using embeddings for queries and document tokens, the system uses a feed-forward neural-network to compute translation probabilities T(q|d). There are multiple ways to do this. In one implementation the system proceeds as follows: 7. layer-norm is a layer normalization. xºy denotes the Hadamard product between vectors.). 
Therefore, it would have been obvious to one of ordinary skill in the art on or before the effective filing date of the claimed invention to combine the teachings of Emmadi, Wang and Boytsov to include: compute each of the layernorm approximations elementwise as y^ = x o y+b, where x is an input matrix element, o is a Hadamard product, and y and b are learned affine transform parameters. One would have been motivated to provide user with a means for training and executing AI models for outputting a search outcome. (See Boytsov [0108].)
Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Emmadi et al. (“Emmadi,” US 20210367758, published Nov. 25, 2021) in view of Wang et al. (“Wang,” US 20230153381, filed Nov. 17, 2021) and Huang et al. (“Huang,” US 20200410337, published Dec. 31, 2020). 
Regarding claim 10, Wang and Emmadi disclose the method of claim 1. Huang further discloses wherein the transformer network includes a convolution layer downstream of a plurality of attention heads (Huang [0032]-[0033]. In some neural networks, such as a convolutional neural network, a Transformer including multi-head attention models, a multi-layer perceptron, or other neural network models based on tensor operations, large input tensors may be processed to generate new output tensors (e.g., a tensor product). According to certain embodiments, a tensor operation, such as a convolution operation, a multi-head attention operation, or a multi-layer perceptron operation, may be split in certain manners into sub-operations to be performed in parallel by multiple computing engines, such that each computing engine may perform a sub-operation to generate a portion of the final results (e.g., an output tensor) of the tensor operation in a shorter time period.). 
Therefore, it would have been obvious to one of ordinary skill in the art on or before the effective filing date of the claimed invention to combine the teachings of Emmadi, Wang and Huang to include: wherein the transformer network includes a convolution layer downstream of a plurality of attention heads. One would have been motivated to provide user with a means for executing an AI transformer based multi-head attention models for training and inference. (See Huang [0032].)
Regarding claim 11, Wang and Emmadi disclose the method of claim 1. Huang further discloses wherein, at each of a plurality of attention heads included in the transformer network, the processor is configured to perform attention score scaling at a respective query projection layer (Huang [0153], [0155]. An attention function may map a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. A query vector q encodes the word/position that is paying attention. A key vector k encodes the word to which attention is being paid. The key vector k and the query vector q together determine the attention score between the respective words. The output is computed as a weighted sum of values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key. For instance, in the sentence “I like cats more than dogs,” one may want to capture the fact that the sentence compares two entities, while retaining the actual entities being compared. A Transformer may use the multi-head self-attention sub-layer to allow the encoder and decoder to see the entire input sequence all at once. To learn diverse representations, the multi-head attention applies different linear transformations to the values, keys, and queries for each attention head, where different weight matrices may be used for the multiple attention heads and the results of the multiple attention heads may be concatenated together.). 
Therefore, it would have been obvious to one of ordinary skill in the art on or before the effective filing date of the claimed invention to combine the teachings of Emmadi, Wang and Huang to include: at each of a plurality of attention heads included in the transformer network, the processor is configured to perform attention score scaling at a respective query projection layer. One would have been motivated to provide user with a means for executing an AI transformer based multi-head attention models for training and inference based on natural language processing. (See Huang [0155].)
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Emmadi et al. (“Emmadi,” US 20210367758, published Nov. 25, 2021) in view of Antic (“Antic,” US 20210342700, published Nov. 4, 2021). 
Regarding claim 20, Emmadi discloses 
A client computing device comprising: a client device processor configured to (Emmadi [0023]-[0024]. According to an embodiment of the disclosure, the system 100 is implemented on the client machine 102 and the server 104. The one or more hardware processors 108 work in communication with the memory 110. The one or more hardware processors 108 are configured to execute a plurality of algorithms stored in the memory 110.The input module 106 is accessible to the user via smartphones, laptop or desktop configuration thus giving the user the freedom to interact with the system 100 from anywhere anytime.): 
receive a plaintext query; generate an input embedding vector from the plaintext query; homomorphically encrypt the input embedding vector; transmit the homomorphically encrypted input embedding vector to a server computing device; subsequently to transmitting the homomorphically encrypted input embedding vector to the server computing device (Emmadi [0025]-[0026], [0028]. Initially at step 202, the website URL is provided as an input URL to the client machine 102, wherein the input URL is kept at a predefined character length. At step 204, a feature vector is extracted out of the input URL using one of a deep neural network (DNN) based technique, or an n-gram based feature extraction method. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. At step 304, the character level URL and the word level URL are embedded.),  
receive a plurality of homomorphically encrypted rectified linear unit (ReLU) [input] vectors from the server computing device; generate a plurality of ReLU [input] vectors by decrypting the plurality of homomorphically encrypted ReLU [input] vectors (Emmadi [0052]-[0053]. These feature vectors are concatenated to form 1024 dimension feature vector. The server 104 then passes this through a fully connected network. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results. According to an embodiment of the disclosure, URLnet uses ReLU in the fully connected layers and Softmax in the final output layer.); 
apply a ReLU function to each of the ReLU input vectors to compute a corresponding plurality of ReLU output vectors; homomorphically encrypt the plurality of ReLU output vectors (Emmadi [0026], [0038]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. This encrypted output is sent to an activation function to get an activation output. Prominent examples of activation functions include Sigmoid, Tanh, ReLU, leaky ReLU and Softmax.); 
transmit the plurality of homomorphically encrypted ReLU output vectors to the server computing device; subsequently to transmitting the plurality of homomorphically encrypted ReLU output vectors to the server computing device, receive a homomorphically encrypted output vector from the server computing device (Emmadi [0026], [0038], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. This encrypted output is sent to an activation function to get an activation output. Prominent examples of activation functions include Sigmoid, Tanh, ReLU, leaky ReLU and Softmax. The server 104 then passes this through a fully connected network. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.); 
compute a plaintext output at least by decrypting the homomorphically encrypted output vector; and output the plaintext output (Emmadi [0026], [0038], [0052]. At step 206, the feature vector is encrypted using a fully homomorphic encryption (FHE) method, wherein encryption results in generation of a representative encrypted feature vector of the input URL. Further at step 208, the representative encrypted feature vector of the input URL is sent to the server 104. This encrypted output is sent to an activation function to get an activation output. Prominent examples of activation functions include Sigmoid, Tanh, ReLU, leaky ReLU and Softmax. The server 104 then passes this through a fully connected network. Fully connected layer consists of four layers which convert input of 1024 feature vector, into 512, 256, 128 and 2 respectively. The first three layers use ReLU activation function and the output is sent to the client machine 102 which then decrypts and uses Softmax to get the classification results.).
Emmadi does not explicitly disclose: receive a plurality of homomorphically encrypted rectified linear unit (ReLU) input vectors from the server computing device. 
However, in an analogous art, Antic discloses a device, comprising: 
receive a plurality of homomorphically encrypted [rectified linear unit (ReLU)] input vectors from the server computing device (Antic [0018], [0045], [0055]. The computing system 100 may represent, for example, a user device such as a desktop, a laptop, a mobile phone, personal entertainment device, DVR, and so on.  In order to train the AI model, the training set of input feature vectors and the training set of output feature vectors may be provided as input to the AI model. It may be noted that the AI model may be trained to generate the training set of output feature vectors based on the training set of input feature vectors as input. It may be noted that if the AI model is being trained using encrypted training set of input vectors and encrypted training set of output vectors, then the method may proceed to an additional step 516, where the encrypted output feature vectors may be decrypted, by the trained AI model, to generate the output data. Further, the features of encrypted data presentation may be sent over a communication network, and homomorphic transformations may be applied as part of the service in the cloud.). 
Therefore, it would have been obvious to one of ordinary skill in the art on or before the effective filing date of the claimed invention to combine the teachings of Emmadi and Antic to include: receive a plurality of homomorphically encrypted rectified linear unit (ReLU) input vectors from the server computing device. One would have been motivated to provide user with a means for secure AI training and inference across the cloud network including server and client devices. (See Antic [0045].)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD LONG whose telephone number is (571)272-8961.  The examiner can normally be reached on Monday to Friday, 9 AM - 6  PM EST (Alternate Fridays).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Luu Pham can be reached on (571) 270-5002.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/EDWARD LONG/
Examiner, Art Unit 2439


/LUU T PHAM/            Supervisory Patent Examiner, Art Unit 2439
Read full office action
Prosecution Timeline

Aug 30, 2024
Application Filed
Jan 27, 2026
Non-Final Rejection mailed — §103
Mar 18, 2026
Interview Requested
Mar 31, 2026
Applicant Interview (Telephonic)
Apr 04, 2026
Examiner Interview Summary
Apr 27, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/326,384
Patent 12619704
ESTABLISHMENT OF SIGNING PIPELINES AND VALIDATION OF SIGNED SOFTWARE IMAGES
2y 11m to grant Granted May 05, 2026
17/294,691
Patent 12608467
CONTROLLER SYSTEM, CONTROL APPARATUS, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
4y 11m to grant Granted Apr 21, 2026
18/755,401
Patent 12603775
DATA INTERACTION
1y 9m to grant Granted Apr 14, 2026
18/645,371
Patent 12598090
INFORMATION PROCESSING SYSTEM
1y 11m to grant Granted Apr 07, 2026
18/473,717
Patent 12587387
PROTECTING WEBCAM VIDEO FEEDS FROM VISUAL MODIFICATIONS
2y 6m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
99%
With Interview (+48.0%)
2y 11m (~1y 2m remaining)
Median Time to Grant
Low
PTA Risk
Based on 185 resolved cases by this examiner. Grant probability derived from career allowance rate.