Last updated: May 29, 2026

Application No. 18/962,738

TRANSFORMER BLOCK BASED OBFUSCATION

Non-Final OA §102

Filed

Nov 27, 2024

Priority

Nov 27, 2023 — provisional 63/603,061

Examiner

SALEHI, HELAI

Art Unit

2433

Tech Center

2400 — Computer Networks

Assignee

Protopia AI Inc.

OA Round

1 (Non-Final)

Interview Optional

— +32.2% interview lift. Examiner has a relatively high allowance rate (73%); +32.2% interview lift. A written response may suffice.

Based on 527 resolved cases, 2023–2026

Examiner Intelligence

SALEHI, HELAI View full profile →

Grants 73% — above average

Career Allowance Rate

383 granted / 527 resolved

+14.7% vs TC avg

Strong +32% interview lift

Without

With

+32.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

14 currently pending

Career history

543

Total Applications

across all art units

Statute-Specific Performance

§101

2.3%

-37.7% vs TC avg

§103

72.6%

+32.6% vs TC avg

§102

23.4%

-16.6% vs TC avg

§112

0.2%

-39.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 527 resolved cases

Office Action

§102

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This is the initial office action has been issued in response to patent application, 18/962738, filed on 27 November 2024 with a provisional priority date of 27 November 2023.  Claims 1-20, as originally filed, are currently pending and have been considered below.  


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.



Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by David (US2023/0259787 A1, publish date 08/17/2023).

Claim 1:
With respect to claim 1, David discloses a method (Figure 4) (A set of obfuscated embeddings {tilde over (X)}.sub.emb 914 may be generated based on a learned transformation T.sub.ϕ, such that {tilde over (X)}.sub.emb:=T.sub.ϕ(X.sub.emb). The set of obfuscated embeddings {tilde over (X)}.sub.emb 914 may protect privacy of information, 0134, Figure 9B) comprising:
obtaining, by a computer system, a machine learning model, the machine learning model comprising at least one transformer block (deep learning models such as the transformer, 0106) (transformer encoder 922, Figure 9B);
generating, by the computer system, one or more estimator based on the at least one
transformer block (trained model, e.g., a vector of weights or thresholds, may be stored in memory and later retrieved for application to new calculations on newly calculated aggregate estimates. 0082),
wherein at least one estimator comprises a mean shift estimator (the stochastic layer may be a stochastic convolutional layer with a first filter that corresponds to the mean of a normal distribution and a second filter that corresponds to (the standard deviation of the normal distribution. the second data may include data (e.g., data indicating the mean of the normal distribution) that is generated by convolving the first filter over an input image. In this example, the second data may include data (e.g., data indicating the standard deviation of the normal distribution) that is generated by convolving the second filter over the input image., 0063) (the mean generated via the first filter and the standard deviation generated via the second filter (e.g., as discussed above) may be used to sample one or more values. The one or more values may be used as input into a subsequent layer. The subsequent layer may be a stochastic layer (e.g., a stochastic convolution layer, stochastic fully connected layer, stochastic activation layer, stochastic pooling layer, stochastic batch normalization layer, stochastic embedding layer, or a variety of other stochastic layers) or a non-stochastic layer (e.g., convolution, fully-connected, activation, pooling, batch normalization, embedding, or a variety of other layers)., 0064); and
wherein at least one estimator comprises a dispersion shift estimator (The obfuscator performing the transformation may be characterized as an unsupervised machine learning model. In order to determine a maximum noise that may be applied to the dataset D using gradient descent (such as stochastic gradient descent or other gradient based optimization) or another appropriate method, an autoencoder may be trained on the dataset D. , 0033) (by application of stochastic noise, 0134);
training, by the computer system, the one or more estimators to obfuscate input data for
the machine learning model (obfuscate training data. sufficiently accurate machine learning models may be trained using the transformed or obfuscated data set, 0031) (The autoencoder may instead or additionally be a neural network or other machine learning algorithm that generates embeddings. The autoencoder may be trained on a set of training data., 0055); and
storing, by the computer system, the trained one or more estimators in memory (The obfuscated data may include quasi-synthetic data, or multiple elements corresponding to different applications of stochastic noise to the dame element of the un-obfuscated dataset. The obfuscated data may be stored. The parameters of the noise used to create the obfuscated data may be stored. The parameters of the autoencoder, with or without the noise, may be stored. 0058).

Claim 2:
With respect to claim 2, David discloses wherein the mean shift estimator applies a deterministic mean Shift (augment otherwise deterministic autoencoders and/or neural networks with stochastic conditional noise layers, 0041) (the autoencoder comprises a deterministic layer, Claim 1).

Claim 3:
With respect to claim 3, David discloses The method of claim 1, wherein the dispersion shift estimator applies a stochastic based on a sequence-dependent measure of dispersion (The obfuscator performing the transformation may be characterized as an unsupervised machine learning model. In order to determine a maximum noise that may be applied to the dataset D using gradient descent (such as stochastic gradient descent or other gradient based optimization) or another appropriate method, an autoencoder may be trained on the dataset D. , 0033) (by application of stochastic noise, 0134).

Claim 4:
With respect to claim 4, David discloses wherein training the one or more estimators comprises training the one or more estimators by an optimization function (the obfuscation transform may be trained based on an optimization function, 0148) (adjust other machine learning models to reduce the error function, e.g., with a greedy algorithm that optimizes for the current iteration. The resulting, trained model, e.g., a vector of weights or thresholds, may be stored in memory and later retrieved for application to new calculations on newly calculated aggregate estimates., 0082).

Claim 5:
With respect to claim 5, David discloses wherein the optimization function comprises a first portion corresponding to a maximization of noise applied by at least some of the one or more estimators (to determine a maximum noise that may be applied to the dataset D using gradient descent (such as stochastic gradient descent or other gradient based optimization) or another appropriate method, an autoencoder may be trained on the dataset D, 0033) (the maximum noise variance may be determined as described herein and applied to one or more intermediate layers of a machine learning model, maximum reconstruction loss value may depend on the type of model as a subsequent machine learning model which is to be trained on the obfuscated data, 0068-0069).


Claim 6:
With respect to claim 6, David discloses wherein the optimization function comprises a second portion corresponding to a minimization of a performance loss for the machine learning model operating on obfuscated data output by the estimators (Data obfuscation may be presented as a gradient-based optimization that defines a loss function over a pre-trained machine learning model. This loss may be defined as finding the minimum perturbation (noise) over the input to the model that causes minimum reconstruction losses, 0046).

Claim 7:
With respect to claim 7, David discloses wherein training the one or more estimators further comprises:
generating a student machine learning model, the student machine learning model comprising the one or more estimators;
generating a teacher machine learning model, the teacher machine learning model not
comprising the one or more estimators; and
training the one or more estimators by an optimization function based on the student
machine learning model and the teacher machine learning model (One or more machine-learning models that are discussed above (e.g., in connection with FIG. 5 or the technical documentation) may be implemented, for example, as shown in FIG. 6. With respect to FIG. 6, machine-learning model 642 may take inputs 644 and provide outputs 646. 0075)(Figure 6) (Other models may also be used to account for the acquisition of information over time to predict future events, e.g., various recurrent neural networks, like long-short-term memory models trained on gradient descent after loop unrolling, reinforcement learning models, and time-series transformer architectures with multi-headed attention, 0082) (given a foundation model that generates representations of the source data (e.g., the input data, the data of the data collection, or another data store), a transformation, which may be stochastic, may be learned (e.g., trained) which is a significant transform (e.g., obfuscating) in the input space (e.g., on the data of the data collection), 0096).

Claim 8:
With respect to claim 8, David discloses wherein the optimization function comprises a similarity optimization function between the student machine learning model and the teacher machine learning model (may use a triplet loss network or Siamese networks to compute similarity between out-of-sample records and example records in a training set, e.g., determining based on cosine distance, Manhattan distance, or Euclidian distance of corresponding vectors in an encoding space (e.g., with more than 5 dimensions, such as more than 50). 0083) (Both encoders may then be trained such that the cosine similarity between the encoded image and its encoded label description is maximized, while any incorrect label descriptions have minimal cosine similarity., 0130).



Claim 9:
With respect to claim 9, David discloses wherein the optimization function comprises a distance distillation optimization function between the student machine learning model and the teacher machine learning model (formulations have been created to distill necessary information into a data obfuscation process. 0113) (to compute similarity between out-of-sample records and example records in a training set, e.g., determining based on cosine distance, Manhattan distance, or Euclidian distance of corresponding vectors in an encoding space (e.g., with more than 5 dimensions, such as more than 50), 0083).

Claim 10:
With respect to claim 10, David discloses wherein at least one of the student machine learning model and the teacher machine learning model comprises a truncated machine learning model (The tuning of the foundation model 1030 may be performed based on the original training data set (e.g., the training data set used to create the foundation model) or a subset thereof, or based on a different training data set (which may be a customization training data set). 0139).

Claim 11:
With respect to claim 11, David discloses wherein the optimization function comprises a third portion corresponding to a minimization of a performance loss for the machine learning model operating on obfuscated data output by the estimators (Data obfuscation may be presented as a gradient-based optimization that defines a loss function over a pre-trained machine learning model. This loss may be defined as finding the minimum perturbation (noise) over the input to the model that causes minimum reconstruction losses, 0046).

Claim 12:
With respect to claim 12, David discloses wherein the one or more estimators are substantially the same as the at least one transformer block (Various autoencoders may be used, including transformer architectures, 0033) (The transformer may be a building block for many foundation models: such as BERT and GPT-3 for language, and even ViT for vision., 0106) (performing step B can include the same computing device within the computer system performing both steps, 0153).

Claim 13:
With respect to claim 13, David discloses wherein at least one of the one or more estimators is non-causal (a non-stochastic layer (e.g., convolution, fully-connected, activation, pooling, batch normalization, embedding, or a variety of other layers). 0064).

Claim 14:
With respect to claim 14, David discloses wherein an attention mechanism of at least one of the one or more estimators is different than an attention mechanism of the transformer block (time-series transformer architectures with multi-headed attention, 0082).
Claim 15:
With respect to claim 15, David discloses wherein a flow mechanism of at least one of the one or more estimators is different than a flow mechanism of the transformer block (Dashed lines in FIG. 9A may represent a flow for obfuscated images. Figure 9A) (Figure 10).

Claim 16:
With respect to claim 16, David discloses wherein the one or more estimators operate on embeddings corresponding to input data for the machine learning model (a word embedding layer 910, which operates to generate embeddings, 0134, Figure 9B).

Claim 17:
With respect to claim 17, David discloses further comprising deploying, within a trusted network, the trained one or more estimators to obfuscate input data, wherein the obfuscated input data is transmitted to the machine learning model over an untrusted network (un-obfuscated training data may reside at a “trusted” computing device, and training may be performed on an “untrusted” computing device, The obfuscated training data may be proved to the untrusted destination where model training continues on the obfuscated data, 0042) (The obfuscation transform 1040 may be deployed, such as within an enterprise device 1070 (or otherwise within a secure or trusted computation unit, such as on a client device which obtains video recording), such that the sensitive data 1064 is obfuscated from the inference data D 1062 to produce inference data D′ 1080, from which private information is removed or otherwise obfuscated. The inference data D′ 1080 may then be transmitted to an unsecured or untrusted site, such as to cloud 1050 where a tuned version of the foundation model 1030 operates, 0144).

Claim 18:
With respect to claim 18, David discloses wherein training the one or more estimators comprises training the one or more estimators with the machine learning model on a trusted network, the method further comprising deploying the trained one or more estimators to provide obfuscated input data to the machine learning model over an untrusted network (The obfuscation transform 1040 may be deployed, such as within an enterprise device 1070 (or otherwise within a secure or trusted computation unit, such as on a client device which obtains video recording), such that the sensitive data 1064 is obfuscated from the inference data D 1062 to produce inference data D′ 1080, from which private information is removed or otherwise obfuscated. The inference data D′ 1080 may then be transmitted to an unsecured or untrusted site, such as to cloud 1050 where a tuned version of the foundation model 1030 operates, 0144).

Claim 19:
With respect to claim 19, David discloses A system (Figures 4, 10) (A set of obfuscated embeddings {tilde over (X)}.sub.emb 914 may be generated based on a learned transformation T.sub.ϕ, such that {tilde over (X)}.sub.emb:=T.sub.ϕ(X.sub.emb). The set of obfuscated embeddings {tilde over (X)}.sub.emb 914 may protect privacy of information, 0134, Figure 9B) comprising:
memory (memory 720, Figure 7), the memory configured to store:
a machine learning model (model 130, Figure 1), the machine learning model configured to produce output data based on input data; and
an obfuscation system (obfuscation, 110, Figure 1), the obfuscation system configured to obfuscate input data provided to the machine learning model; and
a processor (processors, 710a-710n, Figure 7), the processor configured to train one or more estimators of the obfuscation system by:
obtaining at least one transformer block from the machine learning model (deep learning models such as the transformer, 0106) (transformer encoder 922, Figure 9B);
generating one or more estimator based on the at least one transformer block (trained model, e.g., a vector of weights or thresholds, may be stored in memory and later retrieved for application to new calculations on newly calculated aggregate estimates. 0082),
wherein at least one estimator comprises a mean shift estimator (the stochastic layer may be a stochastic convolutional layer with a first filter that corresponds to the mean of a normal distribution and a second filter that corresponds to (the standard deviation of the normal distribution. the second data may include data (e.g., data indicating the mean of the normal distribution) that is generated by convolving the first filter over an input image. In this example, the second data may include data (e.g., data indicating the standard deviation of the normal distribution) that is generated by convolving the second filter over the input image., 0063) (the mean generated via the first filter and the standard deviation generated via the second filter (e.g., as discussed above) may be used to sample one or more values. The one or more values may be used as input into a subsequent layer. The subsequent layer may be a stochastic layer (e.g., a stochastic convolution layer, stochastic fully connected layer, stochastic activation layer, stochastic pooling layer, stochastic batch normalization layer, stochastic embedding layer, or a variety of other stochastic layers) or a non-stochastic layer (e.g., convolution, fully-connected, activation, pooling, batch normalization, embedding, or a variety of other layers)., 0064); and
wherein at least one estimator comprises a dispersion shift estimator (The obfuscator performing the transformation may be characterized as an unsupervised machine learning model. In order to determine a maximum noise that may be applied to the dataset D using gradient descent (such as stochastic gradient descent or other gradient based optimization) or another appropriate method, an autoencoder may be trained on the dataset D. , 0033) (by application of stochastic noise, 0134);
training the one or more estimators to obfuscate input data for the machine
learning model (obfuscate training data. sufficiently accurate machine learning models may be trained using the transformed or obfuscated data set, 0031) (The autoencoder may instead or additionally be a neural network or other machine learning algorithm that generates embeddings. The autoencoder may be trained on a set of training data., 0055); and
storing the trained one or more estimators in memory (The obfuscated data may include quasi-synthetic data, or multiple elements corresponding to different applications of stochastic noise to the dame element of the un-obfuscated dataset. The obfuscated data may be stored. The parameters of the noise used to create the obfuscated data may be stored. The parameters of the autoencoder, with or without the noise, may be stored. 0058).

Claim 20:
With respect to claim 20, David discloses further comprising an input device, wherein the obfuscation system is configured to obfuscate input data from the input device (input X, 906, Figure 9B).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure, see PTO Form 892.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Helai Salehi whose telephone number is 571-270-7468.  The examiner can normally be reached on Monday - Friday from 9 am to 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Jeff Pwu, can be reached on 571-272-6798.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HELAI SALEHI/           Examiner, Art Unit 2433                  

/JEFFREY C PWU/           Supervisory Patent Examiner, Art Unit 2433

Read full office action

Prosecution Timeline

Nov 27, 2024

Application Filed

Apr 08, 2026

Non-Final Rejection mailed — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/184,365

Patent 12632573

METHOD, APPARATUS, AND COMPUTER-READABLE MEDIUM FOR AUTOMATED CONSTRUCTION OF DATA MASKS

5y 2m to grant Granted May 19, 2026

18/176,791

Patent 12621349

DYNAMIC UNLOCKING OF SOFTWARE DEFINED SILICON (SDSi) PROCESSOR FEATURES VIA SECURITY PROTOCOL DATA MODEL (SPDM)

3y 2m to grant Granted May 05, 2026

17/663,766

Patent 12587382

METHOD AND SYSTEM FOR PROCESSING BIOMETRIC DATA

3y 10m to grant Granted Mar 24, 2026

19/195,323

Patent 12587504

CONNECTIONLESS-VIRTUAL PRIVATE NETWORK FOR SECURE CLOUD TO USER COMMUNICATION OVER THE INTERNET USING A PLURALITY OF SERVERS

10m to grant Granted Mar 24, 2026

17/552,322

Patent 12566860

STATIC-DYNAMIC INTEGRATION

4y 2m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

73%

Grant Probability

99%

With Interview (+32.2%)

3y 5m (~1y 11m remaining)

Median Time to Grant

Low

PTA Risk

Based on 527 resolved cases by this examiner. Grant probability derived from career allowance rate.