Last updated: July 17, 2026
Application No. 17/493,442
Systems And Methods For Parameter Sharing To Reduce Computational Costs Of Training Machine-Learned Models

Non-Final OA §103
Filed
Oct 04, 2021
Priority
Oct 02, 2020 — provisional 63/087,017
Examiner
MULLINAX, CLINT LEE
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
3 (Non-Final)
Interview Optional

— +36.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 48% grant rate with +36.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 127 resolved cases, 2023–2026
Examiner Intelligence

MULLINAX, CLINT LEE View full profile →
Grants 48% of resolved cases
Career Allowance Rate
61 granted / 127 resolved
-7.0% vs TC avg
Strong +36% interview lift
Without
With
+36.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
20 currently pending
Career history
158
Total Applications
across all art units
Statute-Specific Performance

§101
6.0%
-34.0% vs TC avg
§103
86.4%
+46.4% vs TC avg
§102
4.5%
-35.5% vs TC avg
§112
1.8%
-38.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 127 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/29/2026 has been entered.

Status of Claims
This action is a responsive to the application filed on 04/29/2026.
Claims 1-23 are pending.
Claims 1, 8, 11, 18, and 19 have been amended.
Claims 21-23 have been added.

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim 1, 11, and 19 under 35 U.S.C. 103, have been considered but are moot because the arguments do not apply to the current combination of references being used in the current rejection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-23 are rejected under 35 U.S.C. 103 as being unpatentable over Gong et al (“Efficient Training of BERT by Progressively Stacking”, 2019) hereinafter Gong, in view of Kearney et al (US Pub 20200411201) hereinafter Kearney, in view of Deng et al (“Scalable Stacking and Learning for Building Deep Architectures”, 2015) hereinafter Deng.
Regarding claims 1, 11, and 19, Gong teaches a computer-implemented method for reducing computational costs of training a machine-learned model; a computing system for reducing computational costs of training a machine-learned model, comprising: one or more processors; one or more tangible, non-transitory computer readable media storing computer-readable instructions that when executed by one or more processors cause the one or more processors to perform operations (section 4 teaches performing the model training embodiments of the disclosure on a “computation environment” including “GPUs” with executable “code” known to be included on a computer system with one or more memories), the operations comprising: 
performing, by a computing system comprising one or more computing devices, a first plurality of training iterations with the machine-learned model to adjust one or more parameters of a shared plurality of parameters (sections 3 and 4.1-4.2 teach a GPU executing an “iterative training algorithm based on our stacking technique to train a deep BERT faster”, wherein “we first train a 3-layer BERT for 50,000 steps (first plurality of training iterations), stack it twice into a 6-layer BERT and then train this 6-layer BERT for 70,000 steps (alternate first plurality of training iterations)” for tuning parameters), wherein the machine-learned model comprises a first model unit comprising a first plurality of parameters tied to the shared plurality of parameters during the first plurality of training iterations and a second model unit comprising a second plurality of parameters tied to the shared plurality of parameters during the first plurality of training iterations (sections 3 and 4.1-4.2 teach the GPU training a BERT model (machine-learned model) wherein “we first train a 3-layer BERT for 50,000 steps” (iterations) for tuning “parameters” to be shared, and progress training through progressive stacking (tied to the shared plurality of parameters) of model layers (model units) “with parameters” (each of the plurality of model units comprise a plurality of parameters). Section 1 further teaches “Once we have a shallow model, we can stack the shallow model into a deep model by sharing weight between the top self-attention layers and the bottom self-attention layers, and then fine-tune all the parameters.”),  and 
performing, by the computing system, a second plurality of training iterations with the machine-learned model to adjust one or more parameters of each of the first model unit and second model unit independent of the shared plurality of parameters (section 4.1 teaches via a GPU, sections 3 and 4.1-4.2 teach via a GPU, “we first train a 3-layer BERT for 50,000 steps (untying condition), stack it twice into a 6-layer BERT and then train this 6-layer BERT for 70,000 steps (second plurality of training iterations)” and so on (alternate second plurality of training iterations), wherein the layers include “parameters” being trained according to the newly stacked separate layers (each of the first model unit and second model unit independent of the shared plurality of parameters). Further, “[w]hen fine-tuning models on downstream tasks (alternative second plurality of training iterations), we use the same hyperparameter search space as BERT for each down-stream task. We perform a hyperparameter search on the validation set of each task with our baseline model and apply the resulting hyperparameter to other models. We use a new set of random seeds that is different from the seeds for hyperparameter search to prevent over-fitting”).

However, Gong does not explicitly teach and wherein adjusting the one or more parameters comprises averaging at least a first gradient determined with respect to the shared plurality of parameters based on the first model unit and a second, different gradient determined with respect to the shared plurality of parameters based on the second model unit.
Kearney teaches and wherein adjusting the one or more parameters comprises averaging at least a first gradient determined with respect to the shared plurality of parameters based on the first model unit and a second, different gradient determined with respect to the shared plurality of parameters based on the second model unit (paragraphs 0257-0258 and Fig. 19 teach multiple base models initialized and outputting gradients, wherein “The gradients from the multiple institutions selected at step 1802 may then be combined by the server system 1500 to obtain combined gradients, e.g. by averaging the gradients to obtain averaged gradients. The combined gradients may then be used to select new parameters for the combined moving model 1708 and the combined moving model 1708 is then updated according to the new parameters.”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kearney’s teachings of gradient averaging for machine learning model parameter tuning into Gong’s teaching of BERT layer weight stacking through copying trained layers and then fine-tuning weights in order to increase training speed and efficiency from diverse datasets (Kearney, paragraphs 0257-0260 and Figs. 18-19).
Further, Gong at least implies performing, by the computing system, a second plurality of training iterations with the machine-learned model to adjust one or more parameters of each of the first model unit and second model unit independent of the shared plurality of parameters (see mappings above), however Deng teaches performing…a second plurality of training iterations with the machine-learned model to adjust one or more parameters of each of the first model unit and second model unit independent of the shared plurality of parameters (sections 4-5.1 teach stacked (shared) weight matrix of a deep neural network layer (unit) weights being trained, and then performing weight “fine-tuning” per layer (first model unit and second model unit independent of the shared plurality of parameters) for a certain amount of iterations (second plurality of training iterations) while monitoring the “gradient” between layer weights for tuning).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Gong’s teaching of BERT layer weight stacking through copying trained layers and then fine-tuning weights, as modified by Kearney’s teachings of gradient averaging for machine learning model parameter tuning, to include Deng’s teachings of DNN layer weight stacking in matrices for training and further fine-tuning while monitoring gradient calculations in order to increasing training efficiency and accuracy through “parallel training on potentially very large data sets” (Deng, sections 4.1-4.2 and 7).

Regarding claims 2, 12, and 20, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 1, 11, and 19 above; and further teach wherein: the shared plurality of parameters is a first shared plurality of parameters (Gong, sections 3 and 4.1-4.2 teach training a BERT model (machine-learned model) through progressive stacking (shared plurality of parameters is a first shared) of model layers “with parameters” (plurality of parameters). Section 1 further teaches “Once we have a shallow model, we can stack the shallow model into a deep model by sharing weight between the top self-attention layers and the bottom self-attention layers, and then fine-tune all the parameters.”); the machine-learned model comprises two model groups respectively comprising a first subset of model units and a second subset of model units of a plurality of model units of the machine-learned model, wherein the first subset comprises the first model unit and the second model unit, and wherein the parameters of each of the second subset of model units is tied to a second shared plurality of parameters during the first plurality of training iterations and the second plurality of training iterations (Gong, sections 3 and 4.1-4.2 teach the GPU training a BERT model (machine-learned model) through progressive stacking (tied to a shared plurality of first/second group parameters) of model layers (first/second subset model units) “with parameters” (parameters of each of the first/second subset of model units). Starting as a compressed “3-layer BERT” (first/second model units) for training (first plurality of training iterations) and fine-tuning (alternative second plurality of training iterations) for updating layers to be stacked (second shared plurality of parameters) for training a “6-layer BERT for 70,000 steps (second plurality of training iterations)”.); and
performing the second plurality of training iterations comprises adjusting one or more of the second shared plurality of parameters (Gong, sections 3 and 4.1-4.2 teach via a GPU, “we first train a 3-layer BERT for 50,000 steps, stack it twice into a 6-layer BERT and then train this 6-layer BERT for 70,000 steps (second plurality of training iterations). In the final step, we stack the 6-layer BERT into a 12-layer BERT, and train the 12-layer BERT for 280,000 steps”, wherein the layers include “parameters” being trained according to the newly stacked separate layers. Here, it is interpreted that “6-layer BERT” train the previously shared parameters (as a “3-layer BERT”) (second plurality of training iterations…adjust…the second shared plurality of parameters) as independent parameters (second plurality of training iterations comprises adjusting one or more of the second shared plurality of parameters), then repeats the stacking process; thus, making the 6-layer BERT the shared parameters (alternate second plurality of training iterations…adjust…the shared plurality of parameters) to be trained independently as a “12-layer BERT”.), wherein adjusting one or more of the second shared plurality of parameters comprises combining a third gradient determined with respect to the second shared plurality of parameters based on a third model unit of the second subset with a fourth, different gradient determined with respect to the second shared plurality of parameters based on a fourth model unit of the second subset (Deng, sections 4-5.1 teach repeatedly training a stacked weight matrix of a deep neural network layer weights when stacking then network layers (second subset), and then performing weight “fine-tuning” per layer (third model unit and fourth) for a certain amount of iterations (second plurality of training iterations) while monitoring the “gradient” (third gradient…fourth, different gradient) between layer weights for tuning (combining)).
Gong, Kearney, and Deng are combinable for the same rationale as set forth above with respect to claims 1, 11, and 19.

Regarding claims 3 and 13, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 2 and 12 above; and further teach wherein the method further comprises: 
performing, by the computing system, a third plurality of training iterations with the machine-learned model to adjust one or more parameters of at least one of the third model unit and the fourth model unit independent of the first shared plurality of parameters and the second shared plurality of parameters (Gong, section 4.1 teaches via a GPU, sections 3 and 4.1-4.2 teach via a GPU, “In the final step, we stack the 6-layer BERT into a 12-layer BERT, and train the 12-layer BERT for 280,000 steps (performing…a third plurality of training iterations)”, wherein the layers include “parameters” being trained according to the newly stacked separate layers (adjust…parameters…of the third model unit and the fourth model unit independent of the first shared plurality of parameters and the second shared plurality of parameters). Further, “[w]hen fine-tuning models on downstream tasks (alternative third plurality of training iterations…independent), we use the same hyperparameter search space as BERT for each down-stream task. We perform a hyperparameter search on the validation set of each task with our baseline model and apply the resulting hyperparameter to other models. We use a new set of random seeds that is different from the seeds for hyperparameter search to prevent over-fitting”).

Regarding claims 4 and 14, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 1 and 11 above; and further teach wherein performing, by the computing system, the second plurality of training iterations further adjusts one or more of the shared plurality of parameters (Gong, sections 3 and 4.1-4.2 teach via a GPU, “we first train a 3-layer BERT for 50,000 steps, stack it twice into a 6-layer BERT and then train this 6-layer BERT for 70,000 steps (second plurality of training iterations). In the final step, we stack the 6-layer BERT into a 12-layer BERT, and train the 12-layer BERT for 280,000 steps”, wherein the layers include “parameters” being trained according to the newly stacked separate layers. Here, it is interpreted that “6-layer BERT” train the previously shared parameters (as a “3-layer BERT”) independently, then repeats the stacking process; thus, making the 6-layer BERT the shared parameters (second plurality of training iterations further adjusts one or more of the shared plurality of parameters) to be trained independently as a “12-layer BERT”.).

Regarding claims 5 and 15, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 1 and 11 above; and further teach further comprising: evaluating, by the computing system, one or more gradient statistics associated with at least one of the first plurality of training iterations (Deng, sections 4-5.1 teach stacked weight matrix of a deep neural network layer weights being trained, and then performing weight “fine-tuning” per layer for a certain amount of iterations while monitoring the “gradient”), wherein the first plurality of training iterations is ended based at least in part on the one or more gradient statistics (Deng, sections 3-5.1 teach stacked weight matrix of a deep neural network layer (first model unit and the second model unit) weights being trained until an “optimization problem” is satisfied (ended) while monitoring the computed “gradient” (based at least in part on the one or more gradient statistics) between layer weights).
Gong, Kearney, and Deng are combinable for the same rationale as set forth above with respect to claims 1, 11, and 19.

Regarding claims 6 and 16, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 1 and 11 above; and further teach wherein the first plurality of training iterations is ended responsive to the first plurality of training iterations exceeding a threshold number of training iterations (Gong, sections 3 and 4.1-4.3 teach via a GPU, “we first train a 3-layer BERT for 50,000 steps (first plurality of training iterations exceeds a threshold number of training iterations), stack it twice into a 6-layer BERT and then train this 6-layer BERT for 70,000 steps”, wherein the layers include “parameters” and the iteration times are tracked to be greater than the “threshold”).

Regarding claims 7 and 17, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 1 and 11 above; and further teach wherein the first model unit is adjacent to the second model unit (Gong, sections 3 and 4.1-4.2 teach connected (adjacent) 3 layer BERT model layers (units)).

Regarding claims 8 and 18, the combination of Gong, Kearney, and Deng teach all the claim limitations of claims 7 and 17 above; and further teach wherein the first plurality of training iterations is ended based at least in part on a correlation coefficient indicative of a correlation between gradients of at least the first model unit and the second model unit (Deng, sections 3-5.1 teach stacked weight matrix of a deep neural network layer (first model unit and the second model unit) weights being trained until an “optimization problem” is satisfied (untying condition) while monitoring the computed “gradient” (ended based at least in part on a correlation between gradients) between layer weights (first model unit and the second model unit).).
Deng at least implies wherein the first plurality of training iterations is ended based at least in part on a correlation coefficient indicative of a correlation between gradients of at least the first model unit and the second model unit (see mappings above); however Kearney teaches wherein the first plurality of training iterations is ended based at least in part on a correlation coefficient indicative of a correlation between gradients of at least the first model unit and the second model unit (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the combination model averaging the gradients of the models training iteration of the included models and computing the gradient loss function determinations (correlation coefficient) through “minimum loss function values of the current and previous iteration”).
Gong, Kearney, and Deng are combinable for the same rationale as set forth above with respect to claims 1, 11, and 19.

Regarding claim 9, the combination of Gong, Kearney, and Deng teach all the claim limitations of claim 1 above; and further teach wherein the first model unit and the second model unit share a model unit architecture (Gong, sections 3 and 4.1-4.2 teach connected BERT model layers (first model unit and the second model unit share a model unit architecture)).

Regarding claim 10, the combination of Gong, Kearney, and Deng teach all the claim limitations of claim 9 above; and further teach wherein the model unit architecture comprises a sequence of model layers (Gong, sections 3 and 4.1-4.2 teach connected BERT model layers (sequence of model layers)).

Regarding claim 21, the combination of Gong, Kearney, and Deng teach all the claim limitations of claim 1 above; and further teach wherein the shared plurality of parameters is a first shared plurality of parameters (Gong, sections 3 and 4.1-4.2 teach training a BERT model (machine-learned model) through progressive stacking (shared plurality of parameters is a first shared) of model layers “with parameters” (plurality of parameters). Section 1 further teaches “Once we have a shallow model, we can stack the shallow model into a deep model by sharing weight between the top self-attention layers and the bottom self-attention layers, and then fine-tune all the parameters.”);
wherein the second plurality of training iterations is begun based on a first correlation between the first gradient and the second gradient falling below a first correlation threshold (Kearney, paragraphs 0257-0259 and Fig. 19 teach “subsequent iterations” of training (second plurality of training iterations) after averaging the gradients of the previous iterations and computing the gradient loss function determinations through “minimum loss function values of the current and previous iteration (first correlation threshold)”);
wherein the first model unit and second model unit are adjacent model units in a series of four or more model units each tied to the shared plurality of parameters during the first plurality of training iterations (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the model having four or more layers, iteratively determining the gradient loss performance from training, and repeating training with the model including “merging” specific layers creating a segmented model of layers; wherein “The combined moving model 1708 as updated is then transmitted to the institutions 1502 and used (shared) and the moving base model 1712 in the next iteration”);
further comprising splitting, responsive to the first correlation falling below the first correlation threshold, the series of four or more model units into a first plurality of model units comprising the first model unit and a second plurality of model units comprising the second model unit (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the model having four or more layers, determining the gradient loss performance “is below a predefined convergence threshold or threshold condition” (responsive to the first correlation falling below the first correlation threshold), and repeating training with the model including “merging” specific layers creating a segmented model of layers (splitting…into a first plurality of model units comprising the first model unit and a second plurality of model units comprising the second model unit)); and
wherein the second plurality of training iterations comprises adjusting a second shared plurality of parameters based on an average of two or more gradients determined with respect to the second shared plurality of parameters based respectively on two or more model units of the first plurality of model units (Kearney, paragraphs 0257-0258 and Fig. 19 teach multiple base models initialized and outputting gradients, wherein “The gradients from the multiple institutions selected at step 1802 may then be combined by the server system 1500 to obtain combined gradients, e.g. by averaging the gradients to obtain averaged gradients. The combined gradients may then be used (based on an average of two or more gradients) to select new parameters for the combined moving model 1708 and the combined moving model 1708 is then updated according to the new parameters…The combined moving model 1708 as updated is then transmitted to the institutions 1502 and used (shared) and the moving base model 1712 in the next iteration”, and the process repeated for training and merging model layers (units)).
Gong, Kearney, and Deng are combinable for the same rationale as set forth above with respect to claims 1, 11, and 19.

Regarding claim 22, the combination of Gong, Kearney, and Deng teach all the claim limitations of claim 21 above; and further teach comparing, at each of one or more training iterations of the first plurality of training iterations, a plurality of pairwise correlation values to the first correlation threshold, each pairwise correlation value indicative of a correlation between a respective pair of consecutive model units of the series of four or more units (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the combination model averaging the gradients of the models training iteration of the included models and computing the gradient loss function determinations through “minimum loss function values of the current and previous iteration (the first correlation threshold)”);
wherein the first correlation is a first pairwise correlation value of the plurality of pairwise correlation values (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the combination model averaging the gradients of the models training iteration of the included models and computing the gradient loss function determinations through “minimum loss function values of the current and previous iteration”); and
wherein splitting the series of four or model units comprises splitting, responsive to the first correlation falling below the first correlation threshold, the series of four or more model units into a first consecutive subseries and a second consecutive subseries at a boundary between the first model unit and the second model unit (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the model having four or more layers, determining the combination models’ gradient loss performance “is below a predefined convergence threshold or threshold condition” (responsive to the first correlation falling below the first correlation threshold), and repeating training with the model including “merging” specific layers within the model creating a segmented model of layers (splitting…into a first consecutive subseries and a second consecutive subseries at a boundary between the first model unit and the second model unit)).
Gong, Kearney, and Deng are combinable for the same rationale as set forth above with respect to claims 1, 11, and 19.

Regarding claim 23, the combination of Gong, Kearney, and Deng teach all the claim limitations of claim 22 above; and further teach wherein splitting the series of four or model units into the first consecutive subseries and the second consecutive subseries comprises grouping the first consecutive subseries based at least in part on a second pairwise correlation value between the first model unit and a third model unit, wherein the third model unit is adjacent to the first model unit in the series of four or more model units (Kearney, paragraphs 0251-0253, 0257-0259, 0263, and Fig. 19 teach the model having four or more layers, determining the combination models’ gradient loss performance “is below a predefined convergence threshold or threshold condition” (responsive to the first correlation falling below the first correlation threshold), and repeating training with the updated model; each including “merging” specific layers creating a segmented model of layers being input and first hidden layers, merged layer, final layer, and consolidation layer (splitting…into a first plurality of model units comprising the first model unit and a second plurality of model units comprising the second model unit)).
Gong, Kearney, and Deng are combinable for the same rationale as set forth above with respect to claims 1, 11, and 19.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Liu et al (US Pub 20210065011) teaches neural network training with end conditions and gradient averaging.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Show 5 earlier events
Dec 11, 2025
Response Filed
Jan 29, 2026
Final Rejection mailed — §103
Mar 09, 2026
Interview Requested
Mar 16, 2026
Examiner Interview Summary
Mar 16, 2026
Applicant Interview (Telephonic)
Apr 29, 2026
Request for Continued Examination
May 02, 2026
Response after Non-Final Action
May 21, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/221,791
Patent 12664407
Resource-Efficient Attention in a Neural Network
5y 2m to grant Granted Jun 23, 2026
17/519,457
Patent 12665986
LIVE STYLE TRANSFER ON A MOBILE DEVICE
4y 7m to grant Granted Jun 23, 2026
17/386,020
Patent 12645983
TRAINING A MACHINE LEARNING-BASED TRAFFIC ANALYZER USING A PROTOTYPE DATASET
4y 10m to grant Granted Jun 02, 2026
18/063,671
Patent 12646010
SYSTEMS AND METHODS FOR LEVERAGING A KNOWLEDGE GRAPH
3y 5m to grant Granted Jun 02, 2026
16/005,750
Patent 12619424
ROBOTIC SCRIPT GENERATION BASED ON PROCESS VARIATION DETECTION
7y 10m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
48%
Grant Probability
84%
With Interview (+36.2%)
4y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 127 resolved cases by this examiner. Grant probability derived from career allowance rate.