Last updated: April 19, 2026

Application No. 17/589,542

ATTENTION NEURAL NETWORKS WITH SPARSE ATTENTION MECHANISMS

Final Rejection §101§DP

Filed

Jan 31, 2022

Examiner

VAUGHN, RYAN C

Art Unit

2125

Tech Center

2100 — Computer Architecture & Software

Assignee

Google LLC

OA Round

2 (Final)

This examiner grants 62% of cases after interview

— +19.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 235 resolved cases, 2023–2026

Examiner Intelligence

VAUGHN, RYAN C View full profile →

Grants 62% of resolved cases

Career Allow Rate

145 granted / 235 resolved

+6.7% vs TC avg

Strong +19% interview lift

Without

With

+19.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 9m

Avg Prosecution

45 currently pending

Career history

280

Total Applications

across all art units

Statute-Specific Performance

§101

23.9%

-16.1% vs TC avg

§103

40.1%

+0.1% vs TC avg

§102

7.6%

-32.4% vs TC avg

§112

21.9%

-18.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 235 resolved cases

Office Action

§101 §DP

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 2-21 are presented for examination.

Response to Amendment
Applicant’s filing of a terminal disclaimer has overcome the nonstatutory double patenting rejection.  Therefore, that rejection is withdrawn.  However, Applicant has not submitted amendments to the specification.  Therefore, the objections to the specification are maintained.

Specification
Examiner objects to the specification for containing various grammatical informalities.  Examiner has attached a marked-up copy of the specification indicating where errors have occurred.  To the extent that the markings are not self-explanatory and are not corrected, Examiner will enumerate the remaining objections in a subsequent Office Action.

Claim Rejections - 35 USC § 101
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 2-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Claim 2
Step 1:  The claim recites a system comprising one or more computers and one or more storage devices; therefore, it is directed to the statutory category of machines.
Step 2A Prong 1:  The claim recites, inter alia:
[G]enerat[ing] an attended input sequence comprising a respective attended input at each of the plurality of input positions:  This limitation could encompass mentally generating the input sequence.
[F]or each input position in a second subset of the input positions, generating the attended input at the input position:  This limitation could encompass mentally generating an input for each input position.
[U]sing the query at the input position to attend over only the keys at a corresponding proper subset of the input positions to generate a respective weight for each of the input positions in the corresponding proper subset and computing a weighted sum of the value inputs at the corresponding proper subset of the input positions in accordance with the respective weights for the corresponding proper subset of the input positions, the corresponding proper subset of input positions for each input position in the second subset including: a first proper subset of the input positions that is shared for each input position in the second subset; one or more input positions randomly selected from the input positions that are outside of the first proper subset; and each input position that is within a window of a fixed number of positions of the input position in the second subset:  This limitation could encompass mentally generating the weight for a subset of the input positions including a shared first proper subset of the input positions, a randomly selected subset of input positions, and input positions within a window of another input position, and computing a weighted sum of these inputs, by paying attention only to keys at a subset of the input positions.  Additionally, the calculation of the weighted sum is a mathematical concept.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites a “system for performing a machine learning task on a network input to generate a network output, the system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement: an attention neural network configured to perform the machine learning task, the attention neural network comprising one or more sparse attention layers, each sparse attention layer comprising one or more sparse attention sub-layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with generically recited classes of computer algorithm.  MPEP § 2106.05(f).
The claim further recites “receiv[ing] a sequence of queries derived from an input sequence to the sparse attention layer, the sequence of queries having a respective query at each of a plurality of input positions; receiv[ing] a sequence of keys derived from the input sequence to the sparse attention layer, the sequence of keys having a respective key at each of the plurality of input positions; [and] receiv[ing] a sequence of value inputs derived from the input sequence to the sparse attention layer, the sequence of value inputs having a respective value input at each of the plurality of input positions”.  However, these limitations amount to the insignificant extra-solution activity of mere data gathering.  MPEP § 2106.05(g).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The three receiving limitations, in addition to being insignificant extra-solution activity, also recite the well-understood, routine, and conventional activity of receiving or transmitting data over a network.  MPEP § 2106.05(d)(II); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network).  Otherwise, the analysis is identical to that of step 2A, prong 2.  As an ordered whole, the claim is directed to a mathematical algorithm of generating an attended input sequence using generically recited attention neural networks.  Nothing in the claim provides significantly more than this.  As such, the claim is not patent eligible.

Claim 3
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “for each input position in the first proper subset of the input positions, generating the attended input at the input position by: using the query at the input position to attend over all of the keys in the sequence of keys to generate a respective weight for all of the input positions and computing a weighted sum of the value inputs at all of the input positions in accordance with the respective weights.”  This limitation could encompass mentally paying attention to the keys in a sequence using the query at the input position, generating a weight for each input position, and computing a weighted sum of the values.  Additionally, the calculation of the weighted sum is a mathematical concept.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.
Step 2B:  The claim does not contain significantly more than the judicial exception.

Claim 4
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites that “the second subset is a proper subset of the input positions and the second subset is disjoint from the first proper subset”.  The manipulation of the subsets as claimed in claim 2 remains an abstract idea under these further assumptions.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  See claim 3 analysis.
Step 2B:  The claim does not contain significantly more than the judicial exception.  See claim 3 analysis.

Claim 5
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “augment[ing] the network input by adding one or more pre-determine[d] global tokens before processing the network input, … wherein the first proper subset of input positions correspond[s] to the positions at which the one or more global tokens are added.”  This limitation could encompass mentally adding tokens to a dataset or writing them down with a pen and paper.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that “the network input is a sequence having a respective token at each of a plurality of the input positions” and that the operations are performed by an “attention neural network”.  However, these are mere instructions to apply the judicial exception using a generic computer programmed with generically recited classes of computer algorithm.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites that “the network input is a sequence having a respective token at each of a plurality of the input positions” and that the operations are performed by an “attention neural network”.  However, these are mere instructions to apply the judicial exception using a generic computer programmed with generically recited classes of computer algorithm.  MPEP § 2106.05(f).

Claim 6
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “designat[ing] a fixed number of the plurality of input positions as the first proper subset of input positions.”  This limitation could encompass mentally designating a number of input positions as a proper subset.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that “the network input is a sequence having a respective token at each of a plurality of the input positions” and that the designation is performed by an “attention neural network”.  However, these are mere instructions to apply the judicial exception using a generic computer programmed with generically recited classes of computer algorithm.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites that “the network input is a sequence having a respective token at each of a plurality of the input positions” and that the designation is performed by an “attention neural network”.  However, these are mere instructions to apply the judicial exception using a generic computer programmed with generically recited classes of computer algorithm.  MPEP § 2106.05(f).

Claim 7
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “appl[ying], for each sparse attention sub-layer, a respective query linear transformation to the input sequence to generate the sequence of queries for the sub-layer.”  This limitation could encompass mentally applying a query linear transformation to an input sequence; this operation also recites a mathematical concept.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that the transformation is applied by “each of the one or more sparse attention layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites that the transformation is applied by “each of the one or more sparse attention layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).

Claim 8
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “appl[ying], for each sparse attention sub-layer, a respective key linear transformation to the input sequence to generate the sequence of keys for the sparse attention sub-layer.”  This limitation could encompass mentally applying a key linear transformation to an input sequence; this operation also recites a mathematical concept.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that the transformation is applied by “each of the one or more sparse attention layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites that the transformation is applied by “each of the one or more sparse attention layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).

Claim 9
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “appl[ying], for each sparse attention sub-layer, a respective a respective value linear transformation to the input sequence to generate the sequence of value inputs for the sparse attention sub-layer.”  This limitation could encompass mentally applying a value linear transformation to an input sequence; this operation also recites a mathematical concept.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that the transformation is applied by “each of the one or more sparse attention layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites that the transformation is applied by “each of the one or more sparse attention layers”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).

Claim 10
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “generat[ing] a final … input sequence from the … input sequences generated by the one or more sub-layers.”  This limitation could encompass mentally generating the final input sequence by combining the input sequences.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that the input sequences are “attended”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites that the input sequences are “attended”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).

Claim 11
Step 1:  A machine, as above.
Step 2A Prong 1:  The claim recites, inter alia, “applying a sequence of transformations to the attended layer input at the input position to generate a layer output for the input position.”  To the extent that the transformations in question are mathematical transformations, this represents a mathematical concept.  Moreover, this could be performed mentally.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The claim further recites that “each sparse attention layer further comprises: one or more position-wise feed-forward layers that are configured to generate an output sequence for the layer from the final attended input sequence, the output sequence comprising a respective layer output at each of the plurality of input positions”.  However, this is a mere instruction to apply the judicial exception using a generic computer programmed with a generic class of computer algorithm.  MPEP § 2106.05(f).
The claim further recites “for each of the plurality of input positions: receiving an attended layer input at the input position”.  This limitation represents the insignificant extra-solution activity of mere data gathering and output.  MPEP § 2106.05(g).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The claim further recites “for each of the plurality of input positions: receiving an attended layer input at the input position”.  This limitation represents the well-understood, routine, and conventional activity of receiving or transmitting data over a network.  MPEP § 2106.05(d)(II); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network).  Otherwise, the analysis at this step is identical to that of step 2A, prong 2.

Claim 12
Step 1:  The claim recites one or more non-transitory computer-readable storage media; therefore, it is directed to the statutory category of articles of manufacture.
Step 2A Prong 1:  The claim recites the same judicial exceptions as in claim 1.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The analysis at this step mirrors that of claim 1, except insofar as this claim recites “[o]ne or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to implement [the] attention neural network”.  However, this is a mere instruction to apply the judicial exception using a generic computer.  MPEP § 2106.05(f).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The analysis at this step mirrors that of claim 1, except insofar as this claim recites “[o]ne or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to implement [the] attention neural network”.  However, this is a mere instruction to apply the judicial exception using a generic computer.  MPEP § 2106.05(f).

Claims 13-21
Step 1:  The claims recite a method; therefore, they are directed to the statutory category of processes.
Step 2A Prong 1:  The claims recite the same judicial exceptions as in claims 2-10, respectively.
Step 2A Prong 2:  This judicial exception is not integrated into a practical application.  The analysis at this step mirrors that of claims 2-10, respectively, except insofar as these claims recite “receiv[ing] a network input”, which is directed to the insignificant extra-solution activity of mere data gathering and output.  MPEP § 2106.05(g).
Step 2B:  The claim does not contain significantly more than the judicial exception.  The analysis at this step mirrors that of claims 2-10, respectively, except insofar as these claims recite “receiv[ing] a network input”, which is directed to the well-understood, routine, and conventional activity of receiving or transmitting data over a network.  MPEP § 2106.05(d)(II); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network).

Response to Arguments
Applicant's arguments filed February 26, 2026 (“Remarks”) have been fully considered but they are not persuasive.
Applicant argues that the claims are eligible under 35 USC § 101 because the specification describes an improvement to attention neural networks such that they may operate in memory-constrained environments by reducing the number of keys needed to be loaded on the constrained memory space in order to compute an attended input.  Applicant further argues that the claims reflect these purported advantages by stating that the mechanism attends over only the keys at a proper subset of the input positions.  Remarks at 10-12.  However, the limitations that allegedly provide the practical application according to Applicant are part of the judicial exception itself.  The judicial exception itself cannot provide the inventive concept.  MPEP § 2106.05(I).  Here, the attention mechanism claimed is widely understood to involve assigning a soft weight to each item in a sequence.  Moreover, page 12 of the specification as filed (particularly lines 13-20), as well as the claim itself, indicate that the generation of the attended input involves generating a weight for each input position in a proper subset of input positions and computing a weighted sum of value inputs in accordance with the weights.  That is, the specification supports the conclusion that the attention mechanism itself is a mathematical concept.  Compare Example 47, claim 2 (concluding that a limitation directed to training a network using backpropagation with gradient descent recites a mathematical concept when the mathematical operations involved are disclosed by the specification).  Moreover, since the claims contain no limitations on the complexity of the data being attended to nor the mechanism by which the attention is carried out (beyond the limitations previously recited), attending over the keys at a proper subset of input positions may be mentally performed.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7:00a-5:00p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/RYAN C VAUGHN/             Primary Examiner, Art Unit 2125

Read full office action

Prosecution Timeline

Jan 31, 2022

Application Filed

Apr 20, 2023

Response after Non-Final Action

Sep 25, 2025

Non-Final Rejection — §101, §DP

Feb 26, 2026

Response Filed

Mar 17, 2026

Final Rejection — §101, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/304,163

Patent 12602448

PROGRESSIVE NEURAL ORDINARY DIFFERENTIAL EQUATIONS

2y 5m to grant Granted Apr 14, 2026

17/465,916

Patent 12602610

CLASSIFICATION BASED ON IMBALANCED DATASET

2y 5m to grant Granted Apr 14, 2026

17/227,817

Patent 12561583

Systems and Methods for Machine Learning in Hyperbolic Space

2y 5m to grant Granted Feb 24, 2026

17/730,148

Patent 12541703

MULTITASKING SCHEME FOR QUANTUM COMPUTERS

2y 5m to grant Granted Feb 03, 2026

17/830,142

Patent 12511526

METHOD FOR PREDICTING A MOLECULAR STRUCTURE

2y 5m to grant Granted Dec 30, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

62%

Grant Probability

81%

With Interview (+19.4%)

3y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 235 resolved cases by this examiner. Grant probability derived from career allow rate.