Prosecution Insights
Last updated: April 19, 2026
Application No. 18/564,859

CHARACTER-LEVEL ATTENTION NEURAL NETWORKS

Final Rejection §103
Filed
Nov 28, 2023
Examiner
SONIFRANK, RICHA MISHRA
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
2 (Final)
66%
Grant Probability
Favorable
3-4
OA Rounds
3y 3m
To Grant
91%
With Interview

Examiner Intelligence

Grants 66% — above average
66%
Career Allow Rate
250 granted / 379 resolved
+4.0% vs TC avg
Strong +25% interview lift
Without
With
+24.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
29 currently pending
Career history
408
Total Applications
across all art units

Statute-Specific Performance

§101
16.6%
-23.4% vs TC avg
§103
56.1%
+16.1% vs TC avg
§102
11.2%
-28.8% vs TC avg
§112
8.2%
-31.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 379 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The office action sent in response to Applicant’s communication received on 11/23/2023 for the application number 18564859. The office hereby acknowledges receipt of the following placed of record in the file: Specification, Abstract, Oath/Declaration and claims. Status of the claims Claims 4-13 are amended. Claims 17-20 are added. Claims 1-20 are presented for examination. Examiner’s Note regarding 101 Claims are not rejected under 101 since the claim do not recite an abstract idea. Independent claim relates to using a plural specific neural network model to perform the task. Based on the specification the specific output generated by these combination of neural network can perform the given task with reduced runtime latency, e.g., in terms of wall clock time that is needed to perform an inference on an input hence the claims with Ex Parte Desjardins which reads “xiv. Improvements to computer component or system performance based upon adjustments to parameters of a machine learning model associated with tasks or workstreams; Ex Parte Desjardins, Appeal No. 2024-000567 (PTAB September 26, 2025, Appeals Review Panel Decision) (precedential).” Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 5-10, and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Henderson ( EP 3819809) and further in view of Morshita (US 20220229982) and further in view of Malkiel (US 20220318504) Regarding claim 1, Henderson teaches a system for performing a machine learning task on an input sequence of characters that has a respective character at each of a plurality of character positions to generate a network output ( first model 205 represent the unit text embedding, where unit text is a sequence of characters, Para 0063, Fig 5a) , the system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform one or more operations to implement: a neural network configured to perform the machine learning task, the neural network comprising a sub-word tokenizer and an output neural network ( each unit is converted to a list of unit or tokens which includes subwords, Para 0062) , the gradient- based sub-word tokenizer configured to: receive a sequence of character embeddings that includes a respective character embedding at each of the plurality of character positions ( positional encodings of the subword which includes the character positions, Para 0084-0086, Fig 5a-c) ; and for each particular character position of the plurality of character positions: generate a plurality of candidate sub-word blocks, each candidate sub-word block comprising the respective character embeddings at each character position in a corresponding set of one or more character continuous positions that includes the particular character position ( the subword embeddings sequence is augmented with positional encodings. The positional encodings are in the form of vectors of length D, with one positional encoding vector corresponding to each embedding in the input sequence. The positional encoding vector is summed with the corresponding embedding in the sequence, Para 0086-0090; augmenting the positional embeddings of the subwords, Para 0086) ; generate a respective sub-word block embedding for each of the plurality of candidate sub-word blocks; , comprising processing each of the plurality of sub-word block embeddings using a block scoring neural network ( scoring, Fig 7 and Fig 8) ; and the output neural network configured to: receive an output neural network input derived from the latent sub-word representations at the plurality of character positions; and process the output neural network input to generate the network output for the machine learning task ( output, Fig 7 and Fig 8, task could be a translation task, Para 0106) Henderson does not explicitly teach determine a respective relevance score for each of the plurality of sub- word block embeddings; and generate a latent sub-word representation for the particular character position, comprising determining a weighted combination of the plurality of sub-word block embeddings weighted by the relevance scores However, Morishita determine a respective relevance score for each of the plurality of sub- word block embeddings ( value of the subwords are determined, Para 0041-0044, Fig 3) ; and generate a sub-word representation for the particular character position, comprising determining a weighted combination of the plurality of sub-word block embeddings weighted by the relevance scores ( all the values are merged and meaning part of the word is determined, Para 0041-0044) It would have been obvious having the teachings of Henderson to further include the concept of Morishita before effective filing date to improve processing efficiency of neural machine translation ( Para 0080, Morishita) Henderson does not teach a gradient-based sub-word tokenizer and an output neural network; a latent representation for the position However Malkiel teaches a gradient-based sub-word tokenizer and an output neural network (gradient based tokenization ) and generate a latent representation for the position of the word ( identify a word-pair from the set of word-pairs having a highest weight, wherein the identified word-pair having the highest weight is selected; [0138] scale at least one gradient map by a multiplication with the corresponding activation maps and summed across the feature dimensions to produce one or more saliency score(s) for every token associated with a selected paragraph; [0139] maximize the similarity score between the aggregated latent representation of a matched word associated with a description of the recommended item and a word associated with a description of the seed item; and [0140] aggregate token saliency scores associated with at least one word in an item description to generate word-scores, Para 0137-0140) It would have been obvious having the teachings of Henderson and Morishita to further include the concept of Malkiel before effective filing date to reduce prediction error ( Para 0003, 0021, Malkiel) Regarding claim 4, Henderson as above in claim 1, teaches ,wherein the output neural network comprises one or more attention neural network layers that are each configured to: apply an attention mechanism to an attention layer input derived from the output neural network input to generate an attention layer output for the attention neural network layer ( attention mechanism, Para 0084, 0089-0090, Henderson) Regarding claim 5, Henderson as above in claim 1, teaches, wherein, for each particular character position of the plurality of character positions: each candidate sub-word block comprises the respective character embeddings at each of the one or more continuous character positions that begin from the particular character position ( character positions, Fig 5a, Para 0084-0088, Henderson) Regarding claim 7, Henderson as above in claim 1, teaches, ,wherein the gradient- based sub-word tokenizer is further configured to shift the sequence of character embeddings by one or more character positions prior to generating the plurality of candidate sub-word blocks ( augmentation before generating the subword, Para 0086) Regarding claim 10, Henderson as above in claim 1, teaches, wherein the gradient- based sub-word tokenizer is configured to determine the respective relevance score for each of the plurality of sub-word block embeddings based on applying a parameterized linear transformation function to each of the plurality of sub-word block embeddings ( parametrized linear transformation is applied for the embedding, Fig 6 a 6b, Henderson and the scores are calculated for the embedding as showing in Fig 3, Morishita) Regarding claim 12, arguments analogous to claim 1, are applicable. In addition Henderson teaches One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to implement: a neural network configured to perform a machine learning task on an input sequence of characters that has a respective character at each of a plurality of character positions to generate a network output ( Para 0082-0086, Fig 5) Regarding claim 13, arguments analogous to claim 1, are applicable. Regarding claim 14, Henderson as above in claim 13, teaches training the neural network by jointly training the gradient-based sub-word tokenizer and the output neural network based on optimizing a supervised learning objective function ( fig 5b, para 0111, 0113) Regarding claim 19, arguments analogous to claim 4, are applicable Regarding claim 20, arguments analogous to claim 5, are applicable Claims 2, 8 and 17 rejected under 35 U.S.C. 103 as being unpatentable over Henderson ( EP 3819809) and further in view of Morishita (US 20220229982) and further in view of Malkiel (US 20220318504) and further in view of Zhang (US 20210098134) Regarding claim 2, Henderson modified by Malkiel as above in claim 1, does not explicitly teach wherein the gradient-based sub-word tokenizer is further configured to apply a down-sampling function to the latent sub-word representations at the plurality of character positions to generate the output neural network input However, Zhang teaches wherein the gradient-based sub-word tokenizer is further configured to apply a down-sampling function to the latent sub-word representations at the plurality of character positions to generate the output neural network input ( downsampling the input through maxpooling , Para 0089-0090) It would have been obvious having the teachings of Henderson and Morishita and Malkiel to further include the concept of Zhang before effective filing date since the downsampling reduces the dimensionality which reduced complexity ( Para 0090, Zhang) Regarding claim 8, Henderson modified by Morishita and Malkiel as above in claim 1, does not explicitly teach wherein the gradient- based sub-word tokenizer is further configured to apply a 1-D convolution function to the sequence of character embeddings prior to generating the plurality of candidate sub-word blocks However, Zhang teaches wherein the gradient- based sub-word tokenizer is further configured to apply a 1-D convolution function to the sequence of character embeddings prior to generating the plurality of candidate sub-word blocks ( The embeddings layer in the architecture internally learned the vectorized representations for characters. This was followed by two, one-dimensional (1D) convolutions (with different kernel sizes) which tend to capture sub-word information from the input text. Maxpooling is a sample-based discretization process, Para 0089-0090, 0159, Claim 16) It would have been obvious having the teachings of Henderson and Morishita and Malkiel to further include the concept of Zhang before effective filing date since the 1 d convolution will capture sub-word information hence helping with the downstream task ( Para 0159, Zhang) Regarding claim 17, arguments analogous to claim 2, are applicable Allowable Subject Matter Claims 3, 6, 9 , 11, 15-16 and 18 are being objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to Richa Sonifrank whose telephone number is (571)272-5357. The examiner can normally be reached M-T 7AM - 5:30PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Phan Hai can be reached at (571)272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Richa Sonifrank/Primary Examiner, Art Unit 2654
Read full office action

Prosecution Timeline

Nov 28, 2023
Application Filed
Dec 18, 2025
Non-Final Rejection — §103
Mar 23, 2026
Applicant Interview (Telephonic)
Mar 27, 2026
Response Filed
Apr 02, 2026
Examiner Interview Summary
Apr 13, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602552
Machine-Learning-Based OKR Generation
2y 5m to grant Granted Apr 14, 2026
Patent 12603085
ENTITY LEVEL DATA AUGMENTATION IN CHATBOTS FOR ROBUST NAMED ENTITY RECOGNITION
2y 5m to grant Granted Apr 14, 2026
Patent 12585883
COMPUTER IMPLEMENTED METHOD FOR THE AUTOMATED ANALYSIS OR USE OF DATA
2y 5m to grant Granted Mar 24, 2026
Patent 12585877
GROUPING AND LINKING FACTS FROM TEXT TO REMOVE AMBIGUITY USING KNOWLEDGE GRAPHS
2y 5m to grant Granted Mar 24, 2026
Patent 12579988
METHOD AND APPARATUS FOR CONTROLLING AUDIO FRAME LOSS CONCEALMENT
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
66%
Grant Probability
91%
With Interview (+24.9%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 379 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month