Last updated: May 29, 2026

Application No. 18/749,483

TOKENIZING PROGRAMMING CODE WITH CANONICAL REPRESENTATIONS

Non-Final OA §102

Filed

Jun 20, 2024

Priority

Jun 23, 2023 — provisional 63/509,953

Examiner

AGUSTIN, PETER VINCENT

Art Unit

2688

Tech Center

2600 — Communications

Assignee

Aurora Labs Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +12.6% interview lift. Interview lift (+12.6%) is below the 15.0% threshold. A written response is recommended.

Based on 865 resolved cases, 2023–2026

Examiner Intelligence

AGUSTIN, PETER VINCENT View full profile →

Grants 84% — above average

Career Allowance Rate

726 granted / 865 resolved

+21.9% vs TC avg

Moderate +13% lift

Without

With

+12.6%

Interview Lift

resolved cases with interview

Fast prosecutor

1y 11m

Avg Prosecution

6 currently pending

Career history

870

Total Applications

across all art units

Statute-Specific Performance

§101

3.7%

-36.3% vs TC avg

§103

45.2%

+5.2% vs TC avg

§102

29.6%

-10.4% vs TC avg

§112

9.8%

-30.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 865 resolved cases

Office Action

§102

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-10, 12-21, 23 & 24 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ben-Artzi et al. (US 2009/0313613).
In regard to claim 1, Ben-Artzi et al. discloses a non-transitory computer-readable medium (see paragraph 0010) including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for creating and using tokens representing portions of programming code (see abstract), the operations comprising: identifying a body of programming code (paragraph 0022: “source programming language code”); associating a plurality of tokens with respective portions of the body of programming code, wherein the associating comprises determining at least one canonical representation of at least one of the respective portions of the body of programming code (paragraph 0024: “Tokenizer 106 transforms streams of characters from the source programming language code into a list of tokens.”; see also paragraph 0033); configuring model input data for a code language processing model, wherein the model input data comprises the plurality of tokens including the at least one canonical representation (paragraph 0035: “In an embodiment of the invention, list of tokens 406 comprises columns of token list 408a and token type list 408b. Token list 408a comprises the tokens generated from input stream 402 and the token type list 408b comprises the description for the type of tokens. Tokens in list of tokens 406 are categorized block of text. Referring to list of tokens 406, the token `Sum` in tokens 408a is defined by tokenizer 106 as an `identifier` in type 408b. Similarly, the complete programming code of the source programming language can be processed to form a list of tokens. Subsequently, list of tokens 406 is processed by parser 108 to generate structured information.”); and analyzing at least a part of the body of programming code using the code language processing model influenced by the model input data (paragraph 0050: “the list of token is analyzed syntactically by parser 108 to generate a grammatical data structure, at step 804. In an embodiment of the invention, the grammatical data structure is a hierarchical data structure and is referred to as an Abstract Syntax Tree (AST). Thereafter, at step 806, the AST is processed by generator 110 to generate a document object model. Document object model is a simplified grammatical data structure in a hierarchical data structure format. Subsequently, the document object model is processed by analyzer 112 to generate a target list of tokens. The target list of tokens is thereafter processed by analyzer 112 to generate the target programming language code, at step 808”).
In regard to claim 2, Ben-Artzi et al. discloses that determining the at least one canonical representation comprises determining the at least one canonical representation from among a plurality of canonical representations, each of the canonical representations representing multiple programming code elements (see paragraph 0035).
In regard to claim 3, Ben-Artzi et al. discloses that the multiple programming code elements are associated with different programming languages (see paragraph 0023).
In regard to claim 4, Ben-Artzi et al. discloses that the multiple programming code elements are associated with different bodies of programming code (see paragraph 0035).
In regard to claim 5, Ben-Artzi et al. discloses that associations between the multiple programming code elements and the canonical representations are determined using the code language processing model (see paragraph 0035).
In regard to claim 6, Ben-Artzi et al. discloses that the associations between the multiple programming code elements and the canonical representations are determined by applying the code language processing model to the different bodies of programming code (see paragraph 0035).
In regard to claim 7, Ben-Artzi et al. discloses that the at least one canonical representation represents different code elements with a same functionality (see paragraph 0046).
In regard to claim 8, Ben-Artzi et al. discloses that the at least one canonical representation represents different code elements with functionalities within a similarity threshold range (see paragraphs 0037-0038)
In regard to claim 9, Ben-Artzi et al. discloses that the operations further comprise identifying a portion of the body of programming code for token designation (see paragraph 0036).
In regard to claim 10, Ben-Artzi et al. discloses that the operations further comprise: determining functionality of the identified portion; and based on the functionality, designating a new token for association with the identified portion (suggested in paragraph 0036).
Claims 12-21 has similar limitations as claims 1-10 and are therefore rejected on the same grounds.
In regard to claim 23, Ben-Artzi et al. discloses a non-transitory computer-readable medium (see paragraph 0010) including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for creating and using tokens representing portions of programming code (see abstract), the operations comprising: identifying a body of programming code (paragraph 0022: “source programming language code”); associating a plurality of tokens with respective portions of the body of programming code to generate a token-based representation of the body of programming code, wherein the associating comprises determining at least one canonical representation of at least one of the respective portions of the body of programming code (paragraph 0024: “Tokenizer 106 transforms streams of characters from the source programming language code into a list of tokens.”; see also paragraph 0033); providing the token-based representation of the body of programming code to an emulator, the emulator being configured to interpret token-based representations (see paragraphs 0025 & 0026); and receiving, from the emulator, an emulation result (see paragraphs 0025 & 0026).
In regard to claim 24, Ben-Artzi et al. discloses that the emulator is not configured to interpret assembly language (see paragraphs 0004 & 0005).
Allowable Subject Matter
Claims 11 & 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon (see attached PTO-892 form) is considered pertinent to applicant's disclosure.
Gosling (US 5,367,685) discloses a compiler comprising a lexical analyzer and parser, an intermediate representation builder, a semantic analyzer, and a code generator, wherein these elements are sequentially coupled to each other, and together, they transform a program source code into tokenized statements, intermediate representations, annotated intermediate representations, and ultimately intermediate form code with data references made on a symbolic basis.
Nackman et al. (US 6,182,281) discloses a computer-implemented method for compiling a C++ source code program in an enhanced compiler effecting lexical analysis to tokenize the source code program, parsing and semantic analysis to produce an intermediate representation of the source code program, comprising the steps of: parsing the tokenized source code program in any order with respect to declarations in the program through multiple parsing passes, each pass accumulating information to parse the declarations in the source code program for which all identifiers are unknown, from program definitions, wherein the multiple parsing passes comprise an initial pass that parses only type declarations, a second pass that parses types of functions and variables, and a third pass that parses variable initializers and function bodies.
Ota (US 7,657,878) discloses a compile apparatus for generating object code from the application program comprising a lexical analyzer configured to divide an operation described in a source code of the application program into tokens, a syntax analyzer configured to analyze whether or not the tokens conform to grammatical rules.
Kraft (US 2013/0212563) discloses a symbol database including a tokenized representation of a program code which is a higher-level representation where the characters of the program code text have been converted into lexemes (also known as tokens), according to the grammar of the programming language at hand.
Olson et al. (US 2021/0056211) discloses obtaining source code from a client codebase, wherein the client codebase is a complete or an incomplete body of the source code for a given software program or an application; and using a machine learning (ML) model to perform a ML based analysis on an abstract syntax tree (AST) for detecting a first security vulnerability over a static source code.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Peter Vincent Agustin whose telephone number is (571) 272-7567.  The examiner can normally be reached on Monday - Thursday 8:30 am - 6:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Steven Lim can be reached on 571-270-1210.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Peter Vincent Agustin/
Primary Examiner, Art Unit 2688

Read full office action

Prosecution Timeline

Jun 20, 2024

Application Filed

Dec 31, 2025

Non-Final Rejection mailed — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

19/170,063

Patent 12626721

MAGNETIC TAPE, MAGNETIC TAPE CARTRIDGE, SERVO PATTERN RECORDING DEVICE, MAGNETIC TAPE DRIVE, MAGNETIC TAPE SYSTEM, DETECTION DEVICE, INSPECTION DEVICE, SERVO PATTERN RECORDING METHOD, MANUFACTURING METHOD OF MAGNETIC TAPE, DETECTION METHOD, AND INSPECTION METHOD

1y 1m to grant Granted May 12, 2026

18/981,954

Patent 12603106

DISK DEVICE

1y 3m to grant Granted Apr 14, 2026

18/884,444

Patent 12597440

MAGNETIC DISK DEVICE

1y 6m to grant Granted Apr 07, 2026

18/753,626

Patent 12586603

DISK DEVICE

1y 9m to grant Granted Mar 24, 2026

18/865,779

Patent 12579998

METHOD FOR STORING AND ACQUIRING INFORMATION USING FLUORESCENCE DEFECTS IN WIDE BANDGAP MATERIALS

1y 4m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

84%

Grant Probability

96%

With Interview (+12.6%)

1y 11m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 865 resolved cases by this examiner. Grant probability derived from career allowance rate.