Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is in response to the claimed listing filed on 02/13/2026.
Claims 1, 3-10, 12-19 are pending.
Response to Arguments
This is in response to the Argument Remarks filed on 02/13/2026, the amendment necessitated the new ground of rejection presenting in the Action. Therefore, Applicant submissions in the remarks are moot in view of the added prior art in the Action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-4, 7-10, 12-13, 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Taviss et al., “Asm2Seq: Explainable Assembly Code Functional Summary Generation for Reverse Engineering and Vulnerability Analysis”, May 18, 2023, 25 pages (Applicant’s Submitted prior art #2 in IDS, receipt date 11/01/2024, and hereinafter: Taviss), and in view of Taviss, “Asm2Seq: Explainable Assembly Code Functional Summary Generation”, 2021, Master Thesis, Queen University, Canada, 103 pages (hereinafter: Taviss Thesis), and in view of Kusupati et al., “Natural Language to Code Using Transformers”, 2022, arXiv, 7 pages (Applicant’s Submitted prior art #7 in IDS, receipt date 09/19/2023).
As per Claim 1: Taviss discloses the limitations in bold below:
1. (Currently Amended) A method for creating a model to add a code summary to functions of assembly language code, the method comprising:
tokenizing an assembly code dataset, wherein the assembly code dataset comprises [the functions of assembly language code and comment pairings];
See p. 2, within second bullet of the Introduction sec., ‘We created the first labelled datasets for binary code summarization…..The creation of the assembly-description pairs will be very beneficial for future work on developing an AI model that can truly understand the semantics of assembly code. The complete datasets are available at….’. See p.3, within sec. 2.2, ‘A more simplified approach models code summarization as a machine translation problem, where the input is represented through a sequence of tokens, and the output is the natural language equivalent here the input is represented through a sequence of tokens’)
But, in the creation of dataset/the input as a sequence of tokens, Taviss mentions that a complete dataset is an assembly-description pairs will be very beneficial, but
does not explicitly mention,
wherein the assembly code dataset comprises the functions of assembly language code and comment pairings]
Taviss thesis discloses, “the assembly code dataset comprises the functions of assembly language code and comment pairings” (See Taviss thesis, p. 5, lines 1-6 “Specially, we compile the Juliet Test Suite and the NDSS18 vulnerable source code dataset and link them with their corresponding vulnerability descriptions. In total, we generated 97,492 unique pairs. The
creation of the assembly-description pairs will be very beneficial to future work on developing an AI model that can truly understand the semantics of assembly code”. In p. 27, within sec. 3.1.2, Code Summarization and Generation, second para., “The comments included throughout source code are often a good means for producing an accurate summary of a codes purpose. In this regard, text summarization techniques are useful as comments are generally written in natural language. Due to potentially a multitude of reasons, software developers and programmers do not always
include commenting in their code”
P. 56, last four lines, “The assembly instructions associated with one source code file are combined into a string of tokens. This string of tokens is related to the description extracted from the source code file used to create these assembly instructions.”.
Thus, the Taviss thesis suggested that comments in the assembly code/subroutine/functions in the program would be helpful for summarization, and the suggestion explains assembly-description pair is another term for assembly language code and comments.
Therefore, it would be obvious to an ordinary of skills in the art before the effective filing of the application to combine the tokenized assembly code description pair of Taviss with the code and comment paring in the token dataset in Taviss thesis as a necessity for model understanding and it would be helpful for code summarization.
Claim continuing recites where Taviss, further discloses the limitations in bold below:
inputting the tokenized assembly code dataset to a pre-trained transformer-based model,
the pre-trained transformer-based model [ having an architecture comprising
self-attention layers];
(See Taviss above “created the first labelled datasets, and ” p. 20, sec. 10, ‘We focus on qualitatively evaluating the summary generated from the previously trained model from Juliet and NDSS dataset on both in-sample and out-of-sample CWE categories.’)
using an encoder to create fixed length embeddings;
(See Taviss, p. 5, within sec 3.1 Encoder, ‘An encoder is a neural network that processes the variable-length input sequences to a fixed-length vector’)
and using a decoder on the fixed length embeddings to generate the code summary.
(See Taviss, p. 6, within sec 3.2 Decoder ‘A decoder maps the fixed-length vectors back into variable-length sequences’; and see Fig. 1, Asm2Seq).
Taviss and combing Taviss Thesis do not explicitly disclose the limitation in italics below:
the pre-trained transformer-based model [ having an architecture comprising self-attention layers];
Kusupati discloses “the pre-trained transformer-based model having an architecture comprising self-attention layers (See Figure 1, in p.2 and this transformer model includes “Pre-train” in p. 5, sec. 3.4, “• FINETUNE: We first pre-train the transformer using the mined data and then finetune the model using annotated data alone.”.
See in p. 2, Figure 1 shows layers of “Multi-Head Attention”, and in sec. 2.1: (left col.) “A transformer is similar to many sequence to sequence models in the sense that it contains an encoder and decoder to compress the sentence to an encoding and further generate each token conditioned on previous the previous tokens.” , and ((right col.) “In addition to this generic setup of self-attention and fully connected network, the decoder layer contains another attention in between them, which computes attention specific to each output word over the input encodings from the output of the encoder stack.”)
With Self-attention layers in the pre-trained transformer of Kusipati, it helps to tokens interacting together and resolving ambiguity for training.
Therefore, it would be obvious to an ordinary of skills in the art before the effective filing of the application to combine
the teaching tokenized assembly code description pair of Taviss with the code and comment paring in the token dataset in Taviss Thesis, and further with the teaching the self-attention layers for positioning information of long token sequence of Kusupati.
The combination would yield predictable results because with the input of self-attention layers in the transformer model, it would help the code summarization in binding the information for resolving ambiguity and keeping the information for not being lost, and thus would improve the performance of code summarization.
As per Claim 3: : Taviss and Taviss thesis and further combining Kusupati, where
Taviss discloses the limitation in bold as below:
3. (Previously presented) The method of claim 1, wherein the dataset is created by:
retrieving source code [with comment pairings]; compiling the source code to create a binary output; (Taviss: p. 2, Fig. 1, and in p. 4, item 4)
disassembling the binary output to assembly language code; and
correlating functions within the assembly language code and the source code to associate the
comment pairings with the assembly language code.
(Taviss: p. 4, sec. 2.3, using disassembly tool IDA Pro, etc., and “A similar study using RNNs predicts function types from disassembled binary code functions..” )
Taviss does not mention “comment pairings” (See rationales addressed in claim 1)
It would be obvious to an ordinary of skills in the art before the effective filing of the application to combine the tokenized assembly code description pair of Taviss and further in view of Kusupati within the code and comment paring in the token dataset in Taviss Thesis as the necessity for code summarization.
As per Claim 4: Taviss and combing Taviss thesis, and combining Kusupati, where
Taviss further discloses,
4. (Previously presented) The method of claim 1, further comprising training the pre-trained transformer-based model with a subset of the assembly code dataset and testing the model using a further subset of the of the assembly code dataset.
(Taviss, in p. 14: sec. 8.2, Experiments, “The validation set is the sample of data used to evaluate the model on the training dataset and is frequently used to fine-tune the hyperparameters. The testing set is the sample of data used to evaluate the final model after training is complete.”: The term fine-tune is the training step after pre-training. And see p.9, within sec. 5.1, ‘Rather than encoding a whole input sequence into a single fixed-length vector, attention models encode input sequences into a series of vectors and choose a subset of these vectors as the model decodes and predicts an output.’)
As per Claim 7: Taviss and combing Taviss thesis, and combining Kusupati, where
Taviss further discloses,
7. The method of claim 1, wherein the fixed length embeddings are further created using padding and truncation.
(Taviss, p. 13, within sec. 8.1.2, ‘Sequences are truncated or padded with zeroes until they have the appropriate length to ensure all samples are of the same length.’)
As per Claim 8: Taviss and combing Taviss thesis, and combining Kusupati, where
Taviss further discloses,
8. The method of claim 7, wherein the fixed length is optimized for accuracy and model training time.
(Taviss, p. 10, entire sec. 6: Optimization)
As per Claim 9: Taviss and combing Taviss thesis, and combining Kusupati, where
Taviss further discloses,
9. The method of claim 1, wherein each of the fixed length embeddings is a contextual vector representation of an input token.
(Taviss, p. 5-6, sec. 3.1, Encoder, ‘An encoder is a neural network that processes the variable-length input sequences to a fixed-length vector ’; p. 6, sec. 3.2, Decoder ‘A decoder maps the fixed-length vectors back into variable-length sequences’)
As per Claims 10, 12-13, 16-18:
The claims 10, 12-13, 16-18 recite a device, where the claims recite the claimed limitations to perform the method of claims 1, 3-4, 7-9. The rejection of the claims would be with the same rationales as addressed in the rejection of the method claims 1, 3-4, 7-9.
As per Claim 19:
The claim 19 recites a non-transitory computer readable medium, where the claim recites the claimed limitations to perform the method of claim 1. The rejection of the claim would be with the same rationales as addressed in the rejection of the method claim 1.
Claims 5-6, 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over as being unpatentable over Taviss, “Asm2Seq: Explainable Assembly Code Functional Summary Generation for Reverse Engineering and Vulnerability Analysis”, May 18, 2023, 25 pages (Applicant’s Submitted prior art #2 in IDS, receipt date 11/01/2024, and hereinafter: Taviss), and in view of Taviss, “Asm2Seq: Explainable Assembly Code Functional Summary Generation”, 2021, Master Thesis, Queen University, Canada, 103 pages (hereinafter: Taviss thesis) and in view of Kusupati et al., “Natural Language to Code Using Transformers”, 2022, arXiv, 7 pages (Applicant’s Submitted prior art #7 in IDS, receipt date 09/19/2023), and further in view of Feng et al., “CodeBERT: A Pre-Trained Model for Programming and Natural Languages”, 2020, 12 pages (Applicant’s Submitted prior art #38 in IDS, receipt date 09/19/2023).
As per Claim 5: Regarding,
5. The method of claim 1, wherein the pre-trained transformer-based model is a CodeBERT model.
As per above limitation, Taviss, and in view of Taviss thesis , and in view of Kusupati do not explicitly mention the model is “a CodeBERT”.
Feng disclose a pre-trained transformer-based model is a CodeBERT (See Feng, Abstract, see sec. 3 CodeBERT, in page 3). The CodeBERT is a pre-trained model available in machine-learning used as a bimodal performed both on programming language and Natural Language.
Therefore, it would be obvious to an ordinary of skills before the effective filing of the application to include and to utilize the pre-trained CodeBERT in Feng, with the pre-trained model of Taviss et al., and in view of Taviss thesis and in view of Kusupati, for conforming to the model availability.
As per Claim 6: Regarding,
6. The method of claim 1, wherein the tokenizing is performed by a WordPiece tokenizer.
As per above limitation, Taviss shows the performance of assembly code as it tokenized in words then mapped into numbers (p. 12, Fig. 8), but
Taviss and in view of Taviss thesis , and in view of Kusupati do not explicitly mention do not explicitly mention the model is performed by “a WordPiece tokenizer”.
Feng disclose tokenized is performed by “a WordPiece tokenizer” (See in page 3, within sec. 3.2, ‘Following the standard way of processing text in Transformer, we regard a natural language text as a sequence of words, and split it as WordPiece (Wu et al., 2016). We regard a piece of code as a sequence of tokens.’) Tokenizing dataset with workpiece is standardized with the language that used with Latin characters, and used with bimodal.
Therefore, it would be obvious to an ordinary of skills before the effective filing of the application to include tokenizing in Taviss and in view of Taviss thesis and in view of Kusupati, with the WordPiece tokenizer in Feng for conforming to the standard and the availability.
As per Claims 14-15:
The claims 14-15 recite a device, where the claims recite the claimed limitations to perform the method of claims 5-6. The rejection of the claims would be with the rationales addressed in the rejection of the method claims 5-6.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ted T Vo whose telephone number is (571)272-3706. The examiner can normally be reached 8am-4:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Y Mui can be reached on (571) 272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
TTV
March 21, 2026
/Ted T. Vo/
Primary Examiner, Art Unit 2191