Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is in response to the communication filed on 02/28/2024.
Claims 1-20 are pending and addressed in the Action.
Specification
The blanks in the text [0001] should be filled with available data.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of copending Application No. US18/589,818 (reference application: US20250272089A ). Although the claims at issue are not identical, they are not patentably distinct from each other because:
Current Application’s Claims
Application No. US18/589,818 Claims
1. A method, comprising:
processing a binary code by using a file encoder model to obtain a file embedding vector; and
selecting one or more natural language description samples based on the file embedding vector and a distance function.
2. The method of claim 1, wherein the one or more natural language description samples are selected based on a text embedding vector of the one or more natural language description samples, wherein the text embedding vector and the file embedding vector have the same dimension.
3. The method of claim 2, wherein the text embedding vector is generated by using a text
language model.
4. The method of claim 1, wherein the file encoder model is trained based on a training set of description sample pairs, wherein each description sample pair in the training set includes a description text sample and a binary code sample.
5. The method of claim 1, wherein the file encoder model comprises a pretrained
embedding model in sequence with a translator model.
6. The method of claim 1, wherein the one or more natural language description samples are
selected by using a k-nearest neighbors algorithm (k-NN).
7. The method of claim 1, further comprising: generating a text description of the binary
code based on the one or more natural language description samples by using a large language
model (LLM).
1. A method, comprising:
processing a binary code by using a file encoder model to obtain a file embedding vector; and
selecting one or more source code samples based on the file embedding vector and a distance function.
2. The method of claim 1, wherein the one or more source code samples are selected based on a source code embedding vector of the one or more source code samples, wherein the source code embedding vector and the file embedding vector have a same dimension.
3. The method of claim 2, wherein the source code embedding vector is generated by using a text language model.
4. The method of claim 1, wherein the file encoder model is trained based on a training set of source code sample pairs, wherein each source code sample pair in the training set includes a source code training sample and a binary code sample.
5. The method of claim 1, wherein the file encoder model comprises a pretrained embedding model and a translator model.
6. The method of claim 1, wherein the one or more source code samples are selected by using a k-nearest neighbors algorithm (k-NN).
7. The method of claim 1, further comprising: generating a text description of the binary code based on the one or more source code samples by using a large language model (LLM).
Claims 8-14: The claims are directed to a computer-readable medium having the same functionality recited in the method of Claims 1-7. Claims 8-18 are not patentably distinct from Claims 8-14 of the copending Application each other by the same comparison above.
Claims 15-20: The claims are directed to a computer-implemented system having the same functionality recited in the method of Claims 1-6. Claims 15-20 are not patentably distinct from Claims 15-20 of the copending Application each other by the same comparison above.
The current claims selecting one or more natural language description samples and the copending recites selecting one or more source code samples . The samples recited in the twos all are texts. There is no functional structure to make them patentably distinct, but the claims present a single structure of the same invention with inputs are selective.
Therefore, it would be obvious to an ordinary of skills before the effective filing of the applications to select a training dataset, for example, select natural language description in the present Application and select the text of a program, the training model would generate the results accordingly, and thus, the selections are the input choices of a user into the same system.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 8-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claims recited directed to “A computer-readable medium” while the specification does not describe specific about “A computer-readable medium”. In In the specification, [0098], it discloses,
“… one or more modules of computer program instructions encoded on a tangible, non transitory, computer-readable medium for execution by, or to control the operation of, a
computer or computer-implemented system..”.
The claims do not include “non-tangible”. Thus, the claims are not incorporated with the specification and the BRI of the claims drawn to a computer-readable medium cover both statutory and non-statutory embodiments, i.e. the medium in the claims encompass signal per se, a non-statutory embodiment. See MPEP 2106.03(II). It is suggested that the Claims be amended to recite a “non-transitory” computer-readable medium.Accordingly, Claims 8-14 fail to recite statutory subject matter under 35 USC 101.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over
Shanahan, US PAP Pub. No. US2024/0152749 A1 (EF: 05/27/2021), in view of Liu et al., “Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection”, 2023, Electronics, 18 pages.
As per Claim 1: Shanahan discloses the limitation in bold as below:
1. A method, comprising:
processing [a binary code] by using a file encoder model to obtain a file embedding vector; and
(See Figure 1, where Figure 1 is processing a “training data item” [See ‘#310’in Figure 3] using model ‘#100” having ‘Encoder #115’ [i.e. a file encoder model] to obtain ‘Encoding #120’ [i.e. a file embedding vector] .
See [0013] “..neural networks may be associated with a respective key. The method may further comprise determining a similarity between the encoding and each respective key; and selecting a subset of neural networks may be based upon the determined similarity.”, [0014 ] “The respective keys may be generated by sampling a probability distribution based upon the embedding space represented by the encoder. That is, the keys may be vectors sampled from the embedding space represented by the encoder. In one example, the embedding space has dimensionality 512. In another example, the embedding space has dimensionality 2048…”, in [0041] “…each of the plurality of neural networks may be associated with a respective key and the selection of a subset of neural networks 130….
… The respective keys and the encoding 120 may therefore reside in the same latent embedding space.”
Thus, ‘embedding space represented by the encoder’ appears being Encoding #120, ‘Keys” are Neural Network 1….N; are embedding vectors, and herein ‘Encoding # 120’ reads on ‘file embedding vector’ since it is obtained from Encoder 115 in Figure 1.
Furthermore, see in [0049] “he encoder may be represented as: z=f(x)∈ Rd where x is the input data item provided to the encoder which implements function f to provide the encoding z.”: It shows z is a vector in d dimension).
selecting one or more natural language description samples based on the file embedding vector and a distance function.
(See Figure 3, ‘#315”, Select a subset of Neural Network, and see in [0083] “…the neural
network-based system is a sequence representing a spoken utterance, the output generated by the neural network may be a score for each of a set of pieces of text… As another example, if the input to the neural network-based system is a sequence representing a spoken utterance, the output generated by the neural network-based system can identify the natural language in which the utterance was spoken. …” [compared to the limitations in light of text [0081] in the specification in text [0081] ]. Thus, each of Neural Network 1…N in # 125 in Figure 1 reads on ‘one or more natural language description samples’ , and a ‘Neural Network i’ in #125 in Figure 1 is selected based on the Encoding #120 ‘file embedding vector’.
See [0013] “The similarity may be based upon a cosine distance between the encoding and the respective key. Thus, the k-nearest neighbors of keys to the encoding may be selected”. And [0052] “any other suitable distance metric may be used such as Euclidean distance”)
In the processing using the encoder model as in #100 of Figure 1,
Shanahan does not explicitly mention “Binary Code”.
Liu discloses the deficit “Binary code” (Liu: in p. 6, Figure 2, the yellow file is a binary code file that is processed through Transformers, where Transformer model architecture as shown in Figure 1, p. 4, is an encoder model. See in p. 5, sec. 3. Codeformer “a normalized analysis of binary assembly code can determine that the curves of the instruction distribution conform to Zipf’s law, similar to natural language”. Moreover, in sec. Introduction, p. 1 “…to determine the similarity of binary code because a large amount of program semantics will be lost during the compilation.” )
Thus Liu with the encoder is used to detect Binary Code Similarity, where with Shanahan, the Encoder model is to detect the similarity of data item entering to the encoder for target data, where data item would be selected as any type of data, due to the loss of data.
Therefore, it would be obvious to an ordinary of skills in the art before the effective filing of the application to combine the training data item to an encoder in Shanahan, to the teaching binary code for the encoder model of Liu. The combination would yield predictable results because of the similarity detection purpose, where one would choose a machine leaning model that is provided with encoder having similarity detections, and thus, the use of encoder model would be the best choice for any type of data in similarity detection.
As per Claim 2: Shanahan and combining Liu, where
Shanahan further discloses,
2. The method of claim 1, wherein the one or more natural language description samples are selected based on a text embedding vector of the one or more natural language description samples, (Referred to “keys’ / or Neural networks I in #125. See [0014] “The respective keys may be generated by sampling a probability distribution based upon the embedding space represented by the encoder. That is, the keys may be vectors sampled from the embedding space represented the encoder.”) wherein the text embedding vector (Referred to ‘keys’) and the file embedding vector (Referred to Encoding #120, : z=f(x)∈ Rd) have the same dimension (in [0041] “…The respective keys and the encoding 120 may therefore reside in the same latent embedding space.”. See [0050] “ For example, the memory 125 may be represented as M=(Mkey, Mcfier) with Mkey ∈ Rnxd (i.e. n keys, each with dimensionality d)” : i.e. vector ‘keys’ and vector ‘z’-Encoding 120- have the same dimension).
As per Claim 3: Shanahan and combining Liu, where
Shanahan further discloses,
3. The method of claim 2, wherein the text embedding vector is generated by using a text language model.
(Referred Keys as text embedding vectors, where see in [0083] “if the input to the neural network-based system is a sequence representing a spoken utterance, the output generated by the neural network-based system can identify the natural language in which the utterance was spoken. Thus in general the network input may comprise audio data for performing an audio processing task and the network output may provide a result of the audio processing task e.g. to identify a word or phrase or to convert the audio to text.” : Thus Neural network bases system using text language model)
As per Claim 4: Shanahan and combining Liu, where
Shanahan further discloses the limitation in bold below:
4. The method of claim 1, wherein the file encoder model (i.e. Encoder #115)
is trained based on a training set of description sample pairs (i.e. Training data 104, that generating Encoding #120 based on a selection of subset network 130. See Figure 4, #420, process the Encoding, and [0051] “The memory 125 may be considered to comprise n pairs of keys and neural networks. For example, the memory 125 may be represented as M=(Mkey' , Mcfier)” ),
wherein each description sample pair (used in #130, and in [0051] “the memory 125 may be represented as M=(Mkey , Mcfier)” )
in the training set includes a description text sample (i.e. a selection of Neural Network i, and See in [0051]: Referred to ‘Mkey’ corresponds to n-Keys ‘n pairs of keys’ )
and [a binary code] sample (The sample reads on ‘Mcfier’ . “(i.e. n pairs of corresponding classifier weights and biases).”. See Figure 4, #430, it shows the output 135 in Figure 1 is of a classification of an aspect of the ‘Data Item’. Thus, Mcfier is a classifier has the aspect of the input item x from the Encoder, where z=f(x)∈ Rd )
Shanahan does not make specific on “a binary code”.
Liu further discloses “a binary code” (See Liu, in abstract: “Binary code similarity detection is used to calculate the code similarity of a pair of binary functions or files, through a certain calculation method and judgment method. It is a fundamental task in the field of computer binary security.”. And See in p. 9, the calculation Ew(F1, F2), and sec. 3.4.2 Loss Function with function pairs in according to the binary code similarity detection.
Therefore, it would be obvious to an ordinary of skills in the art before the effective filing of the application to modify the sample pair as used and selected in the Neural Networks of Shanahan with binary code paired with another function in the similarity detection of Liu; the modification would be conforming for the requirement to an input item in the form of binary because it must require a similarity when selected data is carried out for detection.
As per Claim 5: Shanahan and combining Liu, where
Shanahan further discloses,
5. The method of claim 1, wherein the file encoder model comprises a pretrained embedding model in sequence with a translator model (See [0009] “The encoder may be pre-trained using a dataset different to the dataset that the training data item belongs to”, and [0084] “if the input to the neural network-based system is a sequence of text in one language, the output generated by the neural network may be a score for each of a set of pieces of text in another language, with each score representing an estimated likelihood that the piece of text in the other language is a proper translation of
the input text into the other language”).
As per Claim 6: Shanahan and combining Liu, where
Shanahan further discloses,
6. The method of claim 1, wherein the one or more natural language description samples are selected by using a k-nearest neighbors algorithm (k-NN) (See [0013] “The similarity may be based upon a cosine distance between the encoding and the respective key. Thus, the k-nearest neighbors of keys to the encoding may be selected.”).
As per Claim 7: Shanahan and combining Liu, where
Shanahan further discloses the limitation in bold below:
7. The method of claim 1, further comprising: generating a text description [of the binary code] based on the one or more natural language description samples by using a large language model (LLM) (Figure 1, The Neural Network Training system #100, where neural network is known as LLM, The Neural Network system #100 generating Output #135 of Training data item 105 based on one or more “Neural Network 1..n ” selected in memory #125, the “Neural Network 1..n ” reads on one natural language description samples as given in the rationales above in claim 1).
As per claims 8-14: The Claims are directed to a computer-readable medium which recites the limitations having functionality corresponding to the method of Claims 1-7 above. The claims are rejected with the same rationales addressed in Claims 1-7.
As per claims 15-20: The Claims are directed to a computer-implemented system which recites the limitations having functionality corresponding to the method of Claims 1-6 above. The claims are rejected with the same rationales addressed in Claims 1-6.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ted T Vo whose telephone number is (571)272-3706. The examiner can normally be reached 8am-4:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Y Mui can be reached at (571) 272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
TTV
January 8, 2026
/Ted T. Vo/
Primary Examiner, Art Unit 2191