Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
The instant application having Application No. 18/896,763 is presented for examination by the examiner. Claims 1-20 have been examined and are currently pending.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “a large language model (LLM) orchestrator (LO) configured to place calls to the SOT, fetch the tool outputs from MRS” and “LLM configured to receive the natural language prompt from the LO and generate instructions to perform an investigatory step” as recited in claim 12. The term “LLM orchestrator” and “LLM” are interpreted as generic placeholder that do not recite sufficient structure for performing the claimed function and are coupled with the functional language through the phrase “configured to”. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 10 is rejected under 35 U.S.C. § 112(b) or 35 U.S.C. § 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. § 112, the applicant), regards as the invention.
Regarding Claim 10: Claim 10 recites the limitation "wherein accessing the function of the binary comprises classifying the binary as malware or not." This limitation renders the claim indefinite because it is unclear how the term "accessing the function of the binary" relates to the parent claim's recitation of "analyzing the function of the binary" in Claim 1, step (i). Specifically, "accessing the function" could reasonably be interpreted as retrieving a pre-determined function value from the MRS or obtaining the function through the analysis steps (d)-(h) recited in Claim 1, which is the meaning suggested by the parent claim's use of "analyzing." Because Claim 10 does not specify which interpretation is intended, and the specification does not provide sufficient clarity to resolve this ambiguity, a person having ordinary skill in the art cannot determine with reasonable certainty whether Claim 10 is further limiting Claim 1's analyzing step to a specific type of analysis (malware classification), or reciting a distinct operation (accessing pre-determined data) that operates outside the scope of Claim 1's iterative analysis. This ambiguity creates uncertainty regarding the scope and boundaries of Claim 10. The claim does not reasonably apprise those skilled in the art of the scope of the invention. See Nautilus, Inc. v. Biosig Instruments, Inc., 504 F.3d 1024 (Fed. Cir. 2013). For purposes of examination, the limitation is interpreted as the LLM analyzing the function of the binary through the iterative steps (d)-(h) of Claim 1, with such analysis comprising the specific act of classifying the binary as malware or not.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5-7 and 9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Conway (US 20250013753 A1), and in view of Siracusano (US 20240411994 A1).
Regarding Claim 1
Conway discloses:
A method for analyzing a function of a binary comprising:
(a) receiving the binary (Conway: ¶[0078]: “receiving an input dataset comprising at least an unknown binary”; see also ¶[0035]-[0036]: input data may include executable applications, malware, firmware, and other binary files obtained for analysis.);
(b) parsing the binary with one or more of a suite of tools (SOT) to create tool outputs (Conway: ¶[0023]: “pipelined reverse engineering techniques may be applied to the binary”; “a pipeline may begin by applying a disassembler to the binary to obtain disassembly data from the binary”; each stage of the pipeline is configured to “produce data or extract information from the binary or intermediate representations of the binary”).;
(c) initializing a memory representation system (MRS) with the tool outputs (Conway: ¶[0046]: data collectors write their outputs to a database, where a primary datastore stores data generated by processing the input binary; disassembly data stored in the primary datastore are used by subsequent data collectors in the pipeline. ¶[0048]: representations such as function call graphs and binary images are stored in the primary datastore for subsequent use by other analysis components.);
(f) calling the tool of the SOT to implement the investigatory step (Conway: ¶[0023]:“applying a disassembler to the binary to obtain disassembly data”; ¶[0068]: processes may utilize tools such as Binwalk and Radare2 to analyze the binary input).;
(g) modifying contents of the MRS with a subsequent tool output received from the tool (Conway: ¶[0046]: data collectors write their outputs to a database, and disassembly data stored in the primary datastore is used by a subsequent data collector to generate a function call graph, which is also stored in the primary datastore; ¶[0048]: binary image representations are stored to the primary datastore for subsequent use by another data collector or information extractor; ¶[0050]: generated binary images and associated data are stored to the primary datastore; ¶[0065]: data obtained by data collectors and information extracted by information extractors are stored in the primary datastore.).;
Conway teaches a pipeline-based system that analyzes an unknown binary using multiple reverse-engineering tools whose outputs are stored in a datastore and further processed by additional analysis modules to detect software vulnerabilities or malicious code. However, Conway does not explicitly disclose using a LLM that receives analysis outputs and generates prompts or queries to reason about the information, determine investigatory steps, and iteratively repeat the reasoning process. On the other hand, Siracusano teaches an LLM agent that receives parsed information and determines queries to send to an LLM (¶0070). Siracusano further teaches that the LLM agent forms prompts using portions of parsed information and sends the prompts to the LLM (¶0071-0072). Siracusano also teaches that the LLM agent performs several queries to the LLM and receives the outputs from the LLM (¶0073-0075). Additionally, Siracusano teaches iteratively modifying prompt templates and repeating the process based on feedback or insufficient output (¶0077-0080). Siracusano further teaches that LLMs operate based on prompts and that prompt pipelines can be used to automatically process information through interaction with the LLM (¶0090-0091; ¶0117; ¶0120; ¶0123). Conway already performs automated binary analysis using multiple tools and stores the outputs of those tools in a datastore for further processing (¶0053-0058). Siracusano teaches using an LLM agent that reasons over parsed information through prompts and iterative queries.
It would have been obvious to use the LLM prompting and reasoning techniques of Siracusano to analyze and guide the processing of the tool outputs produced by Conway. Both references address automated cybersecurity analysis and reducing manual analyst effort by automating interpretation of complex security data. Applying the LLM prompting framework of Siracusano to the stored analysis outputs of Conway would allow automated reasoning about those outputs and determination of additional analysis steps. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using an LLM prompt reasoning system to analyze and guide an existing automated binary analysis pipeline would predictably improve automation of analysis and interpretation of the results produced by the analysis tools.
Regarding Claim 2
Conway discloses:
The method of claim 1, wherein the SOT comprises software reverse engineering tools including at least one of a decompiler, a disassembler, a string deobfuscator, an unpacker, a control flow extractor, or a memory analysis tool (Conway: ¶[0023]: pipelined reverse engineering techniques are applied to the binary, where a pipeline may begin by applying a disassembler to the binary to obtain disassembly data; ¶[0068]: reverse engineering tools such as Radare2 perform disassembly and debugging of code extracted from the binary; ¶[0073]: reverse engineering tools including disassemblers, debuggers, static analysis tools, and dynamic analysis tools are used to extract and analyze data from the binary.).
Regarding Claim 3
Conway teaches storing and retrieving outputs generated by reverse-engineering analysis tools within a pipeline architecture and using those outputs in subsequent analysis stages. For example, Conway teaches that outputs produced by data collectors analyzing a binary are written to a primary datastore and used by subsequent collectors or extractors within the pipeline to perform further analysis of the binary. However, Conway is silent in explicitly teaching generating a natural language prompt by translating the tool outputs into natural language text and combining that text with descriptions of individual tools and requirements for using those tools. On the other hand, Siracusano teaches generating natural language prompts for LLMs using processed cybersecurity information and instructions that guide the reasoning performed by the LLM. For example, Siracusano teaches that prompts are natural language inputs provided to an LLM to generate responses (¶0091), and that users can program the behavior of an LLM using natural language instructions contained in prompts (¶0190). Siracusano further teaches defining pipelines of prompts that process cybersecurity information and specify which information should be extracted or processed (¶0117; ¶0123). These teachings demonstrate translating processed information into natural language prompt text and combining contextual information with task instructions to guide automated reasoning by the LLM.
It would have been obvious to use the natural-language prompt generation techniques of Siracusano within the automated binary analysis pipeline of Conway so that the outputs of Conway’s analysis tools stored in the datastore could be translated into natural language prompt text and combined with instructions describing available tools and their usage requirements. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using LLM-based prompting to reason about the outputs of an existing automated analysis pipeline would predictably improve automation and efficiency of vulnerability analysis by enabling automated interpretation of intermediate analysis results and guiding subsequent analysis steps.
Regarding Claim 5
Conway teaches storing and retrieving outputs generated by reverse-engineering analysis tools within a pipeline architecture and using those outputs in subsequent analysis stage. However, Conway is silent in explicitly teaching translating the tool outputs to natural language text using context provided by expert knowledge regarding the function and use of individual tools in the suite of tools to generate the natural language text. On the other hand, Siracusano teaches generating natural language prompts and processing cybersecurity information using prompts defined according to instructions provided by domain experts. For example, Siracusano teaches defining a pipeline of prompts to process cybersecurity information based on analyst instructions (¶0117). Siracusano further teaches that prompts can reason about the context of the information according to received instructions (¶0120) and can specify which information should be extracted or processed (¶0123). Additionally, Siracusano teaches that a CTI analyst can provide feedback and instructions that guide the generation or modification of prompts used by the system (¶0124), and that analysts can customize and revise prompts or outputs used by the LLM (¶0193). These teachings demonstrate using expert knowledge and instructions to guide how information is interpreted and translated into natural language prompts for processing by an LLM.
It would have been obvious to use the expert guided prompt generation techniques of Siracusano when translating the analysis outputs generated by Conway’s reverse-engineering tools into natural language prompt text so that the outputs of the tools could be interpreted using context provided by expert knowledge regarding the function and use of the tools. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using expert guided prompt instructions to interpret the outputs of analysis tools in an automated pipeline would predictably improve the automation and accuracy of vulnerability analysis by enabling the system to reason about the meaning and use of the tool outputs when generating prompts for further analysis.
Regarding Claim 6
Conway discloses:
The method of claim 1, wherein calling the tool of the SOT comprises:
providing tool data from the MRS to the tool (Conway ¶[0046]: data collectors write their outputs to a primary datastore, and the stored disassembly data may be used by a subsequent data collector in the pipeline to generate additional analysis such as a function call graph; Conway ¶[0024]: outputs of each tool may be stored to a unified data store for further analysis and insight derivation.).
Conway teaches providing tool data from a memory representation system to analysis tools within a pipeline architecture. However, Conway is silent in explicitly teaching parsing instructions to perform an investigatory step into structured information comprising reasoning for using a tool, identification of the tool, and one or more operands for the tool, and translating the structured information to data to generate a call to the tool. On the other hand, Siracusano teaches parsing information and generating structured representations using prompts and LLM reasoning. For example, Siracusano teaches obtaining and parsing relevant text data and defining a pipeline of prompts to extract structured cybersecurity information (¶0117). Siracusano further teaches that prompts can specify which information to extract and verify extraction (¶0123), and that pipeline outputs can be parsed and exported into structured machine-readable formats (¶0194). These teachings demonstrate parsing instructions into structured representations and translating those representations into data used by subsequent processing steps within a pipeline.
It would have been obvious to use the structured parsing and prompt-driven reasoning techniques of Siracusano within the automated analysis pipeline of Conway so that instructions for additional analysis steps could be parsed into structured information and translated into data used to invoke the appropriate analysis tools. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using structured LLM parsing to determine and invoke tools within an existing automated analysis pipeline would predictably improve automation and efficiency of vulnerability analysis by enabling automated reasoning about the intermediate outputs produced by the analysis tools.
Regarding Claim 7
Conway teaches an automated analysis pipeline in which multiple reverse engineering tools are executed and the outputs of those tools are stored and used by subsequent processing stages. However, Conway is silent in explicitly teaching translating natural language instructions into data using context provided by expert knowledge regarding the function and use of individual tools. On the other hand, Siracusano teaches using prompts and contextual reasoning to guide extraction and processing of cybersecurity information. For example, Siracusano teaches defining pipelines of prompts that reason about the context of information and generate instructions for extracting relevant cybersecurity information (¶0117; ¶0120). Siracusano further teaches that prompts specify which information should be extracted and that structured outputs are generated based on those instructions (¶0123; ¶0194). Siracusano also explains that cybersecurity analysis traditionally requires expert knowledge from analysts to interpret reports and classify attack patterns using domain knowledge such as the MITRE ATT&CK techniques (¶0145). Thus, Siracusano teaches using contextual and expert knowledge to guide automated processing of cybersecurity information.
It would have been obvious to use teachings of Siracusano when translating natural language instructions into data used to invoke analysis tools in the pipeline of Conway. Both references address automating cybersecurity analysis that traditionally required expert analysts. Using expert knowledge encoded in prompts to determine how and when to invoke specific analysis tools in Conway would predictably improve automation and efficiency of vulnerability analysis. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results.
Regarding Claim 9
Conway teaches an automated analysis pipeline in which multiple reverse engineering tools are executed and the outputs of those tools are stored and used by subsequent processing stages. However, Conway is silent in explicitly teaching terminating processing based on reaching a token limit. On the other hand, Siracusano teaches that large language models (LLMs) operate with limits on the size of prompt inputs and outputs, where the total size of input and output may be constrained by a fixed number of tokens (¶0096; ¶0191). These teachings demonstrate that LLM-based processing may be limited by a token capacity that constrains how much information can be processed in a prompt.
It would have been obvious to apply the token-limit constraints of Siracusano to the LLM reasoning process used in the analysis pipeline of Conway so that processing would terminate when the token limit is reached. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using token limits as a termination condition in an automated LLM analysis pipeline would predictably ensure efficient processing and prevent exceeding the processing capacity of the LLM.
Regarding Claim 10
Conway discloses:
The method of claim 1, wherein accessing the function of the binary comprises classifying the binary as malware or not (Conway teaches analyzing a binary and determining whether the binary contains malicious code such as malware (¶0023). Conway further teaches comparing the function call graph of the binary to structures associated with known malware families and classifying the binary based on similarity scores exceeding a threshold (¶0057). Thus Conway teaches classifying the binary as malware or not.).
Regarding Claim 11
Conway teaches storing and retrieving outputs generated by reverse-engineering analysis tools within a pipeline architecture and using those outputs in subsequent analysis stages to analyze binaries. However, Conway is silent in explicitly on determining that instructions to perform an investigatory step are invalid, generating an explanation why the instructions are invalid, providing the explanation as part of a prompt to revise the instructions to a large language model (LLM), and receiving revised instructions from the LLM. On the other hand, Siracusano teaches an LLM agent system that iteratively processes information using prompts and incorporates feedback to revise prompts and instructions. For example, Siracusano teaches that a user can evaluate whether the generated information is sufficient and provide feedback to the system when the output is not acceptable (¶0081–¶0083). Siracusano further teaches that the LLM agent can iteratively receive feedback and revise the generated prompt templates based on that feedback (¶0086). Additionally, Siracusano teaches that the process may be repeated by modifying prompts and querying the LLM again using the revised prompts until satisfactory results are obtained (¶0080). These teachings demonstrate determining that generated instructions or outputs are insufficient, providing feedback explaining the issue, incorporating that explanation into revised prompts, and receiving revised instructions from the LLM.
It would have been obvious to use the feedback prompt revision techniques of Siracusano within the automated binary analysis pipeline of Conway so that when instructions used for analysis are determined to be insufficient, an explanation can be generated and incorporated into a revised prompt to obtain updated instructions from the LLM. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using iterative prompt revision and feedback within an automated analysis pipeline would predictably improve the reliability and accuracy of the analysis by allowing incorrect or insufficient instructions to be corrected through subsequent interactions with the LLM.
Regarding Claim 12
Conway teaches:
An expert system for analyzing a function of a binary comprising:
a processing unit;
a memory coupled to the processor and storing computer-executable instructions;
a memory representation system (MRS) configured to store tool outputs generated by a suite of tools (SOT) (Conway: ¶[0046]: data collectors 232 write outputs such as disassembly data and function call graphs to a primary datastore 236 for use by subsequent collectors in a pipeline; ¶[0050]: data collectors generate binary images and store the generated image and associated data in the primary datastore 236; ¶[0065]: input data, data generated by data collectors, and information extracted by information extractors are stored in the primary datastore to coordinate sharing of information between analysis components.);
Conway teaches a pipeline-based system that analyzes a binary using multiple analysis tools whose outputs are stored in a datastore and subsequently processed by additional analysis modules. However, Conway does not explicitly disclose using a LLM to reason over the stored analysis outputs and generate prompts that determine further investigatory analysis steps. Siracusano teaches an LLM agent that receives parsed information, generates prompts based on that information, and sends the prompts to an LLM to determine queries or tasks (¶0070–¶0072). Siracusano further teaches that the LLM agent performs multiple queries to the LLM and receives outputs that guide further analysis steps, including iteratively modifying prompts and repeating the process when additional reasoning is required (¶0073–¶0075; ¶0077–¶0080; ¶0090–¶0091). Thus, Siracusano teaches an LLM-based orchestration component that generates prompts, receives LLM outputs, and uses those outputs to determine further processing steps.
It would have been obvious to incorporate the LLM prompting and reasoning techniques of Siracusano into the automated binary analysis pipeline of Conway so that the stored tool outputs of Conway could be interpreted by an LLM and used to determine additional investigatory steps in the analysis pipeline. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Applying an LLM prompt reasoning system to guide an existing automated binary analysis pipeline would predictably improve automation and interpretation of analysis outputs.
Regarding Claim 13
Conway teaches a system for analyzing binaries using multiple reverse-engineering tools arranged in a pipeline, where outputs generated by the tools are stored in a datastore and subsequently used by additional analysis components. However, Conway does not explicitly disclose translating the stored tool outputs into natural language text using a data-to-natural-language translator. Siracusano teaches generating natural language text representations of processed information using an LLM processing pipeline. For example, Siracusano teaches that an LLM agent sends prompts containing summaries, descriptions, and parsed information to an LLM and receives generated text outputs that extract and describe relevant information (¶0073–¶0075). Siracusano further teaches preprocessing steps that generate summaries or expanded descriptions of information contained in source data to provide context for subsequent extraction and reasoning (¶0099; ¶0110; ¶0117). Additionally, Siracusano teaches that annotated datasets and models used in the pipeline may be generated using knowledge provided by cross-domain experts (¶0056), thereby providing expert context for interpreting and generating natural language descriptions of the processed information.
It would have been obvious to one of ordinary skill in the art to apply the natural language generation and prompt-based reasoning techniques of Siracusano to the analysis outputs generated by the binary analysis pipeline of Conway so that the stored tool outputs could be translated into natural language descriptions for reasoning and further automated analysis. Both references relate to automated cybersecurity analysis systems that process complex security data and reduce manual analyst effort by automating interpretation of extracted information. Applying the natural language generation techniques of Siracusano to the tool outputs generated and stored in Conway would predictably improve the ability of the system to interpret and reason about the results produced by the analysis tools.
Regarding Claim 14
Conway teaches a binary analysis system that utilizes multiple analysis tools arranged in a pipeline, where outputs generated by the tools are stored in a datastore and subsequently processed by additional analysis components. However, Conway does not explicitly disclose generating a natural language prompt that combines translated analysis outputs with descriptions of tools and requirements for using those tools. Siracusano teaches generating prompts for an LLM that combine contextual information and task instructions to guide processing and extraction operations. For example, Siracusano teaches prompts that combine context with instructions specifying the information to extract and the format in which it should be returned (¶0112–¶0113). Siracusano further teaches prompt templates that incorporate context, filtering conditions, and task requirements to guide automated processing through a pipeline of prompts interacting with an LLM (¶0091–¶0093; ¶0117; ¶0120). These prompts therefore combine natural language descriptions of information with instructions describing how the processing components should perform their tasks.
It would have been obvious to one of ordinary skill in the art to apply the prompt construction techniques of Siracusano to the automated binary analysis pipeline of Conway so that prompts generated for the LLM combine the natural language descriptions of the stored analysis outputs with instructions describing how the analysis tools should be used. Both references address automated cybersecurity analysis and reducing manual analyst effort by automating interpretation of complex security data. Applying the prompt construction techniques of Siracusano to the tool outputs produced by Conway would predictably improve automation of analysis and reasoning over the outputs produced by the analysis tools.
Regarding Claim 15
Conway teaches a binary analysis system that utilizes a plurality of analysis tools arranged in a pipeline to analyze an input binary, where outputs generated by the tools are stored in a datastore and used by subsequent analysis components to perform additional analysis operations. However, Conway does not explicitly disclose “further comprising a natural language to data translator (NL2DT) configured to translate the instructions to perform an investigatory step into a call to the SOT, wherein the NL2DT uses context provided by expert knowledge regarding the function and use of individual tools in the SOT to generate the call”.
Siracusano teaches using an LLM agent that processes natural language instructions and generates structured outputs that are subsequently parsed and formatted into a structured data representation. For example, Siracusano teaches prompts that instruct the LLM to extract specific information and output the information in a specified structured format (¶0112–¶0113). Siracusano further teaches that the output of the extraction component is processed by a formatting component that maps the extracted information into a structured data model such as a STIX ontology (¶0115–¶0116). Thus, Siracusano teaches translating natural language instructions and outputs generated by an LLM into structured representations that can be used by downstream processing components within an automated pipeline (¶0117).
It would have been obvious to one of ordinary skill in the art to apply the natural language processing and structured output translation techniques of Siracusano to the automated binary analysis pipeline of Conway so that natural language instructions generated during analysis are translated into structured calls to the analysis tools within the pipeline. Both references address automating cybersecurity analysis and reducing the manual effort required by security analysts to interpret complex security data. Applying the natural language-to-structured-data translation techniques of Siracusano to the tool-based analysis architecture of Conway would predictably enable automated translation of natural language investigatory instructions into structured commands that invoke the appropriate analysis tools within the pipeline. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results.
Regarding Claim 16
Conway teaches storing and retrieving outputs generated by reverse-engineering analysis tools within a pipeline architecture and using those outputs in subsequent analysis stages to analyze binaries. However, Conway is silent in explicitly on teaching “wherein the NL2DT is further configured to determine that the instructions to perform the investigatory step are invalid, generate an explanation why the instructions are invalid, the LO is further configured to provide the explanation as part of a prompt to revise the instructions to the LLM, and the LLM is further configured to generate revised instructions based on the prompt to revise.” Siracusano teaches an LLM system that evaluates generated outputs, determines when the outputs are insufficient or not acceptable, and iteratively modifies prompts and queries to the LLM to improve the results. For example, Siracusano teaches that the process may be repeated when the generated information is not sufficiently good, where feedback is provided to the LLM agent and the LLM agent modifies prompt templates and queries the LLM again using the modified prompts (¶0080). Siracusano further teaches that generated information can be evaluated and feedback provided through a user interface, and that the LLM agent receives the feedback and revises the generated information or prompt templates accordingly (¶0081–¶0083). Siracusano also teaches that system components evaluate whether sufficient information exists for forming prompts and processing instructions, and when insufficient information is detected, additional queries are performed to obtain improved results (¶0084–¶0086).
It would have been obvious to one of ordinary skill in the art to apply the iterative feedback and prompt revision techniques of Siracusano to the automated binary analysis architecture of Conway so that instructions generated for investigatory steps are evaluated, explanations or feedback regarding insufficient or invalid instructions are incorporated into revised prompts, and the LLM generates revised instructions for further analysis. Both references address automating complex cybersecurity analysis tasks while reducing manual analyst effort through iterative reasoning and improvement of generated outputs. Applying the iterative prompt modification and feedback mechanisms of Siracusano to the automated analysis pipeline of Conway would predictably enable automatic detection of invalid investigatory instructions and generation of revised instructions through repeated interaction with the LLM.
Regarding Claim 17
Conway teaches a binary analysis system that utilizes a plurality of analysis tools arranged in a pipeline to analyze an input binary, where outputs generated by the tools are stored and used by subsequent components to perform additional analysis operations. However, Conway does not explicitly disclose providing a pre-determined prompt to an LLM upon a result of an investigatory step meeting a certain condition. Siracusano teaches an LLM agent that determines and provides prompts (queries) to an LLM based on information received and processed within a pipeline. For example, Siracusano teaches that a data acquisition module provides parsed information to the LLM agent, which determines the queries that should be sent to the LLM (¶0070). Siracusano further teaches that the LLM agent generates prompts by selecting portions of parsed information and combining them with prompt requests such as summarization or description tasks (¶0071–¶0072). Siracusano additionally teaches that the LLM agent performs several queries to the LLM to preprocess and extract information from the provided data (¶0073–¶0074). Furthermore, Siracusano teaches that prompt templates may be modified or selected based on feedback and system processing, and the LLM agent provides the modified or selected prompts to the LLM for further processing (¶0077–¶0078).
It would have been obvious to one of ordinary skill in the art to apply the prompt selection and query generation techniques of Siracusano to the automated binary analysis architecture of Conway so that predetermined prompt templates are provided to the LLM when particular analysis conditions or results occur within the analysis pipeline. Both references address automating complex cybersecurity analysis tasks and reducing the manual effort required by security analysts by automatically reasoning over collected analysis information. Applying the prompt orchestration techniques of Siracusano to the automated analysis pipeline of Conway would predictably enable the system to provide predetermined prompts to the LLM when particular investigatory conditions are met during analysis.
Regarding Claim 18
Claim 18 is directed to a storage media comprising instructions corresponding to the method in claim 1. Claim 18 is similar in scope to claim 1 and is therefore rejected under similar rationale.
Regarding Claim 19
Conway teaches a binary analysis system that utilizes a plurality of analysis tools arranged in a pipeline to analyze an input binary, where outputs generated by the tools are stored and used by subsequent components to perform additional analysis operations. However, Conway does not explicitly disclose generating, by a large language model (LLM), a textual explanation of the function of the binary based on the analysis results. Siracusano teaches using an LLM to generate natural language responses describing cybersecurity information based on prompts and contextual input data. For example, Siracusano teaches prompting an LLM to analyze cybersecurity related text and generate answers or descriptions regarding security artifacts such as malware, threat actors, or attack targets (¶0112–¶0113). Siracusano further teaches that the LLM can reason over contextual information and produce textual responses explaining the information contained in the input data (¶0089–¶0091). Siracusano additionally teaches that the outputs generated by the LLM may describe cybersecurity entities and relationships which are then mapped into structured representations such as a STIX ontology (¶0114–¶0115). Thus, Siracusano teaches generating textual explanations of cybersecurity-related artifacts using an LLM.
It would have been obvious to one of ordinary skill in the art to apply the natural language generation capabilities of the LLM taught by Siracusano to the binary analysis outputs generated by the Conway system so that the system produces a textual explanation describing the function or behavior of the analyzed binary. Both references address automating cybersecurity analysis tasks and improving the interpretability of complex security information for analysts. Applying the LLM text generation techniques of Siracusano to the binary analysis results of Conway would predictably enable the system to generate natural language explanations describing the behavior or function of the analyzed binary.
Regarding Claim 20
Conway discloses:
The computer-readable storage media of claim 18, wherein the instructions further cause the computing device to perform operations comprising:
classifying the binary as malware (Conway: ¶[0057]: comparing the structure of the function call graph to structures associated with known malware or malware families.).;
generating a signature of the binary (Conway: ¶[0061]: generate fingerprints associated with known vulnerabilities or malicious code.).; and
submitting the signature to a malware tracking database (Conway: ¶[0062]: the fingerprints are stored in a database to create a catalog of vulnerabilities.).
Claims 4 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Conway (US 20250013753 A1), in view of Siracusano (US 20240411994 A1) as applied to claim 1 above, and in further view of Nemtsov (US 12095806 B1).
Regarding Claim 4
Conway teaches storing and retrieving outputs generated by reverse-engineering analysis tools within a pipeline architecture and using those outputs in subsequent analysis stages to analyze a binary. Siracusano teaches generating natural language prompts for an LLM based on processed information and using those prompts in an iterative analysis framework. However, Conway and Siracusano are silent in explicitly teaching that the tool outputs comprise runtime objects. Nemtsov teaches representing detected cybersecurity objects as nodes in a security graph generated during inspection of a resource. For example, Nemtsov teaches that detected cybersecurity objects are represented in a security graph and that a node is generated to represent a malware object detected on a resource (Column 12, Lines 15-54). Nemtsov further teaches receiving runtime data from an inspected resource and generating definitions and mitigation logic based on detected cybersecurity objects and runtime data (Column 12, Lines 15-54). These teachings demonstrate that outputs produced by inspection and analysis tools may comprise objects representing detected entities derived from runtime information.
It would have been obvious to incorporate the object-based representation of analysis results taught by Nemtsov into the automated analysis pipeline of Conway, as modified by Siracusano, so that outputs of analysis tools are represented as runtime objects. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Representing analysis outputs as objects in a graph structure would predictably improve automated reasoning and correlation of detected cybersecurity entities across analysis steps.
Regarding Claim 8
Conway teaches storing and retrieving outputs generated by reverse-engineering analysis tools within a pipeline architecture and providing those outputs to subsequent analysis tools through a shared datastore used by the analysis pipeline. Siracusano teaches generating structured prompts and instructions within a processing pipeline to guide automated analysis tasks and tool execution. However, Conway and Siracusano are silent in explicitly teaching that a call to the tool indicates a runtime object in a MRS. On the other hand, Nemtsov teaches representing detected cybersecurity objects as nodes within a security graph stored in a graph database and generating instructions which, when executed, perform queries on the security graph to detect nodes representing resources or cybersecurity threats (Column 12, Lines 15 - Column 13, Line 50). Because the nodes in the security graph represent detected cybersecurity objects, these nodes correspond to runtime objects stored within the system’s memory representation structures. The generated query instructions operate on and reference these nodes when executed by the graph database. These teachings demonstrate that tool calls reference runtime objects represented within a system memory structure.
It would have been obvious to incorporate the object-based graph representation and query operations of Nemtsov into the automated analysis pipeline of Conway, as modified by Siracusano, so that calls to analysis tools reference runtime objects stored in the system’s memory representation structures. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. Using runtime object representations within a shared memory structure and invoking tools that operate on those objects would predictably improve the system’s ability to correlate and analyze detected cybersecurity entities across analysis steps.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAAD ABDULLAH whose telephone number is (571) 272-1531. The examiner can normally be reached on Monday - Friday, 9:30am - 5:30pm, EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lynn Feild can be reached on (571) 272-2092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SAAD AHMAD ABDULLAH/Examiner, Art Unit 2431
/SHIN-HON (ERIC) CHEN/Primary Examiner, Art Unit 2431