DETAILED ACTION
Response to Arguments
Applicant's arguments ("REMARKS") filed 02 December 2025 have been fully considered, and they are partially persuasive as to the previous grounds of rejection.
Claims 1-2, 4-9, 11-16, and 18-20 were amended. Claims 1, 8, and 15 are independent. Claims 1-20 are currently pending.
Re: Claim Rejections Under 35 U.S.C. §103
As an initial matter, the Examiner notes that Applicant’s amendments to independent claims 1, 8, and 15 have removed the limitation “… wherein the similarity-based operation is performed entirely using CPU caching.” In the prior Office Action, the Hu et al., “Large-scale malware indexing using function-call graphs”, In Proceedings of the 16th ACM conference on Computer and communications security (CCS '09), 2009 (hereinafter, “Hu ‘09”), and Changkyu Kim et al., “FAST: fast architecture sensitive tree search on modern CPUs and GPUs”, In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD '10), 2010 (hereinafter, “Kim ‘10”) references were relied upon solely to address this CPU caching limitation. Because this limitation is no longer present in the amended claims, the Hu ‘09 and Kim ‘10 references are hereby withdrawn from the rejection.
Applicant’s amendment and arguments, indicated on pp.7-18 of the REMARKS, in response to the rejection of the claims under 35 U.S.C. §103 with respect to Rao et al., US 2022/0036208 A1 (hereinafter, “Rao ‘208”), Huang et al., US 2018/0097822 A1 (hereinafter, “Huang ‘822”), and Kenefick, US 11,354,409 B1 (hereinafter, “Kenefick ‘409”) have been fully considered, and they are partially persuasive as to the previous grounds of rejection. In particular, with respect to the independent claims, Applicant argues that:
The cited references do not disclose the “confidence range” limitations.
The motivations to combine Rao ‘208, Huang ‘822, and Kenefick ‘409 is conclusory.
The reasonable expectation of success is conclusory.
In response to Argument A
Applicant argues that none of the cited references disclose the limitation “… hashing the target code to produce a fuzzy hash of the target code; generating, by assessing the fuzzy hash of the target code using one or more vantage-point tree structures, a first malware classification output; determining, via a threshold operator, a confidence range associated with the first malware classification output, wherein the confidence range predicts (i) a high confidence that the target code is malicious, (ii) a high confidence that the target code is non-malicious, or (iii) a low confidence that the target code is either malicious or non-malicious …”, as amended in the independent claims.
The Examiner respectfully disagrees.
Specifically, with respect to the primary reference Rao ‘208, Applicant argues that Rao ‘208’s ‘classification values’ merely indicate ‘whether a label is likely correct’, which is allegedly different from the claimed “confidence range” that predicts three outcomes.
The Examiner, however, does not interpret there to be substantive difference between Rao ‘208’s ‘classification values’ and the claimed “confidence range”. Rao ‘208 at ¶45 describes classification values as ‘confidence values or probability values that indicate a likelihood of the label for the corresponding classification being a correct label for a sample classified by the model’. A confidence value indicating a high likelihood that a ‘malicious’ label is correct is functionally equivalent to a high confidence that the target code is malicious, because the label is the classification of malicious or benign (emphasis added). Similarly, a high likelihood that a ‘benign’ label is correct is functionally equivalent to a high confidence that the target code is non-malicious.
Moreover, Rao ‘208 at ¶79 explicitly states that the incumbent model generates ‘a classification of malicious or benign based on the sample feature set and a confidence value that indicates a likelihood that the sample feature set corresponds to a benign sample’. This is a confidence prediction regarding whether the sample is benign or malicious and not merely a verification of a previously assigned label as Applicant argues. Accordingly, the Examiner does not find this argument to be persuasive.
Applicant further argues that Rao ‘208’s ‘activation range’ merely determines whether the candidate model should be activated, which is allegedly different from predicting the three confidence outcomes, as recited in the claims. While the Examiner acknowledges that Rao ‘208 uses the term ‘activation range’ rather than “confidence range”, the functional operation of Rao ‘208’s ‘activation range’ creates a three-part decision structure that maps to the three recited confidence outcomes.
Specifically, as disclosed in Rao ‘208 at ¶¶79-83 and Fig.8: (1) When the incumbent model’s output falls outside the activation range and the classification is malicious, the malware detector indicates the sample is malware (block 815). This corresponds to the claimed prediction of “(i) a high confidence that the target code is malicious”. (2) When the incumbent model’s output falls outside the activation range and the classification is benign, the malware detector indicates the sample is benign (block 813). This corresponds to the claimed prediction of “(ii) a high confidence that the target code is non-malicious”. (3) When the incumbent model’s output falls inside the activation range, the sample is not immediately classified and is instead passed to the supplemental ML model for further analysis (blocks 809, 811). This corresponds to the claimed prediction of “(iii) a low confidence that the target code is either malicious or non-malicious”.
The fact that Rao ‘208 describes this mechanism as ‘activating a supplemental model’ rather than ‘predicting low confidence’ is a merely difference in terminology. The underlying functional operation of evaluating samples based on the confidence level of the first-stage classification and escalating uncertain samples for further analysis, is the same. Accordingly, the Examiner does not find this argument to be persuasive.
Next, Applicant argues that the secondary reference Huang ‘822 relates to URLs rather than “target code”, and that Huang ‘822’s ‘classification score’ is different from a “confidence range” predicting three outcomes, as recited in the claims.
Firstly, Huang ‘822 is not relied upon to disclose the “target code” element. Rao ‘208 already discloses receiving and classifying code samples including executable files (Rao ‘208, ¶¶2, 78). Huang ‘822 is relied upon for its disclosing of a confidence-threshold framework for multi-stage malicious content classification. Both Rao ‘208 and Huang ‘822 are in the field of classification of malicious content, making them analogous art. In an obviousness analysis under 35 U.S.C. §103, it is not necessary that the references be directed to the identical subject matter; it is sufficient that they be reasonably pertinent to the problem being solved (emphasis added). See In re Oetiker, 977 F.2d 1443, 1447 (Fed. Cir. 1992).
Secondly, Huang ‘822 discloses a confidence determination with three outcomes, as recited in the claims. ¶36 of Huang ’822 discloses the confidence threshold as ‘threshold values or ranges of values that, if a malicious URL classification score satisfies, provides a high confidence that the URL is either malicious or not malicious’. Furthermore, at ¶53, Huang ‘822 discloses how the classification score either satisfies the confidence threshold (indicating high confidence in the classification) or does not satisfy the confidence threshold, in which case the content proceeds to further analysis. This maps directly to the three claimed outcomes: (i) high confidence malicious (score satisfies threshold on the malicious side), (ii) high confidence non-malicious (score satisfies threshold on the non-malicious side), and (iii) low confidence of either (score does not satisfy the threshold, triggering further analysis). Applicant’s characterization of Huang ‘822 as disclosing only ‘a score that indicates a probability that a URL is malicious’ does not accurately reflect the full scope of Huang ‘822’s disclosure. Accordingly, the Examiner does not find this argument to be persuasive.
Lastly, Applicant argues that even if Rao ‘208’s ‘activation range’ were modified in view of Huang ‘822, the result would still not disclose the claimed “confidence range” because Rao ‘208’s classification values only convey ‘label correctness’. The Examiner respectfully disagrees.
As stated above, Rao ‘208’s ‘classification values’ are confidence or probability values associated with malicious or benign classifications (Rao ‘208, ¶¶45, 79). They are not merely verifications of previously assigned labels. The ‘classification values’ of Rao ‘208 are predictions of whether the sample corresponds to malicious or benign content. Modifying Rao ‘208’s ‘activation range’ to incorporate Huang ‘822’s explicit confidence threshold framework results in a system that determines whether the first-stage classification output indicates (i) high confidence that the target code is malicious, (ii) high confidence that the target code is non-malicious, or (iii) low confidence of either, which would trigger further analysis by the supplemental ML model. ¶79 of Rao ‘208 clearly states: ‘a classification of malicious or benign based on the sample feature set and a confidence value that indicates a likelihood that the sample feature set corresponds to a benign sample’. Accordingly, the Examiner does not find this argument to be persuasive.
See Claim Rejections – 35 USC §103 below for further details.
In response to Argument B
Applicant argues that the Office Action’s motivation to combine Rao ‘208 and Huang ‘822 is generic and conclusory, analogizing it to the rejected rationales in ActiveVideo Networks, Inc. v. Verizon Commc’ns, Inc., 694 F.3d 1312 (Fed. Cir. 2012), and Google LLC v. Sonos, Inc., No. 2023-1259 (Fed. Cir. 2024). Applicant further argues that the cited portions of Huang ‘822 do not mention ‘efficiency’ and that multi-stage analysis would likely reduce efficiency.
Applicant’s argument is partially persuasive. Accordingly, Applicant's arguments and amendments have necessitated new ground(s) of rejection presented in this Office Action. A new ground of rejection has been asserted over modified interpretations and further clarification of the references. The Examiner has considered Applicant’s arguments and provides the following clarification and supplemental reasoning.
Both Rao ‘208 and Huang ‘822 address the same technical problem of classifying content as malicious or benign when initial analysis may be inconclusive, and escalating to further analysis when necessary. Rao ‘208 addresses this problem through its incumbent/supplemental model framework with an activation range (Rao ‘208, ¶¶15-16), while Huang ‘822 addresses this through a multi-stage analysis workflow with confidence thresholds (Huang ‘822, ¶¶14-17). Specifically, Huang ‘822 at ¶17 discloses that multi-stage analysis provides an ‘efficiency’ benefit: each stage ‘filters out some number of URLs by identifying those URLs as malicious or not malicious, such that later analysis stages operate on a smaller sub-set of URLs’, and as a result, ‘the overall computational intensity and/or analysis time to analyze a group of URLs is reduced’. Thus, the multi-stage confidence-threshold method of Huang ‘822 reduces increase efficiency and reduces computational burden.
A person of ordinary skill in the art working with Rao ‘208’s activation range method would recognize that Huang ‘822’s confidence threshold framework provides a more structured and well-defined approach of deciding when the first-stage output is sufficiently confident to act upon and when further analysis is needed. Huang ‘822 provides an explicit three-way confidence determination that refines and clarifies the operation of Rao ‘208’s activation range. A person of ordinary skill in the art would be motivated to adopt Huang ‘822’s confidence threshold framework to make Rao ‘208’s activation range decision more predictable and well-defined.
Regarding Applicant’s argument that the Office Action assumes without evidence that Rao ‘208’s data ‘was not successfully analyzed by prior analyzers’, the Examiner notes that this is inherent in Rao ‘208’s framework. Rao ‘208’s activation range exists precisely because the incumbent model’s output for certain samples falls within a range of insufficient confidence, necessitating further analysis by the supplemental model (Rao ‘208, ¶¶80, 82-83). The premise of Rao ‘208’s system is that some data is not conclusively classified by the first-stage analysis. Accordingly, the Examiner finds this argument partially persuasive to the extent it identified areas for clearer articulation, but not persuasive to the extent it contends the combination is improper.
Next, Applicant argues that the Office Action fails to explain why a person of ordinary skill in the art would choose Kenefick ‘409’s LSH process to replace Rao ‘208’s first-stage incumbent model, and further argues that if Kenefick ‘409’s LSH process is sufficiently ‘precise and efficient’, there would be less need for subsequent analyzers, thereby undermining the motivation to combine Rao ‘208 and Huang ‘822. The Examiner has considered these arguments and provides clarification. A person of ordinary skill in the art would be motivated to use Kenefick ‘409’s LSH process using vantage point trees as the first-stage classification process in Rao ‘208’s multi-model framework for the following reasons.
First, Kenefick ‘409 discloses that locality-sensitive hashing with a distance metric and vantage point trees provides fast, scalable similarity-based malware detection that can operate in real time, even against databases containing hundreds of millions of entries (Kenefick ‘409, Col.2 lines 11-54; Col.8 lines 30-45). This fast, scalable first-stage screening complements the deeper, more computationally intensive ML analysis provided by Rao ‘208’s supplemental model. A person of ordinary skill in the art would recognize that pairing a fast similarity-based first stage with a more thorough ML-based second stage follows a well-established pattern of escalating analysis complexity.
Second, regarding Applicant’s argument that Kenefick ‘409’s ‘precise and efficient’ LSH process would undermine the need for further analysis, the Examiner notes that Kenefick ‘409 itself recognizes that its LSH-based detection does not resolve all cases with high confidence. Kenefick ‘409 discloses returning a verdict of ‘“3” (unknown, feedback requested)’ for borderline cases where ‘we are not confident enough to flag a detection’ (Kenefick ‘409, Col.10 lines 46-62). Kenefick ‘409 further discloses ‘middle-of-the-road’ situations where the distance metric result is ambiguous (Kenefick ‘409, Col.9 lines 45-54). This demonstrates that even Kenefick ‘409’s LSH process produces uncertain results in some cases, supporting the need for a second-stage ML analysis as taught by Rao ‘208. The Specification of the current application also acknowledges that the fuzzy hash/VPT first stage will sometimes produce low-confidence results that require further ML analysis. Accordingly, the Examiner finds this argument partially persuasive to the extent it identified areas for clearer articulation, but not persuasive to the extent it contends the combination is improper.
Lastly, Applicant argues that the Examiner’s prior statement that ‘Rao ‘208’s machine learning model may be trained and adapted to process hashed representations just as effectively as it would raw data’ is unsupported, and that a person of ordinary skill in the art would understand hashed representations to be ‘nondescript text that cannot be reversed or decoded’. The Examiner has considered this argument and provides the following clarification.
The Examiner clarifies the mechanics of the proposed combination. The proposed modification does not require Rao ‘208’s incumbent ML model to process hashed representations directly. Rather, the proposed combination replaces Rao ‘208’s first-stage incumbent ML model with Kenefick ‘409’s LSH/VPT-based process for the first classification stage. In the modified system, the first-stage process operates as follows: (1) the target code is hashed to produce a fuzzy hash (per Kenefick ‘409’s TLSH computation), (2) the fuzzy hash is assessed against one or more vantage-point tree structures (per Kenefick ‘409’s fast search using balanced tree structures), and (3) the distance metric output from the VPT search serves as the first-stage classification output. This classification output is then evaluated against the confidence range thresholds (per Huang ‘822’s confidence threshold framework as applied to Rao ‘208’s activation range). Samples with ambiguous distance metrics (i.e., low confidence) proceed to the second-stage ML analysis, which is Rao ‘208’s supplemental model operating on the original sample features, not on hash values.
In other words, the hash values are used in the first-stage similarity-based process (Kenefick ‘409), and the second-stage ML model (Rao ‘208) operates on the original sample data. There is no requirement that Rao ‘208’s ML model process ‘nondescript’ hash values. Accordingly, Applicant’s argument that a person of ordinary skill in the art would understand hashed representations to be indecipherable does not undermine the proposed combination as clarified herein. The Examiner finds this argument partially persuasive to the extent it identified an ambiguity in the prior Office Action’s articulation of the combination, which is now clarified.
See Claim Rejections – 35 USC §103 below for further details.
In response to Argument C
Applicant argues that the Office Action’s reasonable expectation of success analysis is conclusory and unsupported. Specifically, Applicant argues that: (a) the Office Action claims the core ideas of Rao ‘208 and Kenefick ‘409 are ‘complementary’ without supporting evidence; (b) the characterization of Applicant’s traversals as arguing references are ‘not physically combinable’ is insufficient; and (c) the Office Action does not explain why a person of ordinary skill in the art would use Kenefick ‘409’s LSH process but forego Kenefick ‘409’s distance metric in favor of Rao ‘208’s activation range modified by Huang ‘822.
The Examiner respectfully disagrees.
With respect to argument (a), the core ideas of Rao ‘208 and Kenefick ‘409 are complementary because they address different aspects of the same malware detection problem. Rao ‘208 provides a framework for joining two detection models with an activation range to manage the tradeoff between detection rate and false positive rate (Rao ‘208, ¶¶15-16). Kenefick ‘409 provides a specific, efficient first-stage detection technique using locality-sensitive hashing with vantage point trees that is optimized for fast, scalable similarity-based detection (Kenefick ‘409, Col.2 lines 11-54). A person of ordinary skill in the art would have a reasonable expectation of success in combining these teachings because the combination follows a predictable pattern: using a fast similarity-based technique (Kenefick ‘409) as the first stage in a multi-model framework (Rao ‘208), with uncertain results escalated to a more thorough ML-based second stage. Both approaches are proven malware detection techniques, and their sequential application follows a well-known process in the security arts.
With respect to argument (b), the Examiner reiterates that the proposed combination does not require physically combining the specific hardware or software structures of the references. As stated in MPEP § 2145(III), combining the teachings of references does not require an ability to combine their specific structures. A person of ordinary skill is also a person of ordinary creativity, not an automaton (see MPEP § 2141.03). The proposed combination takes Kenefick ‘409’s teaching of LSH-based detection using VPTs and applies it as the first-stage process within Rao ‘208’s teaching of a multi-model detection framework. A person of ordinary skill in the art would be capable of implementing such a combination (emphasis added).
With respect to argument (c), the proposed combination does not forego Kenefick ‘409’s distance metric (emphasis added). Rather, the distance metric output from Kenefick ‘409’s VPT search serves as the basis for the first-stage classification output. The distance metric values are the values that are evaluated against the confidence range thresholds (as modified by Huang ‘822). Specifically, a small distance to a node in a malware VPT indicates high confidence of maliciousness; a small distance to a node in a non-malware VPT indicates high confidence of non-maliciousness; and intermediate or ambiguous distances indicate low confidence, triggering the second-stage ML analysis. Thus, Kenefick ‘409’s distance metric is not being abandoned. Rather, it is being integrated into the confidence range evaluation. Applicant’s citation to Ex parte Manzi, Appeal No. 2018-003675 (PTAB 2018), is distinguishable because in that case the combination required abandoning a specific advantage taught by one reference. Here, Kenefick ‘409’s distance metric advantage is preserved and utilized within the combined system.
For the foregoing reasons, the Examiner maintains that a person of ordinary skill in the art would have had a reasonable expectation of success in combining the teachings of Rao ‘208, Huang ‘822, and Kenefick ‘409 as proposed.
See Claim Rejections – 35 USC §103 below for further details.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5 and 7-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rao et al., US 2022/0036208 A1 (hereinafter, “Rao ‘208”), in view of Huang et al., US 2018/0097822 A1 (hereinafter, “Huang ‘822”), and further in view of Kenefick, US 11,354,409 B1 (hereinafter, “Kenefick ‘409”).
As per claim 1: Rao ‘208 discloses:
A method for generating a malware classification output for a target code, the method comprising (a method for detecting and classifying malware for executable code [Rao ‘208, ¶¶Abstract, 15-16, 29]):
receiving the target code (receiving an input sample, where the input sample may comprise executable code [Rao ‘208, ¶¶77-78; Fig.8]);
generating, (the malware detector, using the incumbent machine learning (ML) model, generates a first malware classification output with respect to the input code, where the classification output is associated with confidence values [Rao ‘208, ¶¶79-81, 84-85; Fig.8]);
determining, via a threshold operator, a confidence range associated with the first malware classification output (determining an activation range by an activation range tuner, where the activation range is applied to the first malware classification output by the incumbent ML model, and where the first malware classification output by the incumbent ML model is associated with confidence values [Rao ‘208, ¶¶19, 45, 79-80]), wherein the confidence range predicts (i) (a prediction is made on whether the input falls outside of the activation range and is considered malicious [Rao ‘208, ¶¶79-81, 85; Fig.8]), (ii) (a prediction is made on whether the input falls outside of the activation range and is considered benign [Rao ‘208, ¶¶79-81, 84; Fig.8; Fig.8]), or (iii) (a prediction is made on whether the input falls inside of the activation range, where if the input falls inside of the activation range, the input is not immediately classified as malicious or benign by the incumbent ML model; instead the input is passed to the supplemental ML model for further analysis [Rao ‘208, ¶¶80, 82-83; Fig.8]);
in response to the determination that the associated confidence range predicts the (in response to determining that the input falls inside of the activation range, such that the input is not immediately classified as malicious or benign, passing the input to the supplemental ML model and generating a second malware classification output, where the supplemental ML model operates on the original sample feature set rather than on hash values [Rao ‘208, ¶¶78, 80, 82-85; Fig.8]); and
performing one or more malware-based actions, including to reject, pass, and/or quarantine the target code, based on the first malware classification output or the second malware classification output (performing a malware-based action by another component, such as a firewall or a virtual machine, based on the first classification output or the second classification output, where the action may be confirming the verdict of the classifications (i.e., reject or pass), or executing the input code in a virtual environment to monitor potential malicious effects without exposing devices to malware (i.e., quarantine) [Rao ‘208, ¶¶31, 55, 84-85, 90; Fig.2]).
As stated above, Rao ‘208 does not explicitly disclose the limitations “… hashing the target code to produce a fuzzy hash of the target code; generating, by assessing the fuzzy hash of the target code using one or more vantage-point tree structures, a first malware classification output … wherein the confidence range predicts (i) a high confidence that the … is malicious, (ii) a high confidence that the … is non-malicious, or (iii) a low confidence that the … is either malicious or non-malicious; in response to the determination that the associated confidence range predicts the low confidence that the … is either malicious or non-malicious, generating …”.
Huang ‘822, however, discloses:
…
determining, via a threshold operator (a multi-stage method for identifying malicious content, comprising determining a classification score by an analysis manager 202 [Huang ‘822, ¶¶14-15, 36, 39; Fig.2, Fig.4]), a confidence range associated with the first … output (in the first analysis stage 402, a classification score is generated, where the classification score is associated with a confidence prediction of whether the content is malicious or not [Huang ‘822, ¶¶36-37, 51-53; Fig.4, Fig.6]),
wherein the confidence range predicts (i) a high confidence that the … is malicious (the classification score may indicate a confidence range of absolute malicious [Huang ‘822, ¶¶14, 36-37, 53]), (ii) a high confidence that the … is non-malicious (the classification score may indicate a confidence range of absolute not-malicious [Huang ‘822, ¶¶14, 36-37, 53]), or (iii) a low confidence that the target code is either malicious or non-malicious (the classification score does not satisfy a confidence threshold for either malicious or not-malicious [Huang ‘822, ¶¶15, 37, 53]);
in response to the determination that the associated confidence range predicts the low confidence that the … is either malicious or non-malicious, generating (in response to a determination the classification score does not satisfy a confidence threshold for either malicious or not-malicious, a second-stage analysis 404 is performed on the content using another analyzer which uses machine learning techniques [Huang ‘822, ¶¶15, 28, 30, 37-38, 53, 55; Fig.4, Fig.6, Fig.7]) …
Rao ‘208 and Huang ‘822, are analogous art because they are from the same field of endeavor, namely that of classification of malicious content. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 and Huang ‘822 before them, to modify the method in Rao ‘208 to include the teachings of Huang ‘822, namely to implement the activation range of Rao ‘208, such that being inside of the activation range indicates not reaching the confidence threshold for neither malicious nor not-malicious, while being outside of the activation range indicates reaching the confidence threshold for either malicious or not-malicious, as disclosed in Huang ‘822. A motivation for doing so would be to filter out content at each stage of analysis, reducing overall computational intensity, as each stage filters out some number of items by identifying them as malicious or not malicious such that later analysis stages operate on a smaller subset (see Huang ‘822, ¶¶14-15, 17).
As stated above, Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitation “… hashing the target code to produce a fuzzy hash of the target code; generating, by assessing the fuzzy hash of the target code using one or more vantage-point tree structures, a first malware classification output …”.
Kenefick ‘409, however, discloses:
… hashing the target code to produce a fuzzy hash of the target code (the malware detection agent (MDA) calculates a locality-sensitive hash value from an API call sequence string of an executing process, where the locality-sensitive hash is a TLSH value, which is a fuzzy hash representation of the behavior of the executing process [Kenefick ‘409, Col.7 line 63-Col.8 line 29]);
generating, by assessing the fuzzy hash of the target code using one or more vantage-point tree structures, a first malware classification output (the security server searches the blacklist database using the received TLSH value to determine if this value matches or is within the distance threshold of the entries in the database, where the database is organized into a tree structure using recursive partitioning to implement a fast search, and where the tree works well with metric trees and with a vantage point tree in particular [Kenefick ‘409, Col.2 lines 11-54; Col.9 line 55-Col.10 line 45; Fig.4]) …
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409, are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the method in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to namely to replace the first-stage classification process of Rao ‘208 with a locality-sensitive hashing (LSH) process using vantage point trees (VPT), as disclosed in Kenefick ‘409, such that the target code is hashed to produce a fuzzy hash and the fuzzy hash is assessed against VPT structures to generate the first malware classification output, where the distance metric output from the VPT search serves as the first-stage classification output that is evaluated against the confidence range thresholds. A motivation for doing so would be that Kenefick ‘409 provides a fast, scalable similarity-based detection process optimized for real-time operation against large databases, which complements the deeper ML analysis of Rao ‘208’s supplemental model, and because pairing a fast similarity-based first stage with a thorough ML-based second stage follows a well-established pattern in the security arts for balancing detection speed and accuracy (see Kenefick ‘409, Col.2 lines 11-54, Col.8 lines 30-45).
Moreover, Kenefick ‘409 itself recognizes that its LSH-based detection does not resolve all cases with high confidence, as it describes returning a verdict of ‘3’ (unknown, feedback requested) for borderline cases where the system is not confident enough to flag a detection (see Kenefick ‘409, Col.9 lines 45-54, Col.10 lines 46-61). This demonstrates that even Kenefick ‘409’s LSH process produces uncertain results in some cases, supporting the need for a second-stage ML analysis as taught by Rao ‘208.
As per claim 2: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claim 1, as stated above, from which claim 2 is dependent upon. Furthermore, Rao ‘208 discloses:
wherein the trained malware classification machine learning model comprises a trained neural network model (the supplemental ML model may be implemented using a neural network [Rao ‘208, ¶¶30, 34, 37-38, 58]).
As per claim 3: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claim 1, as stated above, from which claim 3 is dependent upon. Furthermore, Rao ‘208 discloses:
wherein the trained malware classification machine learning model is not executed until after the first malware classification output is generated (the supplemental ML model is executed after the incumbent ML model in response to a determination that the classification output from the incumbent ML model is within the activation range [Rao ‘208, ¶¶80, 82-83; Fig.8]).
As per claim 4: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409, discloses all limitations of claim 1, as stated above, from which claim 4 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 4. Kenefick ‘409, however, discloses:
wherein the one or more vantage-point tree structures are generated based on a library of malware code (the blacklist database 180 is comprised of hash values of known malicious sequences, where these hash values in database 180 are organized into a tree structure in order to implement a fast search, and where the tree works well with metric trees and with a vantage point tree in particular [Kenefick ‘409, Col.2 lines 11-54, Col.5 lines 25-41; Fig.4]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. For the reasons stated in claim 1, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the method in Rao ‘208 (modified by Huang ‘282) to include the teachings of Kenefick ‘409.
As per claim 5: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claims 1 and 4, as stated above, from which claim 5 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 5. Kenefick ‘409, however, discloses:
wherein one or more vantage-point tree structures are generated based on a library of non-malware code (the whitelist database 170 supports fast search and is comprised of known good sequences, where database 170 is a whitelist of LSH values of API call sequences [Kenefick ‘409, Col.5 lines 20-41]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. For the reasons stated in claim 4, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the method in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409.
As per claim 7: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409, discloses all limitations of claim 1, as stated above, from which claim 7 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 7. Kenefick ‘409, however, discloses:
wherein the first malware classification output includes multiple results (a plurality of results generated from the search of the blacklist database 180 and the whitelist database 170 using the locality-sensitive hashing operation assessed using the vantage point tree [Kenefick ‘409, Abstract, Col.5 lines 25-41, Col.9 lines 55-61]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the method in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to implement the first-stage classification process of Rao ‘208 such that the classification output includes multiple results generated from searching both a blacklist database and a whitelist database, as disclosed in Kenefick ‘409. A motivation for doing so would be to increase the accuracy of the nearest neighbor classification by searching against both known malicious and known non-malicious samples, thereby providing richer information for the confidence range determination (see Kenefick ‘409, Col.6 lines 35-41, Col.9 lines 55-61).
As per claim 8: Rao ‘208 discloses:
A system comprising: a processor; and memory having instructions stored thereon that, when executed by the processor, cause the processor to: (a computer system including a processor 1001 and memory 1007, where the functionality may be partially or entirely implemented in hardware and/or on the processor [Rao ‘208, ¶¶93, 98; Fig.10]):
receive target code (receiving an input sample, where the input sample may comprise executable code [Rao ‘208, ¶¶77-78; Fig.8]);
generate, (the malware detector, using the incumbent machine learning (ML) model, generates a first malware classification output with respect to the input code, where the classification output is associated with confidence values [Rao ‘208, ¶¶79-81, 84-85; Fig.8]);
determine, via a threshold operator, a confidence range associated with the first malware classification output (determining an activation range by an activation range tuner, where the activation range is applied to the first malware classification output by the incumbent ML model, and where the first malware classification output by the incumbent ML model is associated with confidence values [Rao ‘208, ¶¶19, 45, 79-80]),
wherein the confidence range predicts (i) (a prediction is made on whether the input falls outside of the activation range and is considered malicious [Rao ‘208, ¶¶79-81, 85; Fig.8]), (ii) (a prediction is made on whether the input falls outside of the activation range and is considered benign [Rao ‘208, ¶¶79-81, 84; Fig.8]), or (iii) (a prediction is made on whether the input falls inside of the activation range, where if the input falls inside of the activation range, the input is not immediately classified as malicious or benign by the incumbent ML model; instead the input is passed to the supplemental ML model for further analysis [Rao ‘208, ¶¶80, 82-83; Fig.8]);
in response to the determination that the associated confidence range predicts the (in response to determining that the input falls inside of the activation range, such that the input is not immediately classified as malicious or benign, passing the input to the supplemental ML model and generating a second malware classification output, where the supplemental ML model operates on the original sample feature set rather than on hash values [Rao ‘208, ¶¶78, 80, 82-85; Fig.8]); and
perform one or more malware-based actions, including to reject, pass, and/or quarantine the target code, based on the first malware classification output and the second malware classification output (performing a malware-based action by another component, such as a firewall or a virtual machine, based on the first classification output and the second classification output, where the malware detector communicates both the incumbent ML model output and the supplemental ML model output to a separate component, and where the action may be confirming the verdict of the classifications (i.e., reject or pass), or executing the input code in a virtual environment to monitor potential malicious effects without exposing devices to malware (i.e., quarantine) [Rao ‘208, ¶¶31, 55, 84-85, 90; Fig.2, Fig.8, Fig.9]).
As stated above, Rao ‘208 does not explicitly disclose the limitations “… hash the target code to produce a fuzzy hash of the target code; generate, by comparing the fuzzy hash to one or more nodes of a vantage-point tree, a first malware classification output … wherein the confidence range predicts (i) a high confidence that the … is malicious, (ii) a high confidence that the … is non-malicious, or (iii) a low confidence that the … is either malicious or non-malicious; in response to the determination that the associated confidence range predicts the low confidence that the … is either malicious or non-malicious, generate …”.
Huang ‘822, however, discloses:
…
determine, via a threshold operator (a multi-stage method for identifying malicious content, comprising determining a classification score by an analysis manager 202 [Huang ‘822, ¶¶14-15, 36, 39; Fig.2, Fig.4]), a confidence range associated with the first … classification output (in the first analysis stage 402, a classification score is generated, where the classification score is associated with a confidence prediction of whether the content is malicious or not [Huang ‘822, ¶¶36-37, 51-53; Fig.4, Fig.6]),
wherein the confidence range predicts (i) a high confidence that the … is malicious (the classification score may indicate a confidence range of absolute malicious [Huang ‘822, ¶¶14, 36-37, 53]), (ii) a high confidence that the … is non-malicious (the classification score may indicate a confidence range of absolute not-malicious [Huang ‘822, ¶¶14, 36-37, 53]), or (iii) a low confidence that the target code is either malicious or non-malicious (the classification score does not satisfy a confidence threshold for either malicious or not-malicious [Huang ‘822, ¶¶15, 37, 53]);
in response to the determination that the associated confidence range predicts the low confidence that the target code is either malicious or non-malicious, generate (in response to a determination the classification score does not satisfy a confidence threshold for either malicious or not-malicious, a second-stage analysis 404 is performed on the content using another analyzer which uses machine learning techniques [Huang ‘822, ¶¶15, 28, 30, 37-38, 53, 55; Fig.4, Fig.6, Fig.7]) …
Rao ‘208 and Huang ‘822, are analogous art because they are from the same field of endeavor, namely that of classification of malicious content. For the reasons stated in claim 1, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 and Huang ‘822 before them, to modify the method in Rao ‘208 to include the teachings of Huang ‘822.
As stated above, Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitation “… hash the target code to produce a fuzzy hash of the target code; generate, by comparing the fuzzy hash to one or more nodes of a vantage-point tree, a first malware classification output …”.
Kenefick ‘409, however, discloses:
… hashing the target code to produce a fuzzy hash of the target code (the malware detection agent (MDA) calculates a locality-sensitive hash value from an API call sequence string of an executing process, where the locality-sensitive hash is a TLSH value, which is a fuzzy hash representation of the behavior of the executing process [Kenefick ‘409, Col.7 line 63-Col.8 line 29]);
generate, by comparing the fuzzy hash to one or more nodes of a vantage-point tree, a first malware classification output (the security server searches the blacklist database by calculating a metric distance between the received TLSH value and the values at each node of a tree, where the tree is a balanced tree structure built using recursive partitioning, and where the tree works well with metric trees and with a vantage point tree in particular [Kenefick ‘209, Col.2 lines 11-54, Col.9 line 55-Col.10 line 15; Fig.4]) …
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409, are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. For the reasons stated in claim 1, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the method in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409.
As per claim 9: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claim 8, as stated above, from which claim 9 is dependent upon. Furthermore, Rao ‘208 discloses:
wherein the trained malware classification machine learning model comprises a trained neural network model (the supplemental ML model may be implemented using a neural network [Rao ‘208, ¶¶30, 34, 37-38, 58]).
As per claim 10: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claim 8, as stated above, from which claim 10 is dependent upon. Furthermore, Rao ‘208 discloses:
wherein the trained malware classification machine learning model is not executed until after the first malware classification output is generated (the supplemental ML model is executed after the incumbent ML model in response to a determination that the classification output from the incumbent ML model is within the activation range [Rao ‘208, ¶¶80, 82-83; Fig.8]).
As per claim 11: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claim 8, as stated above, from which claim 11 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 11. Kenefick ‘409, however, discloses:
wherein a first node of the one or more nodes of the vantage-point tree is generated from malware code (the blacklist database 180 is comprised of hash values of known malicious sequences, where these hash values are organized into a tree structure such that each node in the tree represents a hash value generated from a known malicious API call sequence [Kenefick ‘409, Col.5 lines 25-41; Fig.4]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the system in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to implement the vantage-point tree of the first-stage classification process such that nodes in the tree are generated from known malicious code, as disclosed in Kenefick ‘409. A motivation for doing so would be to enable fast similarity-based comparison of an unknown sample against a database of known malicious behaviors, such that a match within a distance threshold indicates the unknown sample is likely malicious (see Kenefick ‘409, Col.5 lines 25-41).
As per claim 12: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claims 8 and 11, as stated above, from which claim 12 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 12. Kenefick ‘409, however, discloses:
wherein a second node of the one or more nodes of the vantage point tree is generated from non-malware code (the whitelist database 170 supports fast search and is comprised of known good sequences, where database 170 is a whitelist of LSH values of API call sequences, such that each node in the whitelist tree represents a hash value generated from a known non-malicious API call sequence [Kenefick ‘409, Col.5 lines 20-41]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. For the reasons stated in claim 11, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the system in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409.
As per claim 13: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claims 8 and 11-12, as stated above, from which claim 13 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 13. Kenefick ‘409, however, discloses:
wherein comparing the fuzzy hash to the one or more nodes comprises calculating a first distance between the fuzzy hash and the first node, and calculating a second distance between the fuzzy hash and the second node (the security server uses a distance metric or approximate distance metric to calculate a distance between the received TLSH value and each of the values in the blacklist database 180, i.e., the first node generated from malware code as established in claim 11, and the whitelist database 170 may also be queried, i.e., calculating a distance between the received TLSH value and the second node generated from non-malware code as established in claim 12 [Kenefick ‘409, Col.3 lines 41-55, Col.9 lines 55-61; Fig.4]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the system in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to implement the first-stage classification process such that distances are calculated between the fuzzy hash of the target code and nodes generated from both malware code and non-malware code in the vantage-point tree, as disclosed in Kenefick ‘409. A motivation for doing so would be to enable the system to determine the proximity of an unknown sample to both known malicious and known non-malicious samples, thereby providing a basis for classifying the sample by comparing the respective distances against a distance threshold (see Kenefick ‘409, Col.2 lines 11-54).
As per claim 14: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claims 8 and 11-13, as stated above, from which claim 14 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 14. Kenefick ‘409, however, discloses:
wherein the instructions cause the processor to update the vantage-point tree to include a node generated based on the target code (once the target code has been classified as malicious or non-malicious, the hash value of the target code may be designated as benign and placed into the whitelist database 170, or designated as malicious and placed into the blacklist database 180, thereby updating the tree structure to include a new node generated based on the target code [Kenefick ‘409, Col.3 lines 18-27, Col.5 lines 25-41]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the system in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to update the vantage-point tree to include a node generated based on the target code after classification, as disclosed in Kenefick ‘409. A motivation for doing so would be to increase the protection of the system by quickly updating the database with newly classified malicious or non-malicious behaviors, thereby enabling the system to address new threats within a reasonably short amount of time (see Kenefick ‘409, Col.3 lines 18-27).
As per claims 15-17: Claims 15-17 define a non-transitory computer-readable medium that recites substantially similar subject matter as the method of claims 1-3, respectively. Specifically, claims 15-17 are directed to a non-transitory computer-readable medium having instructions stored thereon for generating a malware classification output for a target code, wherein execution of the instructions by a processor causes the processor to perform the method of claims 1-3, respectively. Thus, the rejection of claims 1-3 is equally applicable to claims 15-17, respectively.
As per claim 18: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claim 15, as stated above, from which claim 18 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 18. Kenefick ‘409, however, discloses:
wherein at least one node of the one or more vantage-point tree structures are generated from a fuzzy hash of malware code (the blacklist database 180 is comprised of hash values of known malicious sequences, where the hash values are locality-sensitive hash values calculated using the TLSH algorithm, which is a fuzzy hash algorithm that generates a fuzzy hash representation of the behavior of a malicious process, and where these hash values are organized into a tree structure such that each node represents a fuzzy hash value generated from malware code [Kenefick ‘409, Col.5 lines 25-41; Fig.4]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the computer-readable medium in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to implement the vantage-point tree structures such that nodes are generated from fuzzy hash values of known malicious code, as disclosed in Kenefick ‘409. A motivation for doing so would be to enable fast similarity-based comparison of an unknown sample against a database of known malicious behaviors using locality-sensitive hash values, which preserve similarity information and allow the use of a distance metric for efficient tree-based searching (see Kenefick ‘409, Col.2 lines 11-54, Col.5 lines 25-41).
As per claim 19: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claims 15 and 18, as stated above, from which claim 19 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 19. Kenefick ‘409, however, discloses:
wherein at least one node of the one or more vantage-point tree structures are generated from a fuzzy hash of non-malware code (the whitelist database 170 supports fast search and is comprised of known good sequences, where database 170 is a whitelist of LSH values of API call sequences, and where the LSH values are locality-sensitive hash values calculated using the TLSH algorithm, which is a fuzzy hash algorithm that generates a fuzzy hash representation of the behavior of a non-malicious process [Kenefick ‘409, Col.5 lines 20-41]).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. For the reasons stated in claim 18, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the computer-readable medium in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409.
As per claim 20: Rao ‘208 in view of Huang ‘822, and further in view of Kenefick ‘409 discloses all limitations of claims 15 and 18-19, as stated above, from which claim 20 is dependent upon. Rao ‘208 in view of Huang ‘822 does not explicitly disclose the limitations of claim 20. Kenefick ‘409, however, discloses:
wherein assessing the fuzzy hash of the target code includes calculating a first distance between the fuzzy hash of the target code and a node of the at least one node generated from the fuzzy hash of the malware code, and calculating a second distance between the fuzzy hash of the target code and a node of the at least one node generated from the fuzzy hash of the non-malware code (the security server uses a distance metric or approximate distance metric to calculate a distance between the received TLSH value of the target code and each of the TLSH values in the blacklist database 180, i.e., nodes generated from fuzzy hashes of malware code as established in claim 18, and the whitelist database 170 may also be queried, i.e., calculating a distance between the received TLSH value and nodes generated from fuzzy hashes of non-malware code as established in claim 19 [Kenefick ‘409, Col.3 lines 41-55, Col.9 lines 55-61).
Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Kenefick ‘409 before them, to modify the computer-readable medium in Rao ‘208 (modified by Huang ‘822) to include the teachings of Kenefick ‘409, namely to implement the first-stage classification process such that distances are calculated between the fuzzy hash of the target code and nodes generated from fuzzy hashes of both malware code and non-malware code, as disclosed in Kenefick ‘409. A motivation for doing so would be to enable the system to determine the proximity of an unknown sample to both known malicious and known non-malicious samples, thereby providing a basis for classifying the sample by comparing the respective distances against a distance threshold (see Kenefick ‘409, Col.2 lines 34-45).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Rao ‘208, in view of Huang ‘822, and further in view of Kenefick ‘409, and further in view of Lancioni et al., US 2023/0029679 A1 (hereinafter, “Lancioni ‘679”).
As per claim 6: Rao ‘208, in view of Huang ‘822, and further in view of Kenefick ‘409, discloses all limitations of claim 1, as stated above, from which claim 6 is dependent upon. Rao ‘208, in view of Huang ‘822 does not explicitly disclose the limitations of claim 6. Lancioni ‘679, however, discloses:
wherein assessing the fuzzy hash of the target code includes calculating a first distance between the fuzzy hash and a node in a first (the MinHashed vector 404 of an unknown sample is used to query a malware LSH forest 406, which includes samples of the malware class, and a custom distance metric is calculated between the unknown sample and items retrieved from the malware LSH forest 406 to produce a first malware distance 412 [Lancioni ‘679, ¶¶46-49; Fig.4]), and wherein assessing the fuzzy hash of the target code further includes calculating a second distance between the fuzzy hash and a node in a second (the MinHashed vector 404 is also used to query a clean LSH forest 420, which includes samples of the clean class, and a custom distance metric is calculated between the unknown sample and items retrieved from the clean LSH forest 420 to produce a first clean distance 426 [Lancioni ‘679, ¶¶46-50; Fig.4]).
Rao ‘208 (modified by Huang ‘822) and Lancioni ‘679 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code using similarity-based operations. Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822) and Lancioni ‘679 before them, to modify the method in Rao ‘208 (modified by Huang ‘822) to include the teachings of Lancioni ‘679, namely to implement the first-stage classification process such that the fuzzy hash of the target code is queried against separate malware and non-malware tree structures, and distances are calculated between the fuzzy hash and nodes in each respective tree structure, as disclosed in Lancioni ‘679. A motivation for doing so would be to improve classification accuracy by computing distances to both malware and non-malware samples independently, thereby providing the classifier with richer proximity information from both classes to distinguish between malicious and non-malicious code (see Lancioni ‘679, ¶¶45, 47, 49-51).
As stated above, Rao ‘208 in view of Huang ‘822, and further in view of Lancioni ‘679 does not explicitly disclose the limitation “... a first vantage-point tree structure ... vantage-point tree structures ... vantage-point tree structure ... a second vantage-point tree structure ... vantage-point tree structure ... vantage-point tree structure ...”.
Kenefick ‘409, however, discloses:
... a first vantage-point tree structure ... vantage-point tree structures ... vantage-point tree structure ... a second vantage-point tree structure ... vantage-point tree structure ... vantage-point tree structure ... (identification and classification of malware using a similarity-based operation comprising locality-sensitive hashing (LSH) using metric or vantage point trees, where the similarity-based operation comprises calculating distances between a subject hash value and hash values organized in the tree structure [Kenefick ‘409, Col.2 lines 11-54; Fig.4]).
Rao ‘208 (modified by Huang ‘822 and Lancioni ‘679) and Kenefick ‘409 are analogous art because they are from the same field of endeavor, namely that of classification of malicious code. For the reasons stated in claim 1, prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Rao ‘208 (modified by Huang ‘822 and Lancioni ‘679) and Kenefick ‘409 before them, to modify the method in Rao ‘208 (modified by Huang ‘822 and Lancioni ‘679) to include the teachings of Kenefick ‘409.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Choi et al., US 20200082083 A1: verification and reliability of a machine learning classification model by deriving predictive information for a file suspected of maliciousness by various machine learning models such as CNN and DNN and determining the similarity for the malicious suspicious file.
Kaluga et al., US 20230098919 A1: A system and method for detecting malware using hierarchical clustering analysis. Unknown files classified by clustering and in view of known malicious and known safe files. Machine learning models and detection rules are used to enhance classification accuracy.
Oliver et al., US 11182481 B1: A system for evaluating files for cyber threats includes a machine learning model and a locality sensitive hash (LSH) repository. When the machine learning model classifies a target file as normal, the system searches the LSH repository for a malicious locality sensitive hash that is similar to a target locality sensitive hash of the target file.
Chang et al., US 11636161 B1: An intelligent clustering system has a dual-mode clustering engine for mass-processing and stream-processing. A tree data model is utilized to describe heterogenous data elements in an accurate and uniform way and to calculate a tree distance between each data element and a cluster representative.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN L KONG whose telephone number is (571)272-2646. The examiner can normally be reached Monday-Thursday 9:00am-7:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JUNG (JAY) KIM can be reached on (571)272-3804. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALAN L KONG/Examiner, Art Unit 2494
/THEODORE C PARSONS/Primary Examiner, Art Unit 2494