Last updated: May 29, 2026
Application No. 18/482,519
BINARY FILE MALWARE DETECTION WITH STRUCTURE AWARE MACHINE LEARNING

Non-Final OA §103
Filed
Oct 06, 2023
Priority
Jul 31, 2023 — provisional 63/516,659
Examiner
PATEL, HARESH N
Art Unit
2496
Tech Center
2400 — Computer Networks
Assignee
Palo Alto Networks Inc.
OA Round
3 (Non-Final)
Interview Optional

— +22.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +22.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 824 resolved cases, 2023–2026
Examiner Intelligence

PATEL, HARESH N View full profile →
Grants 78% — above average
Career Allowance Rate
640 granted / 824 resolved
+19.7% vs TC avg
Strong +22% interview lift
Without
With
+22.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
24 currently pending
Career history
862
Total Applications
across all art units
Statute-Specific Performance

§101
1.3%
-38.7% vs TC avg
§103
66.8%
+26.8% vs TC avg
§102
24.1%
-15.9% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 824 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Status of Claims
Claims 1-3, 5-10, 12-17, 19, 20-23 are subject to examination. 
Claims 4, 11, 18 are cancelled.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Note: As claimed, regardless of what the outcome of claimed “comparison”, the outcome of determining remains same, i.e., is statically or dynamically. 
Regardless of “statically compressing” or “dynamically compression” the outcome is same. 
The same outcome is, “to generate the plurality of compressed representations”, regardless of which compression is used.  
Since, outcome “to generate the plurality of compressed representations” is same whether it is “dynamically compression” or “statically compression” it is similar to any “compression” that also accomplishes the claimed “plurality of compressed representations”.

Claim(s) 1, 8, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio 10762281, FENG et al., CN 114386034, Grymel et al., 20210406164, Rajkumar et al., 20200311616 and Official Notice.
Referring to claim(s) 1, Zhang substantially discloses a method comprising: generating a tree data structure comprising, at each node of the tree data structure, a corresponding one of a first plurality of strings, wherein the first plurality of strings comprises strings from analysis of a binary file, generate a plurality of sequences of strings from the first plurality of strings, wherein each of the plurality of sequences of strings corresponds to one or more paths of the tree data structure, wherein each of the one or more paths comprises a path from a root node to a leaf node of the tree data structure, traversing the tree data structure with (abstract, 
the dictionary tree (Trie Tree) is called a word search tree, key tree, is a tree-shaped data structure, which is a hashtree of variable. It divides the character string into several minimum units (such as single character), and forms the tree form (the path from the root node to the leaf node forms the original character string) in the way of public prefix. Such data structure can furthest reduce meaningless character string comparison so as to reduce query time overhead, typically applied to storage of a large number of character string, counting and ordering. In the above embodiment and does not limit how to obtain the analysis result, the embodiment application a specific scheme. matching the extracted features of the binary file to be detected by using the pre-established characteristic database to obtain the matching result comprising: matching the characteristic of the extracted binary file to be detected by using the pre-established characteristic database; if the characteristic of the extracted binary file to be detected is successfully matched with the characteristic of the path characterization in the dictionary tree, increasing the corresponding characteristic packet to count. correspondingly, analyzing the binary file to be detected according to the matching result to obtain the analysis result last para, page 10,
extracting string: based on the existing binary analysis tool, by analyzing the input binary file format and structure (common format such as PE, ELF, Mach-O and so on, static library and so on binary file should be split independent analysis), extracting comprises character string information of the data segment and the input and derived symbol table containing the symbol information. searching all the string only composed of ASCII readable character in the binary file data by linear search mode, as the character string set to be screened extracted from the binary file. At the same time, if the input is a binary sample library, the step can according to the binary file folder or software package to generate initial characteristic packet information (see text characteristic packet module). 3rd last para, page 6
segmentation: The character string sequentially into several words according to the following heurim rule: 1. A sub-string composed of several capital letters and a plurality of small letters or numbers, as a word; 2. It is composed of several small letters, numbers or English sentences (". "). composed of the sub-string, as a word; 3, the length of the plurality of special characters is not less than 2 of the sub-string, as a word; 4, eliminating all the rest characters. The extracted words are then sorted into a word sequence according to the order in which it appears in the original character string.
sieve: screening each character string by using word as unit, eliminating or replacing the word with meaningless or unstable existence. The heuriform rule of the actual application is as follows: 1, the short (such as less than 3 characters) of the letter, number or English sentence is formed by the word should be removed; 2, the words of too long (such as more than 24 characters) should be removed; 3, if the word is only by decimal or hexadecimal number (even number or letter "A" to "F", may be "0x" beginning, not distinguishing big and small writing), the word is replaced by the fixed mark representing the number; 4, if the word satisfy the version number format (such as X.X.X.X.X.X.X.X.X.X.X.X.X.X, etc., wherein " X " is at least one number), the word is replaced by the fixed mark representing the version number. after that, if the selected character string is repeated with other character string, the word number comprises less (such as less than 3 words) or only comprises the word composed of special character, then considering the character string effective information is too little, removing the character string. all the obtained word sequences together form a feature set, 3rd para, page 7

Inputting a second plurality of strings wherein the second plurality of strings at least comprises strings resulting from static analysis of the binary file (
analyzing for the binary file, the current scheme is the static analysis of the program code in the binary file. At present, one solution of binary file feature extraction and evaluation is for non-program code part in binary file, namely program symbol and character string and so on data for extracting and analyzing. The static analysis refers to program static analysis, by lexical analysis, grammar analysis, control flow and data stream analysis technology to scan the program code, verifying whether the code satisfy standard, security, reliability and maintainability index of a code analysis technology. the static analysis of the program can help software development staff and the quality assurance personnel to search the code for security vulnerability and so on, so as to ensure the whole quality of the software, and also can be used for helping software development the large-scale software system and system service logic extraction and other fields such as system service logic extraction and so on, 2nd para, page 11.

 Zhang does not specifically mention about, which is well-known in the art, which Delio discloses, compressing the plurality of sequences of strings to generate a plurality of compressed representations of the tree data structure, determine to compress a plurality of embeddings of the plurality of sequences of strings based on a comparison between a number of paths in the tree data structure corresponding to the plurality of sequences of strings compressing the plurality of embeddings of the plurality of sequences of strings, inputting the plurality of compressed representations (abstract
(54) FIGS. 7A-7D illustrate how compression and decompression can be performed in accordance with some embodiments described herein. First, producer 602 passes the string “Hello” to string compressor/decompressor 604. The string compressor/decompressor will then take the string “Hello” and add it to the internal string tree using the nodes already presented earlier in this disclosure. When the string compressor/decompressor is first given the string “Hello”, it creates the tree nodes shown earlier in this disclosure and returns the node identifier (offset) of a node that served as the base node (the node that contained as much of the string as possible that already exists within the tree), the offset within the string of that base node where the new string diverges, and the number of bytes that was added to the base node at the base offset, and lastly the identifier or key for the newly added string. col., 12, lines 32-45 
(32) One non-obvious feature of some embodiments described herein is that the common prefix node “Hel” is numbered with a value greater than the ending of the “Hello” string, “lo”. The reason for doing this is because the node or prefix string numbering must remain consistent throughout the use of the tree and compression. Recall that the identifiers of the strings are implicit. In the case of “Hello”, the implicit value was one. Notice that by keeping the ending portion of “Hello”, which is the “lo” segment, as one, it is still possible to refer to an identifier of one and walk up the tree and form the string “Hello”. Sure, it will be reversed but if a buffer was filled in reverse then the result would be “Hello”, col., 7, lines 28-38.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing compression of the strings. Regardless of the tree or not the strings would be compressed. This would reduce amount of space needed for storage of the strings, which implement efficient use of the storage resources, col., 12, lines 32-45. 
Zhang, Delio does not specifically mention about, which is well-known in the art, which Feng discloses, a dynamic analysis of a binary file, a tree search algorithm 
detecting engine comparison means also from the traditional static analysis based on feature development to dynamic analysis based on behavior, comprehensive utilization behavior monitoring, threat intelligence, machine learning and other techniques for malicious code detection. The malicious code analysis detection technology mainly comprises two kinds of static analysis and dynamic analysis. static analysis is mainly by analyzing the binary file of malicious code, extracting feature code to form malicious code feature library, detecting engine by comparing the characteristic of the sample file with the feature library to judge whether it is a malicious code, the advantage is that it can check all execution path of malicious code, the obtained characteristic code detection accuracy is high, but the characteristic code is heavy workload the analysis period is long, especially for polymorphic, modification, malicious code of the shell, difficult to extract effective characteristic code. dynamic analysis is to execute the sample file in the protected virtual environment, and monitoring the dynamic behavior of the sample file in the execution process through various monitoring points of the kernel state and the user state, such as file system, process, registry and network access, the advantages are not under multi-state, deformation and shell influence, 2nd para, page 2,
sample analysis capability and detection analysis algorithm 3rd para, page 2.
realizing the malicious code detection of multi-engine fusion, to improve the malicious code comprehensive detection level, 3rd para, page 2

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the dynamic analysis. Dynamic analysis of a binary file is a method of analyzing a software program or system while it is running. This technique is also known as "dynamic code analysis" and "dynamic program analysis". In contrast to static analysis, which examines the binary without running it, dynamic analysis focuses on and implements observing and understanding the program's behavior in real-time execution, 2nd para, page 2.
Zhang, Delio, Feng does not specifically mention about, which is well-known in the art, which Grymel discloses a threshold parameter that determines dimensionality of the plurality of compressed representations; and determining whether to statically compress or dynamically compress, statically compressing or dynamically
[0058] The example static storage controlling circuitry 314 compresses the uncompressed tensor. The example static storage controlling circuitry 314 divides the uncompressed tensor into storage elements. As used herein, a storage element is a tensor with relatively smaller dimensions. As an example base case for ZXY data, the data along the Z axis for a particular XY coordinate pair is a single storage element. However, in some examples, the static storage controlling circuitry 314 determines a number of storage elements to divide a tensor into along an axis. For example, the static storage controlling circuitry 314 can determine to divide the tensor into two storage elements along the Z axis. However, the static storage controlling circuitry 314 can additionally or alternatively determine to divide the tensor into a greater or fewer number of storage elements along the X, Y, and/or Z axis (e.g., three storage elements along the Z axis, etc.).
[0064] The example dynamic storage controlling circuitry 316 compresses the static, compressed tensor to generate a dynamic, compressed tensor. As described above, the static, compressed tensor includes compressed storage elements (e.g., the storage elements do not include zeros). That is, the storage elements of the static, compressed tensor are located at predetermined memory locations. The example dynamic storage controlling circuitry 316 compresses the storage elements and stores the start locations of the storage elements in a pointer table. That is, the pointer table enables access to the storage locations. Because the storage elements are not stored at fixed locations in memory, the dynamic storage controlling circuitry 316 can store the storage elements closer together in memory and, thus, the memory footprint decreases with respect to the static, compressed tensor. In some examples, the dynamic storage controlling circuitry 316 stores the start addresses of the storage elements in the pointer table in ascending order of the storage element number. In some examples, the dynamic storage controlling circuitry 316 stores the dynamic, compressed tensor and/or the pointer table in the local memory 108. An example pointer table is described below in connection with FIG. 13. An example dynamic, compressed tensor is described below in connection with FIG. 14.

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the dynamic compression. The compressed representation such as the compressed tensor would require less storage utilizing the number of axis for the information, para 58, 64.
Zhang, Delio, Feng, Grymel, does not specifically mention about, generating a plurality of embeddings of the plurality of sequences of strings, based on determining to statically compress the plurality of embeddings of the plurality of sequences of strings from the comparison, statically compressing the plurality of embeddings of the plurality of sequences of strings to generate the plurality of compressed representations; and based on determining to dynamically compress the plurality of embeddings of the plurality of sequences of strings from the comparison, dynamically compressing the plurality of embeddings of the plurality of sequences of strings to generate the plurality of compressed representations, which Rajkumar discloses,
[0009] In some implementations, the representation of information learned by a robot can be an embedding generated using a machine learning model. When a set of input information, such as sensor data describing an object, is provided to the machine learning model, the processing of the machine learning model may encode the information in a form that is used directly as an embedding, or is further processed to generate the embedding. As a result, the embedding may be a compressed representation of an object or other observation, where the specific values of the embedding may depend on the structure and training state of the machine learning model used to generate the embedding.
[0115] the classification labels have different string characters, the context of “a cup” and “a coffee mug” have similar characteristics. If a contextual match does occur, the server system 112 may transmit a notification to the device 108 to notify the user 106 that two embeddings correspond to different or similar objects. Alternatively, the server system 112 can store and distribute new embeddings each time a new embedding is received from a robot, regardless of whether the new embedding is different from the stored embeddings in the database 114.
[0209] In some implementations, the training module 902 updates the machine learning model 506 by providing the received feature data 914 of “glasses” and the classification label 918 denoting “glasses” as input to the machine learning model 506. In response, the machine learning model 506 produces an embedding 920 that corresponds to the classification label 918 denoting “glasses.” The server system 112 stores the feature data 914, the produced embedding 920, and the classification label 918 denoting “glasses” in the database 114. Then, the training module 902 provides the next subsequent dataset, dataset 406C, to the machine learning model 506 to evaluate its output at a particular layer.
[0210] In some implementations, the evaluation module 904 evaluates the output at a particular layer of a machine learning model 506. After the training module 902 trains the machine learning model 506, the evaluation module 904 evaluates the newly machine learning model 506 with data stored in the database 114. The data stored in the database 114 includes reference data that is trusted and verified by previous machine learning model versions. The reference data includes a classification label corresponding to an embedding and feature data. For example, reference data can include a classification label for a “cup” object, an embedding of the “cup” object, and feature data captured by a robot 104 describing the “cup” object. The evaluation model 904 is required to evaluate the machine learning model 506 because the machine learning model 506 must be backwards compatible. For instance, the machine learning model 506 can identify a “cup” object, a “plate” object, and a “fork” object. The training module 902 may then train the machine learning model 506 to identify a “camera” object. After the training module 902 trains the machine learning model 506 to recognize the “camera” object, the newly machine learning model 506 needs to be evaluated to ensure it can still recognize the “cup” object, the “plate” object, the “fork” object, and now the “camera” object.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the available compression method. The embedding would be a compressed representation of an object or other observation, where the specific values of the embedding may depend on the structure and training state of the machine learning model used to generate the embedding, para 9.
Zhang, Delio, Feng, Grymel, Rajkumar does not specifically mention about, inputting with a second plurality of strings into a machine learning ensemble to obtain as output a verdict indicating whether the binary file is malicious or benign. However, these limitations are well-known and expected in the art. 
For example, DASGUPTA et al., WO 2022232470 A1 discloses it,
An artificial intelligence (Al) based advanced malware detection tool (AIMaD), which uses a combination of both static and dynamic malware analysis in a machine learning (ML) framework, abstract
using a hybrid approach involving both static analysis and dynamic analysis of the samples. This same framework may be used to analyze a particular piece of code whose status as benign or malware is unknown, 2nd para, page 3.

YOON et al., KR 20230111844 A
Disclosed is an AI deep learning-based malware detection system that uses both static and dynamic information of a target to perform machine learning on the association of features with strong malicious characteristics and their continuity, and to determine whether or not it is a malicious code, last para, page 2.

Pi et al., WO 2020013958 A1
[0028] The hybrid machine learning detection system or method 60, such as the one illustrated in Fig. 2 can be used to monitor the process variables in ICSs 10. The hybrid system 60 of Fig. 2 includes a combination of three machine learning models and monitors the process variables from three different aspects. A combination of models from different aspects of process variable monitoring reduces the high false alarm rate by validating the results from multiple outputs using the different models.

Kim et al., KR 20190064264 A
detecting malicious code based on machine learning through hybrid analysis, performing static analysis and dynamic analysis with respect to the plurality of objects to generate a dataset, abstract
determine whether or not the plurality of objects are malicious by using a hybrid method in which static analysis and dynamic analysis of a plurality of objects are mixed, 5th para, page 3
The data set generator 210 may generate a data set using a hybrid analysis method using static analysis information and dynamic analysis information, 3rd para, page 5.
The detection rate can be measured by applying parameters of 5, 10, 20, 30, 40, 50, and 100. The detection rate for static analysis is from 59% to 89% and the detection rate for dynamic analysis is at least 64% Up to 99%, and the detection rate for hybrid analysis can range from a minimum of 81% to a maximum of 99%, 3rd para, page 10
It can be seen that the detection rate of the parameter value may be different, but the malicious detection rate is higher when the hybrid analysis method is applied than when only the static analysis and the dynamic analysis are performed. 4th para, page 10

Powers et al., 11010472, 
(7) the computer may use machine learning models to detect malware based on data acquired through both static and dynamic analysis of executables, non-executable data, and network traffic, col., 4, lines 10-14
WISTUBA et al., 20220092464 
[0088] Thus, the learning curve ranker 540 may include the history of each previous learning curve stored in the historic learning curve database 550 and each corresponding machine pipeline configurations (e.g., a first input “input 1” or first machine learning pipeline configuration—that is, the “input” is the configuration or description of the machine learning pipelines). Also, the learning curve ranker 540 may include learning curves and configurations of machine learning pipelines generated from the automated machine learning system 520 (e.g., a second input “input 2” or second machine learning pipeline configuration/description). Additionally, the automated machine learning system 520 may include machine learning pipeline configurations and partial learning curves of pipelines under consideration (e.g., a third input “input 3” or third machine learning pipeline configuration/description), and configuration and learning curve of the best pipeline found so far (e.g., a fourth input “input 4” or fourth machine learning pipeline configuration/description).

KOCHMAN et al., 20240283820 
[0051] Then, at operation 510, the system configures an automated machine learning training module with a plurality of corresponding machine learning models implemented by the plurality of candidate machine learning pipelines to process the input dataset.
[0020] The system can subsequently utilize the input datasets to initialize a plurality of candidate machine learning pipelines which serve to implement and execute an associated featurization approach upon the input dataset. As such, an individual candidate machine learning pipeline can implement different engineered features and/or data transforms to process the input dataset. The candidate machine learning pipelines can utilize any suitable type of machine learning model and can be selected based on a task associated with the input dataset (e.g., regression, classification).

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the machine learning ensemble. The machine learning ensemble would enable jointly provide a result of whether the binary file is malicious or benign. This would implement taking appropriate action when the file happens to be malicious and maintain overall security of the system.

Claim(s) 2, 9, 16, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Grymel, Rajkumar, Official Notice and HUANG, CN 109886016 B.
Referring to claim(s) 2, 9, 16, Zhang, Delio, Rajkumar, FENG do not disclose, data of application programming interface calls by the binary file from the dynamic analysis, wherein hierarchical structure of the tree data structure corresponds to hierarchical structure of data for the application programming interface calls, which Huang discloses (
the available dynamic attribute capture interaction with the host operating system, disk/optical disk, and network resource. The interaction with the operating system may include: dynamic library introduction, mutex activity and the operation of other processes running on the system. In addition, it can capture the execution track of all Windows API calls accessed by binary file, comprising parameter of any system call, parameter value and return value. The summary of the disc/optical disc activity may include a file system and a registry operation, which captures any persistent effect of the binary file. In addition, it also can capture the full and/or partial path using file system operation during feature extraction, type and/or number of operation of the file system; A particular registry key is also accessed or modified using a binary file. Finally, the features can be extracted from the network activity of the binary file, including HTTP and DNS traffic and IP addresses that are accessed via TCP and UDP.) 1st para, page 9
in order to reduce the sensitivity of the 3-gram change, defining the equivalence relationship between the character type, and each character is replaced by the typical representation (canonical). For example, in some embodiments, the string 3PUe5f be typically as 0BAa0b, wherein the big and small write vowel are respectively mapped to " A " and " a ", large small writing consonant are respectively mapped to " B " and " b ", and the numerical character is mapped to " 0 ". Similarly, the string 7SEi2d also be typically 0BAa0b. Sometimes, we classify the characters of the 3-gram, so as to further control the deformation and better capture the shape of the string. 1st para, page 8
Sequential (Sequential): A value of some attributes is a sequence of tokens, wherein each token is represented by a finite range of values. These sequential attribute and non-format string attribute have strong relevance; however, the individual token is not limited to individual character. in the software field, can use sequential feature extraction to capture API call information, because there is limited API call set and call occurs according to a specific order. similar to this, in the user behaviour field, can use sequential feature extraction to capture such as user triggering the website action of the command sequence, it also has limited website action and occurs according to a specific order. Similar to the non-format string features, the n-gram scheme can be used, wherein the sequence of each n adjacent tokens corresponds to a single feature. 2nd para, page 8
non-format string (Free-formString) or non-format string: Many important attributes appear to be unbounded strings, such as review fields in software signature verification, content of user posting, and so on. If these attributes are represented as a sub-type feature, then the attacker may be able to avoid detection by the following way: changing the single character in the attribute, so that the attribute is mapped to different dimensions. In order to increase the robustness, it can capture the 3-gram of these strings, wherein each three adjacent character sequence represents different 3-gram and each 3-gram as different dimensions. Because the scheme is still sensitive to the change of 3-gram, so introducing the additional character string is simplified. 3nd para, page 8

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing information associated with application programming interface calls. This would enable verifying that the application programming interface calls are not malicious and would not cause any harm to the system. Overall system would be made secure from attacks by an attacker, 3rd para, page 8. 

Claim(s) 3, 10, 17, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Grymel, Rajkumar, Official Notice and Liu et al., CN 114416966.
Referring to claim(s) 3, 10, 17, The citations of claim 1, discloses at least one of statically and dynamically compressing to generate the plurality of compressed representations and embeddings comprising  at least a subset of the plurality of embeddings. Zhang, Delio, FENG, Grymel, Rajkumar, do not disclose, which Liu discloses fusing each of the plurality of sequences of strings to generate a plurality of fused strings; tokenizing each of the plurality of fused strings to generate a plurality of tokens; embedding each of the plurality of tokens to generate a plurality of embeddings, at least one of statically and dynamically compress a plurality of embeddings, wherein the plurality of embeddings comprises the one or more embeddings for each of the plurality of sequences of strings ( 
A determining a search character string and a word dictionary, the word dictionary is used for dividing each sentence in different documents into different character strings for further use; constructing a search network model BERT, then optimizing the BERT network formed by stacking a plurality of transformers, and using the token embedding, dividing the embedding and position embedding, so that the embedded layer of the BERT network realizes the transmission of the character; step S4: adding classification label token at each character string starting position, taking the transformer output of the BERT network as the fusion sequence of the classification process, using the learning position of BERT network to embed, keeping the length of the fusion sequence as 256 tokens; the sentence pair generated in the BERT network operation process is marked as sentence A and sentence B, each token of the sentence A is embedded with a learning sentence X1, each token of the sentence B is embedded with a learning sentence X2; step S5: performing depth bidirectional representation training to the BERT network, using random shielding input token, predicting the shielded token; random shielding sentence A or sentence B 10 % of the component, training for unsupervised learning; in order to keep the token distributed characteristic and increasing information of each sentence in the transformer, in the shielding part, wherein 90 % adopts fixed template to shield, 5 % uses random template to shield, the remaining 5 % remains unchanged; step S6, claim 1.


Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing information associated with application programming interface calls. This would enable verifying that the application programming interface calls are not malicious and would not cause any harm to the system. Overall system would be made secure from attacks, 3rd para, page 8. 

Claim(s) 5, 12, 19, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Grymel, Rajkumar, Official Notice and Yerra et al., 20230252152.
Referring to claim(s) 5, 12, 19, The citations of claim 1, discloses wherein the tree data structure comprises one or more tree data structures. Zhang, Delio, FENG, Grymel Rajkumar, do not disclose, which Yerra discloses strings from dynamic analysis of the binary file in sandboxes of one or more operating systems, 
[0033]
[0066] Machine-readable memory 210 may be configured to store in machine-readable data structures: binary code, source code, version control data, application metadata, code changes, dynamic analysis reports, and any other suitable information or data structures (
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing sandboxes. Dynamic analysis would include execution of the update in a contained environment (also known as a “sandbox”) for various operating systems. Dynamic analysis would capture behavioral changes of the updated version over the previous version. The system would enable capture processing changes, memory changes and disk changes between the installed and updated versions of the software, para 33. 

Claim(s) 6, 13, 20, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Rajkumar, Grymel, Official Notice and WISTUBA et al., 20220092464.
Referring to claim(s) 6, 13, The citations of claim 1, discloses having the plurality of compressed representations as inputs and having the second plurality of strings as inputs. Zhang, Delio, Rajkumar, FENG, Grymel do not disclose, which WISTUBA discloses first pipeline and one or more pipelines
[0088] Thus, the learning curve ranker 540 may include the history of each previous learning curve stored in the historic learning curve database 550 and each corresponding machine pipeline configurations (e.g., a first input “input 1” or first machine learning pipeline configuration—that is, the “input” is the configuration or description of the machine learning pipelines). Also, the learning curve ranker 540 may include learning curves and configurations of machine learning pipelines generated from the automated machine learning system 520 (e.g., a second input “input 2” or second machine learning pipeline configuration/description). Additionally, the automated machine learning system 520 may include machine learning pipeline configurations and partial learning curves of pipelines under consideration (e.g., a third input “input 3” or third machine learning pipeline configuration/description), and configuration and learning curve of the best pipeline found so far (e.g., a fourth input “input 4” or fourth machine learning pipeline configuration/description)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing multiple pipelines with the machine learning. The pipelines would enable different configurations be applied to different inputs of the machine learning. Hence, the pipelines would make the processing faster for outputting result from the machine learning, para 88. 

Referring to claim(s) 20, The citations of claim 15, discloses plurality of compressed representations as inputs and second data as inputs, wherein the second data at least comprise data from static analysis of the binary file. WISTUBA discloses first pipeline and one or more pipelines
[0088] Thus, the learning curve ranker 540 may include the history of each previous learning curve stored in the historic learning curve database 550 and each corresponding machine pipeline configurations (e.g., a first input “input 1” or first machine learning pipeline configuration—that is, the “input” is the configuration or description of the machine learning pipelines). Also, the learning curve ranker 540 may include learning curves and configurations of machine learning pipelines generated from the automated machine learning system 520 (e.g., a second input “input 2” or second machine learning pipeline configuration/description). Additionally, the automated machine learning system 520 may include machine learning pipeline configurations and partial learning curves of pipelines under consideration (e.g., a third input “input 3” or third machine learning pipeline configuration/description), and configuration and learning curve of the best pipeline found so far (e.g., a fourth input “input 4” or fourth machine learning pipeline configuration/description).

Claim(s) 6, 13, 20, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Rajkumar, Grymel, Official Notice and KOCHMAN et al., 20240283820.
Referring to claim(s) 6, 13, The citations of claim 1, discloses having the plurality of compressed representations as inputs and having the second plurality of strings as inputs. Zhang, Delio, FENG, Rajkumar, Grymel do not disclose, which KOCHMAN discloses first pipeline and one or more pipelines
[0051] Then, at operation 510, the system configures an automated machine learning training module with a plurality of corresponding machine learning models implemented by the plurality of candidate machine learning pipelines to process the input dataset.
[0020] The system can subsequently utilize the input datasets to initialize a plurality of candidate machine learning pipelines which serve to implement and execute an associated featurization approach upon the input dataset. As such, an individual candidate machine learning pipeline can implement different engineered features and/or data transforms to process the input dataset. The candidate machine learning pipelines can utilize any suitable type of machine learning model and can be selected based on a task associated with the input dataset (e.g., regression, classification).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing multiple pipelines with the machine learning. The pipelines would enable different configurations be applied to different inputs of the machine learning. Hence, the pipelines would make the processing faster for outputting result from the machine learning, para 88. 

Referring to claim(s) 20, The citations of claim 15, discloses plurality of compressed representations as inputs and second data as inputs, wherein the second data at least comprise data from static analysis of the binary file. KOCHMAN discloses first pipeline and one or more pipelines
[0051] Then, at operation 510, the system configures an automated machine learning training module with a plurality of corresponding machine learning models implemented by the plurality of candidate machine learning pipelines to process the input dataset.
[0020] The system can subsequently utilize the input datasets to initialize a plurality of candidate machine learning pipelines which serve to implement and execute an associated featurization approach upon the input dataset. As such, an individual candidate machine learning pipeline can implement different engineered features and/or data transforms to process the input dataset. The candidate machine learning pipelines can utilize any suitable type of machine learning model and can be selected based on a task associated with the input dataset (e.g., regression, classification).

Claim(s) 14, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Rajkumar, Grymel, Official Notice and Niewiadomski 10949465.
Referring to claim(s) 14, Zhang, Delio, FENG, Rajkumar, Grymel do not disclose, which Niewiadomski discloses traverse the tree data structure with a tree search algorithm to obtain the plurality of sequences of strings
(63) The sub-process 600A from start to end represents an instance of an iterative depth-first search algorithm for computing alignment between input strings and the sequences encoded in the address graph. When executing the iterative-deepening depth-first search algorithm on the tree of the address graph, each executed instance of the depth-first search algorithm terminates early when the number unique address ID values of the recorded vertices in the result set is greater-than or equal-to n, as opposed to terminating early when the number of recorded vertices in the result set reaches n. When an instance of the depth-first search algorithm terminates with a non-empty result set, the addresses corresponding to the address ID values of the recorded vertices are ordered based on their hierarchical and lexicographic rank. The generation of the longest-common prefix method result set is achieved through the execution of the depth-first search algorithm with a d.sub.max value of zero. During the search process 500, a single path of the address graph is explored that corresponds to the prefix of the input (e.g., the first characters or component). The last vertex on the path is the root of a sub-tree whose terminal vertices correspond to all longest-common prefix matches. A subsequent lexicographically ordered traversal of the sub-tree allows for the generation of the result set), col., 14, lines 10-30.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing multiple pipelines with the machine learning. The pipelines would enable different configurations be applied to different inputs of the machine learning. Hence, the pipelines would make the processing faster for outputting result from the machine learning, col., 14, lines 10-30. 

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Grymel, Rajkumar, Official Notice and Xu et al., 11210391.
Referring to claim(s) 7, The citations of claim 1, discloses having the plurality of compressed representations as inputs and having the second plurality of strings as inputs. Zhang, Rajkumar, Delio, Grymel, FENG do not disclose, which Xu discloses depth-first search or breadth-first search ((112) traversing the tree (e.g., using a depth-first-search approach), col., 20, line 36.
 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing depth-first search or breadth-first search. Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. It explores as far as possible along each branch before backtracking. Starting at a chosen root node, DFS systematically visits nodes, prioritizing depth over breadth, until it encounters a dead end or a previously visited node, at which point it backtracks to explore other branches, Hence, it would implement systematically visits nodes of the tree for the search, col., 20, line 36.

Claim(s) 21-23, is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., CN 116149669 A in view of Delio, FENG, Grymel, Rajkumar, Official Notice, Liu and Wu 20230135659..
Referring to claim(s) 21-23, The citations of claim 1, discloses at least one of statically and dynamically compressing to generate the plurality of compressed representations. Zhang, Delio, FENG, Grymel, Liu Rajkumar, do not disclose, which Wu discloses tokenize each of the one or more fused strings with at least one of byte pair encoding and dictionary mapping (

    PNG
    media_image1.png
    564
    428
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    620
    434
    media_image2.png
    Greyscale
 
    PNG
    media_image3.png
    514
    738
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    640
    418
    media_image4.png
    Greyscale
 
    PNG
    media_image5.png
    534
    760
    media_image5.png
    Greyscale

[0077] In at least in one embodiment, tokens (e.g., token embeddings) are numerical representations of words or word pieces (e.g., subwords), which may be tokenized by a tokenization technique (e.g., splitting text into smaller units and representing said smaller units, word-based tokenization, character-based tokenization, subword-based tokenization, byte-pair encoding), which may comprise specialized tokens including classification tokens, separation tokens, end of sentence tokens, padding tokens, masking tokens, or some combination thereof In at least one embodiment, left-side segment 202 is separated from right-side segment 204 by placement of a separation token (e.g., a token labelled as [SEP]) between a sequence of left-side tokens 206 and a sequence of right-side tokens 208. In at least one embodiment, a classification token (e.g., a token labelled as [CLS]) classifies all segments 202, 204. In at least one embodiment of training architecture 200, a token embedding layer comprises a sequence of tokens 202, 204.
[0510] In at least one embodiment, one or more execution units can be combined into a fused execution unit 3509A-3509N having thread control logic (3511A-3511N) that is common to fused EUs such as execution unit 3507A fused with execution unit 3508A into fused execution unit 3509A.
[0137] one or more neural networks perform a task of classification 804 of sentences (e.g., identifying sentences), topics and dependency structures (e.g., determining a relation between a word and its dependents within text). The classification 804 is performed by entity-pronoun mapping module 306, event-pronoun mapping module 310, or some combination thereof.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing byte pair encoding. One of ordinary skilled in the art would readily know that Byte Pair Encoding (BPE) is a subword tokenization algorithm used in natural language processing, particularly in large language models. It addresses challenges of both word-based and character-based tokenization by creating a vocabulary of frequently occurring subword units. Hence, the byte pair encoding would enable generating of vocabulary of frequently occurring subword units and reduce the amount of space needed for storage, para 510, 77.


Response to Arguments
Remarks/Arguments filed 9/25/25, have been fully considered but they are not persuasive.  Therefore, rejection of claims 1-3, 5-10, 12-17, 19, 20-23 is maintained. 
Regarding the remarks for the amended claims, the rejections are updated accordingly. Please refer to the updated rejections for the amended limitations.
Zhang substantially discloses a method comprising: generating a tree data structure comprising, at each node of the tree data structure, a corresponding one of a first plurality of strings, wherein the first plurality of strings comprises strings from analysis of a binary file, generate a plurality of sequences of strings from the first plurality of strings, wherein each of the plurality of sequences of strings corresponds to one or more paths of the tree data structure, wherein each of the one or more paths comprises a path from a root node to a leaf node of the tree data structure, traversing the tree data structure with (abstract, 
the dictionary tree (Trie Tree) is called a word search tree, key tree, is a tree-shaped data structure, which is a hashtree of variable. It divides the character string into several minimum units (such as single character), and forms the tree form (the path from the root node to the leaf node forms the original character string) in the way of public prefix. Such data structure can furthest reduce meaningless character string comparison so as to reduce query time overhead, typically applied to storage of a large number of character string, counting and ordering. In the above embodiment and does not limit how to obtain the analysis result, the embodiment application a specific scheme. matching the extracted features of the binary file to be detected by using the pre-established characteristic database to obtain the matching result comprising: matching the characteristic of the extracted binary file to be detected by using the pre-established characteristic database; if the characteristic of the extracted binary file to be detected is successfully matched with the characteristic of the path characterization in the dictionary tree, increasing the corresponding characteristic packet to count. correspondingly, analyzing the binary file to be detected according to the matching result to obtain the analysis result last para, page 10,
extracting string: based on the existing binary analysis tool, by analyzing the input binary file format and structure (common format such as PE, ELF, Mach-O and so on, static library and so on binary file should be split independent analysis), extracting comprises character string information of the data segment and the input and derived symbol table containing the symbol information. searching all the string only composed of ASCII readable character in the binary file data by linear search mode, as the character string set to be screened extracted from the binary file. At the same time, if the input is a binary sample library, the step can according to the binary file folder or software package to generate initial characteristic packet information (see text characteristic packet module). 3rd last para, page 6
segmentation: The character string sequentially into several words according to the following heurim rule: 1. A sub-string composed of several capital letters and a plurality of small letters or numbers, as a word; 2. It is composed of several small letters, numbers or English sentences (". "). composed of the sub-string, as a word; 3, the length of the plurality of special characters is not less than 2 of the sub-string, as a word; 4, eliminating all the rest characters. The extracted words are then sorted into a word sequence according to the order in which it appears in the original character string.
sieve: screening each character string by using word as unit, eliminating or replacing the word with meaningless or unstable existence. The heuriform rule of the actual application is as follows: 1, the short (such as less than 3 characters) of the letter, number or English sentence is formed by the word should be removed; 2, the words of too long (such as more than 24 characters) should be removed; 3, if the word is only by decimal or hexadecimal number (even number or letter "A" to "F", may be "0x" beginning, not distinguishing big and small writing), the word is replaced by the fixed mark representing the number; 4, if the word satisfy the version number format (such as X.X.X.X.X.X.X.X.X.X.X.X.X.X, etc., wherein " X " is at least one number), the word is replaced by the fixed mark representing the version number. after that, if the selected character string is repeated with other character string, the word number comprises less (such as less than 3 words) or only comprises the word composed of special character, then considering the character string effective information is too little, removing the character string. all the obtained word sequences together form a feature set, 3rd para, page 7

Inputting a second plurality of strings wherein the second plurality of strings at least comprises strings resulting from static analysis of the binary file (
analyzing for the binary file, the current scheme is the static analysis of the program code in the binary file. At present, one solution of binary file feature extraction and evaluation is for non-program code part in binary file, namely program symbol and character string and so on data for extracting and analyzing. The static analysis refers to program static analysis, by lexical analysis, grammar analysis, control flow and data stream analysis technology to scan the program code, verifying whether the code satisfy standard, security, reliability and maintainability index of a code analysis technology. the static analysis of the program can help software development staff and the quality assurance personnel to search the code for security vulnerability and so on, so as to ensure the whole quality of the software, and also can be used for helping software development the large-scale software system and system service logic extraction and other fields such as system service logic extraction and so on, 2nd para, page 11.

 Zhang does not specifically mention about, which is well-known in the art, which Delio discloses, compressing the plurality of sequences of strings to generate a plurality of compressed representations of the tree data structure, determine to compress a plurality of embeddings of the plurality of sequences of strings based on a comparison between a number of paths in the tree data structure corresponding to the plurality of sequences of strings compressing the plurality of embeddings of the plurality of sequences of strings, inputting the plurality of compressed representations (abstract
(54) FIGS. 7A-7D illustrate how compression and decompression can be performed in accordance with some embodiments described herein. First, producer 602 passes the string “Hello” to string compressor/decompressor 604. The string compressor/decompressor will then take the string “Hello” and add it to the internal string tree using the nodes already presented earlier in this disclosure. When the string compressor/decompressor is first given the string “Hello”, it creates the tree nodes shown earlier in this disclosure and returns the node identifier (offset) of a node that served as the base node (the node that contained as much of the string as possible that already exists within the tree), the offset within the string of that base node where the new string diverges, and the number of bytes that was added to the base node at the base offset, and lastly the identifier or key for the newly added string. col., 12, lines 32-45 
(32) One non-obvious feature of some embodiments described herein is that the common prefix node “Hel” is numbered with a value greater than the ending of the “Hello” string, “lo”. The reason for doing this is because the node or prefix string numbering must remain consistent throughout the use of the tree and compression. Recall that the identifiers of the strings are implicit. In the case of “Hello”, the implicit value was one. Notice that by keeping the ending portion of “Hello”, which is the “lo” segment, as one, it is still possible to refer to an identifier of one and walk up the tree and form the string “Hello”. Sure, it will be reversed but if a buffer was filled in reverse then the result would be “Hello”, col., 7, lines 28-38.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing compression of the strings. Regardless of the tree or not the strings would be compressed. This would reduce amount of space needed for storage of the strings, which implement efficient use of the storage resources, col., 12, lines 32-45. 
Zhang, Delio does not specifically mention about, which is well-known in the art, which Feng discloses, a dynamic analysis of a binary file, a tree search algorithm 
detecting engine comparison means also from the traditional static analysis based on feature development to dynamic analysis based on behavior, comprehensive utilization behavior monitoring, threat intelligence, machine learning and other techniques for malicious code detection. The malicious code analysis detection technology mainly comprises two kinds of static analysis and dynamic analysis. static analysis is mainly by analyzing the binary file of malicious code, extracting feature code to form malicious code feature library, detecting engine by comparing the characteristic of the sample file with the feature library to judge whether it is a malicious code, the advantage is that it can check all execution path of malicious code, the obtained characteristic code detection accuracy is high, but the characteristic code is heavy workload the analysis period is long, especially for polymorphic, modification, malicious code of the shell, difficult to extract effective characteristic code. dynamic analysis is to execute the sample file in the protected virtual environment, and monitoring the dynamic behavior of the sample file in the execution process through various monitoring points of the kernel state and the user state, such as file system, process, registry and network access, the advantages are not under multi-state, deformation and shell influence, 2nd para, page 2,
sample analysis capability and detection analysis algorithm 3rd para, page 2.
realizing the malicious code detection of multi-engine fusion, to improve the malicious code comprehensive detection level, 3rd para, page 2

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the dynamic analysis. Dynamic analysis of a binary file is a method of analyzing a software program or system while it is running. This technique is also known as "dynamic code analysis" and "dynamic program analysis". In contrast to static analysis, which examines the binary without running it, dynamic analysis focuses on and implements observing and understanding the program's behavior in real-time execution, 2nd para, page 2.
Zhang, Delio, Feng does not specifically mention about, which is well-known in the art, which Grymel discloses a threshold parameter that determines dimensionality of the plurality of compressed representations; and determining whether to statically compress or dynamically compress, statically compressing or dynamically
[0058] The example static storage controlling circuitry 314 compresses the uncompressed tensor. The example static storage controlling circuitry 314 divides the uncompressed tensor into storage elements. As used herein, a storage element is a tensor with relatively smaller dimensions. As an example base case for ZXY data, the data along the Z axis for a particular XY coordinate pair is a single storage element. However, in some examples, the static storage controlling circuitry 314 determines a number of storage elements to divide a tensor into along an axis. For example, the static storage controlling circuitry 314 can determine to divide the tensor into two storage elements along the Z axis. However, the static storage controlling circuitry 314 can additionally or alternatively determine to divide the tensor into a greater or fewer number of storage elements along the X, Y, and/or Z axis (e.g., three storage elements along the Z axis, etc.).
[0064] The example dynamic storage controlling circuitry 316 compresses the static, compressed tensor to generate a dynamic, compressed tensor. As described above, the static, compressed tensor includes compressed storage elements (e.g., the storage elements do not include zeros). That is, the storage elements of the static, compressed tensor are located at predetermined memory locations. The example dynamic storage controlling circuitry 316 compresses the storage elements and stores the start locations of the storage elements in a pointer table. That is, the pointer table enables access to the storage locations. Because the storage elements are not stored at fixed locations in memory, the dynamic storage controlling circuitry 316 can store the storage elements closer together in memory and, thus, the memory footprint decreases with respect to the static, compressed tensor. In some examples, the dynamic storage controlling circuitry 316 stores the start addresses of the storage elements in the pointer table in ascending order of the storage element number. In some examples, the dynamic storage controlling circuitry 316 stores the dynamic, compressed tensor and/or the pointer table in the local memory 108. An example pointer table is described below in connection with FIG. 13. An example dynamic, compressed tensor is described below in connection with FIG. 14.

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the dynamic compression. The compressed representation such as the compressed tensor would require less storage utilizing the number of axis for the information, para 58, 64.
Zhang, Delio, Feng, Grymel, does not specifically mention about, generating a plurality of embeddings of the plurality of sequences of strings, based on determining to statically compress the plurality of embeddings of the plurality of sequences of strings from the comparison, statically compressing the plurality of embeddings of the plurality of sequences of strings to generate the plurality of compressed representations; and based on determining to dynamically compress the plurality of embeddings of the plurality of sequences of strings from the comparison, dynamically compressing the plurality of embeddings of the plurality of sequences of strings to generate the plurality of compressed representations, which Rajkumar discloses,
[0009] In some implementations, the representation of information learned by a robot can be an embedding generated using a machine learning model. When a set of input information, such as sensor data describing an object, is provided to the machine learning model, the processing of the machine learning model may encode the information in a form that is used directly as an embedding, or is further processed to generate the embedding. As a result, the embedding may be a compressed representation of an object or other observation, where the specific values of the embedding may depend on the structure and training state of the machine learning model used to generate the embedding.
[0115] the classification labels have different string characters, the context of “a cup” and “a coffee mug” have similar characteristics. If a contextual match does occur, the server system 112 may transmit a notification to the device 108 to notify the user 106 that two embeddings correspond to different or similar objects. Alternatively, the server system 112 can store and distribute new embeddings each time a new embedding is received from a robot, regardless of whether the new embedding is different from the stored embeddings in the database 114.
[0209] In some implementations, the training module 902 updates the machine learning model 506 by providing the received feature data 914 of “glasses” and the classification label 918 denoting “glasses” as input to the machine learning model 506. In response, the machine learning model 506 produces an embedding 920 that corresponds to the classification label 918 denoting “glasses.” The server system 112 stores the feature data 914, the produced embedding 920, and the classification label 918 denoting “glasses” in the database 114. Then, the training module 902 provides the next subsequent dataset, dataset 406C, to the machine learning model 506 to evaluate its output at a particular layer.
[0210] In some implementations, the evaluation module 904 evaluates the output at a particular layer of a machine learning model 506. After the training module 902 trains the machine learning model 506, the evaluation module 904 evaluates the newly machine learning model 506 with data stored in the database 114. The data stored in the database 114 includes reference data that is trusted and verified by previous machine learning model versions. The reference data includes a classification label corresponding to an embedding and feature data. For example, reference data can include a classification label for a “cup” object, an embedding of the “cup” object, and feature data captured by a robot 104 describing the “cup” object. The evaluation model 904 is required to evaluate the machine learning model 506 because the machine learning model 506 must be backwards compatible. For instance, the machine learning model 506 can identify a “cup” object, a “plate” object, and a “fork” object. The training module 902 may then train the machine learning model 506 to identify a “camera” object. After the training module 902 trains the machine learning model 506 to recognize the “camera” object, the newly machine learning model 506 needs to be evaluated to ensure it can still recognize the “cup” object, the “plate” object, the “fork” object, and now the “camera” object.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the available compression method. The embedding would be a compressed representation of an object or other observation, where the specific values of the embedding may depend on the structure and training state of the machine learning model used to generate the embedding, para 9.
Zhang, Delio, Feng, Grymel, Rajkumar does not specifically mention about, inputting with a second plurality of strings into a machine learning ensemble to obtain as output a verdict indicating whether the binary file is malicious or benign. However, these limitations are well-known and expected in the art. 
For example, DASGUPTA et al., WO 2022232470 A1 discloses it,
An artificial intelligence (Al) based advanced malware detection tool (AIMaD), which uses a combination of both static and dynamic malware analysis in a machine learning (ML) framework, abstract
using a hybrid approach involving both static analysis and dynamic analysis of the samples. This same framework may be used to analyze a particular piece of code whose status as benign or malware is unknown, 2nd para, page 3.

YOON et al., KR 20230111844 A
Disclosed is an AI deep learning-based malware detection system that uses both static and dynamic information of a target to perform machine learning on the association of features with strong malicious characteristics and their continuity, and to determine whether or not it is a malicious code, last para, page 2.

Pi et al., WO 2020013958 A1
[0028] The hybrid machine learning detection system or method 60, such as the one illustrated in Fig. 2 can be used to monitor the process variables in ICSs 10. The hybrid system 60 of Fig. 2 includes a combination of three machine learning models and monitors the process variables from three different aspects. A combination of models from different aspects of process variable monitoring reduces the high false alarm rate by validating the results from multiple outputs using the different models.

Kim et al., KR 20190064264 A
detecting malicious code based on machine learning through hybrid analysis, performing static analysis and dynamic analysis with respect to the plurality of objects to generate a dataset, abstract
determine whether or not the plurality of objects are malicious by using a hybrid method in which static analysis and dynamic analysis of a plurality of objects are mixed, 5th para, page 3
The data set generator 210 may generate a data set using a hybrid analysis method using static analysis information and dynamic analysis information, 3rd para, page 5.
The detection rate can be measured by applying parameters of 5, 10, 20, 30, 40, 50, and 100. The detection rate for static analysis is from 59% to 89% and the detection rate for dynamic analysis is at least 64% Up to 99%, and the detection rate for hybrid analysis can range from a minimum of 81% to a maximum of 99%, 3rd para, page 10
It can be seen that the detection rate of the parameter value may be different, but the malicious detection rate is higher when the hybrid analysis method is applied than when only the static analysis and the dynamic analysis are performed. 4th para, page 10

Powers et al., 11010472, 
(7) the computer may use machine learning models to detect malware based on data acquired through both static and dynamic analysis of executables, non-executable data, and network traffic, col., 4, lines 10-14
WISTUBA et al., 20220092464 
[0088] Thus, the learning curve ranker 540 may include the history of each previous learning curve stored in the historic learning curve database 550 and each corresponding machine pipeline configurations (e.g., a first input “input 1” or first machine learning pipeline configuration—that is, the “input” is the configuration or description of the machine learning pipelines). Also, the learning curve ranker 540 may include learning curves and configurations of machine learning pipelines generated from the automated machine learning system 520 (e.g., a second input “input 2” or second machine learning pipeline configuration/description). Additionally, the automated machine learning system 520 may include machine learning pipeline configurations and partial learning curves of pipelines under consideration (e.g., a third input “input 3” or third machine learning pipeline configuration/description), and configuration and learning curve of the best pipeline found so far (e.g., a fourth input “input 4” or fourth machine learning pipeline configuration/description).

KOCHMAN et al., 20240283820 
[0051] Then, at operation 510, the system configures an automated machine learning training module with a plurality of corresponding machine learning models implemented by the plurality of candidate machine learning pipelines to process the input dataset.
[0020] The system can subsequently utilize the input datasets to initialize a plurality of candidate machine learning pipelines which serve to implement and execute an associated featurization approach upon the input dataset. As such, an individual candidate machine learning pipeline can implement different engineered features and/or data transforms to process the input dataset. The candidate machine learning pipelines can utilize any suitable type of machine learning model and can be selected based on a task associated with the input dataset (e.g., regression, classification).

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to modify the invention disclosed by Zhang to implement these limitations and also one of ordinary skill in the art would have been motivated to do so because it could provide utilizing the machine learning ensemble. The machine learning ensemble would enable jointly provide a result of whether the binary file is malicious or benign. This would implement taking appropriate action when the file happens to be malicious and maintain overall security of the system.




Conclusion
Pertinent reference:
KOCHMAN et al., 20240283820 
[0051] Then, at operation 510, the system configures an automated machine learning training module with a plurality of corresponding machine learning models implemented by the plurality of candidate machine learning pipelines to process the input dataset.
[0020] The system can subsequently utilize the input datasets to initialize a plurality of candidate machine learning pipelines which serve to implement and execute an associated featurization approach upon the input dataset. As such, an individual candidate machine learning pipeline can implement different engineered features and/or data transforms to process the input dataset. The candidate machine learning pipelines can utilize any suitable type of machine learning model and can be selected based on a task associated with the input dataset (e.g., regression, classification).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HARESH PATEL whose telephone number is (571)272-3973. The examiner can normally be reached on M-F 9-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jorge L. Ortiz-Criado, can be reached at (571) 272-7624. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HARESH N PATEL/Primary Examiner, Art Unit 2496
Read full office action
Prosecution Timeline

Show 3 earlier events
Sep 09, 2025
Examiner Interview Summary
Sep 09, 2025
Applicant Interview (Telephonic)
Sep 25, 2025
Response Filed
Oct 10, 2025
Final Rejection mailed — §103
Dec 30, 2025
Request for Continued Examination
Jan 15, 2026
Response after Non-Final Action
Mar 31, 2026
Non-Final Rejection mailed — §103
May 26, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/016,676
Patent 12640928
DEVICE-INDEPENDENT AUTHENTICATION BASED ON A PASSPHRASE AND A POLICY
5y 8m to grant Granted May 26, 2026
18/440,862
Patent 12626010
SYSTEM AND METHOD FOR ELECTRONICALLY COMMUNICATING PROTECTED ACCESSIBLE USER DATA TO AN AUTHORIZED THIRD PARTY
2y 2m to grant Granted May 12, 2026
18/191,834
Patent 12619735
PERFORMING ACTION BASED ON MAPPING OF RUNTIME RESOURCE TO HIERARCHY OF ASSETS UTILIZED DURING DEVELOPMENT OF CODE
3y 1m to grant Granted May 05, 2026
18/842,708
Patent 12598058
MUTABLE DIGITAL ASSET STORAGE UNITS FOR VERIFYING OTHER STORAGE UNITS IN A DECENTRALISED PEER-TO-PEER STORAGE NETWORK
1y 7m to grant Granted Apr 07, 2026
17/583,313
Patent 12568384
BOOTSTRAPPING AND TROUBLESHOOTING OF REMOTE DEVICES
4y 1m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+22.0%)
3y 0m (~4m remaining)
Median Time to Grant
High
PTA Risk
Based on 824 resolved cases by this examiner. Grant probability derived from career allowance rate.