DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-21 and 23-26 have been examined.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/12/2025 has been entered.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/20/2026 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
Claims 1, 15, and 23-25 have been amended.
Applicant’s arguments with respect to claims 1, 15, and 23-25 regarding the new limitations: “plurality of subspaces organized in a hierarchical structure”, “wherein said identifying is based on one or more subspace security parameters that vary based on a hierarchical positioning of the first subspace within the hierarchical structure”, “wherein the identifying code corresponds to a depth within the hierarchical structure” have been considered but are moot in view of the new ground of rejection presented in the current office action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 13, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over prior art of record US 11151250 to Chang et al (hereinafter Chang) and US 20240232355 to Briliauskas (hereinafter Briliauskas).
As per claim 1, Chang teaches:
A computer-implemented method, comprising:
for a plurality of iterations, obtaining, by a computing system comprising one or more processor devices, entity detection information from one or more client computing devices, wherein the entity detection information comprises: (a) information that indicates whether an entity detected at the client computing device is malicious; and (b) information that associates the entity with a particular subspace of a plurality of subspaces of an embedding space (Chang: column 2, lines 56-67: File information stored in the local LSH database 171 may be referenced using the locality sensitive hash of the file. In one embodiment, local file information stored in the local LSH database 171 includes the locality sensitive hash of the file, exact cryptographic hash of the file (e.g., SHA-1), the label of the file (i.e., whether the file is known bad, known good, or unknown) (indication of whether the file is malicious and also information of a subspace), etc. Column 3, lines 20-27 and 42-51: In the example of FIG. 1, the infrastructure of the MDR service 162 includes the SOC server 163. The SOC server 163 may comprise a computer system with associated software for receiving file information from subscribed computer networks and other sources, and for storing the file information in a global locality sensitive hash (LSH) database 164. The global LSH database 164 may include information of files received from a plurality of different private computer networks. For example, the SOC server 163 may receive, from the network security device 170, an LSH 191 and other information of a file of the private computer network 160 (see arrow 151). Similarly, the SOC server 163 may receive, from the network security device 180, an LSH 192 and other information of a file of the private computer network 161 (see arrow 152). Column 8, lines 24-30: The global LSH database may be updated when the target file is detected to pose a cybersecurity threat (FIG. 5, step 313 to FIG. 6, step 314). For example, the network security device may so inform the SOC server of the MDR service);
aggregating, by the computing system, the entity detection information received over the plurality of iterations to obtain aggregated threat information (Chang: Column 3, lines 20-27 and 42-51: The SOC server 163 may comprise a computer system with associated software for receiving file information from subscribed computer networks and other sources, and for storing the file information in a global locality sensitive hash (LSH) database 164), wherein the aggregated threat information is descriptive of a number of malicious entities and a total number of entities detected for each subspace of the plurality of subspaces (Chang: column 5, lines 5-7 and 33-45: The clusters of locality sensitive hashes may serve as entries of a global LSH database. FIG. 4 shows a data structure 270 of an example cluster (subspace) in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”), the number of locality sensitive hashes that are members of the cluster (“N ITEMS”), etc. The header 271 may include other information, such as the name of the malware family to which the members of the cluster belong (“MW_NAME”), if applicable; confidence level of the label assigned to the cluster (e.g., 90% probability of malware or goodware), and other information, i.e., “N ITEMS” describes the number of malicious entities in each malware family cluster as well as the number of entities in each cluster (including malware and goodware));
based on the entity detection information, generating, by the computing system, subspace classification information (Chang: column 4, lines 57-67: In the example of FIG. 3, locality sensitive hashes of sample files are received (step 251). The sample files (“samples”) may be labeled as “known good”, “known bad,” or “unknown” for example. The locality sensitive hashes of the samples are grouped into a plurality of clusters (subspaces), with each cluster (subspace) comprising locality sensitive hashes that are similar to one another); and
identifying a first subspace of the plurality of subspaces as being a malicious subspace associated with malicious entities (Chang: Column 5, lines 33-45: FIG. 4 shows a data structure 270 of an example cluster (subspace) in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. The header 271 may include other information, such as the name of the malware family to which the members of the cluster belong (“MW_NAME”)).
Chang does not teach: an embedding space that is partitioned into the plurality of subspaces organized in a hierarchical structure to which embeddings generated for one or more entities detected at the one or more client computing devices are mapped; plurality of subspaces to which the embeddings are mapped; and a malicious subspace associated with malicious entities for which corresponding embeddings from among the embeddings are generated, wherein said identifying is based on one or more subspace security parameters that vary based on a hierarchical positioning of the first subspace within the hierarchical structure. However, Briliauskas teaches:
an embedding space that is partitioned into the plurality of subspaces organized in a hierarchical structure to which embeddings generated for one or more entities detected at the one or more client computing devices are mapped; plurality of subspaces to which the embeddings are mapped (Briliauskas: [0003]. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space. [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. The service provider computing system 102′ can scan the received files from the client devices 104a, 104b via models 108′ and 110′, e.g., as described in relation to those in FIG. 1A, and transmit results back to the anti-malware application 105′. [0082] VPT Tree Generation. To generate a VPT tree for malware and non-malware files, a vantage point (VP) may be first randomly selected. The model generator (e.g., model generation modules 106) may compute the distances between the vantage point and the other points by setting the radius of the vantage point to the median of the distances. The model generator may then classify the points into two groups: an inner group and an outer group. Then, the points in the inner group may then be assigned to the left subtree of the vantage point, and the points in the outer group may then be assigned to the right subtree); and
a malicious subspace associated with malicious entities for which corresponding embeddings from among the embeddings are generated (Briliauskas: [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes. [0084]: the VPT search can determine the fuzzy hash space of the query file and the node in the tree), wherein said identifying is based on one or more subspace security parameters that vary based on a hierarchical positioning of the first subspace within the hierarchical structure (Briliauskas: [0083] VPT Tree Search. During a scan, the generated vantage-point tree can be traversed from the root node. Typically, a tree is traversed by recursively exploring all children that intersect a hyperball of a pre-defined fuzzy hash space around the query point, e.g., using a triangle inequality and fuzzy hash stored in each node. Once a list of leaf nodes is found, each contained fuzzy hash may be verified as being within the target hash).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Briliauskas in the invention of Chang to include the above limitations. The motivation to do so would be to improve cyber security protection (Briliauskas: [0004]).
As per claim 13, Chang in view of Briliauskas teaches:
The computer-implemented method of claim 1, wherein the entity detected by the client computing device comprises: communication information; behavioral information; a user account; or a computing device (Chang: column 4, lines 18-24: The file 231 may be an email attachment).
As per claim 26, Chang in view of Briliauskas teaches:
The computer-implemented method of claim 1, wherein individual ones of the embeddings are generated by processing at least one of data descriptive of corresponding entities of the one or more entities, data associated with corresponding entities of the one or more entities, or corresponding entities of the one or more entities themselves (Briliauskas: [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space. [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′), and wherein the plurality of subspaces are identified by hashes, enabling the one or more client devices to locally determine individual ones of one or more of the plurality of subspaces to which corresponding embeddings from among the embeddings are mapped (Briliauskas: [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes).
Claims 2-12 are rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Briliauskas as applied to claim 1 above, and further in view of prior art of record US 20150135329 to Aghasaryan et al (hereinafter Aghasaryan).
As per claim 2, Chang in view of Briliauskas teaches:
The computer-implemented method of claim 1, determining, by the computing system, a plurality of identifying codes that each identify a respective subspace of the plurality of subspaces of the embedding space (Chang: Column 5, lines 33-45: FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”) (identifying code of a subspace)).
Chang in view of Briliauskas does not explicitly teach: wherein, prior to obtaining the entity detection information from the one or more client computing devices, the method comprises performing, by the computing system, a Locality Sensitive Hashing (LSH) process based on a plurality of randomized vectors, wherein performing the LSH process comprises: partitioning, by the computing system, the embedding space into the plurality of subspaces. However, Aghasaryan teaches:
wherein, prior to obtaining the entity detection information from the one or more client computing devices, the method comprises performing, by the computing system, a Locality Sensitive Hashing (LSH) process based on a plurality of randomized vectors, wherein performing the LSH process comprises: partitioning, by the computing system, the embedding space into the plurality of subspaces (Aghasaryan: [0033] The semantic representations thus obtained may be assigned a cluster identifier. The cluster identifiers may be assigned using a technique of locality sensitive hashing (LSH). The LSH technique involves converting each of the semantic representations into corresponding hash codes, i.e., the cluster identifiers, using the semantic representations and a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user device. [0050]: The cluster identifiers may be understood as interest group identity codes that may be used for efficiently identifying the clusters (partitioning the embedding space into plural subspaces). [0052]: Further, the cluster identifier module 112 provides the cluster identifiers to one or more remote nodes 104. Also, [0066])).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Aghasaryan in the invention of Chang in view of Briliauskas to include the above limitations. The motivation to do so would be to facilitate in providing privacy protection with reduced errors in clustering (Aghasaryan: [0051]).
As per claim 3, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 2, wherein performing the LSH process further comprises: providing, by the computing system, information indicative of the plurality of randomized vectors to the one or more client computing devices (Aghasaryan: [0050]: The cluster identifier module 112 may be configured to use a technique of LSH for assigning the cluster identifiers. The LSH technique may be used by the cluster identifier module 112 to convert each of the semantic representations into corresponding hash codes, i.e., the cluster identifiers. For the purpose, the cluster identifier module 112 utilizes a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user devices 102. [0051]: In another implementation, the cluster identifier module 112 may be configured to generate the hash functions based on seed generation functions obtained from, for example, the central entity. Also, [0066]).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 2 above.
As per claim 4, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 3, wherein, prior to performing the LSH process, the method comprises generating, by the computing system, the plurality of randomized vectors based on one or more Random Number Generation (RNG) seeds; and wherein providing the information indicative of the plurality of randomized vectors to the one or more client computing devices comprises providing, by the computing system, the one or more RNG seeds to the one or more client computing devices (Aghasaryan: [0050]: For the purpose, the cluster identifier module 112 utilizes a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user devices 102. [0051]: In another implementation, the cluster identifier module 112 may be configured to generate the hash functions based on seed generation functions obtained from, for example, the central entity).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 2 above.
As per claim 5, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 4, wherein the information that associates the entity with the particular subspace comprises an identifying code for the particular subspace generated as a dot product of the plurality of randomized vectors and an embedding of the entity (Aghasaryan: [0066]: In said method the cluster identifier module 112 obtains dot product between the semantic representations (embedding) and each of the random vectors and concatenates sign of the dot products to obtain the hash code as the cluster identifier for the semantic representation. The cluster identifiers thus obtained may be stored in the cluster identifier data 224. Also, [0048]).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 2 above.
As per claim 6, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 3, wherein the information that associates the entity with the particular subspace of the plurality of subspaces of the embedding space comprises information indicative of a particular identifying code of the plurality of identifying codes that is associated with the particular subspace (Chang: column 5, lines 33-45: FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”). Column 2, lines 56-67: File information stored in the local LSH database 171 may be referenced using the locality sensitive hash of the file. In one embodiment, local file information stored in the local LSH database 171 includes the locality sensitive hash of the file. Column 3, lines 20-37 and 42-51: The SOC server 163 may comprise a computer system with associated software for receiving file information from subscribed computer networks and other sources, and for storing the file information in a global locality sensitive hash (LSH) database 164. Global file information stored in the global LSH database 164 may be referenced using the locality sensitive hash of the corresponding file. In one embodiment, file information stored in the global LSH database 164 is the same as those stored in a local LSH database (i.e., locality sensitive hash of the file, exact cryptographic hash of the file, label of the file, the timestamp of the file, etc.). The global LSH database 164 may include information of files received from a plurality of different private computer networks. Aghasaryan: [0067] The data transfer module 216 subsequently transmits the cluster identifiers to the remote node 104 over the communication channels 108).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 2 above.
As per claim 7, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 6, wherein the method further comprises: broadcasting, by the computing system to the one or more computing devices, information that identifies a first identifying code of the plurality of identifying codes as being associated with malicious entities, wherein the first identifying code is associated with the first subspace associated with malicious entities for which the one or more of the embeddings are generated(Chang: column 5, lines 15-32: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash. Column 6, lines 20-52: In one embodiment, the network security device 170 is configured to search the global LSH database 164 for a locality sensitive hash that is similar to the LSH 221 relative to other locality sensitive hashes that are stored in the global LSH database 164. For example, the network security device 170 may be configured to find, in the global LSH database 164, a locality sensitive hash that is within a predetermined threshold distance to the LSH 221. The network security device 170 may act on the file 231 depending on global file information of the file 231. It is inherent that the global LSH database 164 transmitted information that the locality sensitive hash belongs to a known bad cluster (malicious) to the network security device in order for the network security device to take action), and wherein each of the one or more of the embeddings refers to a relatively lower-dimensional representation of a type of input (Briliauskas: [0062] In some embodiments, a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 2 above.
As per claim 8, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 7, wherein the method further comprises: for one or more additional iterations, obtaining, by the computing system, additional entity detection information from the one or more client computing devices, wherein the additional entity detection information comprises: (a) information that indicates whether an additional entity detected at the client computing device is malicious; and (b) information that associates the additional entity with an additional particular subspace of the plurality of subspaces of the embedding space; and aggregating, by the computing system, the additional entity detection information received over the one or more additional iterations to obtain additional aggregated threat information (Chang: Column 3, lines 20-27 and 42-51: The SOC server 163 may comprise a computer system with associated software for receiving file information from subscribed computer networks and other sources, and for storing the file information in a global locality sensitive hash (LSH) database 164. The global LSH database 164 may include information of files received from a plurality of different private computer networks. For example, the SOC server 163 may receive, from the network security device 170, an LSH 191 and other information of a file of the private computer network 160 (see arrow 151). Similarly, the SOC server 163 may receive, from the network security device 180, an LSH 192 and other information of a file of the private computer network 161 (see arrow 152). As another example, the SOC server 163 may receive, from an external feed (e.g., from a server 190), an LSH 193 and other information of a file of some other computer (see arrow 153)).
As per claim 9, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 8, wherein the method further comprises: based on the additional aggregated threat information, identifying, by the computing system, a second subspace of the plurality of subspaces as being a second malicious subspace associated with malicious entities; generating, by the computing system, additional subspace classification information to identify the second subspace as being the second malicious subspace associated with malicious entities (Chang: column 4, lines 57-67: The similarity between locality sensitive hashes may also be determined using a clustering algorithm. FIG. 3. shows a flow diagram of a method 250 of determining similarity between locality sensitive hashes in accordance with an embodiment of the present invention. In the example of FIG. 3, locality sensitive hashes of sample files are received (step 251). The locality sensitive hashes of the samples are grouped into a plurality of clusters, with each cluster comprising locality sensitive hashes that are similar to one another. Column 5, lines 33-49: FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”), the number of locality sensitive hashes that are members of the cluster (“N ITEMS”), etc. The header 271 may include other information, such as the name of the malware family to which the members of the cluster belong (“MW_NAME”). It is inherent that the name of the malware family of the cluster distinguishes between a plurality of malicious clusters); and providing, by the computing system to the one or more client computing devices, information that identifies a second identifying code of the plurality of identifying codes as being associated with malicious entities, wherein the second identifying code is associated with the second subspace (Chang: column 5, lines 15-32: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash. Column 6, lines 20-52: In one embodiment, the network security device 170 is configured to search the global LSH database 164 for a locality sensitive hash that is similar to the LSH 221 relative to other locality sensitive hashes that are stored in the global LSH database 164. For example, the network security device 170 may be configured to find, in the global LSH database 164, a locality sensitive hash that is within a predetermined threshold distance to the LSH 221. The network security device 170 may act on the file 231 depending on global file information of the file 231. It is inherent that the global LSH database 164 transmitted information that the locality sensitive hash belongs to a known bad cluster (malicious) to the network security device in order for the network security device to take action).
As per claim 10, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 8, wherein the method further comprises: based on the additional aggregated threat information, modifying, by the computing system, the subspace classification information to identify the first subspace of the plurality of subspaces as being a non-malicious subspace associated with non-malicious entities (Chang: column 5, lines 19-32: For example, in the case where the clusters are in an LSH database and the target locality sensitive hash is most similar to a center of a cluster that is labeled as “good”, the target locality sensitive hash may be deemed to be similar to a known good locality sensitive hash. A cluster may be labeled “good” or “bad” depending on the labels of the members of the cluster); and broadcasting, by the computing system to the one or more client computing devices, information that identifies the first identifying code of the plurality of identifying codes as being associated with non-malicious entities, wherein the first identifying code is associated with the first subspace (Chang: Column 6, lines 20-52: In one embodiment, the network security device 170 is configured to search the global LSH database 164 for a locality sensitive hash that is similar to the LSH 221 relative to other locality sensitive hashes that are stored in the global LSH database 164. For example, the network security device 170 may be configured to find, in the global LSH database 164, a locality sensitive hash that is within a predetermined threshold distance to the LSH 221. The network security device 170 may act on the file 231 depending on global file information of the file 231. The network security device 170 may allow the file 231 to pass when the LSH 221 is similar to a known good locality sensitive hash in the global LSH database 164. It is inherent that the global LSH database 164 transmitted information that the locality sensitive hash belongs to a known good cluster (non-malicious) to the network security device in order for the network security device to allow the file to pass).
As per claim 11, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 6, wherein the method further comprises: receiving, by the computing system from a client computing device of the one or more client computing devices, a request to identify whether an identifying code for an entity detected at the client computing device is associated with a malicious subspace (Chang: column 4, lines 18-33: In the example of FIG. 2, a file 231 is received by the network security device 170 in the private computer network 160 (see arrow 201). The network security device 170 is configured to generate an LSH 221 (see arrow 202) of the file 231 using a locality sensitive hashing function. Column 6, lines 20-35: Continuing the example of FIG. 2, the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203)); determining, by the computing system, that the identifying code is associated with the malicious subspace (Chang: column 5, lines 15-32: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash. Column 6, lines 18-33: For example, the network security device 170 may be configured to find, in the global LSH database 164, a locality sensitive hash that is within a predetermined threshold distance to the LSH 221); and providing, by the computing system to the client computing device, information indicating that the identifying code is associated with the malicious subspace (Chang: Column 6, lines 20-52: For example, the network security device 170 may be configured to find, in the global LSH database 164, a locality sensitive hash that is within a predetermined threshold distance to the LSH 221. The network security device 170 may act on the file 231 depending on global file information of the file 231. The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164. It is inherent that the global LSH database 164 transmitted information to the network security device 170 that the locality sensitive hash belongs to a known bad cluster (malicious) in order for the network security device to block the file).
As per claim 12, Chang in view of Briliauskas and Aghasaryan teaches:
The computer-implemented method of claim 11, wherein, prior to receiving the request to identify whether the identifying code for the entity detected at the client computing device is associated with the malicious subspace, the method comprises: establishing, by the computing system, a homomorphic encryption protocol for communication between the computing system and the client computing device (Aghasaryan: [0024]: the interest profiles of the users are encrypted using crypto techniques, such as homomorphic encryption in order to provide privacy to the users. Such crypto techniques enable execution of protocol primitive operations, such as addition and multiplication for clustering the interest profiles using a distributed computation setting).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 2 above.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Briliauskas as applied to claim 1 above, and further in view of prior art of record US 20160224803 to Frank et al (hereinafter Frank).
As per claim 14, Chang in view of Briliauskas does not teach the limitations of claim 14. However, Frank teaches:
wherein, prior to aggregating the entity detection information received over the plurality of iterations, the method comprises: mixing, by the computing system, at least some of the entity detection information received over the plurality of iterations to obtain mixed entity detection information; and adding, by the computing system, noise to the mixed entity detection information (Frank: [1189]: All these data sources may be combined in analysis, along with the measurements of affective response. [1190]: the ratings are combined with additional data (e.g., from other sources). [1199]: Typically, noise generated from a Laplace distribution is added to data in such a way that minimizes the information gained on a user from the release of a record of the user (e.g., measurement data), but still enables a statistical query, such as computing a score based on multiple measurements).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Frank in the invention of Chang in view of Briliauskas to include the above limitations. The motivation to do so would be to protect privacy (Frank: [1199]).
Claims 15-21 and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Chang, Briliauskas, and Aghasaryan.
As per claim 15, Chang teaches:
A client computing device, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store a first set of instructions that, when executed by the one or more processors, cause the client computing device to perform operations, the operations comprising:
for one or more iterations: detecting an entity; determining whether the entity is malicious (Chang: column 2, lines 35-55: The private computer network 160 may further include a network security device 170, for evaluating files for malware or other cybersecurity threats. Files to be evaluated may be those attached to emails, stored in file servers, transmitted over computer networks, etc. The network security device 170 may consult a local LSH database 171 to obtain local file information. The local LSH database 171 may store information of files that are local to the private computer network 160 (i.e., received and/or stored in the private computer network 160). Local file information stored in the local LSH database 171 includes the locality sensitive hash of the file, exact cryptographic hash of the file (e.g., SHA-1), the label of the file (i.e., whether the file is known bad, known good, or unknown));
and
wherein the identifying code is one of a plurality of identifying codes respectively associated with the plurality of subspaces (Chang: Column 5, lines 33-45: FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”) (identifying code of a subspace)); and
providing, to the computing system, entity detection information, wherein the entity detection information comprises:(a) information that indicates whether the entity is malicious (Chang: column 3, lines 20-27: The SOC server 163 may comprise a computer system with associated software for receiving file information from subscribed computer networks and other sources, and for storing the file information in a global locality sensitive hash (LSH) database 164. column 2, lines 35-55: Local file information stored in the local LSH database 171 includes the locality sensitive hash of the file, exact cryptographic hash of the file (e.g., SHA-1), the label of the file (i.e., whether the file is known bad, known good, or unknown). Column 8, lines 24-30: The global LSH database may be updated when the target file is detected to pose a cybersecurity threat (FIG. 5, step 313 to FIG. 6, step 314). For example, the network security device may so inform the SOC server of the MDR service.).
Chang does not teach: using a plurality of randomized vectors to determine, for the entity, an identifying code that identifies a particular subspace for the entity, wherein the plurality of randomized vectors are for performing a Locality Sensitive Hashing (LSH) process that partitions an embedding space of a computing system into a plurality of subspaces with a hierarchical structure to which embeddings generated for one or more entities detected at one or more client computing devices are mapped, the plurality of subspaces comprising the hierarchical structure that is unknown to the client computing device; wherein the identifying code corresponds to a depth within the hierarchical structure. However, Briliauskas teaches:
partition an embedding space of a computing system into a plurality of subspaces with a hierarchical structure to which embeddings generated for one or more entities detected at one or more client computing devices are mapped, the plurality of subspaces comprising the hierarchical structure that is unknown to the client computing device (Briliauskas: [0003]. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space. [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. The service provider computing system 102′ can scan the received files from the client devices 104a, 104b via models 108′ and 110′, e.g., as described in relation to those in FIG. 1A, and transmit results back to the anti-malware application 105′. [0082] VPT Tree Generation. To generate a VPT tree for malware and non-malware files, a vantage point (VP) may be first randomly selected. The model generator (e.g., model generation modules 106) may compute the distances between the vantage point and the other points by setting the radius of the vantage point to the median of the distances. The model generator may then classify the points into two groups: an inner group and an outer group. Then, the points in the inner group may then be assigned to the left subtree of the vantage point, and the points in the outer group may then be assigned to the right subtree. Since the models are operated on the service provider computing system the VPT tree structure is unknown to the client devices);
providing, to the computing system, … (b) information indicative of the identifying code for the entity, wherein the identifying code corresponds to a depth within the hierarchical structure (Briliauskas: [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes. [0083] VPT Tree Search. During a scan, the generated vantage-point tree can be traversed from the root node. Typically, a tree is traversed by recursively exploring all children that intersect a hyperball of a pre-defined fuzzy hash space around the query point, e.g., using a triangle inequality and fuzzy hash stored in each node. Once a list of leaf nodes is found, each contained fuzzy hash may be verified as being within the target hash).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Briliauskas in the invention of Chang to include the above limitations. The motivation to do so would be to improve cyber security protection (Briliauskas: [0004]).
Chang in view of Briliauskas does not teach: using a plurality of randomized vectors to determine, for the entity, an identifying code that identifies a particular subspace for the entity, wherein the plurality of randomized vectors are for performing a Locality Sensitive Hashing (LSH) process that partitions an embedding space of a computing system into a plurality of subspaces. However, Aghasaryan teaches:
using a plurality of randomized vectors to determine, for the entity, an identifying code that identifies a particular subspace for the entity, wherein the plurality of randomized vectors are for performing a Locality Sensitive Hashing (LSH) process that partitions an embedding space of a computing system into a plurality of subspaces (Aghasaryan: [0033] The semantic representations thus obtained may be assigned a cluster identifier. The cluster identifiers may be assigned using a technique of locality sensitive hashing (LSH). The LSH technique involves converting each of the semantic representations into corresponding hash codes, i.e., the cluster identifiers, using the semantic representations and a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user device. [0034] The cluster identifiers thus obtained may be used for clustering the user device into one or more clusters, i.e., interest groups corresponding to the cluster identifiers. Further, the cluster identifiers are provided to one or more remote nodes, for example, a central aggregator. [0050]: The cluster identifiers may be understood as interest group identity codes that may be used for efficiently identifying the clusters (partitioning the embedding space into plural subspaces). [0052]: Further, the cluster identifier module 112 provides the cluster identifiers to one or more remote nodes 104. Also, [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Aghasaryan in the invention of Chang in view of Briliauskas to include the above limitations. The motivation to do so would be to ensure cluster identification is done uniformly by all the user devices (Aghasaryan: [0051]).
As per claim 16, Chang in view of Briliauskas and Aghasaryan teaches:
The client computing device of claim 15, wherein using the plurality of randomized vectors to determine the identifying code for the entity comprises: receiving, from the computing system, Random Number Generation (RNG) seeds utilized to generate the plurality of randomized vectors at the computing system; and generating the plurality of randomized vectors based on the RNG seeds (Aghasaryan: [0050]: For the purpose, the cluster identifier module 112 utilizes a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user devices 102. [0051]: In another implementation, the cluster identifier module 112 may be configured to generate the hash functions based on seed generation functions obtained from, for example, the central entity).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 15 above.
As per claim 17, Chang in view of Briliauskas and Aghasaryan teaches:
The client computing device of claim 16, wherein using the plurality of randomized vectors to determine the identifying code for the entity further comprises: processing information associated with the entity with a machine-learned embedding model to obtain an entity embedding (Briliauskas: [0061] Models 108 and 110 can be provided to the client devices 104a, 104b as an anti-malware application 105 (shown as “Malware classification machine-learning model” 105′). [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space.); and determining the identifying code for the entity based on the plurality of randomized vectors and the embedding (Aghasaryan: [0066]: In said method the cluster identifier module 112 obtains dot product between the semantic representations (embedding) and each of the random vectors and concatenates sign of the dot products to obtain the hash code as the cluster identifier for the semantic representation. The cluster identifiers thus obtained may be stored in the cluster identifier data 224. Also, [0048]).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 15 above.
As per claim 18, Chang in view of Briliauskas and Aghasaryan teaches:
The client computing device of claim 17, wherein the operations further comprise: detecting an additional entity (Chang: column 4, lines 18-20: In the example of FIG. 2, a file 231 is received by the network security device 170 in the private computer network 160 (see arrow 201)); determining that information associated with the additional entity is insufficient for determining whether the entity is malicious (Chang: column 4, lines 18-28: The file 231 may have been sent by an unknown server 230 over the Internet. The file 231 may be an email attachment or a file being downloaded by the network device 173. In one embodiment, the network security device 170 is configured to generate an LSH 221 (see arrow 202) of the file 231 using a locality sensitive hashing function. Column 6, lines 20-52: the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203). The network security device 170 may act on the file 231 depending on global file information of the file 231. The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164. It is inherent that information associated with file 231 is insufficient to determine whether the file is malicious because the network security device is querying the global LSH database to determine whether the file is malicious); using the plurality of randomized vectors to determine an additional identifying code for the additional entity (Aghasaryan: [0066]: In said method the cluster identifier module 112 obtains dot product between the semantic representations (embedding) and each of the random vectors and concatenates sign of the dot products to obtain the hash code as the cluster identifier (identifying code) for the semantic representation); providing, to the computing system, a request to identify whether the additional identifying code for the additional entity is associated with a malicious subspace (Chang: column 6, lines 20-35: the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203). Aghasaryan: [0067] The data transfer module 216 subsequently transmits the cluster identifiers (identifying codes) to the remote node 104 over the communication channels 108); and responsive to providing the request, receiving, from the computing system, information indicating that the additional entity is associated with the malicious subspace (Chang: column 6, lines 20-35: the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203). In one embodiment, the network security device 170 is configured to search the global LSH database 164 for a locality sensitive hash that is similar to the LSH 221 relative to other locality sensitive hashes that are stored in the global LSH database 164. Column 5, lines 15-27: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). For example, in the case where the clusters are in an LSH database and the target locality sensitive hash is most similar to a center of a cluster that is labeled as “good”, the target locality sensitive hash may be deemed to be similar to a known good locality sensitive hash. Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 15 above.
As per claim 19, Chang in view of Briliauskas and Aghasaryan teaches:
The client computing device of claim 18, wherein the operations further comprise: performing a corrective action based on the information indicating that the additional entity is associated with the malicious subspace (Chang: Column 6, lines 20-52: The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164).
As per claim 20, Chang in view of Briliauskas and Aghasaryan teaches:
The client computing device of claim 19, wherein performing the corrective action comprises: assigning the additional entity to a location associated with malicious entities within a file system of the client computing device; generating, within an interface of an application executed by the client computing device, an alert element indicating that the entity is malicious; blocking transmission of data from the entity; providing reporting information indicating that the entity is malicious to a computing device other than the computing system; or deleting data received from the entity (Chang: Column 6, lines 20-52: The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164).
As per claim 21, Chang in view of Briliauskas and Aghasaryan teaches:
The client computing device of claim 20, wherein determining whether the entity is malicious comprises processing information associated with the entity with a machine-learned threat assessment model (Briliauskas: [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. The service provider computing system 102′ can scan the received files from the client devices 104a, 104b via models 108′ and 110′, e.g., as described in relation to those in FIG. 1A, and transmit results back to the anti-malware application 105′. [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes), and wherein the operations further comprise: training the machine-learned threat assessment model based on the information indicating that the additional entity is associated with the malicious subspace (Briliauskas: [0058]: training malware classification machine learning models. [0062]: In some embodiments, a malware classification machine learning model may be trained using labeled training data. Using newly identified malicious files to train a malware classification machine learning model was well known to one of ordinary skill in the art before the effective filing date of the claimed invention).
The examiner provides the same rationale to combine prior arts Chang in view of Briliauskas and Aghasaryan as in claim 15 above.
As per claim 23, Chang teaches:
One or more non-transitory computer-readable media that collectively store a first set of instructions that, when executed by one or more processors of a client computing device, cause the client computing device to perform operations, the operations comprising:
wherein the identifying code is one of a plurality of identifying codes that each identify a respective subspace of the plurality of subspaces (Chang: Column 5, lines 33-45: FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”) (identifying code of a subspace));
responsive to providing the identifying code, receiving, from the computing system, information associated with entities of the subspace identified by the identifying code (Chang: column 4, lines 18-28: The file 231 may have been sent by an unknown server 230 over the Internet. The file 231 may be an email attachment or a file being downloaded by the network device 173. In one embodiment, the network security device 170 is configured to generate an LSH 221 (see arrow 202) of the file 231 using a locality sensitive hashing function. Column 6, lines 20-52: the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203). In one embodiment, the network security device 170 is configured to search the global LSH database 164 for a locality sensitive hash that is similar to the LSH 221 relative to other locality sensitive hashes that are stored in the global LSH database 164. The network security device 170 may act on the file 231 depending on global file information of the file 231. Column 5, lines 15-27: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). For example, in the case where the clusters are in an LSH database and the target locality sensitive hash is most similar to a center of a cluster that is labeled as “good”, the target locality sensitive hash may be deemed to be similar to a known good locality sensitive hash. Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash. It is inherent that the network security device receives information regarding the LSH 221 from the global LSH database); and
performing an action based on the information associated with the entities of the subspace identified by the identifying code (Chang: Column 6, lines 20-52: The network security device 170 may act on the file 231 depending on global file information of the file 231. The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164).
Chang does not teach: using a plurality of randomized vectors to determine an identifying code that identifies a particular subspace, wherein the plurality of randomized vectors are for performing a Locality Sensitive Hashing (LSH) process that partitions an embedding space of a computing system into a plurality of subspaces organized in a hierarchical structure to which embeddings generated for one or more entities detected at one or more client computing devices are mapped; providing, to the computing system, the identifying code to the computing system; and wherein the identifying code corresponds to a depth within the hierarchical structure. However,
partitions an embedding space of a computing system into a plurality of subspaces organized in a hierarchical structure to which embeddings generated for one or more entities detected at one or more client computing devices are mapped; providing, to the computing system, the identifying code to the computing system (Briliauskas: [0003]. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space. [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. The service provider computing system 102′ can scan the received files from the client devices 104a, 104b via models 108′ and 110′, e.g., as described in relation to those in FIG. 1A, and transmit results back to the anti-malware application 105′. [0082] VPT Tree Generation. To generate a VPT tree for malware and non-malware files, a vantage point (VP) may be first randomly selected. The model generator (e.g., model generation modules 106) may compute the distances between the vantage point and the other points by setting the radius of the vantage point to the median of the distances. The model generator may then classify the points into two groups: an inner group and an outer group. Then, the points in the inner group may then be assigned to the left subtree of the vantage point, and the points in the outer group may then be assigned to the right subtree); and wherein the identifying code corresponds to a depth within the hierarchical structure (Briliauskas: [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes. [0083] VPT Tree Search. During a scan, the generated vantage-point tree can be traversed from the root node. Typically, a tree is traversed by recursively exploring all children that intersect a hyperball of a pre-defined fuzzy hash space around the query point, e.g., using a triangle inequality and fuzzy hash stored in each node. Once a list of leaf nodes is found, each contained fuzzy hash may be verified as being within the target hash).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Briliauskas in the invention of Chang to include the above limitations. The motivation to do so would be to improve cyber security protection (Briliauskas: [0004]).
Chang in view of Briliauskas does not teach: using a plurality of randomized vectors to determine an identifying code that identifies a particular subspace, wherein the plurality of randomized vectors are for performing a Locality Sensitive Hashing (LSH) process that partitions an embedding space of a computing system into a plurality of subspaces. However, Aghasaryan teaches:
using a plurality of randomized vectors to determine an identifying code that identifies a particular subspace, wherein the plurality of randomized vectors are for performing a Locality Sensitive Hashing (LSH) process that partitions an embedding space of a computing system into a plurality of subspaces (Aghasaryan: [0033] The semantic representations thus obtained may be assigned a cluster identifier. The cluster identifiers may be assigned using a technique of locality sensitive hashing (LSH). The LSH technique involves converting each of the semantic representations into corresponding hash codes, i.e., the cluster identifiers, using the semantic representations and a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user device. [0034] The cluster identifiers thus obtained may be used for clustering the user device into one or more clusters, i.e., interest groups corresponding to the cluster identifiers. Further, the cluster identifiers are provided to one or more remote nodes, for example, a central aggregator. [0050]: The cluster identifiers may be understood as interest group identity codes that may be used for efficiently identifying the clusters (partitioning the embedding space into plural subspaces). [0052]: Further, the cluster identifier module 112 provides the cluster identifiers to one or more remote nodes 104. Also, [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Aghasaryan in the invention of Chang in view of Briliauskas to include the above limitations. The motivation to do so would be to facilitate in providing privacy protection with reduced errors in clustering (Aghasaryan: [0051]).
As per claim 24, Chang teaches:
A cybersecurity computing system, comprising: one or more processors; and a memory, comprising: an embedding space, wherein the embedding space is partitioned into a plurality of subspaces(Chang: column 4, lines 57-67 and column 5, lines 1-7: The similarity between locality sensitive hashes may also be determined using a clustering algorithm. The locality sensitive hashes may be grouped using a suitable clustering algorithm. A cluster may be labeled “good” or “bad” depending on the labels of the members of the cluster (embedding space with plurality of subspaces); and one or more non-transitory computer-readable media that collectively store a first set of instructions that, when executed by the one or more processors, cause the cybersecurity computing system to perform operations, the operations comprising: receiving an entity identification request from a client computing device that comprises (Chang: column 4, lines 18-30: In the example of FIG. 2, a file 231 is received by the network security device 170 in the private computer network 160 (see arrow 201). The network security device 170 is configured to generate an LSH 221 (see arrow 202) of the file 231 using a locality sensitive hashing function. Column 6, lines Continuing the example of FIG. 2, the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203)), wherein: (b) the identifying code is one of a plurality of identifying codes respectively associated with a plurality of subspaces of an embedding space (Chang: Column 5, lines 33-45: FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”) (identifying code of a subspace));
(c) determining that the subspace associated with the identifying code is a malicious subspace associated with malicious entities (Chang: column 5, lines 15-32: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). For example, in the case where the clusters are in an LSH database and the target locality sensitive hash is most similar to a center of a cluster that is labeled as “good”, the target locality sensitive hash may be deemed to be similar to a known good locality sensitive hash. Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash); and
providing information to the client computing device indicating that the entity is a malicious entity (Chang: column 6, lines 35-52: The network security device 170 may act on the file 231 depending on global file information of the file 231. The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164., i.e., information indicating that file 231 is malicious is received from the global LSH database).
Chang does not teach: wherein the embedding space is partitioned into a plurality of subspaces organized in a hierarchical structure based on a plurality of randomized vectors; an entity identification request that comprises an identifying code; (a) the identifying code identifies a subspace and is determined, for an entity, based on the plurality of randomized vectors detected locally at the client computing device; embedding space that is partitioned into the plurality of subspaces to which embeddings generated for one or more entities detected at the one or more client computing devices are mapped; and (c) the identifying code corresponds to a depth within the hierarchical structure. However, Briliauskas teaches:
wherein the embedding space is partitioned into a plurality of subspaces organized in a hierarchical structure; an entity identification request that comprises an identifying code; (a) the identifying code identifies a subspace; embedding space that is partitioned into the plurality of subspaces to which embeddings generated for one or more entities detected at the one or more client computing devices are mapped (Briliauskas: [0003]. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space. [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. The service provider computing system 102′ can scan the received files from the client devices 104a, 104b via models 108′ and 110′, e.g., as described in relation to those in FIG. 1A, and transmit results back to the anti-malware application 105′. [0082] VPT Tree Generation. To generate a VPT tree for malware and non-malware files, a vantage point (VP) may be first randomly selected. The model generator (e.g., model generation modules 106) may compute the distances between the vantage point and the other points by setting the radius of the vantage point to the median of the distances. The model generator may then classify the points into two groups: an inner group and an outer group. Then, the points in the inner group may then be assigned to the left subtree of the vantage point, and the points in the outer group may then be assigned to the right subtree. [0083] VPT Tree Search. During a scan, the generated vantage-point tree can be traversed from the root node. Typically, a tree is traversed by recursively exploring all children that intersect a hyperball of a pre-defined fuzzy hash space around the query point, e.g., using a triangle inequality and fuzzy hash stored in each node. Once a list of leaf nodes is found, each contained fuzzy hash may be verified as being within the target hash) and (c) the identifying code corresponds to a depth within the hierarchical structure (Briliauskas: [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes. [0083] VPT Tree Search. During a scan, the generated vantage-point tree can be traversed from the root node. Typically, a tree is traversed by recursively exploring all children that intersect a hyperball of a pre-defined fuzzy hash space around the query point, e.g., using a triangle inequality and fuzzy hash stored in each node. Once a list of leaf nodes is found, each contained fuzzy hash may be verified as being within the target hash).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Briliauskas in the invention of Chang to include the above limitations. The motivation to do so would be to improve cyber security protection (Briliauskas: [0004]).
Chang in view of Briliauskas does not teach: wherein the embedding space is partitioned into a plurality of subspaces based on a plurality of randomized vectors; (a) the identifying code … is determined, for an entity, based on the plurality of randomized vectors for an entity detected locally at the client computing device. However, Aghasaryan teaches:
wherein the embedding space is partitioned into a plurality of subspaces based on a plurality of randomized vectors; (a) the identifying code … is determined, for an entity, based on the plurality of randomized vectors for an entity detected locally at the client computing device (Aghasaryan: [0033] The semantic representations thus obtained may be assigned a cluster identifier. The cluster identifiers may be assigned using a technique of locality sensitive hashing (LSH). The LSH technique involves converting each of the semantic representations into corresponding hash codes, i.e., the cluster identifiers, using the semantic representations and a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user device. [0034] The cluster identifiers thus obtained may be used for clustering the user device into one or more clusters, i.e., interest groups corresponding to the cluster identifiers. Further, the cluster identifiers are provided to one or more remote nodes, for example, a central aggregator. [0050]: The cluster identifiers may be understood as interest group identity codes that may be used for efficiently identifying the clusters (partitioning the embedding space into plural subspaces). [0052]: Further, the cluster identifier module 112 provides the cluster identifiers to one or more remote nodes 104. Also, [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Aghasaryan in the invention of Chang in view of Briliauskas to include the above limitations. The motivation to do so would be to facilitate in providing privacy protection with reduced errors in clustering (Aghasaryan: [0051]).
As per claim 25, Chang teaches:
A client computing device, comprising: one or more processors; and a memory, comprising:
one or more non-transitory computer-readable media that collectively store a first set of instructions that, when executed by the one or more processors, cause the client computing device to perform operations, the operations comprising: detecting an entity; processing information associated with the entity with the machine- learned embedding model to generate an entity embedding (Chang: column 4, lines 18-34: In the example of FIG. 2, a file 231 is received by the network security device 170 in the private computer network 160 (see arrow 201). The network security device 170 is configured to generate an LSH 221 (see arrow 202) of the file 231 using a locality sensitive hashing function);
wherein the identifying code is one of a plurality of identifying codes respectively associated with a plurality of subspaces of an embedding space implemented by a computing system (Chang: Column 5, lines 28-45: A cluster may be labeled “good” or “bad” depending on the labels of the members of the cluster. FIG. 4 shows a data structure 270 of an example cluster in accordance with an embodiment of the present invention. In one embodiment, a cluster includes a header 271 that includes metadata and other information about the cluster. In the example of FIG. 4, the header 271 indicates an identifier of the cluster (“GROUP”) (identifying code of a subspace)),
providing, to the computing system, an entity identification request comprising (Chang: Column 6, lines 20-35: the network security device 170 uses the LSH 221 to query the global LSH database 164 for global file information of the file 231 (see arrow 203). In one embodiment, the network security device 170 is configured to search the global LSH database 164 for a locality sensitive hash that is similar to the LSH 221 relative to other locality sensitive hashes that are stored in the global LSH database 164); and
responsive to providing the entity identification request, receiving, from the computing system, information indicating that the entity is malicious (Chang: Column 5, lines 15-27: When a target locality sensitive hash is received for similarity determination (step 254), the target locality sensitive hash may be compared to the centers of the clusters to find a cluster with members that are most similar to the target locality sensitive hash (step 255). Similarly, in the case where the target locality sensitive hash is most similar to a center of a cluster that is labeled as “bad”, the target locality sensitive hash may be deemed to be similar to a known bad locality sensitive hash. Column 6, lines 20-52: The network security device 170 may act on the file 231 depending on global file information of the file 231. The network security device 170 may block the file 231 when the LSH 221 is similar to a known bad locality sensitive hash in the global LSH database 164).
Chang does not teach: a machine-learned embedding model, wherein the machine-learned embedding model is trained to process information associated with an entity to generate an embedding for the entity; based on a plurality of randomized vectors and the entity embedding, determining, for the entity, an identifying code that identifies a particular subspace; wherein the plurality of randomized vectors are the same vectors used to partition the embedding space into the plurality of subspaces organized in a hierarchical structure to which embeddings generated for one or more entities detected at one or more client computing devices are mapped; providing, to the computing system, an entity identification request comprising the identifying code; and wherein the identifying code corresponds to a depth within the hierarchical structure. However, Briliauskas teaches:
a machine-learned embedding model, wherein the machine-learned embedding model is trained to process information associated with an entity to generate an embedding for the entity (Briliauskas: [0061] Models 108 and 110 can be provided to the client devices 104a, 104b as an anti-malware application 105 (shown as “Malware classification machine-learning model” 105′). [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space.);
partition the embedding space into the plurality of subspaces organized in a hierarchical structure to which embeddings generated for one or more entities detected at one or more client computing devices are mapped (Briliauskas: [0003]. [0062]: a malware classification machine learning model may be configured to determine one or more similarity measures with respect to an encoded representation (e.g., embedding) of an input data object (e.g., a file or document) and one or more stored data objects in a multi-dimensional embedding space. [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′. The service provider computing system 102′ can scan the received files from the client devices 104a, 104b via models 108′ and 110′, e.g., as described in relation to those in FIG. 1A, and transmit results back to the anti-malware application 105′. [0082] VPT Tree Generation. To generate a VPT tree for malware and non-malware files, a vantage point (VP) may be first randomly selected. The model generator (e.g., model generation modules 106) may compute the distances between the vantage point and the other points by setting the radius of the vantage point to the median of the distances. The model generator may then classify the points into two groups: an inner group and an outer group. Then, the points in the inner group may then be assigned to the left subtree of the vantage point, and the points in the outer group may then be assigned to the right subtree);
providing, to the computing system, an entity identification request comprising the identifying code (Briliauskas: [0071]: As shown in FIG. 1B, the anti-malware application 105′ of the client device 104 (shown as “Client Device #1” 104a or “Client device #n” 104b) can respectively generate the fuzzy hashes for the local target code 119 and transmit the fuzzy hashes, or one or more files derived therefrom, through a network 132 to the service provider computing system 102′); and
wherein the identifying code corresponds to a depth within the hierarchical structure (Briliauskas: [0064]: the model 108 (shown as 108′) of the locality-sensitive hashing operation with the vantage-point tree data structure (also referred to as a VPT hash classification model 108) can be employed to predict or provide a likelihood or confidence value or score (i) whether a target code 119 is malicious, or non-malicious, based on the fuzzy hash space to the nearest neighbor to known malicious files or code and (ii) whether the target code 119 is non-malicious, or malicious, based on the distance to nearest neighbor known clean files or codes. [0083] VPT Tree Search. During a scan, the generated vantage-point tree can be traversed from the root node. Typically, a tree is traversed by recursively exploring all children that intersect a hyperball of a pre-defined fuzzy hash space around the query point, e.g., using a triangle inequality and fuzzy hash stored in each node. Once a list of leaf nodes is found, each contained fuzzy hash may be verified as being within the target hash).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Briliauskas in the invention of Chang to include the above limitations. The motivation to do so would be to improve cyber security protection (Briliauskas: [0004]).
Chang in view of Briliauskas does not teach: based on a plurality of randomized vectors and the entity embedding, determining, for the entity, an identifying code that identifies a particular subspace; wherein the plurality of randomized vectors are the same vectors used to partition the embedding space into the plurality of subspaces. However, Aghasaryan teaches:
based on a plurality of randomized vectors and the entity embedding, determining, for the entity, an identifying code that identifies a particular subspace; wherein the plurality of randomized vectors are the same vectors used to partition the embedding space into the plurality of subspaces (Aghasaryan: [0033] The semantic representations thus obtained may be assigned a cluster identifier. The cluster identifiers may be assigned using a technique of locality sensitive hashing (LSH). The LSH technique involves converting each of the semantic representations into corresponding hash codes, i.e., the cluster identifiers, using the semantic representations and a set of hash functions defined by random values, such as a common sequence of random vectors generated at each of the user device. [0034] The cluster identifiers thus obtained may be used for clustering the user device into one or more clusters, i.e., interest groups corresponding to the cluster identifiers. Further, the cluster identifiers are provided to one or more remote nodes, for example, a central aggregator. [0050]: The cluster identifiers may be understood as interest group identity codes that may be used for efficiently identifying the clusters (partitioning the embedding space into plural subspaces). [0052]: Further, the cluster identifier module 112 provides the cluster identifiers to one or more remote nodes 104. Also, [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to employ the teachings of Aghasaryan in the invention of Chang in view of Briliauskas to include the above limitations. The motivation to do so would be to facilitate in providing privacy protection with reduced errors in clustering (Aghasaryan: [0051]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MADHURI R HERZOG whose telephone number is (571)270-3359. The examiner can normally be reached 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Taghi Arani can be reached at (571)272-3787. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
MADHURI R. HERZOG
Primary Examiner
Art Unit 2438
/MADHURI R HERZOG/Primary Examiner, Art Unit 2438