Last updated: May 04, 2026
Application No. 18/161,238
SIMILARITY HASHING OF BINARY FILE FEATURE SETS FOR CLUSTERING AND MALICIOUS DETECTION

Non-Final OA §102§103
Filed
Jan 30, 2023
Examiner
NARRAMORE, BLAKE I
Art Unit
2438
Tech Center
2400 — Computer Networks
Assignee
Palo Alto Networks Inc.
OA Round
2 (Non-Final)
Interview Optional

— +24.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +24.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 162 resolved cases, 2023–2026
Examiner Intelligence

NARRAMORE, BLAKE I View full profile →
Grants 78% — above average
Career Allowance Rate
127 granted / 162 resolved
+20.4% vs TC avg
Strong +24% interview lift
Without
With
+24.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
25 currently pending
Career history
187
Total Applications
across all art units
Statute-Specific Performance

§101
8.2%
-31.8% vs TC avg
§103
56.4%
+16.4% vs TC avg
§102
10.1%
-29.9% vs TC avg
§112
20.5%
-19.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 162 resolved cases
Office Action

§102 §103
Detailed Action
This is a Final Office action in response to communications received on 4/22/2025. Claims 1 and 16 were amended. Claims 1-20 are pending and are examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s amendments, filed 4/22/2025, to claim 16 correcting the claim to start the list with a colon is sufficient to overcome the objection to the aforementioned claim.  Accordingly, the objection to claim 16, as filed in (2) of the Non-Final Office action filed 1/22/2025, is withdrawn.  
Applicant’s amendments, filed 4/22/2025, to claim 16 correcting the claim to recite a non-transitory machine-readable medium is sufficient to overcome the rejection of the aforementioned claim.  Accordingly, the rejection of claims 16-20 under 101, as filed in (5) of the Non-Final Office action filed 1/22/2025, is withdrawn.  
Applicant’s arguments regarding the rejection under 35 U.S.C. 102 of the claims under Pan have been considered, and are found unpersuasive.
Applicant argues on pages 8 and 9 of the Remarks, filed 4/22/2025, the cited prior art fail to teach or suggest the limitations of the claim, specifically that: (i) Pan’s feature vectors are extracted from images, not “feature sets of binary file artifacts”, (ii) Pan lacks teaching of “hashes” of such feature sets; and (iii) Pan nowhere discloses that the feature sets are “generated using disassembly”. However, Examiner respectfully disagrees. Pan repeatedly describes each image in the data set as a “binary file” that is represented by a binary hash feature vector ([0025]; Step 510 of Fig. 5). Step 510 obtains a binary hash feature vector for each image using a trained CNN. The hash of the target image is the claimed “first hash vector.” ([0025]; Step 510 of Fig. 5). Under BTRI, a “binary file” is any file whose contents are encoded as binary data. An image file is indisputably a binary file. Nothing in the claim language narrows “binary file” to executable program code or to artifacts created during program execution. Therefore, Pan’s images fall within the scope of the claim term. Additionally, Pan explicitly teaches obtaining a feature vector for each image using a trained CNN and treating that vector as the representation used for similarity computation. Accordingly the limitation of “generating a feature set of a binary file” is met by Pan’s generation of a feature vector for the binary image file. Pan adds a hash coding layer and a binary coding layer after the CNN so that a hash feature vector corresponding to the image is produced and then binarized. These operations constitute the claims taking a hash of a feature set and generating a binary hash feature vector, which addresses the claimed absence of hashing argued by Applicant. In regards to argument (iii), Claim 1 merely recites generating feature set via disassembly without specifying how the disassembly is performed or limiting it to traditional instruction-level separation. Pan’s pipeline, though convolution and pooling, disassembles the binary image file into its constituent numerical features. This satisfies the plain meaning under BRI.
Applicant states that Pan concerns “image retrieval” and is therefore in a “completely disparate technology area.” However, Pan is at least reasonably pertinent to the problem faced by the inventor. The problem faced – similarity determination of collections of binary files by hashing feature representations – is the same problem solved by Pan for image binaries. Therefore the reference is analogous because a POSITA seeking faster similarity-based retrieval of binary objects would have looked to Pan.
Consequently, the rejection of the claims under 35 U.S.C. 102 is sustained.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3-4 and 16-17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Pan (US 20220012526 A1).
Regarding claim 1, Pan teaches the limitations of claim 1 substantially as follows:
A method comprising: 
disassembling a binary file to generate a plurality of feature sets of the binary file, (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) (i.e. feature sets of the binary file) corresponding to the image and the remainder images. In some embodiments, the plurality of image difference degrees may be determined by the processing device)
wherein each of the plurality of feature sets corresponds to an artifact from the disassembled binary file; (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) corresponding to the image and the remainder images (i.e. each of the plurality of feature sets corresponds to an artifact from the disassembled binary file). In some embodiments, the plurality of image difference degrees may be determined by the processing device)
hashing each of the plurality of feature sets to generate a first hash vector for the binary file; (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) (i.e. hashing each of the plurality of feature sets to generate a first hash vector for the binary file) corresponding to the image and the remainder images. In some embodiments, the plurality of image difference degrees may be determined by the processing device)
identifying a first plurality of binary files with corresponding hash vectors that each match the first hash vector for the binary file, (Pan; Para(s). [0100]: for each image in the image set (i.e. identifying a first plurality of binary files), the plurality of image difference degrees between the image and remainder images in the image set may be determined based on feature vectors (i.e. corresponding hash vectors that each match the first hash vector for the binary file))
wherein each match is at least one of an exact match and an approximate match, (Pan; Para(s). [0100]: for each image in the image set, the plurality of image difference degrees (i.e. at least one of an exact match and an approximate match) between the image and remainder images in the image set may be determined based on feature vectors)
wherein exact and approximate matches of the first hash vector are according to a nearest neighbor search of hash vectors of binary files including the first plurality of binary files; and (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. matches of the first hash vector are according to a nearest neighbor search of hash vectors), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)
classifying the binary file according to a verdict for at least one of the first plurality of binary files.  (Pan; Para(s). [0100]-[0101] & [0105]-[0107]: for each image in the first neighbor image set, remainder images in the image set may be ranked based on difference degrees between the remainder images and the image (i.e. classifying the binary file according to a verdict for at least one of the first plurality of binary files))

Regarding claim 3, Pan teaches the limitations of claim 1.
Pan teaches the limitations of claim 3 as follows:
The method of claim 1, wherein the nearest neighbor search comprises an approximate nearest neighbor search.  (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. approximate nearest neighbor search), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)

Regarding claim 4, Pan teaches the limitations of claim 3.
Pan teaches the limitations of claim 4 as follows:
The method of claim 3, wherein the approximate nearest neighbor search is according to hamming distance between hash vectors.  (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. approximate nearest neighbor search is according to hamming distance between hash vectors), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)

Regarding claim 16, Pan teaches the limitations of claim 16 substantially as follows:
An apparatus comprising: a processor; and a non-transitory machine-readable medium having instructions stored thereon that are executable by the processor to cause the apparatus to: 
generate a first hash vector for a first binary file, wherein the first hash vector comprises hashes generated from artifacts of the first binary file; (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) (i.e. hashing each of the plurality of feature sets to generate a first hash vector for the binary file) corresponding to the image and the remainder images. In some embodiments, the plurality of image difference degrees may be determined by the processing device)
identify a subset of a plurality of hash vectors corresponding to a plurality of binary files as nearest neighbors to a first hash vector corresponding to a first binary file, (Pan; Para(s). [0100]: for each image in the image set (i.e. identifying a first plurality of binary files), the plurality of image difference degrees between the image and remainder images in the image set may be determined based on feature vectors (i.e. corresponding hash vectors that each match the first hash vector for the binary file))
wherein each of the plurality of hash vectors comprise vectors of hashes generated from artifacts of corresponding binary files; and (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) (i.e. hashing each of the plurality of feature sets to generate a first hash vector for the binary file) corresponding to the image and the remainder images. In some embodiments, the plurality of image difference degrees may be determined by the processing device)
based on a determination that a first cluster in a plurality of clusters of hash vectors comprises at least one hash vector of the subset of the plurality of hash vectors, (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. matches of the first hash vector are according to a nearest neighbor search of hash vectors), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)
indicate a verdict for the first binary file corresponding to a label of the first cluster.  (Pan; Para(s). [0100]-[0101] & [0105]-[0107]: for each image in the first neighbor image set, remainder images in the image set may be ranked based on difference degrees between the remainder images and the image (i.e. classifying the binary file according to a verdict for at least one of the first plurality of binary files))

Regarding claim 17, Pan teaches the limitations of claim 16.
Pan teaches the limitations of claim 17 as follows:
The apparatus of claim 16, wherein the instructions to identify the subset of the plurality of hash vectors as nearest neighbors to the first hash vector comprise instructions executable by the processor to cause the apparatus to identify the subset of the plurality of hash vectors as approximate nearest neighbors of the first hash vector. (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. approximate nearest neighbor search), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set) 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 5-7, 10-11 and 13-14 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Pan (US 20220012526 A1), in view of Shabi (US 20230113436 A1).
 Regarding claim 2, Pan teaches the limitations of claim 1.
Pan does not teach the limitations of claim 2 as follows:
The method of claim 1, wherein hashing each of the plurality of feature sets to generate the first hash vector comprises inputting each of the plurality of feature sets into a locality sensitive hashing function to generate a plurality of hashes, wherein the first hash vector comprises the plurality of hashes.  
However, in the same field of endeavor, Shabi discloses the limitations of claim 2 as follows:
The method of claim 1, wherein hashing each of the plurality of feature sets to generate the first hash vector comprises inputting each of the plurality of feature sets into a locality sensitive hashing function to generate a plurality of hashes, wherein the first hash vector comprises the plurality of hashes.  (Shabi; Para(s). [0034]: The hash function applied by Hash Value Generation Logic may be part of a locality-sensitive hashing (LSH) scheme used by Selective Data Compression Logic)
Shabi is combinable with Pan because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan to incorporate locality sensitive hashing as in Shabi in order to improve the security of the system by providing a means by which hashes may be performed using a secure LSH scheme.

Regarding claim 5, Pan teaches the limitations of claim 1.
Pan does not teach the limitations of claim 5 as follows:
The method of claim 1, further comprising: 
clustering a plurality of hash vectors corresponding to a second plurality of binary files to generate a plurality of clusters, 
wherein the second plurality of binary files comprises the first plurality of binary files; and 
labelling each cluster of the plurality of clusters according to known verdicts of binary files corresponding to hash vectors in the cluster.  
However, in the same field of endeavor, Shabi discloses the limitations of claim 5 as follows:
The method of claim 1, further comprising: 
clustering a plurality of hash vectors corresponding to a second plurality of binary files to generate a plurality of clusters, (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages is selected from the candidate pages (i.e. clustering a plurality of hash vectors corresponding to a second plurality of binary files to generate a plurality of clusters))
wherein the second plurality of binary files comprises the first plurality of binary files; and (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages is selected from the candidate pages (i.e. the second plurality of binary files comprises the first plurality of binary files))
labelling each cluster of the plurality of clusters according to known verdicts of binary files corresponding to hash vectors in the cluster.  (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages is selected from the candidate pages (i.e. labelling each cluster of the plurality of clusters according to known verdicts of binary files))
Shabi is combinable with Pan because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan to incorporate comparison against sets of candidate pages as in Shabi in order to expand the functionality of the system by providing a means by which groupings of documents may be evaluated via comparison.

Regarding claim 6, Pan and Shabi teach the limitations of claim 5.
Pan and Shabi teach the limitations of claim 6 as follows:
The method of claim 5, wherein classifying the binary file according to the verdict for at least one of the first plurality of binary files comprises, 
determining that the first hash vector is a nearest neighbor of a first cluster of the plurality of clusters; and (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. determining that the first hash vector is a nearest neighbor of a first cluster of the plurality of clusters), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)
indicating the verdict as a label of the first cluster.  (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages (i.e. indicating the verdict as a label of the first cluster) is selected from the candidate pages)
The same motivation to combine as in claim 5 is applicable to the instant claim.

Regarding claim 7, Pan and Shabi teach the limitations of claim 5.
Pan and Shabi teach the limitations of claim 7 as follows:
The method of claim 5, wherein clustering the plurality of hash vectors to generate the plurality of clusters comprises, 
for each hash vector in the plurality of hash vectors, determining a subset of the plurality of hash vectors as nearest neighbors of the hash vector; and (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. determining a subset of the plurality of hash vectors as nearest neighbors of the hash vector), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)
based on determining that a first cluster of the plurality of clusters comprises at least one of the subset of the plurality of hash vectors and the hash vector, assigning the subset of the plurality of hash vectors and the hash vector to the first cluster.  (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages (i.e. based on determining that a first cluster of the plurality of clusters comprises at least one of the subset of the plurality of hash vectors and the hash vector), a set of similar candidate pages is selected from the candidate pages (i.e. assigning the subset of the plurality of hash vectors and the hash vector to the first cluster))
The same motivation to combine as in claim 5 is applicable to the instant claim.

Regarding claim 10,  Pan teaches the limitations of claim 10 substantially as follows:
A non-transitory machine-readable medium having program code stored thereon, the program code comprising instructions to: 
wherein each of the plurality of hash vectors comprises hashes for a corresponding one of the plurality of binary files, (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) (i.e. each of the plurality of hash vectors comprises hashes for a corresponding one of the plurality of binary files) corresponding to the image and the remainder images. In some embodiments, the plurality of image difference degrees may be determined by the processing device)
wherein each of the hashes is a hash of a feature set generated from one of a plurality of binary file artifacts; (Pan; Para(s). [0100]: feature vectors (e.g., binary hash feature vectors) corresponding to the image and the remainder images (i.e. each of the plurality of feature sets corresponds to an artifact from the disassembled binary file). In some embodiments, the plurality of image difference degrees may be determined by the processing device)
determine that a first hash vector of a first binary file in the plurality of binary files is at least one of an exact and an approximate match of a second hash vector in a first cluster of the plurality of clusters; and (Pan; Para(s). [0100]: for each image in the image set, the plurality of image difference degrees (i.e. at least one of an exact match and an approximate match) between the image and remainder images in the image set may be determined based on feature vectors)
assign a verdict for the first binary file corresponding to a label of the first cluster.  (Pan; Para(s). [0100]-[0101] & [0105]-[0107]: for each image in the first neighbor image set, remainder images in the image set may be ranked based on difference degrees between the remainder images and the image (i.e. classifying the binary file according to a verdict for at least one of the first plurality of binary files))
Pan does not teach the limitations of claim 10 as follows:
generate a plurality of clusters for a plurality of hash vectors corresponding to a plurality of binary files according to nearest neighbor search on hash vectors in the plurality of hash vectors, 
assign each cluster of the plurality of clusters a label according to known labels of binary files in the plurality of binary files corresponding to hash vectors in the cluster; 
However, in the same field of endeavor, Shabi discloses the limitations of claim 10 as follows:
generate a plurality of clusters for a plurality of hash vectors corresponding to a plurality of binary files according to nearest neighbor search on hash vectors in the plurality of hash vectors, (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages is selected from the candidate pages (i.e. clustering a plurality of hash vectors corresponding to a second plurality of binary files to generate a plurality of clusters))
assign each cluster of the plurality of clusters a label according to known labels of binary files in the plurality of binary files corresponding to hash vectors in the cluster; (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages is selected from the candidate pages (i.e. labelling each cluster of the plurality of clusters according to known verdicts of binary files))
Shabi is combinable with Pan because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan to incorporate comparison against sets of candidate pages as in Shabi in order to expand the functionality of the system by providing a means by which groupings of documents may be evaluated via comparison.

Regarding claim 11, Pan and Shabi teach the limitations of claim 10.
Pan and Shabi teach the limitations of claim 11 as follows:
The non-transitory machine-readable medium of claim 10, wherein the instructions to generate the plurality of clusters for the plurality of hash vectors comprise instructions to, for each hash vector of the plurality of hash vectors, determine a subset of the plurality of hash vectors that are nearest neighbors of the hash vector; and (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. determining a subset of the plurality of hash vectors as nearest neighbors of the hash vector), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)
based on a determination that a first cluster of the plurality of clusters comprises at least one of the subset of the plurality of hash vectors and the hash vector, assign the subset of the plurality of hash vectors and the hash vector to the first cluster.  (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages (i.e. based on determining that a first cluster of the plurality of clusters comprises at least one of the subset of the plurality of hash vectors and the hash vector), a set of similar candidate pages is selected from the candidate pages (i.e. assigning the subset of the plurality of hash vectors and the hash vector to the first cluster))
The same motivation to combine as in claim 10 is applicable to the instant claim.

Regarding claim 13, Pan and Shabi teach the limitations of claim 10.
Pan and Shabi teach the limitations of claim 13 as follows:
The non-transitory machine-readable medium of claim 10, wherein the instructions to determine that the first hash vector is at least one of an exact and an approximate match of the second hash vector comprise instructions to determine that the second hash vector is an approximate nearest neighbor of the first hash vector.  (Pan; Para(s). [0100]-[0101]: for each image in the image set, the plurality of image difference degrees between the image and the remainder images in the image set may be determined by determining at least one of a Hamming distance (i.e. approximate nearest neighbor search), a Euclidean distance, or a cosine distance based on the feature vectors corresponding to the image and the remainder images in the image set)

Regarding claim 14, Pan and Shabi teach the limitations of claim 10.
Pan and Shabi teach the limitations of claim 14 as follows:
The non-transitory machine-readable medium of claim 10, wherein the hashes for each of the plurality of binary files comprise locality sensitive hashes of feature sets generated from the plurality of binary file artifacts.  (Shabi; Para(s). [0034]: The hash function applied by Hash Value Generation Logic may be part of a locality-sensitive hashing (LSH) scheme used by Selective Data Compression Logic)
Shabi is further combinable with Pan because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan to incorporate locality sensitive hashing as in Shabi in order to improve the security of the system by providing a means by which hashes may be performed using a secure LSH scheme.

Regarding claim 18, Pan teaches the limitations of claim 16.
Pan does not teach the limitations of claim 18 as follows:
The apparatus of claim 16, wherein the hashes generated from artifacts of the first binary file comprise locality sensitive hashes generated from artifacts of the first binary file.  
However, in the same field of endeavor, Shabi discloses the limitations of claim 18 as follows:
The apparatus of claim 16, wherein the hashes generated from artifacts of the first binary file comprise locality sensitive hashes generated from artifacts of the first binary file.  (Shabi; Para(s). [0034]: The hash function applied by Hash Value Generation Logic may be part of a locality-sensitive hashing (LSH) scheme used by Selective Data Compression Logic)
Shabi is combinable with Pan because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan to incorporate locality sensitive hashing as in Shabi in order to improve the security of the system by providing a means by which hashes may be performed using a secure LSH scheme.

Regarding claim 19, Pan teaches the limitations of claim 16.
Pan does not teach the limitations of claim 19 as follows:
The apparatus of claim 16, wherein each cluster of the plurality of clusters is labelled based, at least in part, on known verdicts of binary files corresponding to at least a subset of hash vectors in the cluster.  
However, in the same field of endeavor, Shabi discloses the limitations of claim 19 as follows:
The apparatus of claim 16, wherein each cluster of the plurality of clusters is labelled based, at least in part, on known verdicts of binary files corresponding to at least a subset of hash vectors in the cluster.  (Shabi; Para(s). [0007]: Responsive to the hash values generated for the candidate pages, a set of similar candidate pages is selected from the candidate pages (i.e. labelling each cluster of the plurality of clusters according to known verdicts of binary files))
Shabi is combinable with Pan because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan to incorporate comparison against sets of candidate pages as in Shabi in order to expand the functionality of the system by providing a means by which groupings of documents may be evaluated via comparison.

Claims 8 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Pan (US 20220012526 A1), in view of Shabi (US 20230113436 A1), further in view of Das (US 20210407114 A1).
Regarding claim 8, Pan and Shabi teach the limitations of claim 7.
Pan and Shabi do not teach the limitations of claim 8 as follows:
The method of claim 7, further comprising, 
based on determining that none of the plurality of clusters comprise at least one of the subset of the plurality of hash vectors and the hash vector, 
initializing a second cluster of the plurality of clusters with the subset of the plurality of hash vectors and the hash vector.  
However, in the same field of endeavor, Das discloses the limitations of claim 10 as follows:
The method of claim 7, further comprising, 
based on determining that none of the plurality of clusters comprise at least one of the subset of the plurality of hash vectors and the hash vector, (Das; Para(s). [0090]: deriving the fundamental matrix based on these putative matches between the source image's set of features and the target image's set of features may be carried out using any of various techniques, one example of which may take the form of a Random Sample Consensus (RANSAC) technique that iterates through many different subsets (i.e. determining that none of the plurality of clusters comprise at least one of the subset of the plurality of hash vectors and the hash vector) of the putative matches in order to identify the “best” fundamental matrix in the presence of outliers)
initializing a second cluster of the plurality of clusters with the subset of the plurality of hash vectors and the hash vector.  (Das; Para(s). [0090]: deriving the fundamental matrix based on these putative matches between the source image's set of features and the target image's set of features may be carried out using any of various techniques, one example of which may take the form of a Random Sample Consensus (RANSAC) technique that iterates through many different subsets (i.e. initializing a second cluster of the plurality of clusters with the subset of the plurality of hash vectors and the hash vector) of the putative matches in order to identify the “best” fundamental matrix in the presence of outliers)
Das is combinable with Pan and Shabi because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan and Shabi to incorporate iterative comparison of subset features as in Das in order to expand the functionality of the system by providing a means by which comparison may be iterated over desired groups.

Regarding claim 12, Pan and Shabi teach the limitations of claim 11.
Pan and Shabi do not teach the limitations of claim 12 as follows:
The non-transitory machine-readable medium of claim 11, further comprising instructions to, based on a determination that none of the plurality of clusters comprise at least one of the subset of the plurality of hash vectors and the hash vector, 
initialize a second cluster of the plurality of clusters with the subset of the plurality of hash vectors and the hash vector.  
However, in the same field of endeavor, Das discloses the limitations of claim 10 as follows:
The non-transitory machine-readable medium of claim 11, further comprising instructions to, based on a determination that none of the plurality of clusters comprise at least one of the subset of the plurality of hash vectors and the hash vector, (Das; Para(s). [0090]: deriving the fundamental matrix based on these putative matches between the source image's set of features and the target image's set of features may be carried out using any of various techniques, one example of which may take the form of a Random Sample Consensus (RANSAC) technique that iterates through many different subsets (i.e. determining that none of the plurality of clusters comprise at least one of the subset of the plurality of hash vectors and the hash vector) of the putative matches in order to identify the “best” fundamental matrix in the presence of outliers)
initialize a second cluster of the plurality of clusters with the subset of the plurality of hash vectors and the hash vector.  (Das; Para(s). [0090]: deriving the fundamental matrix based on these putative matches between the source image's set of features and the target image's set of features may be carried out using any of various techniques, one example of which may take the form of a Random Sample Consensus (RANSAC) technique that iterates through many different subsets (i.e. initializing a second cluster of the plurality of clusters with the subset of the plurality of hash vectors and the hash vector) of the putative matches in order to identify the “best” fundamental matrix in the presence of outliers)
Das is combinable with Pan and Shabi because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan and Shabi to incorporate iterative comparison of subset features as in Das in order to expand the functionality of the system by providing a means by which comparison may be iterated over desired groups.

Claims 9, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Pan (US 20220012526 A1), in view of Shabi (US 20230113436 A1), as applied to claims 1, 10 and 16, further in view of Pan (US 20180096144 A1) hereinafter “Pan B”.
 Regarding claim 9, Pan and Shabi teach the limitations of claim 1.
Pan and Shabi do not teach the limitations of claim 9 as follows:
The method of claim 1, wherein the plurality of feature sets comprises two or more of named functions features, unnamed functions features, function categories features, referenced strings features, and non-referenced strings features.  
However, in the same field of endeavor, Pan B discloses the limitations of claim 9 as follows:
The method of claim 1, wherein the plurality of feature sets comprises two or more of named functions features, unnamed functions features, function categories features, referenced strings features, and non-referenced strings features.  (Pan B; Para(s). [0006]-[0008] & [0012]: finding character string features between the character strings having close correlations to each of the key character strings, and acquiring the malicious code rule according to the character string features (i.e. referenced strings features, and non-referenced strings features))
Pan B is combinable with Pan and Shabi because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan and Shabi to incorporate comparison of character string features as in Pan B in order to expand the functionality of the system by providing a means by which alternative features may be examined for comparison.

Regarding claim 15, Pan and Shabi teach the limitations of claim 10.
Pan and Shabi do not teach the limitations of claim 15 as follows:
The non-transitory machine-readable medium of claim 10, wherein the plurality of binary file artifacts comprises two or more of named functions, non-named functions, function types, referenced strings, and non-referenced strings in assembly code of binary files.  
However, in the same field of endeavor, Pan B discloses the limitations of claim 15 as follows:
The non-transitory machine-readable medium of claim 10, wherein the plurality of binary file artifacts comprises two or more of named functions, non-named functions, function types, referenced strings, and non-referenced strings in assembly code of binary files.  (Pan B; Para(s). [0006]-[0008] & [0012]: finding character string features between the character strings having close correlations to each of the key character strings, and acquiring the malicious code rule according to the character string features (i.e. referenced strings features, and non-referenced strings features))
Pan B is combinable with Pan and Shabi because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan and Shabi to incorporate comparison of character string features as in Pan B in order to expand the functionality of the system by providing a means by which alternative features may be examined for comparison.

Regarding claim 20, Pan and Shabi teach the limitations of claim 16.
Pan and Shabi do not teach the limitations of claim 20 as follows:
The apparatus of claim 16 wherein the artifacts of the first binary file comprise two or more of named functions, non-named functions, function types, referenced strings, and non-referenced strings in assembly code of binary files. 
However, in the same field of endeavor, Pan B discloses the limitations of claim 20 as follows:
The apparatus of claim 16 wherein the artifacts of the first binary file comprise two or more of named functions, non-named functions, function types, referenced strings, and non-referenced strings in assembly code of binary files. (Pan B; Para(s). [0006]-[0008] & [0012]: finding character string features between the character strings having close correlations to each of the key character strings, and acquiring the malicious code rule according to the character string features (i.e. referenced strings features, and non-referenced strings features))
Pan B is combinable with Pan and Shabi because all are from the same field of endeavor of feature comparison methods. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified system of Pan and Shabi to incorporate comparison of character string features as in Pan B in order to expand the functionality of the system by providing a means by which alternative features may be examined for comparison.

Prior Art Considered But Not Relied Upon
Rose (US 20210319709 A1) which teaches a method of image feature comparison to automatically compare and match the centroid and/or extracted features of the transformed image to a database of airport lighting positions, including lighting positions at the landing site (e.g., using a random sample consensus operation, using an iterative closest point operation, etc.).
Lee (US 20170041793 A1) which teaches authorizing a device based on comparison of feature activation keys.

Conclusion
For the above-stated reasons, claims 1-20 are rejected.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BLAKE ISAAC NARRAMORE whose telephone number is (303)297-4357.  The examiner can normally be reached on Monday - Friday 0700-1700 MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Taghi T Arani can be reached on (571) 272-3787.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.I.N./Examiner, Art Unit 2438                                                                                                                                                                                                        
/TAGHI T ARANI/Supervisory Patent Examiner, Art Unit 2438
Read full office action
Prosecution Timeline

Show 4 earlier events
Apr 21, 2025
Applicant Interview (Telephonic)
Apr 22, 2025
Response Filed
Jun 28, 2025
Final Rejection — §102, §103
Oct 02, 2025
Notice of Allowance
Nov 26, 2025
Response after Non-Final Action
Nov 28, 2025
Response after Non-Final Action
Jan 10, 2026
Response after Non-Final Action
Mar 04, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/983,868
Patent 12567986
Performing secure data interactions in a distributed network
3y 3m to grant Granted Mar 03, 2026
17/873,127
Patent 12530458
LOCAL LEDGER BLOCK CHAIN FOR SECURE ELECTRONIC CONTROL UNIT UPDATES
3y 6m to grant Granted Jan 20, 2026
18/065,261
Patent 12530474
METHOD FOR PROVING DEVICE IDENTITY TO SECURITY BROKERS
3y 1m to grant Granted Jan 20, 2026
18/003,265
Patent 12526137
Method for Saving Ciphertext and Apparatus
3y 0m to grant Granted Jan 13, 2026
17/413,530
Patent 12518059
DEVICE AND METHOD TO CONTROL ACCESS TO PROTECTED FUNCTIONALITY OF APPLICATIONS
4y 7m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+24.4%)
2y 9m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 162 resolved cases by this examiner. Grant probability derived from career allowance rate.
SIMILARITY HASHING OF BINARY FILE FEATURE SETS FOR CLUSTERING AND MALICIOUS DETECTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email