Last updated: April 19, 2026

Application No. 18/756,540

TECHNIQUES FOR DETECTING FILE SIMILARITY

Non-Final OA §103§112

Filed

Jun 27, 2024

Examiner

ADAMS, CHARLES D

Art Unit

2152

Tech Center

2100 — Computer Architecture & Software

Assignee

Crowdstrike Inc.

OA Round

3 (Non-Final)

Interview Optional

— +44.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 423 resolved cases, 2023–2026

Examiner Intelligence

ADAMS, CHARLES D View full profile →

Grants 44% of resolved cases

Career Allow Rate

187 granted / 423 resolved

-10.8% vs TC avg

Strong +44% interview lift

Without

With

+44.2%

Interview Lift

resolved cases with interview

Typical timeline

5y 1m

Avg Prosecution

32 currently pending

Career history

455

Total Applications

across all art units

Statute-Specific Performance

§101

21.4%

-18.6% vs TC avg

§103

53.3%

+13.3% vs TC avg

§102

12.3%

-27.7% vs TC avg

§112

9.3%

-30.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 423 resolved cases

Office Action

§103 §112

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 5 February 2026 has been entered.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 8, and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 8, and 15 introduce an element of “training, over a plurality of steps, a machine learning (ML) model… .” 
The claims then comprise a step of “providing to a machine learning (ML) model, a set of files…” 
It is unclear if this is the same machine learning model or different machine learning models. It also renders subsequent invocations of the element “the ML model” unclear, because both “ML models” were introduced using the same language. 

Claims 8 and 15 contain a limitation wherein “the set of feature vectors is grouped based on a hierarchy of characteristics.”  
The element “a hierarchy of characteristics” had already been introduced previously in the claims. It is unclear if this element is referring to the same or a different hierarchy of characteristics. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-12, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US Pre-Grant Publication 2021/0294840), in view of Park et al. (US Pre-Grant Publication 2024/0160890), and further in view of Larkin et al. (US Pre-Grant Publication 2023/0057414). 

As to claim 1, Lee teaches a method comprising: 
…
providing to a machine learning (ML) model, a set of files, wherein the ML model is configured to generate, based on the set of files, a feature vector database comprising a set of feature vectors, wherein each of the set of feature vectors corresponds to a particular file of the set of files and wherein the set of feature vectors is grouped based on … characteristics (see Lee paragraphs [0024]-[0025]. A set of music files may be used to train a neural network. Feature vectors of the music files are generated. Each of the feature vectors is generated from a music file. The feature vectors may be grouped based on characteristics into subspaces corresponding to musical attributes); 
in response to receiving a query file to be compared to the set of files, processing the query file using the ML model to generate a query feature vector (see Lee paragraphs [0026]-[0027]. The user may supply a query music file to the system. The system will generate a feature vector of the query music file); and 
querying, by a processing device, the feature vector database using the query feature vector to identify one or more of the set of files that are similar to the query file (see Lee paragraphs [0027]-[0028]. Based on the feature vector, the system will search for music file similar to the query music file. As noted in paragraph [0028], the searching compares the query music file vector to the stored feature vectors). 
Lee does not explicitly teach: 
Training, over a plurality of steps, a machine learning model to group files based on a hierarchy of characteristics, wherein at each of the plurality of steps, the ML model is trained to group files iteratively, and 
wherein at each progressive iteration the ML model learns to group files based on a characteristic from the hierarchy of characteristics that is progressively lower on the hierarchy of characteristics; 
wherein the set of feature vectors is grouped based on a hierarchy of characteristics; 
Park teaches: 
Training, over a plurality of steps, a machine learning model to group files based on … characteristics, wherein at each of the plurality of steps, the ML model is trained to group files iteratively, and wherein at each progressive iteration the ML model learns to group files based on a particular characteristic from the … characteristics (see paragraph [0123]. Park teaches to cluster (or group) nodes (or files) based on characteristics); 
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Park, because both references are directed towards training data. Park merely adds to Lee an additional method to train the data, which will help to identify groups of entities within an entity network based on relationships between the entities. The learning process of Park will help to categorize data in a more accurate manner (see Park paragraph [0032]). 
Larkin teaches: 
Training, over a plurality of steps, a machine learning model to [match] files based on a hierarchy of characteristics, wherein at each of the plurality of steps, the ML model is trained to group files iteratively (see Larkin paragraphs [0043] and [0094]. Larkin relies upon a hierarchy of match models, such that each iteration is associated with a different matching model of a hierarchical string matching machine learning framework), and 
wherein at each progressive iteration the ML model learns to group files based on a characteristic from the hierarchy of characteristics that is progressively lower on the hierarchy of characteristics (see Larkin paragraphs [0043] and [0094]-[0096]. A sequence of progressively “lower” match models in a hierarchy is used with each iteration. It is noted that Applicant does not define “lower” nor provide any details regarding the nature of the hierarchy of characteristics); 
wherein the set of feature vectors is grouped based on the hierarchy of characteristics (see paragraphs [0043] and [0094]-[0096] for a hierarchy of characteristics. It is noted that Lee paragraphs [0024]-[0025] teach to calculate feature vectors based on characteristics. Larkin is simply relied upon to show wherein those characteristics used by a machine learning model may be based on a hierarchy); 
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Larkin, because both references are directed towards searching for files using feature vectors. Larkin merely adds to Lee an additional method to recognizing matching of data, notably based on a hierarchy of matching characteristics. The process of Larkin will help to improve predictive accuracy of string-based machine learning models (see Larkin paragraph [0021]).  

As to claim 2, Lee as modified by Park teaches the method of claim 1, wherein the ML model is trained using training data comprising a plurality of training data batches, wherein each of the plurality of training data batches comprises a set of training files with a label for each characteristic in the hierarchy of characteristics (see Lee paragraph [0025]. The machine learning model is trained using a plurality of training data. Each of the model subspaces is trained using a different label for each characteristic. Larkin teaches the existence of a hierarchy of characteristics, see paragraphs [0043] and [0094]-[0096]) and 
wherein training the ML model comprises: 
at each of the plurality of steps: 
grouping, using the ML model, a respective training data batch iteratively based on the hierarchy of characteristics to generate an output for each iteration (see Park paragraph [0061] for training based on different clusters. See Park paragraph [0074], which indicates how there are multiple iterations) ; and 
for each iteration: 
analyzing the output with a hierarchical contrastive learning (HCL) loss function to determine a loss value (see Park paragraph [0061]. A contrastive learning loss function is used to identify hierarchical community loss); and 
adjusting one or more weights of the ML model based at least in part on the loss value (see Park paragraph [0075]. Loss is measured and weights are adjusted in the model).  

As to claim 3, Lee as modified by Park teaches the method of claim 2, wherein training the ML model further comprises: 
for each iteration: 
analyzing the output with a focal loss function to determine a second loss value (see Park paragraph [0061]. Two samples may be used to calculate a total loss value); and 
adding the loss value and the second loss value to generate a total loss value, wherein the one or more weights of the ML model are adjusted based on the total loss value (see Park paragraph [0061]. The two samples are added to generate a community loss. As noted in paragraph [0075], node weights may be adjusted to minimize a loss).  

As to claim 4, Lee as modified teaches the method of method of claim 1, wherein querying the feature vector database using the query feature vector comprises: 
using a nearest neighbors algorithm to identify from the feature vector database, one or more of the set of feature vectors that are similar to the query feature vector (see Lee paragraph [0160]).  

As to claim 5, Lee as modified teaches the method of method of claim 4, further comprising: 
for each of the identified one or more feature vectors, retrieving a file from the set of files corresponding to the identified feature vector to obtain the one or more of the set of files that are similar to the query file set (see Lee paragraphs [0028]-[0029] and [0160]); and 
providing the one or more of the set of files that are similar to the query file as a result set (see Lee paragraphs [0028]-[0029] and [0160]). 

As to claims 8 and 15, see the rejection of claim 1. 
As to claims 9 and 16, see the rejection of claim 2. 
As to claims 10 and 17, see the rejection of claim 3. 
As to claims 11 and 18, see the rejection of claim 4. 
As to claims 12 and 19, see the rejection of claim 5. 

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US Pre-Grant Publication 2021/0294840), in view of Park et al. (US Pre-Grant Publication 2024/0160890), in view of Larkin et al. (US Pre-Grant Publication 2023/0057414), and further in view of Srivastava et al. (US Pre-Grant Patent 8,561,193).  

As to claim 6, Lee as modified teaches the method of method of claim 1. 
Lee does not teach wherein the hierarchy of characteristics comprises: 
threat type, malware family, subtype, compiler, packer, and library. 
Srivastava teaches wherein the hierarchy of characteristics comprises: 
threat type, malware family, subtype, compiler, packer, and library (see Srivastava 5:10-37. Srivastava teaches wherein each of these characteristics may be extracted and recorded as part of a file. It is noted that Lee extracts characteristics from a file to use when creating feature vectors. Larkin teaches the creation of a hierarchy of features. Srivastava simply shows wherein such features may be related to malware attributes, including those claimed. It is additionally note that the specific data types of the hierarchy do not appear to functionally change the invention, and that no order for the data types is claimed). 
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Srivastava, because both references are directed towards extracting data. Srivastava merely adds to Lee an additional type of data entity that may be categorized and searched for using the system of Lee. This will make the search system of Lee be able to respond to additional types of user requests. 

As to claims 13 and 20, see the rejection of claim 6. 

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US Pre-Grant Publication 2021/0294840), in view of Park et al. (US Pre-Grant Publication 2024/0160890), in view of Larkin et al. (US Pre-Grant Publication 2023/0057414), and further in view of Alme et al. (US Pre-Grant Patent 8,561,193).  

As to claim 7, Lee as modified teaches the method of method of claim 1, 
Lee does not teach wherein each of the set of files and the query file are portable executable files 
Alme teaches wherein each of the set of files and the query file are portable executable files (see Alme paragraphs [0033], [0035] and [0064]. Malicious executable files may be used for training. Additionally, executable files are searched when received).
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Alme, because both references are directed towards extracting data and searching vectors of data. Alme merely adds to Lee an additional type of data entity that may be categorized and searched for using the system of Lee. This will make the search system of Lee be able to respond to additional types of user requests. 

As to claim 14, see the rejection of claim 7. 

Response to Arguments
Applicant’s arguments with respect to claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES D ADAMS whose telephone number is (571)272-3938. The examiner can normally be reached M-F, 9-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached at 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES D ADAMS/           Primary Examiner, Art Unit 2152

Read full office action

Prosecution Timeline

Jun 27, 2024

Application Filed

Jun 12, 2025

Non-Final Rejection — §103, §112

Sep 15, 2025

Response Filed

Nov 01, 2025

Final Rejection — §103, §112

Jan 29, 2026

Applicant Interview (Telephonic)

Jan 30, 2026

Examiner Interview Summary

Feb 05, 2026

Request for Continued Examination

Feb 17, 2026

Response after Non-Final Action

Mar 07, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/670,896

Patent 12602392

SCALABLE METADATA-DRIVEN DATA INGESTION PIPELINE

2y 5m to grant Granted Apr 14, 2026

18/200,736

Patent 12591595

ADAPATIVE SYSTEM FOR PROCESSING DISTRIBUTED DATA FILES AND A METHOD THEREOF

2y 5m to grant Granted Mar 31, 2026

18/313,463

Patent 12572546

METHODS AND SYSTEMS FOR DISTRIBUTED DATA ANALYSIS

2y 5m to grant Granted Mar 10, 2026

18/437,351

Patent 12566778

OPTIMIZING JSON STRUCTURE

2y 5m to grant Granted Mar 03, 2026

18/600,971

Patent 12566706

PROVIDING ROLLING UPDATES OF DISTRIBUTED SYSTEMS WITH A SHARED CACHE

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

44%

Grant Probability

88%

With Interview (+44.2%)

5y 1m

Median Time to Grant

High

PTA Risk

Based on 423 resolved cases by this examiner. Grant probability derived from career allow rate.