Prosecution Insights
Last updated: April 19, 2026
Application No. 18/756,540

TECHNIQUES FOR DETECTING FILE SIMILARITY

Non-Final OA §103§112
Filed
Jun 27, 2024
Examiner
ADAMS, CHARLES D
Art Unit
2152
Tech Center
2100 — Computer Architecture & Software
Assignee
Crowdstrike Inc.
OA Round
3 (Non-Final)
44%
Grant Probability
Moderate
3-4
OA Rounds
5y 1m
To Grant
88%
With Interview

Examiner Intelligence

Grants 44% of resolved cases
44%
Career Allow Rate
187 granted / 423 resolved
-10.8% vs TC avg
Strong +44% interview lift
Without
With
+44.2%
Interview Lift
resolved cases with interview
Typical timeline
5y 1m
Avg Prosecution
32 currently pending
Career history
455
Total Applications
across all art units

Statute-Specific Performance

§101
21.4%
-18.6% vs TC avg
§103
53.3%
+13.3% vs TC avg
§102
12.3%
-27.7% vs TC avg
§112
9.3%
-30.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 423 resolved cases

Office Action

§103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 5 February 2026 has been entered. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1, 8, and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claims 1, 8, and 15 introduce an element of “training, over a plurality of steps, a machine learning (ML) model… .” The claims then comprise a step of “providing to a machine learning (ML) model, a set of files…” It is unclear if this is the same machine learning model or different machine learning models. It also renders subsequent invocations of the element “the ML model” unclear, because both “ML models” were introduced using the same language. Claims 8 and 15 contain a limitation wherein “the set of feature vectors is grouped based on a hierarchy of characteristics.” The element “a hierarchy of characteristics” had already been introduced previously in the claims. It is unclear if this element is referring to the same or a different hierarchy of characteristics. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-5, 8-12, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US Pre-Grant Publication 2021/0294840), in view of Park et al. (US Pre-Grant Publication 2024/0160890), and further in view of Larkin et al. (US Pre-Grant Publication 2023/0057414). As to claim 1, Lee teaches a method comprising: … providing to a machine learning (ML) model, a set of files, wherein the ML model is configured to generate, based on the set of files, a feature vector database comprising a set of feature vectors, wherein each of the set of feature vectors corresponds to a particular file of the set of files and wherein the set of feature vectors is grouped based on … characteristics (see Lee paragraphs [0024]-[0025]. A set of music files may be used to train a neural network. Feature vectors of the music files are generated. Each of the feature vectors is generated from a music file. The feature vectors may be grouped based on characteristics into subspaces corresponding to musical attributes); in response to receiving a query file to be compared to the set of files, processing the query file using the ML model to generate a query feature vector (see Lee paragraphs [0026]-[0027]. The user may supply a query music file to the system. The system will generate a feature vector of the query music file); and querying, by a processing device, the feature vector database using the query feature vector to identify one or more of the set of files that are similar to the query file (see Lee paragraphs [0027]-[0028]. Based on the feature vector, the system will search for music file similar to the query music file. As noted in paragraph [0028], the searching compares the query music file vector to the stored feature vectors). Lee does not explicitly teach: Training, over a plurality of steps, a machine learning model to group files based on a hierarchy of characteristics, wherein at each of the plurality of steps, the ML model is trained to group files iteratively, and wherein at each progressive iteration the ML model learns to group files based on a characteristic from the hierarchy of characteristics that is progressively lower on the hierarchy of characteristics; wherein the set of feature vectors is grouped based on a hierarchy of characteristics; Park teaches: Training, over a plurality of steps, a machine learning model to group files based on … characteristics, wherein at each of the plurality of steps, the ML model is trained to group files iteratively, and wherein at each progressive iteration the ML model learns to group files based on a particular characteristic from the … characteristics (see paragraph [0123]. Park teaches to cluster (or group) nodes (or files) based on characteristics); It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Park, because both references are directed towards training data. Park merely adds to Lee an additional method to train the data, which will help to identify groups of entities within an entity network based on relationships between the entities. The learning process of Park will help to categorize data in a more accurate manner (see Park paragraph [0032]). Larkin teaches: Training, over a plurality of steps, a machine learning model to [match] files based on a hierarchy of characteristics, wherein at each of the plurality of steps, the ML model is trained to group files iteratively (see Larkin paragraphs [0043] and [0094]. Larkin relies upon a hierarchy of match models, such that each iteration is associated with a different matching model of a hierarchical string matching machine learning framework), and wherein at each progressive iteration the ML model learns to group files based on a characteristic from the hierarchy of characteristics that is progressively lower on the hierarchy of characteristics (see Larkin paragraphs [0043] and [0094]-[0096]. A sequence of progressively “lower” match models in a hierarchy is used with each iteration. It is noted that Applicant does not define “lower” nor provide any details regarding the nature of the hierarchy of characteristics); wherein the set of feature vectors is grouped based on the hierarchy of characteristics (see paragraphs [0043] and [0094]-[0096] for a hierarchy of characteristics. It is noted that Lee paragraphs [0024]-[0025] teach to calculate feature vectors based on characteristics. Larkin is simply relied upon to show wherein those characteristics used by a machine learning model may be based on a hierarchy); It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Larkin, because both references are directed towards searching for files using feature vectors. Larkin merely adds to Lee an additional method to recognizing matching of data, notably based on a hierarchy of matching characteristics. The process of Larkin will help to improve predictive accuracy of string-based machine learning models (see Larkin paragraph [0021]). As to claim 2, Lee as modified by Park teaches the method of claim 1, wherein the ML model is trained using training data comprising a plurality of training data batches, wherein each of the plurality of training data batches comprises a set of training files with a label for each characteristic in the hierarchy of characteristics (see Lee paragraph [0025]. The machine learning model is trained using a plurality of training data. Each of the model subspaces is trained using a different label for each characteristic. Larkin teaches the existence of a hierarchy of characteristics, see paragraphs [0043] and [0094]-[0096]) and wherein training the ML model comprises: at each of the plurality of steps: grouping, using the ML model, a respective training data batch iteratively based on the hierarchy of characteristics to generate an output for each iteration (see Park paragraph [0061] for training based on different clusters. See Park paragraph [0074], which indicates how there are multiple iterations) ; and for each iteration: analyzing the output with a hierarchical contrastive learning (HCL) loss function to determine a loss value (see Park paragraph [0061]. A contrastive learning loss function is used to identify hierarchical community loss); and adjusting one or more weights of the ML model based at least in part on the loss value (see Park paragraph [0075]. Loss is measured and weights are adjusted in the model). As to claim 3, Lee as modified by Park teaches the method of claim 2, wherein training the ML model further comprises: for each iteration: analyzing the output with a focal loss function to determine a second loss value (see Park paragraph [0061]. Two samples may be used to calculate a total loss value); and adding the loss value and the second loss value to generate a total loss value, wherein the one or more weights of the ML model are adjusted based on the total loss value (see Park paragraph [0061]. The two samples are added to generate a community loss. As noted in paragraph [0075], node weights may be adjusted to minimize a loss). As to claim 4, Lee as modified teaches the method of method of claim 1, wherein querying the feature vector database using the query feature vector comprises: using a nearest neighbors algorithm to identify from the feature vector database, one or more of the set of feature vectors that are similar to the query feature vector (see Lee paragraph [0160]). As to claim 5, Lee as modified teaches the method of method of claim 4, further comprising: for each of the identified one or more feature vectors, retrieving a file from the set of files corresponding to the identified feature vector to obtain the one or more of the set of files that are similar to the query file set (see Lee paragraphs [0028]-[0029] and [0160]); and providing the one or more of the set of files that are similar to the query file as a result set (see Lee paragraphs [0028]-[0029] and [0160]). As to claims 8 and 15, see the rejection of claim 1. As to claims 9 and 16, see the rejection of claim 2. As to claims 10 and 17, see the rejection of claim 3. As to claims 11 and 18, see the rejection of claim 4. As to claims 12 and 19, see the rejection of claim 5. Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US Pre-Grant Publication 2021/0294840), in view of Park et al. (US Pre-Grant Publication 2024/0160890), in view of Larkin et al. (US Pre-Grant Publication 2023/0057414), and further in view of Srivastava et al. (US Pre-Grant Patent 8,561,193). As to claim 6, Lee as modified teaches the method of method of claim 1. Lee does not teach wherein the hierarchy of characteristics comprises: threat type, malware family, subtype, compiler, packer, and library. Srivastava teaches wherein the hierarchy of characteristics comprises: threat type, malware family, subtype, compiler, packer, and library (see Srivastava 5:10-37. Srivastava teaches wherein each of these characteristics may be extracted and recorded as part of a file. It is noted that Lee extracts characteristics from a file to use when creating feature vectors. Larkin teaches the creation of a hierarchy of features. Srivastava simply shows wherein such features may be related to malware attributes, including those claimed. It is additionally note that the specific data types of the hierarchy do not appear to functionally change the invention, and that no order for the data types is claimed). It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Srivastava, because both references are directed towards extracting data. Srivastava merely adds to Lee an additional type of data entity that may be categorized and searched for using the system of Lee. This will make the search system of Lee be able to respond to additional types of user requests. As to claims 13 and 20, see the rejection of claim 6. Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US Pre-Grant Publication 2021/0294840), in view of Park et al. (US Pre-Grant Publication 2024/0160890), in view of Larkin et al. (US Pre-Grant Publication 2023/0057414), and further in view of Alme et al. (US Pre-Grant Patent 8,561,193). As to claim 7, Lee as modified teaches the method of method of claim 1, Lee does not teach wherein each of the set of files and the query file are portable executable files Alme teaches wherein each of the set of files and the query file are portable executable files (see Alme paragraphs [0033], [0035] and [0064]. Malicious executable files may be used for training. Additionally, executable files are searched when received). It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Lee by the teachings of Alme, because both references are directed towards extracting data and searching vectors of data. Alme merely adds to Lee an additional type of data entity that may be categorized and searched for using the system of Lee. This will make the search system of Lee be able to respond to additional types of user requests. As to claim 14, see the rejection of claim 7. Response to Arguments Applicant’s arguments with respect to claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES D ADAMS whose telephone number is (571)272-3938. The examiner can normally be reached M-F, 9-5:30 EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached at 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /CHARLES D ADAMS/ Primary Examiner, Art Unit 2152
Read full office action

Prosecution Timeline

Jun 27, 2024
Application Filed
Jun 12, 2025
Non-Final Rejection — §103, §112
Sep 15, 2025
Response Filed
Nov 01, 2025
Final Rejection — §103, §112
Jan 29, 2026
Applicant Interview (Telephonic)
Jan 30, 2026
Examiner Interview Summary
Feb 05, 2026
Request for Continued Examination
Feb 17, 2026
Response after Non-Final Action
Mar 07, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602392
SCALABLE METADATA-DRIVEN DATA INGESTION PIPELINE
2y 5m to grant Granted Apr 14, 2026
Patent 12591595
ADAPATIVE SYSTEM FOR PROCESSING DISTRIBUTED DATA FILES AND A METHOD THEREOF
2y 5m to grant Granted Mar 31, 2026
Patent 12572546
METHODS AND SYSTEMS FOR DISTRIBUTED DATA ANALYSIS
2y 5m to grant Granted Mar 10, 2026
Patent 12566778
OPTIMIZING JSON STRUCTURE
2y 5m to grant Granted Mar 03, 2026
Patent 12566706
PROVIDING ROLLING UPDATES OF DISTRIBUTED SYSTEMS WITH A SHARED CACHE
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
88%
With Interview (+44.2%)
5y 1m
Median Time to Grant
High
PTA Risk
Based on 423 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month