Last updated: April 19, 2026
Application No. 18/183,882
Method and Apparatus for Evaluating Similarity Between Files

Non-Final OA §103
Filed
Mar 14, 2023
Examiner
MARI VALCARCEL, FERNANDO MARIANO
Art Unit
2159
Tech Center
2100 — Computer Architecture & Software
Assignee
Crowdstrike Inc.
OA Round
1 (Non-Final)
This examiner grants 49% of cases after interview

— +22.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 145 resolved cases, 2023–2026
Examiner Intelligence

MARI VALCARCEL, FERNANDO MARIANO View full profile →
Grants 49% of resolved cases
Career Allow Rate
71 granted / 145 resolved
-6.0% vs TC avg
Strong +22% interview lift
Without
With
+22.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
40 currently pending
Career history
185
Total Applications
across all art units
Statute-Specific Performance

§101
13.5%
-26.5% vs TC avg
§103
66.1%
+26.1% vs TC avg
§102
13.2%
-26.8% vs TC avg
§112
5.1%
-34.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 145 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 4-5, 7-9, 11-12, 14-16 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bender et al. (US PGPUB No. 2021/0294970; Pub. Date: Sep. 23, 2021) in view of Powles et al. (US PGPUB No. 2022/0391627; Pub. Date: Dec. 8, 2022).
Regarding independent claim 1,
	Bender discloses a method, comprising: receiving a plurality of files; See Paragraph [0125], (Disclosing a system for processing natural-language text documents by mapping n-grams from concepts present in said documents for comparison using vector comparison techniques. Embodiments may obtain documents, tagged media files, data generated from media files and leverage an NLP system, NLU system or AI system to create a structured knowledge base of a knowledge fabric usable to provide data in response to queries, i.e. receiving a plurality of files;)
and creating, via a second parser, a high-level feature vector comprising a second plurality of values, each representing a corresponding one of a plurality of high-level features identified in the file; See Paragraph [0039], (The system may identify a vertex of an ontology graph based on a first embedding vector. Note [0061] wherein embedding vectors include a set of values in an embedding space for each n-gram in a document, i.e. creating, via a second parser, a high-level feature vector comprising a second plurality of values, each representing a corresponding one of a plurality of high-level features identified in the file;)
and creating, during a training workflow of a neural network model, a similarity space comprising a plurality of embedding vectors each corresponding to the respective pair of feature vectors for each of the received plurality of files, See Paragraph [0063], (A trained neural network I used to determine relationships between different n-grams or other values represented by ontology vertices. Note [0039] wherein a first embedding vector may be used to determine a closest embedding vector in an embedding space such as by determining a distance between the first embedding vector and a second embedding vector.) See Paragraph [0087], (Embedding vectors may correspond to different clusters associated with different contexts and domains. Note [0171] wherein the system may determine associations between concepts based on a shared set of n-grams, a shared set of documents, etc., i.e. creating, during a training workflow of a neural network model, a similarity space comprising a plurality of embedding vectors each corresponding to the respective pair of feature vectors for each of the received plurality of files, (e.g. a trained neural network is used to determine relationships between n-grams wherein the n-grams are represented as vectors ).)
wherein a proximity of two of the plurality of embedding vectors in the similarity space is based on a proximity of respective high-level feature vectors for a corresponding two of the received plurality of files. See Paragraph [0039], (An embedding space may be determined based on a first embedding vector. The system may determine a distance between two embedding vectors and select a vertex based on the distance satisfying a distance threshold, i.e. wherein a proximity of two of the plurality of embedding vectors in the similarity space is based on a proximity of respective high-level feature vectors for a corresponding two of the received plurality of files.)
Bender does not disclose the step of creating a respective pair of feature vectors for each of the received plurality of files, comprising: creating, via a first parser, a low-level feature vector comprising a first plurality of values, each representing a corresponding one of a plurality of low-level features identified in the file;
Powles discloses the step of creating a respective pair of feature vectors for each of the received plurality of files, comprising: creating, via a first parser, a low-level feature vector comprising a first plurality of values, each representing a corresponding one of a plurality of low-level features identified in the file; See Paragraphs [0042]-[0043], (Disclosing a system for generating 3-D virtual representations of a building construction structure. The system may process vector space data which may include low-level vector spaces and high-level vector spaces. Note [0349] Data objects may be converted into feature low-level vector space representations via a feature vector space model, i.e. creating a respective pair of feature vectors for each of the received plurality of files, comprising: creating, via a first parser, a low-level feature vector comprising a first plurality of values, each representing a corresponding one of a plurality of low-level features identified in the file (e.g. the system may process a received document such as PDF file, image file, etc.) ;)
Bender and Powles are analogous art because they are in the same field of endeavor, document processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Bender to include the method of generating low-level and high-level vectors representing information contained in a plurality of documents as disclosed by Powles. Paragraphs [0153]-[0161] of Powles disclose that the system may allow for fast comparison of documents such as architectural plans based on any desired parameter.

Regarding dependent claim 2,
As discussed above with claim 1, Bender-Powles discloses all of the limitations.
Powles further discloses the step of identifying, via the first parser, the plurality of low-level features in the file; See Paragraph [0349], (A data object detection and recognition algorithm may convert object data into feature low-level vector space representations via a feature vector space model, identifying, via the first parser, the plurality of low-level features in the file;)
and identifying, via the second parser, the plurality of high-level features in the file. See Paragraph [0352], (The system may generate high-level feature vectors via a one-shot learning process. Note [0145] wherein the one-shot learning process is used in combination with machine learning algorithms in order to recognize objects on a building construction plan, i.e. identifying, via the second parser, the plurality of high-level features in the file.)

Regarding dependent claim 4,
As discussed above with claim 1, Bender-Powles discloses all of the limitations.
Powles further discloses the step wherein creating, during the training workflow of the neural network model, the similarity space comprising the plurality of embedding vectors each corresponding to the respective pair of feature vectors for each of the received plurality of files, See Paragraphs [0042]-[0043], (The system may process vector space data which  may include low-level vector spaces and high-level vector spaces.) See Paragraph [0349], (Data objects may be converted into feature low-level vector space representations via a feature vector space model.) See Paragraph [0373], (A vectorization process performed on identified objects so applied to a set of identified features in order to provide a high-level vector space of concepts relating to data items present in the processed document(s), i.e. wherein creating, during the training workflow of the neural network model, the similarity space comprising the plurality of embedding vectors each corresponding to the respective pair of feature vectors for each of the received plurality of files)
Additionally, Bender further discloses the step wherein the proximity of two of the plurality of embedding vectors in the similarity space is based on the proximity of respective high-level feature vectors for the corresponding two of the received plurality of files, comprises: creating, during the training workflow of the neural network model, an initial similarity space comprising the plurality of embedding vectors each corresponding to a respective low-level feature vector for each of the received plurality of files, wherein an initial proximity of two of the plurality of embedding vectors in the initial similarity space is based on a proximity of the corresponding low-level feature vectors for a corresponding two of the received plurality of files; See Paragraph [0063], (A trained neural network I used to determine relationships between different n-grams or other values represented by ontology vertices. Note [0039] wherein a first embedding vector may be used to determine a closest embedding vector in an embedding space such as by determining a distance between the first embedding vector and a second embedding vector, i.e. wherein an initial proximity of two of the plurality of embedding vectors in the initial similarity space is based on a proximity of the corresponding low-level feature vectors for a corresponding two of the received plurality of files;) See Paragraph [0087], (Embedding vectors may correspond to different clusters associated with different contexts and domains. Note [0171] wherein the system may determine associations between concepts based on a shared set of n-grams, a shared set of documents, etc., i.e. creating, during a training workflow of a neural network model, a similarity space comprising a plurality of embedding vectors each corresponding to the respective pair of feature vectors for each of the received plurality of files, (e.g. a trained neural network is used to determine relationships between n-grams wherein the n-grams are represented as vectors ).)
and transforming the initial similarity space into the similarity space by adjusting, based on the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files, the initial proximity of two of the plurality of embedding vectors in the initial similarity space to yield the proximity of two of the plurality of embedding vectors in the similarity space. See Paragraph [0076], (The system may employ K-means clustering after determining an initial set of centroids of vectors in a multi-sense embedding space. A set of pairwise distances between the set of neighboring vertices and the centroid in the embedding space are use to determine an initials et of centroids, which are then re-computed based on the set of neighboring vertices, i.e. transforming the initial similarity space into the similarity space by adjusting, based on the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files, the initial proximity of two of the plurality of embedding vectors in the initial similarity space to yield the proximity of two of the plurality of embedding vectors in the similarity space.)

Regarding dependent claim 5,
As discussed above with claim 1, Bender-Powles discloses all of the limitations.
Bender further discloses the step wherein the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files approximates a distance between the respective high-level feature vectors for the corresponding two of the received plurality of files. See Paragraph [0039], (An embedding space may be determined based on a first embedding vector. The system may determine a distance between two embedding vectors and select a vertex based on the distance satisfying a distance threshold, i.e. wherein the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files approximates a distance between the respective high-level feature vectors for the corresponding two of the received plurality of files.)

Regarding dependent claim 7,
	As discussed above with claim 4, Bender-Powles discloses all of the limitations.
Bender further discloses the step wherein the initial proximity of two of the plurality of embedding vectors in the initial similarity space approximates a Euclidian distance between the two of the plurality of embedding vectors in the initial similarity space; See Paragraph [0038], (See Paragraph [0202], (The system may update an embedding vector or its distance based on an associated between vertices corresponding with the ontology graphs. Note [0175] wherein distance metrics may be calculated as Euclidean distances in a domain category value space, i.e. wherein the initial proximity of two of the plurality of embedding vectors in the initial similarity space approximates a Euclidian distance between the two of the plurality of embedding vectors in the initial similarity space;)
wherein transforming the initial similarity space into the similarity space by adjusting, based on the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files, the initial proximity of two of the plurality of embedding vectors in the initial similarity space to yield the proximity of two of the plurality of embedding vectors in the similarity space comprises adjusting, based on the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files, the Euclidian distance between the two of the plurality of embedding vectors in the initial similarity space to yield the proximity of two of the plurality of embedding vectors in the similarity space. See Paragraph [0038], (The system may identify a vertex of an ontology graph based on shared n-grams which may be mapped to different learned representations which includes embedding vectors.) See Paragraph [0202], (The system may update an embedding vector or its distance based on an associated between vertices corresponding with the ontology graphs. Note [0175] wherein distance metrics may be calculated as Euclidean distances in a domain category value space, i.e. adjusting, based on the proximity of the respective high-level feature vectors for the corresponding two of the received plurality of files, the Euclidian distance between the two of the plurality of embedding vectors in the initial similarity space to yield the proximity of two of the plurality of embedding vectors in the similarity space (e.g. vector distances may be calculated as a Euclidean distance which may be updated).)  

Regarding independent claim 8,
	The claim is analogous to the subject matter of independent claim 1 directed to a computer system and is rejected under similar rationale.

Regarding dependent claim 9,
	The claim is analogous to the subject matter of dependent claim 2 directed to a computer system and is rejected under similar rationale.

Regarding dependent claim 11,
	The claim is analogous to the subject matter of dependent claim 4 directed to a computer system and is rejected under similar rationale.

Regarding dependent claim 12,
	The claim is analogous to the subject matter of dependent claim 5 directed to a computer system and is rejected under similar rationale.

Regarding dependent claim 14,
	The claim is analogous to the subject matter of dependent claim 7 directed to a computer system and is rejected under similar rationale.

Regarding independent claim 15,
	The claim is analogous to the subject matter of independent claim 1 directed to a non-transitory, computer readable medium and is rejected under similar rationale.

Regarding dependent claim 16,
The claim is analogous to the subject matter of dependent claim 2 directed to a non-transitory, computer readable medium and is rejected under similar rationale.

Regarding dependent claim 18,
The claim is analogous to the subject matter of dependent claim 4 directed to a non-transitory, computer readable medium and is rejected under similar rationale.
Regarding dependent claim 19,
	The claim is analogous to the subject matter of dependent claim 5 directed to a non-transitory, computer readable medium and is rejected under similar rationale.

Claim(s) 3, 10 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bender in view of Powles as applied to claim 1 above, and further in view of Sherman et al. (US PGPUB No. 2022/0415051; Pub. Date: Dec. 29, 2022).
Regarding dependent claim 3,
As discussed above with claim 1, Bender-Powles discloses all of the limitations.
Powles further discloses the step of storing the similarity space comprising the plurality of embedding vectors in a vector database; See Paragraph [0376], (The system may group a plurality of feature vectors representing architectural metadata into a high-level database, i.e. storing the similarity space comprising the plurality of embedding vectors in a vector database;)
creating a feature vector for the new file, comprising a plurality of low-level features in the new file; See Paragraph [0349], (Each object identified by the object detection and recognition algorithm is converted into feature low-level vector space representations using a feature vector space model, i.e. creating a feature vector for the new file, comprising a plurality of low-level features in the new file;)
Additionally, Bender further discloses the step of receiving a new file; See FIG. 3 & Paragraph [0057], (FIG. 3 illustrates method 300 comprising step 304 wherein the system may receive a corpus of text including documents from various sources, i.e. receiving a new file;)
computing, during an inference workflow of the neural network model, a new embedding vector in the similarity space corresponding to the feature vector for the new file, See FIG. 3 & Paragraph [0061], (Method 300 comprises step 308 of determining a learned representation of n-grams based on the received documents. A learned representation may include a set of embedding vectors. Note [0064] wherein the neural network model generates the embedding vectors, i.e. computing, during an inference workflow of the neural network model, a new embedding vector in the similarity space corresponding to the feature vector for the new file)
wherein a proximity of the new embedding vector to any one of the plurality of embedding vectors in the similarity space is based on a proximity of the feature vectors for the new file and a corresponding any one of the received plurality of files; See Paragraph [0063], (A trained neural network I used to determine relationships between different n-grams or other values represented by ontology vertices. Note [0039] wherein a first embedding vector may be used to determine a closest embedding vector in an embedding space such as by determining a distance between the first embedding vector and a second embedding vector.) See Paragraph [0087], (Embedding vectors may correspond to different clusters associated with different contexts and domains. Note [0171] wherein the system may determine associations between concepts based on a shared set of n-grams, a shared set of documents, etc., i.e. wherein a proximity of the new embedding vector to any one of the plurality of embedding vectors in the similarity space is based on a proximity of the feature vectors for the new file and a corresponding any one of the received plurality of files;)
Bender-Powles does not disclose the step of querying the vector database to output an indication about the new file based on the proximity of the new embedding vector for the new file in the similarity space to the plurality of embedding vectors for the received plurality of files in the similarity space.
Sherman discloses the step of querying the vector database to output an indication about the new file based on the proximity of the new embedding vector for the new file in the similarity space to the plurality of embedding vectors for the received plurality of files in the similarity space. See Paragraph [0049], (Disclosing a system for obtaining image data from a sensor. The system may perform search operations over a plurality of feature vectors stored in a custom vector database for similarity search which includes similarity metric calculations such as Euclidian distance, inner product, Hamming distance, Jaccard distance, cosine similarity, etc. Note [0070] wherein extraction engine may store vectors in a database including newly generated vectors. Control unit 310 may obtain stored vectors and compare said vectors with newly generated vectors having the same one or more features, i.e. querying the vector database to output an indication about the new file based on the proximity of the new embedding vector for the new file in the similarity space to the plurality of embedding vectors for the received plurality of files in the similarity space.
Bender, Powles and Sherman are analogous art because they are in the same field of endeavor, document processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Bender-Powles to include the method of performing search operations over a corpus of vectors in a vector database as disclosed by Sherman. Paragraph [0049] of Sherman discloses that the search process includes one or more components to improve operation accuracy and efficiency. These benefits are achieved via the use of a custom vector database for performing similarity searches.

Regarding dependent claim 10,
	The claim is analogous to the subject matter of dependent claim 3 directed to a computer system and is rejected under similar rationale.

Regarding dependent claim 17,
The claim is analogous to the subject matter of dependent claim 3 directed to a non-transitory, computer readable medium and is rejected under similar rationale.

Claim(s) 6, 13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bender in view of Powles as applied to claim 1 above, and further in view of Ho (US PGPUB No. 2022/0351089; Pub. Date: Nov. 3, 2022).
Regarding dependent claim 6,
As discussed above with claim 1, Bender-Powles discloses all of the limitations.
Bender-Powles does not disclose the step wherein the distance between the respective high-level feature vectors for the corresponding two of the received plurality of files is based on one of a hamming loss, a Jaccard ratio, and a mean square error (MSE), calculated for the respective high-level feature vectors for the corresponding two of the received plurality of files.
Ho discloses the step wherein the distance between the respective high-level feature vectors for the corresponding two of the received plurality of files is based on one of a hamming loss, a Jaccard ratio, and a mean square error (MSE), calculated for the respective high-level feature vectors for the corresponding two of the received plurality of files. See Paragraph [0031], (Disclosing a method for segmenting text using a machine learning model for language processing. Training the model includes the use of a noise contrastive estimator (NCE) loss function which includes similarity functions for determining similarity between sentence or sentence fragment pairings based on embedding vectors provided by an encoder function. Examples of determining similarity in text-based documents include Jaccard distance, Cosine distance, Euclidean distance, Relaxed Word Mover's Distance, i.e.  wherein the distance between the respective high-level feature vectors for the corresponding two of the received plurality of files is based on a Jaccard ratio.)
Bender, Powles and Ho are analogous art because they are in the same field of endeavor, document processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Bender-Powles to include the method of determining document similarity as disclosed by Ho. Paragraphs [0019]-[0020] of Ho disclose that the process represents an improvement in the field of text segmentation by grouping sentences of an input text into coherent paragraphs and said paragraphs may be further grouped into topically consistent sections such that the system may appropriately segment unstructured text inputs, making the documents and information easier for a person to understand.

Regarding dependent claim 13,
	The claim is analogous to the subject matter of dependent claim 6 directed to a computer system and is rejected under similar rationale.

Regarding dependent claim 19,
The claim is analogous to the subject matter of dependent claim 6 directed to a non-transitory, computer readable medium and is rejected under similar rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Fernando M Mari whose telephone number is (571)272-2498. The examiner can normally be reached Monday-Friday 7am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached at (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/FMMV/Examiner, Art Unit 2159    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2159
Read full office action
Prosecution Timeline

Mar 14, 2023
Application Filed
Dec 02, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/113,197
Patent 12591588
CATEGORICAL SEARCH USING VISUAL CUES AND HEURISTICS
2y 5m to grant Granted Mar 31, 2026
17/752,657
Patent 12547593
METHOD AND APPARATUS FOR SHARING FAVORITE
2y 5m to grant Granted Feb 10, 2026
17/574,001
Patent 12505129
Distributed Database System
2y 5m to grant Granted Dec 23, 2025
18/110,649
Patent 12499123
ACTOR-BASED INFORMATION SYSTEM
2y 5m to grant Granted Dec 16, 2025
18/525,438
Patent 12499121
REAL-TIME MONITORING AND REPORTING SYSTEMS AND METHODS FOR INFORMATION ACCESS PLATFORM
2y 5m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
49%
Grant Probability
71%
With Interview (+22.0%)
3y 10m
Median Time to Grant
Low
PTA Risk
Based on 145 resolved cases by this examiner. Grant probability derived from career allow rate.