Last updated: April 19, 2026

Application No. 19/196,767

SYSTEM AND METHOD FOR CLASSIFICATION AND RECLASSIFICATION OF STRUCTURED AND UNSTRUCTURED DATA USING SIMILARITY-BASED SIGNATURES

Non-Final OA §103

Filed

May 02, 2025

Examiner

HALE, BROOKS T

Art Unit

2166

Tech Center

2100 — Computer Architecture & Software

Assignee

Securiti Inc.

OA Round

1 (Non-Final)

This examiner grants 49% of cases after interview

— +31.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 74 resolved cases, 2023–2026

Examiner Intelligence

HALE, BROOKS T View full profile →

Grants 49% of resolved cases

Career Allow Rate

36 granted / 74 resolved

-6.4% vs TC avg

Strong +31% interview lift

Without

With

+31.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

37 currently pending

Career history

111

Total Applications

across all art units

Statute-Specific Performance

§101

22.3%

-17.7% vs TC avg

§103

61.3%

+21.3% vs TC avg

§102

10.1%

-29.9% vs TC avg

§112

3.0%

-37.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 74 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
Claims 1-20 are pending.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-20 are rejected under 35 U.S.C. 103 as being unpatentable over Selman et al (US 12411896 B1) hereafter Selman in view of Yan et al (US 20140089400 A1) hereafter Yan further in view of Sarrazin et al (US 20150081369 A1) hereafter Sarrazin
Regarding claim 1, Selman teaches a system comprising: a processor; and a machine-readable storage medium comprising instructions executable by the processor to: detect, by a pre-trained intelligence model, a plurality of entities within a text document of structured and unstructured data (Para 120, The data 504 can be in different formats such as structured, unstructured or semi-structured data); generate, from each of the plurality of entities, multi-level embeddings configured to capture contextual relationships, wherein the embeddings enable calculation of similarity metrics and generation of similarity-based signatures (Para 171, The contexts may include various combinations of a direct lookup for specific values, vector similarity search results, and/or any other contexts); cluster the plurality of entities based on the embeddings for at least one of visualization and batch classification (Para 117, A few examples of ML algorithms include support vector machine (SVM), random forests, naive Bayes, K-means clustering, neural networks, and so forth. A SVM is an algorithm that can be used for both classification and regression problems).
Selman does not appear to explicitly teach provide, by a user interface, an option for a user to submit feedback on the clustering results, wherein the feedback comprises identification of cluster assignments as one of a true positive and a false positive; and reclassify at least one of the plurality of entities based on user feedback wherein the reclassification iteratively refines the artificial intelligence model and facilitates adaptive self calibration of structured and unstructured data management.
In analogous art, Yan teaches provide, by a user interface, an option for a user to submit feedback on the clustering results, wherein the feedback comprises identification of cluster assignments as one of a true positive and a false positive (Para 0034, Negative feedback, such as an "X-out" action that indicates the advertisement was repetitive, irrelevant, offensive, or otherwise objectionable to the viewing user, as well as positive feedback in the form of clicking through the advertisements may be used in measuring the performance of these clusters); and reclassify at least one of the plurality of entities based on user feedback wherein the reclassification iteratively refines the artificial intelligence model and facilitates adaptive self calibration of structured and unstructured data management (Para 0023, feedback information may be used in modifying the scoring model used to determine the inference). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Selman to include the teaching of Yan. One of ordinary skill in the art would be motivated to implement this modification in order to perform accurate data clustering, as taught by Yan (Para 0005, The targeting cluster may be tested for accuracy using performance testing).
Selman in view of Yan teaches clustering. However,  Selman in view of Yan does not appear to explicitly teach wherein the clustering comprises: a first mode configured to classify the plurality of entities based on header information and data types; and a second mode configured to classify the plurality of entities based on semantic meaning and format characteristics of one or more column data.
In analogous art, Sarrazin teaches wherein the clustering comprises: a first mode configured to classify the plurality of entities based on header information and data types (Para 0085, the electronic device 100 searches for emails with attendee names listed in the email fields "To", "From", "cc", and "bcc".); and a second mode configured to classify the plurality of entities based on semantic meaning and format characteristics of one or more column data (Para 0085, device 100 may also search for emails with synonyms of the attendee names ). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Selman in view of Yan to include the teaching of Sarrazin. One of ordinary skill in the art would be motivated to implement this modification in order to determine relevance data, as taught by Sarrazin (Abs, The related files can be ordered or ranked according to confidence values. The files are then displayed as suggestions).
Regarding claim 2, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein the self-calibration is performed by dividing each column into non-overlapping subsets and subsequently comparing the similarities between the said subsets to obtain an aggregated result that is used to compare a first column and a second column (Selman, Para 186, One way to aggregate the embeddings is to take the mean or the maximum of the embeddings across all tokens in the sequence. This can be useful for tasks such as document content classification or sentiment analysis, where the search model 1406 assigns a label or score to a portion of a document or the entire document based on its content).
Regarding claim 3, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 2, wherein to cause to aggregate the similarities into a similarity threshold for further comparison of the plurality of columns (Selman, Para 30, Nodes in the document graph structure may include natural language text and may store vector embeddings representative of such text, which, upon receiving of a query, may allow execution of a semantic similarity search to find nodes and/or edges in the document graph structure that may be semantically similar to the natural language text).
Regarding claim 4, Selman in view of Yan further in view of Sarrazin teaches the system as claimed inclaim1, wherein the self-calibration enables dynamic adjustment of the similarity metrics based on internal column characteristics (Selman, Para 122, The training process involves feeding the pre-processed data 516 into the ML algorithm 524 to produce or optimize an ML model 330. The training process adjusts its parameters until it achieves an initial level of satisfactory performance).
Regarding claim 6, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein to cause to allow the user to select a column of interest from the classified plurality of entities for analysis (Selman, Para 27, a user may need to search for information within a collection of electronic documents, such as warranty terms, contractual obligations, pricing information, and so forth).
Regarding claim 7, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein to cause to display one or more similarities between the plurality of columns using a distance metric via the user interface (Selman, Para 31, execution of a semantic similarity search to find nodes and/or edges in the document graph structure that may be semantically similar to the natural language text).
Regarding claim 8, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 4, wherein to cause to allow the user to adjust schema, content and morphological components to determine the similarities between the plurality of columns (Selman, Para 178, The document graph engine 150 may use the lexical search generator 1434 to perform lexical searching in response to query 218 from the user device 216).

Regarding claim 9, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 3, wherein to cause to enable the user to filter the plurality of entities to focus on a plurality of columns with the selected column of interest (Selman, Para 30, a semantic similarity search to find nodes and/or edges in the document graph structure that may be semantically similar to the natural language text).
Regarding claim 10, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein to cause to assign a consistency score to perform at least one of direct the user to the column of interest for review and automatically update the column of interest (Selman, Para 123, This is done using various metrics such as accuracy, precision, recall, and F1 score).
Regarding claim 11, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein to cause to generate a multi-level similarity score by combining a plurality of similarity measurements using a classifier (Selman, Para 194, Elasticsearch will return the top matching documents based on their similarity scores. Elasticsearch also provides various options for customizing the indexing, searching, and scoring of the embeddings, as well as integrating with other natural language processing tools and frameworks).
Regarding claim 12, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein the clustering uses at least one of a cosine similarity and a Euclidean distance between the embeddings for measuring similarity between the plurality of entities (Selman, Para 194, The document graph engine 150 can then search for similar documents by specifying a query embedding and using the cosine similarity as the similarity metric).
Regarding claim 13, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein each column of the plurality of entities is embedded as a high-dimensional vector (Selman, Para 40, document portion(s) included in each node in the document graph structure may be represented by at least one vector embedding).
Regarding claim 14, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein the embeddings are stored in a database to enable further clustering as required (Selman, Para 95, In determining relationships between various document portions of one or more documents, the engine 208 may cluster and/or group document portions into one or more groups based on various factors, functions, etc.).
Regarding claim 15, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 14, wherein the stored embeddings are used to perform clustering (Selman, Para 30, Nodes in the document graph structure may include natural language text and may store vector embeddings representative of such text, which, upon receiving of a query, may allow execution of a semantic similarity search to find nodes and/or edges in the document graph structure that may be semantically similar to the natural language text).
Regarding claim 16, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein the first mode signifies a table schema clustering, and the second mode signifies a column content clustering (Selman, Para 120, The data 504 can be in different formats such as structured, unstructured or semi-structured data).
Regarding claim 17, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein the feedback enables meaningful interaction by providing visual cues and interactive elements to the user (Selman, Para 99, The document graph engine 150 may provide the results of the search and/or the candidate document vectors to a user via a graphical user interface (GUI) on a client device).
Regarding claim 18, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, wherein the feedback on the clustering of entities is utilized to directly assign initial classifications to one or more clusters, wherein the feedback comprises at least one of confirming a cluster as representative of a classification category and modifying a cluster to define a new classification category, thereby enabling initial classification of entities (Selman, Para 95, In determining relationships between various document portions of one or more documents, the engine 208 may cluster and/or group document portions into one or more groups based on various factors, functions, etc.).
Claim 19 is the method claim corresponding to the system claim 1, and is analyzed and rejected accordingly.
Claim 20 is the medium claim corresponding to the system claim 1, and is analyzed and rejected accordingly.
Claim 5 are rejected under 35 U.S.C. 103 as being unpatentable over Selman in view of Yan in view of Sarrazin further in view of Butvinik et al (US 20250272514 A1) hereafter Butvinik
Regarding claim 5, Selman in view of Yan further in view of Sarrazin teaches the system as claimed in claim 1, as shown above.  Selman in view of Yan further in view of Sarrazin does not appear to explicitly teach wherein the embeddings is All-MiniLM-L6-v2 to distinguish between the plurality of columns.
In analogous art, Butvinik teaches  wherein the embeddings is All-MiniLM-L6-v2 to distinguish between the plurality of columns (Para 0067, The global coherence score may then be used by a sentence transformer model for further sentence embeddings that capture complex sentence relationships. For example, the sentence transformer may correspond to the “all-MiniLM-L6-v2” model provided by SentenceTransformer or similar model). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify The global coherence score may then be used by a sentence transformer model for further sentence embeddings that capture complex sentence relationships. For example, the sentence transformer may correspond to the “all-MiniLM-L6-v2” model provided by SentenceTransformer or similar model to include the teaching of Butvinik. One of ordinary skill in the art would be motivated to implement this modification in order to determine similarity, as taught by Butvinik (Para 0067, The global coherence score may then be used by a sentence transformer model for further sentence embeddings that capture complex sentence relationships).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brooks Hale whose telephone number is 571-272-0160. The examiner can normally be reached 9am to 5pm est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sanjiv Shah can be reached on (571) 272-4098. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.T.H./Examiner, Art Unit 2166              

/SANJIV SHAH/Supervisory Patent Examiner, Art Unit 2166

Read full office action

Prosecution Timeline

May 02, 2025

Application Filed

Feb 16, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/983,997

Patent 12572584

DATA STORAGE METHOD AND APPARATUS BASED ON BLOCKCHAIN NETWORK

2y 5m to grant Granted Mar 10, 2026

17/031,199

Patent 12561344

CLASSIFICATION INCLUDING CORRELATION

2y 5m to grant Granted Feb 24, 2026

18/372,589

Patent 12561309

CORRELATION OF HETEROGENOUS MODELS FOR CAUSAL INFERENCE

2y 5m to grant Granted Feb 24, 2026

18/443,838

Patent 12561375

ENHANCED SEARCH RESULT GENERATION USING MULTI-DOCUMENT SUMMARIZATION

2y 5m to grant Granted Feb 24, 2026

17/164,412

Patent 12555669

SYSTEMS AND METHODS FOR GENERATING AN INTEGUMENTARY DYSFUNCTION NOURISHMENT PROGRAM

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

49%

Grant Probability

80%

With Interview (+31.4%)

3y 3m

Median Time to Grant

Low

PTA Risk

Based on 74 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEM AND METHOD FOR CLASSIFICATION AND RECLASSIFICATION OF STRUCTURED AND UNSTRUCTURED DATA USING SIMILARITY-BASED SIGNATURES

This examiner grants 49% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email