Last updated: May 29, 2026
Application No. 18/596,124
Data Pre-Processing for Search Indexes and Machine Learning Pipelines With a Common Specification

Non-Final OA §101§102§103
Filed
Mar 05, 2024
Examiner
AGHARAHIMI, FARHAD
Art Unit
2161
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
1 (Non-Final)
Interview Optional

— +14.5% interview lift. Interview lift (+14.5%) is below the 15.0% threshold. A written response is recommended.
Based on 275 resolved cases, 2023–2026
Examiner Intelligence

AGHARAHIMI, FARHAD View full profile →
Grants 70% — above average
Career Allowance Rate
194 granted / 275 resolved
+15.5% vs TC avg
Moderate +14% lift
Without
With
+14.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
15 currently pending
Career history
304
Total Applications
across all art units
Statute-Specific Performance

§101
4.1%
-35.9% vs TC avg
§103
93.2%
+53.2% vs TC avg
§102
0.7%
-39.3% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 275 resolved cases
Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 5, 2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-4, 6-10, and 12-20 are rejected under 35 U.S.C. 101 as being directed to a judicial exception without significantly more.
Regarding Independent Claims 1, 14, and 20, the Claims recite the following method steps:
receiving a specification for processing target data;
receiving the target data; and
processing the target data in accordance with the specification through one or both of a machine learning pre-processing sub-system and a search sub-system, wherein the machine learning pre-processing sub-system and the search sub-system are configured to receive the specification and execute different operations using the specification.
It is the position of the Examiner that the method steps claimed above are directed to an abstract mental process, as the act of collecting data, pre-processing it, and outputting it without using it to perform a computer task constitutes the type of activities that can be performed within the human mind or with the aid of pen and paper (see MPEP 2106.04(a)(2)(III)(A), where claims do recite a mental process when they contain limitations that can practically be performed in the human mind, including for example, observations, evaluations, judgments, and opinions. Examples of claims that recite mental processes include: a claim to “collecting information, analyzing it, and displaying certain results of the collection and analysis,” where the data analysis steps are recited at a high level of generality such that they could practically be performed in the human mind, Electric Power Group v. Alstom, S.A., 830 F.3d 1350, 1353-54, 119 USPQ2d 1739, 1741-42 (Fed. Cir. 2016)).
The additional method step of outputting the processed target data, fails to integrate the claim into a practical application or provide significantly more because it constitutes insignificant extra-solution activity (see MPEP 2106.05(g)(3)).
The additional computer elements, including:
one or more processors; and
one or more or more non-transitory computer-readable storage media.
fail to integrate the claim into a practical application or provide significantly more because they are generic computer components recited at a high level of generality and thus constitutes “apply it” language (see MPEP 2106.05(f)(2)).
Regarding dependent Claims 2-4, 6-10, 12, 13, and 15-20, the claims are directed to additional data processing steps and are thus directed to the same abstract mental process set forth above.
Regarding dependent Claims 5 and 11, it is the position of the Examiner that the claims integrate into a practical application and are thus not directed to an abstract idea without significantly more.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 11, 14-16, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by GPTutorPro (“Elasticsearch for ML: Data Ingestion and Preprocessing”, https://gpttutorpro.com/elasticsearch-for-ml-data-ingestion-and-preprocessing/, January 31, 2024).
Regarding Claim 1, GPTutorPro discloses a system comprising:
one or more processors (see GPTutorPro, Section 1, paragraph 2, where ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it [it is the position of the Examiner that a system suggests the use of one or more processors]) configured to:
receive a specification for processing target data (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch. An ingest node can have one or more ingest plugins installed, which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding. You can define a pipeline, which is a set of processors that specify the transformations to be applied by the ingest plugins; see also Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]);
receive the target data (see GPTutorPro, Section 1 paragraph 2, where Data ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it);
process the target data in accordance with the specification through one or both of a machine learning pre-processing sub-system and a search sub-system, wherein the machine learning pre-processing sub-system and the search sub-system are configured to receive the specification and execute different operations using the specification (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]. An ingest node can have one or more ingest plugins installed [emphasis added by Examiner; it is the position of the Examiner that one or more ingest plugins is not patentably distinguishable from executing different operations defined by a specification], which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding. You can define a pipeline, which is a set of processors that specify the transformations to be applied by the ingest plugins; see also Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]); and
output the processed target data (see GPTutorPro, Section 1, paragraph 2, where Data ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it.).
Regarding Claim 2, GPTutorPro discloses the system of Claim 1, wherein:
the machine learning pre-processing sub-system is configured to perform data pre-processing operations using the specification (see GPTutorPro, Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]); and
the search sub-system is configured to pre-process the target data using the specification, and generate a search index for the pre-processed target data (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]).
Regarding Claim 3, GPTutorPro discloses the system of Claim 1, wherein:
the machine learning pre-processing sub-system is configured to execute one or more machine learning pre-processing operations using the specification (see GPTutorPro, Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]), and
the search sub-system is configured to execute one or more search operations different from the one or more machine learning pre-processing operations and using the specification (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]. An ingest node can have one or more ingest plugins installed [emphasis added by Examiner; it is the position of the Examiner that one or more ingest plugins is not patentably distinguishable from executing different operations defined by a specification], which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding).
Regarding Claim 11, GPTutorPro discloses the system of Claim 1, wherein the one or more processors are further configured to train or execute a machine learning model using the processed target data (see GPTutorPro, Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]).
Regarding Claim 14, GPTutorPro discloses a method, comprising:
receiving, by the one or more processors, a specification for processing target data (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch. An ingest node can have one or more ingest plugins installed, which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding. You can define a pipeline, which is a set of processors that specify the transformations to be applied by the ingest plugins; see also Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]);
receiving, by the one or more processors, the target data (see GPTutorPro, Section 1 paragraph 2, where Data ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it);
processing, by the one or more processors, the target data in accordance with the specification through one or both of a machine learning pre-processing sub-system and a search sub-system, wherein the machine learning pre-processing sub-system and the search sub-system are configured to receive the specification and execute different operations using the specification (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]. An ingest node can have one or more ingest plugins installed [emphasis added by Examiner; it is the position of the Examiner that one or more ingest plugins is not patentably distinguishable from executing different operations defined by a specification], which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding. You can define a pipeline, which is a set of processors that specify the transformations to be applied by the ingest plugins; see also Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]); and
outputting, by the one or more processors, the processed target data (see GPTutorPro, Section 1, paragraph 2, where Data ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it.).
Regarding Claim 15, GPTutorPro discloses the method of Claim 14, wherein the method further comprises:
when the target data is processed through the machine learning pre-processing sub-system, performing data pre-processing operations using the specification (see GPTutorPro, Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]); and
when the target data is processed through the search sub-system, pre-processing the target data using the specification and generating a search index for the pre-processed data (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]).
Regarding Claim 16, GPTutorPro discloses the method of Claim 15, wherein:
the machine learning pre-processing sub-system is configured to execute one or more machine learning pre-processing operations (see GPTutorPro, Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]), and
the search sub-system is configured to execute one or more search operations different from the one or more machine learning pre-processing operations using the specification (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]. An ingest node can have one or more ingest plugins installed [emphasis added by Examiner; it is the position of the Examiner that one or more ingest plugins is not patentably distinguishable from executing different operations defined by a specification], which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding).
Regarding Claim 20, GPTutorPro discloses a one or more non-transitory computer-readable storage media storing instructions that operable when performed by one or more processors, to cause the one or more processors to perform operations comprising:
receiving a specification for processing target data (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch. An ingest node can have one or more ingest plugins installed, which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding. You can define a pipeline, which is a set of processors that specify the transformations to be applied by the ingest plugins; see also Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]);
receiving the target data (see GPTutorPro, Section 1 paragraph 2, where Data ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it);
processing the target data in accordance with the specification through one or both of a machine learning pre-processing sub-system and a search sub-system, wherein the machine learning pre-processing sub-system and the search sub-system are configured to receive the specification and execute different operations using the specification (see GPTutorPro, Section 2, paragraph 2, where an ingest node is a special type of node in an Elasticsearch cluster that can apply transformations to the data before indexing it into Elasticsearch [emphasis added by Examiner; it is the position of the Examiner that indexing is not patentably distinguishable from pre-processing input data for a search subsystem]. An ingest node can have one or more ingest plugins installed [emphasis added by Examiner; it is the position of the Examiner that one or more ingest plugins is not patentably distinguishable from executing different operations defined by a specification], which are modules that provide specific functionality for data ingestion, such as parsing, enriching, or geocoding. You can define a pipeline, which is a set of processors that specify the transformations to be applied by the ingest plugins; see also Section 1, paragraph 1, where Elasticsearch is a powerful and scalable search engine that can handle large volumes of data and perform complex queries. But did you know that Elasticsearch can also be used for machine learning purposes? In this blog, you will learn how to use Elasticsearch for data ingestion and preprocessing, two essential steps in any machine learning project [emphasis added by Examiner; it is the position of the Examiner that a user defined pipeline for processing data is not patentably distinguishable from a specification for processing target data]); and
outputting the processed target data (see GPTutorPro, Section 1, paragraph 2, where Data ingestion is the process of collecting, importing, and processing data from various sources into a system that can store and analyze it.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 4-10, 13, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over GPTutorPro as applied to Claims 1-3, 11, 14-16, and 20 above, and further in view of Suhm (“Exploring vector databases: how to get the best of lexical and AI-powered search with Elastic’s vector database”, https://www.elastic.co/search-labs/blog/lexical-ai-powered-search-elastic-vector-database, Bernhard Suhm, July 3, 2023).
Regarding Claim 4, GPTutorPro discloses the system of Claim 3, wherein:
GPTutorPro does not disclose:
the one or more machine learning pre-processing operations comprise one or more operations for performing a text vectorization of the target data, wherein the target data is vectorized in accordance with the specification; and
the one or more search operations comprise one or more operations for generating one or more tokens of text from the target data, and generating a search index for the one or more tokens of text, wherein the one or more tokens of text are generated in accordance with the specification.
Suhm discloses:
the one or more machine learning pre-processing operations comprise one or more operations for performing a text vectorization of the target data, wherein the target data is vectorized in accordance with the specification (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market: fully performant and scalable vector database functionality, including storing embeddings and efficiently searching for nearest neighbor), and
the one or more search operations comprise one or more operations for generating one or more tokens of text from the target data, and generating a search index for the one or more tokens of text, wherein the one or more tokens of text are generated in accordance with the specification (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market … a proprietary sparse retrieval model that implements semantic search out of the box; industry-leading relevance of all types – keyword, semantic, and vector; see also paragraph 20, where lexical search that made Elasticsearch popular (BM25) is an example of a sparse retrieval method. It uses a bag-of-words representation for text and achieves high relevance by modifying the basic relevance scoring method known as TF-IDF (term frequency, inverse document frequency) for factors like length of the document).
GPTutorPro and Suhm are both directed to configuring Elasticsearch.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro with Suhm as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Regarding Claim 5, GPTutorPro in view of Suhm discloses the system of Claim 4, wherein the one or more processors are further configured to:
GPTutorPro does not explicitly disclose receive a search query on the target data and generate search results responsive to the search query using the generated search index.  Suhm discloses receive a search query on the target data and generate search results responsive to the search query using the generated search index (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market … a proprietary sparse retrieval model that implements semantic search out of the box; industry-leading relevance of all types – keyword, semantic, and vector; see also paragraph 20, where lexical search that made Elasticsearch popular (BM25) is an example of a sparse retrieval method. It uses a bag-of-words representation for text and achieves high relevance by modifying the basic relevance scoring method known as TF-IDF (term frequency, inverse document frequency) for factors like length of the document).
GPTutorPro and Suhm are both directed to configuring Elasticsearch.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro with Suhm as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Regarding Claim 6, GPTutorPro in view of Suhm discloses the system of Claim 4, wherein:
the specification comprises an analysis type parameter value, the one or more machine learning pre-processing operations comprise one or more operations for filtering the target data in accordance with the analysis type parameter value, before performing the text vectorization on the target data (see GPTutorPro, Section 3, paragraph 4, where transform is a way to create a new index from an existing one by applying transformations and aggregations on the data, such as grouping, filtering, or summarizing. A transform can help you to reshape the data into a more convenient format for machine learning purposes, such as reducing the dimensionality, creating features, or extracting insights), and
the one or more search operations comprise one or more operations for filtering the target data in accordance with the analysis type parameter value, before generating the search index (see GPTutorPro, Section 3.1, paragraph 1, where the previous section, you learned what are pipelines and processors, and how they can help you to ingest data from various sources and apply transformations to it before indexing it into Elasticsearch; see also Section 3, paragraph 4, where A transform is a way to create a new index from an existing one by applying transformations and aggregations on the data, such as grouping, filtering, or summarizing; see also Section 3.1, paragraph 2, where Data preprocessing can involve various tasks, such as removing noise, outliers, or duplicates, handling missing or inconsistent values, normalizing or scaling the data, encoding categorical variables, creating new features, or reducing the dimensionality).
Regarding Claim 7, GPTutorPro in view of Suhm discloses the system of Claim 6, wherein the analysis type parameter is at least one of:
a pattern recognition parameter value for executing a pattern recognition function, a normalization parameter value for normalizing the text to a common standard, format, or value (see GPTutorPro, Section 3.1, paragraph 2, where Data preprocessing can involve various tasks, such as removing noise, outliers, or duplicates, handling missing or inconsistent values, normalizing or scaling the data, encoding categorical variables, creating new features, or reducing the dimensionality), a no-op parameter value for performing no analysis or processing on the text, or a delimiter parameter value for splitting the text into tokens using one or more specified delimiters.
Regarding Claim 8, GPTutorPro in view of Suhm discloses the system of Claim 7, wherein the target data comprises text data (see GPTutorPro, Section 2, paragraph 2, where you can use the ingest-attachment plugin to extract metadata and content from various types of files, such as PDF, Word, or Excel), and the pattern recognition parameter value further comprises a regular expression (see GPTutorPro, Section 3, paragraph 2, where A pipeline is a set of processors that specify the transformations to be applied to the data by the ingest nodes and plugins. A processor is a component that performs a single operation on the data, such as removing a field, renaming a field, or adding a field).
Regarding Claim 9, GPTutorPro in view of Suhm discloses the system of Claim 4, wherein:
the specification comprises a statistical process parameter value specifying a type of statistical process, the one or more machine learning pre-processing operations comprise one or more operations for generating statistical data from the target data by performing the specified type of statistical process (see GPTutorPro, Section 3, paragraph 5, where An aggregation is a way to summarize and analyze the data by grouping it into buckets and calculating metrics, such as count, sum, average, or percentiles. An aggregation can help you to explore and understand the data, as well as to create features or extract insights for machine learning purposes).
GPTutorPro does not disclose the one or more search operations comprise one or more operations for generating the search index based on the generated statistical data.  Suhm discloses the one or more search operations comprise one or more operations for generating the search index based on the generated statistical data (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market … a proprietary sparse retrieval model that implements semantic search out of the box; industry-leading relevance of all types – keyword, semantic, and vector; see also paragraph 20, where lexical search that made Elasticsearch popular (BM25) is an example of a sparse retrieval method. It uses a bag-of-words representation for text and achieves high relevance by modifying the basic relevance scoring method known as TF-IDF (term frequency, inverse document frequency) for factors like length of the document).
GPTutorPro and Suhm are both directed to configuring Elasticsearch.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro with Suhm as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Regarding Claim 10, GPTutorPro in view of Suhm discloses the system of Claim 9, wherein:
GPTutorPro does not disclose the statistical process is at least one of a bag of words frequency calculation, a term frequency-inverse document frequency (TF-IDF) calculation, a cosine distance calculation, a Euclidean distance calculation, or a Levenshtein distance calculation.  Suhm discloses the statistical process is at least one of a bag of words frequency calculation, a term frequency-inverse document frequency (TF-IDF) calculation (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market … a proprietary sparse retrieval model that implements semantic search out of the box; industry-leading relevance of all types – keyword, semantic, and vector; see also paragraph 20, where lexical search that made Elasticsearch popular (BM25) is an example of a sparse retrieval method. It uses a bag-of-words representation for text and achieves high relevance by modifying the basic relevance scoring method known as TF-IDF (term frequency, inverse document frequency) for factors like length of the document), a cosine distance calculation, a Euclidean distance calculation, or a Levenshtein distance calculation.
GPTutorPro and Suhm are both directed to configuring Elasticsearch.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro with Suhm as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Regarding Claim 13, GPTutorPro discloses the system of Claim 1, wherein:
GPTutorPro does not disclose the machine learning pre-processing system and the search sub-system are configured to perform one or more overlapping operations.  Suhm discloses the machine learning pre-processing system and the search sub-system are configured to perform one or more overlapping operations (see Suhm, paragraph 2, where If you’re looking for the best retrieval performance, hybrid approaches that combine keyword-based search (sometimes referred to as lexical search) with vector-based approaches represent the state of the art).
GPTutorPro and Suhm are both directed to configuring Elasticsearch.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro with Suhm as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Regarding Claim 17, GPTutorPro discloses the method of Claim 16, wherein:
GPTutorPro does not disclose:
the one or more machine learning pre-processing operations comprise one or more operations for performing a text vectorization of the target data, wherein the target data is vectorized in accordance with the specification; and
the one or more search operations comprise one or more operations for generating one or more tokens of text from the target data, and generating a search index for the one or more tokens of text, wherein the one or more tokens of text are generated in accordance with the specification.
Suhm discloses:
the one or more machine learning pre-processing operations comprise one or more operations for performing a text vectorization of the target data, wherein the target data is vectorized in accordance with the specification (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market: fully performant and scalable vector database functionality, including storing embeddings and efficiently searching for nearest neighbor), and
the one or more search operations comprise one or more operations for generating one or more tokens of text from the target data, and generating a search index for the one or more tokens of text, wherein the one or more tokens of text are generated in accordance with the specification (see Suhm, paragraph 4, where Elastic is positioned to be a leader in the rapidly evolving vector database market … a proprietary sparse retrieval model that implements semantic search out of the box; industry-leading relevance of all types – keyword, semantic, and vector; see also paragraph 20, where lexical search that made Elasticsearch popular (BM25) is an example of a sparse retrieval method. It uses a bag-of-words representation for text and achieves high relevance by modifying the basic relevance scoring method known as TF-IDF (term frequency, inverse document frequency) for factors like length of the document).
GPTutorPro and Suhm are both directed to configuring Elasticsearch.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro with Suhm as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Regarding Claim 18, GPTutorPro in view of Suhm discloses the method of Claim 17, wherein:
the specification comprises an analysis type parameter value, the one or more machine learning pre-processing operations comprise one or more operations for filtering the target data in accordance with the analysis type parameter value, before performing the text vectorization on the target data (see GPTutorPro, Section 3, paragraph 4, where transform is a way to create a new index from an existing one by applying transformations and aggregations on the data, such as grouping, filtering, or summarizing. A transform can help you to reshape the data into a more convenient format for machine learning purposes, such as reducing the dimensionality, creating features, or extracting insights), and
the one or more search operations comprise one or more operations for filtering the target data in accordance with the analysis type parameter value, before generating the search index (see GPTutorPro, Section 3.1, paragraph 1, where the previous section, you learned what are pipelines and processors, and how they can help you to ingest data from various sources and apply transformations to it before indexing it into Elasticsearch; see also Section 3, paragraph 4, where A transform is a way to create a new index from an existing one by applying transformations and aggregations on the data, such as grouping, filtering, or summarizing; see also Section 3.1, paragraph 2, where Data preprocessing can involve various tasks, such as removing noise, outliers, or duplicates, handling missing or inconsistent values, normalizing or scaling the data, encoding categorical variables, creating new features, or reducing the dimensionality).
Regarding Claim 19, GPTutorPro in view of Suhm discloses the method of Claim 18, wherein the target data comprises text data (see GPTutorPro, Section 2, paragraph 2, where you can use the ingest-attachment plugin to extract metadata and content from various types of files, such as PDF, Word, or Excel), and the pattern recognition parameter value further comprises a regular expression (see GPTutorPro, Section 3, paragraph 2, where A pipeline is a set of processors that specify the transformations to be applied to the data by the ingest nodes and plugins. A processor is a component that performs a single operation on the data, such as removing a field, renaming a field, or adding a field).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over GPTutorPro as applied to Claims 1-3, 11, 14-16, and 20 above, and further in view of Raghuvansh (PG Pub. No. 2025/0014112 A1).
Regarding Claim 12, GPTutorPro discloses the system of Claim 1, wherein:
GPTutorPro does not disclose the specification is part of a structured query language (SQL) statement .  Raghuvansh discloses the specification is part of a structured query language (SQL) statement (see Raghuvansh, paragraph [0060], where once the mapping is done, the data gaps are checked and finally the data is ingested by writing the SQL ingestion query).
GPTutorPro and Raghuvansh are both directed to configuring ingestion pipelines.  Accordingly, it is the position of the Examiner that it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine GPTutorPro and Raghuvansh as it amounts to combining prior art elements according to known techniques to yield predictable results (see MPEP 2143(I)(A)).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARHAD AGHARAHIMI whose telephone number is (571)272-9864. The examiner can normally be reached M-F 9am - 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached at 571-272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FARHAD AGHARAHIMI/Examiner, Art Unit 2161                                                                                                                                                                                                        

























/APU M MOFIZ/Supervisory Patent Examiner, Art Unit 2161
Read full office action
Prosecution Timeline

Mar 05, 2024
Application Filed
Mar 19, 2026
Non-Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/938,849
Patent 12619579
Encoding / Decoding System and Method
3y 7m to grant Granted May 05, 2026
17/731,056
Patent 12608279
UTILIZING FIXED-SIZED AND VARIABLE-LENGTH DATA CHUNKS TO PERFORM SOURCE SIDE DEDUPLICATION
3y 12m to grant Granted Apr 21, 2026
18/208,601
Patent 12602424
PROACTIVE PERSONALIZATION OF MULTIMEDIA CONTENT AND DIALOG CONTENT THROUGH UTILIZATION OF LARGE LANGUAGE MODEL(S)
2y 10m to grant Granted Apr 14, 2026
18/908,565
Patent 12586347
SCALABLE PIPELINE FOR MACHINE LEARNING-BASED BASE-VARIANT GROUPING
1y 5m to grant Granted Mar 24, 2026
18/833,611
Patent 12541556
DISTRIBUTED GRAPH EMBEDDING-BASED FEDERATED GRAPH CLUSTERING METHOD, APPARATUS, AND READABLE STORAGE MEDIUM
1y 6m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
70%
Grant Probability
85%
With Interview (+14.5%)
3y 3m (~1y 0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 275 resolved cases by this examiner. Grant probability derived from career allowance rate.