Office Action Analysis: 18772068 — SYSTEMS AND METHODS FOR INTERFACING WITH DATA PROFILERS USING A MACHINE LEARNING MODEL

Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is mental process without significantly more. Independent claims 1, 2, and 12 recite a system with one or more processors; and one or more non-transitory, computer-readable media comprising instructions, a method, and non-transitory computer-readable media comprising instructions that, when executed by one or more processors, as drafted under its broadest reasonable interpretation (BRI), covers evaluating language model responses, the method comprising: receiving a data profiler configured to create and store a plurality of data profile attributes (human can receive data dictionary that describes what is in filing cabinet (profiler) such as column headers like date, price, ID (attributes)); training a profile query model to interface with data profilers and generate responses to user queries relating to data profiles based on training data comprising data profile attributes from a plurality of data profiles and example user queries (human can study the data dictionary and list 50 common questions asked by the boss (example queries). They could then learn if someone asks for cost, they should look at price attribute, with this learning creating a mental rulebook (the model)); receiving, in a conversational program, a user query requesting information regarding one or more data profile attributes of the data profiler (Another human can walk up (the conversational program) and ask me how many entries is missing in the price column (user query)); using a language interpretation model, pre-processing the user query and the one or more data profile attributes to determine an activation pattern for the profile query model, wherein the language interpretation model is trained to produce real-valued embeddings associated with the user query (The first human listens to the sentence ignoring filler words (ex. Um, like) to mentally translate the word missing into a technical concept of Null and Price into the studied column (pre-processing) Mental spark between word and data is activation pattern); using the profile query model, processing the real-value embeddings associated with the user query and the one or more data profile attributes to generate a preliminary response (human can use mental rulebook to look at price column and count the number of blank rows and write down that answer on paper (preliminary response)); post-processing the preliminary response to generate a verified response by applying a verification program to verify factual accuracy and confidentiality of the preliminary response (human can recount to make sure they didn’t miscount previous set (factual accuracy) and check to see if those rows contain secret information they cannot share (confidentiality)); and transmitting the verified response in the conversational program related to the user query (Human can tell the second human what the answer is that they counted and verified).
As described above, these limitations can be carried out as a series of mental steps.
The judicial exception is not integrated into a practical application because the only additional elements recited are a system comprising of a computer processor and memory, which is general purpose hardware being used as a tool to implement the mental process, and non-transitory computer-readable program code that is conventional components that utilizes the basic functions of a computer.
	This claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as described above, the only additional elements recited are a system comprising of a computer processor and memory, which is general purpose hardware being used as a tool to implement the mental process, and non-transitory computer-readable program code that is conventional components that utilizes the basic functions of a computer.

	The remaining dependent claims fail to add patent eligible subject matter to independent claim 1:
Claims 3, 13 simply adds determining supplemental data, algorithms, and mathematical formulae to identify an activation pattern which a human can do with a pen and paper by looking up additional reference tables and selecting a calculation method (ex. addition, subtraction) to apply to query 
Claims 4, 14 simply adds using a language processing model to correspond queries to embeddings, which a human can do by asking ChatGPT to find the closest match between a user’s term and the technical column name in the database.
Claims 5, 15 simply adds generating a text-based response from embeddings, which a human can do by asking ChatGPT to “write a three-sentence summary” based on a raw numerical result.
Claims 6, 16 simply adds checking null value distributions to determine desirability, which a human can do by asking ChatGPT to “count the empty rows” and “tell me if this data is reliable enough to use” or by counting the null values himself using a pen and paper.
Claims 7, 17 simply adds comparing mathematical formulae against results to determine accuracy, which a human can do by asking ChatGPT to calculate these two different ways and flag any discrepancies between the results or making the comparisons themselves with a pen and paper. 
Claims 8, 18 simply adds using privacy metadata to determine a disclosable dataset, which human can do by asking ChatGPT to filter rows marked private or by asking to only show me public information. 
Claims 9, 19 simply adds training an interpretation model using plain text descriptions, which human can do by pasting a data dictionary into ChatGPT so it understands the definitions of each data set.
Claims 10, 20 simply adds simply adds producing metadata attributes like medians and standard deviations, which a human can do by asking ChatGPT to “calculate the average, range, etc.” of a provided list of numbers or a human can do themselves with a pen and paper.
Claim 11 simply adds producing histograms and linear regression models, which a human can do by asking ChatGPT to draw a bar chart or predict the next value in the sequence based n historic trends or by drawing it himself with a pen and paper.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Setlur et al. (hereinafter Setlur) (US 20210303558 A1) in view of Srinivasulu et al. (herenafter Srinivasulu) (US 20230124988 A1).

	Regarding claim 1, Setlur teaches:
	A system for interfacing with data profilers using a machine learning model, the system comprising: 
one or more processors (Setlur, P[0020]); and 
one or more non-transitory, computer-readable media comprising instructions that, when executed by the one or more processors, cause operations comprising (Setlur, P[0020]): 
receiving a data profiler configured to create and store a plurality of data profile attributes (Srinivasulu, P[0028]: "Database 217 cam store data", P[0027]: "Oracle Cloud" (database explicitly taught with cloud infrastructure that can store degenerated metadata profiles (data profile attributes))); 
training a profile query model  to interface with data profilers and generate responses to user queries relating to data profiles based on training data comprising data profile attributes from a plurality of data profiles and example user queries (Setlur, P[0041]: "training a first neutral network model on a large corpus of text" (Setlur's large corpus of text" used for training the neural network inherently includes example user queries (natural language utterances) necessary for the model to learn the "grammatical and lexical structure" of how users as for data));  
receiving, in a conversational program, a user query requesting information regarding one or more data profile attributes of the data profile (Setlur, P[0005]: "providing a natural language interface… for an interactive query dialog… allowing users to access complex functionality using ordinary equations" (Setlur's ordinary questions (the user query) are directed at the complex functionality of the data visualization which is built upon underlying metadata. Thus, the query is specifically requesting information reading the data profile attributes (ex. Asking about trends and distributions found in profile))); 
to determine an activation pattern (Setlur, P[0041]: "computing a first word vector for a first word… using a second neural network" (Setlur's second neural network model is a recurrent neural network (RNN) designed for language processing. It pre-processes the query into vectors to create the "mapping" (the activation pattern) to the data attributes)) for the profile query model, wherein the language interpretation model is trained to produce real-valued embeddings associated with the user query (Setlur, P[0041]: “computing a first word vector for a first word… using a second neural network model” (second neural network model is language interpretation model, that performs pre-processing the convert user query into real valued embeddings (word vectors/emebeddings))); 
using the profile query model, processing the real-value embeddings associated with the user query (Setlur, P[0041]: "computing relatedness between the first word vector and the second word vector using a similarity metric." (Setlur's analytical engineer (profile query model) processes the query-derived vectors to fine the correct data intersection)))  
transmitting the verified response (Setlur, P[0006]: "The device then updates the data visualization based on the first set of one or more functional phrases." (updating data visualization reads on transmitting system's output to user (verified because it is generated after interpretation model and analytical functions have been applied))) in a conversational program related to the user query (Setlur, P[0005]: "By modeling the interaction behavior as a conversation", "providing a natural language interface… for an interactive query dialog", "the natural language interface provides appropriate visualization responses… through various techniques for deducing the grammatical and lexical structure of utterances." (response (visualization) is produced by "deducing the structure of use's specific "utterance (query), leading to output being related to user query)).  
Setlur does not teach:
using a language interpretation model, pre-processing the user query and the one or more data profile attributes
and the one or more data profile attributes to generate a preliminary response
post-processing the preliminary response to generate a verified response by applying a verification program to verify factual accuracy and confidentiality of the preliminary response	However, Srinivasulu teaches: 
using a language interpretation model, pre-processing the user query and the one or more data profile attributes (Srinivasulu, P[0016]: "For example, processing module 104 can generate metadata profiles based on input data 102. In some embodiments, the metadata profiles, not the input data itself, is fed to prediction module 106." (Srinivasulu teaches data profile attributes (metadata profiles) are the foundational inputs required by the processing module to interpret and categorize in coming data requests))
and the one or more data profile attributes to generate a preliminary response (Srinivasulu, P[0016]: "the metadata profiles, not the input data itself, is fed to prediction module 106." (Srinivasulu teaches here data profile attributes are the core data structures processed by the machine learning model to generate its output)); 
post-processing the preliminary response to generate a verified response by applying a verification program to verify factual accuracy and confidentiality of the preliminary response (Srinivasulu, P[0014]: "the processed incoming data (e.g., metadata profiles) can be fed to the trained machine learning model as input, and the model can generate predictions to improve the quality of the incoming data. For example, data categorization predictions can be generated that categorize this incoming data. These category predictions can improve the organization/structure of the data, and thus improve the data quality." (this ML model acts as the corrective program. It takes the initial data state (preliminary response or categorization) and applies quality-improvement logic (standardization/validation) to generate a verified response that is organized/structured));
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Setlur in view of Srinivasulu. Doing so would have provided the metadata profile generation of of Srinivasulu (Srinivasulu, P[0016]) with the grammatical and lexical structure of utterances to create visualizations of Setlur (Setlur, P[0049]) and thus would have added integrity and completeness of the data being visualized allowing the combination to form a system capable of verifying that the activation pattern triggered by a user’s query actually corresponds to viable, non-null data. 

	
	Regarding claim 2, Setlur teaches:
	
receiving a user query concerning one or more data profiles associated with the data profiler (Setlur, P[0006]: "he device displays a data visualization based on a dataset retrieved from a database using a first set of one or more database queries. A user specifies a first natural language command related to the displayed data visualization." ("natural language command" is the literal user query. Since the command "related to the displayed data visualization" which is built from the "retrieved dataset" (the data being profiled), the query is explicitly related to the data profiler)) ;
using an interpretation model, pre-processing the user query and one or more data profile attributes of the data profiler to determine an activation pattern for the profile query model, wherein the interpretation model is trained to produce real-valued embeddings associated with user queries (Setlur, P[0041]: "computing a first word vector for a first word in a first phrase in the second set of one or more analytic phrases using a second neural network model, the first word vector mapping the first word to the word embeddings; computing a second word vector for a first data attribute in the one or more data attributes using the second neural network model, the second word vector mapping the first data attribute to the word embeddings; and computing relatedness between the first word vector and the second word vector using a similarity metric." (the second neural network model is the interpretation model. It generates real-valued embeddings (word vectors/word2vec) to map natural language to data attributes. In neural network processing, the mapping of a query vector to a specific data attribute neuron constitutes the determination of an activation pattern)); 
using the profile query model to process the user query (Setlur, P[0041]: "Computing relatedness between the first word vector and the second word vector using a similarity metric." (Second neural network model is the profile query model that is processing the user query by converting it into a word vector and calculating its mathematical relatedness to systems known data structures.)) and generate a preliminary response (Setlur P[0016]: "the natural language interface provides appropriate visualization responses", "Through various techniques for deducing the grammatical and lexical structure of utterances" (analytical engine (profile query model) takes the processed query and metadata attributes to generated the "visualization response" and this output constitutes the preliminary response before and final "corrective" logic is applied to verify the results)); 
transmitting the verified response (Setlur, P[0006]: "The device then updates the data visualization based on the first set of one or more functional phrases." (updating data visualization reads on transmitting system's output to user (verified because it is generated after interpretation model and analytical functions have been applied))) in a conversational program related to the user query (Setlur, P[0005]: "By modeling the interaction behavior as a conversation", "providing a natural language interface… for an interactive query dialog", "the natural language interface provides appropriate visualization responses… through various techniques for deducing the grammatical and lexical structure of utterances." (response (visualization) is produced by "deducing the structure of use's specific "utterance (query), leading to output being related to user query)).  
Setlur does not teach:
receiving a data profiler configured to create and interface with a plurality of data profiles
training a profile query model to generate responses to user queries relating to data profiles based on training data comprising data profile attributes from the data profiler
and the one or more data profile attributes of the data profiler
post-processing the preliminary response to generate a verified response by applying a corrective program, wherein the verified response is checked for accuracy and confidentiality
However, Srinivasulu teaches:
receiving a data profiler configured to create and interface with a plurality of data profiles (Srinivasulu, P[0016]: "processing module 104 can generate metadata profiles based on input data 102. In some embodiments, the metadata profiles, not the input data itself, is fed to prediction module 106." ("processing module" reads as data profiler which extracts features to "generate metadata profiles" which serve as the structural representation (the plurality of data profiles) that the system interfaces with to perform downstream analysis)); 
training a profile query model to generate responses to user queries relating to data profiles based on training data comprising data profile attributes from the data profiler (Srinivasulu, P[0033]: "the neural network model can be built/trained based on the profile of data (e.g., a metadata profile) not on the actual data." (Srinivasulu explicitly teaches training a model (the profile query model) specifically using the attributes found within the metadata profiles rather than raw data values)); 
and the one or more data profile attributes of the data profiler (Srinivasulu, P[0016]: "processing module 104 can generate metadata profiles based on input data 102. In some embodiments, the metadata profiles, not the input data itself, is fed to prediction module 106." ("metadata profiles" consist of the one or more data profile attributes (categorical, numeric, etc.) that are fed to the model allowing the system to use the data profile attributes of the data profiler as the basis for analytical operation))
post-processing the preliminary response to generate a verified response by applying a corrective program, wherein the verified response is checked for accuracy and confidentiality (Srinivasulu, P[0014]: "the processed incoming data (e.g., metadata profiles) can be fed to the trained machine learning model as input, and the model can generate predictions to improve the quality of the incoming data. For example, data categorization predictions can be generated that categorize this incoming data. These category predictions can improve the organization/structure of the data, and thus improve the data quality." (this ML model acts as the corrective program. It takes the initial data state (preliminary response or categorization) and applies quality-improvement logic (standardization/validation) to generate a verified response that is organized/structured));
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Setlur in view of Srinivasulu. Doing so would have provided the metadata profile generation of of Srinivasulu (Srinivasulu, P[0016]) with the grammatical and lexical structure of utterances to create visualizations of Setlur (Setlur, P[0049]) and thus would have added integrity and completeness of the data being visualized allowing the combination to form a system capable of verifying that the activation pattern triggered by a user’s query actually corresponds to viable, non-null data. 

Regarding claim 3, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Setlur further teaches:
wherein determining the activation pattern for the profile query model comprises:
determining supplemental data, wherein the supplemental data is computed based on the plurality of data profiles (Setlur, P[0005]: "Through various techniques for deducing the grammatical and lexical structure of utterances and their context, the natural language interface supports various pragmatic forms of natural language interaction with visual analytics. These pragmatic forms include understanding incomplete utterances, referring to entities within utterances and visualization properties, supporting long, compound utterances, identifying synonyms and related concepts",  (synonyms and related concepts here are the supplemental data computed from the profiles.); 
determining an algorithm for the profile query model (Setlur, see mapping below); and 
determining a mathematical formula from a set of mathematical formulae, wherein the mathematical formula is selected for applicability to the user query (Setlur, P[0041]-P[0043]: "computing relatedness between the first word vector and the second word vector using a similarity metric… based at least on (i) Wu-Palmer distance… (ii) a weighting factor, and (iii) a pairwise cosine distance" (The similarity metric (comprising cosine distance and Wu-Palmer) is the algorithm and the specific mathematical formula selected for applicability to the user query to determine the activation pattern)).  

Regarding claim 4, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Setlur further teaches:
wherein the interpretation model is a language processing model trained to correspond user queries to embeddings used as input to the profile query model (Setlur, P[0041]: "relatedness by performing a sequence of operations that comprises: training a first neutral network model on a large corpus of text, thereby learning word embeddings; computing a first word vector for a first word in a first phrase in the second set of one or more analytic phrases using a second neural network model" (second neural network model reads on interpretation model, language processing model (RNN/Word2vec) specifically trained to convert user queries into real-valued embeddings (word vectors))).  

Regarding claim 5, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
wherein the profile query model is trained to use embeddings as input to generate a text-based response to queries related to the data profiler (Setlur, P[0005]: "The natural language interface allows users to access complex functionality using ordinary questions or commands… The natural language interface provides appropriate visualization responses either within an existing visualization or by creating new visualizations when necessary, and resolves ambiguity through targeted textual feedback and ambiguity widgets. In this way, the natural language interface allows users to efficiently explore data displayed (e.g., in a data visualization) within the data visualization application." (The system (profile query model) consumes the embeddings from the query (input) to generate targeted textual feedback (the text-based responses) regarding the dataset and user's intent)).  

Regarding claim 6, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Srinivasulu further teaches:
wherein the corrective program compares the activation pattern output by the interpretation model against data profile attributes to determine feasibility (Srinivasulu, P[0014]: “predictions can be generated that categorize this incoming data… and thus improve the data quality” (Srinivasulu’s rules (the corrective program) compare the categorized input (the activation pattern) against the metadata profiles (the data profile attributes) to ensure the categorization is high quality and logically sound (the feasibility))), comprising:	using the data profiler, determining a null type and a null value distribution (Srinivasulu, P[0049]: "rules can target the data values… some data retrieved and/or processed can include missing data, null data (the rules (corrective program) identify null data and its frequency/location (null value distribution) within the metadata attributes))); 
based on the null type and the null value distribution, determining a measure of correspondence with the activation pattern, wherein the measure of correspondence indicates an extent of null values and null types in data required to complete the activation pattern (Srinivasulu, P[0014]: "predictions can be generated that categorize this incoming data. These category predictions can improve the organization/structure of the data, and thus improve the data quality." (it is determined here if the incoming data (the activation pattern) has sufficient quality. Also, if the extent of null values is too high, the measure of correspondence fails to meet quality standards)); and 
based on measure of correspondence, determining a feasibility of the activation pattern (Srinivasulu, P[0049]: "Rules can be generated that clean and/or discard certain data elements/values/rows." (If the measure of correspondence is too low (too many nulls), the system will discard the element, thus determining a feasibility or lack of feasibility for that specific activation pattern)).  

Regarding claim 7, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Setlur further teaches:	
mathematical formulae output by the interpretation model against the preliminary response comprising (Setlur, P[0004], P[0041]: “computing relatedness… using similarity metric”, (similarity metric (the corrective program) compares the intent derived from the query (the mathematical formulae) against the potential data mappings (the preliminary response) to ensure a semantic match. Formal query described in P[0004] contains mathematical formulae (ex. SUM, AVG, etc.) which is compared against visualization output to ensure command was executed correctly)): 
using the mathematical formulae and the plurality of data profiles, generate an expected result (Setlur, P[0033]: "combining the first analytical function with the second analytical function by applying one or more logical operators" (Setlur uses mathematical formulae (analytical functions/logical operators) and the data profiles to calculate a predicted or expected result)); 
extracting a reported result from an embedding of the preliminary response (Setlur, P[0005]: "‘repairing’ responses to previous utterances. Furthermore, the natural language interface provides appropriate visualization responses either within an existing visualization or by creating new visualizations when necessary, and resolves ambiguity" (repair mechanism extracts the current state (the reported result) from the visualization's embedding, compares it to the expected result, and adjusts for accuracy (resolves ambiguity)); and 
comparing the expected result against the reported result to determine a measure of accuracy (Setlur, P[0005], see mapping above)).  
Setlur does not teach:wherein the corrective program compares to determine accuracy
However, Srinivasulu further teaches:
wherein the corrective program compares to determine accuracy (Srinivasulu, P[0014]: “improving the data quality of the data… categorized using the neural network model.” (ensuring model output meets specific quality threshold functional equivalent to accuracy))

Regarding claim 8, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Srinivasulu further teaches:
further comprising the corrective program using metadata of the data profiler to determine data integrity (Srinivasulu, P[0033]): 
based on privacy metadata associated with a data profile, determining a disclosable dataset, wherein the disclosable dataset comprises data in the data profile suitable for answering the user query (Srinivasulu, P[0033]: "neural network model can be built/trained based on the profile of data (e.g., a metadata profile) not on the actual data. This can cause embodiments of the tool to be flexible, widely usable, secure, and easily extendable." (Srinivasulu's system identifies the metadata profile (privacy metadata) as the disclosable dataset that is secure for answering queries without revealing the "actual" sensitive data)); and 
modifying the preliminary response to contain only data from the disclosable dataset (Srinivasulu, P[0016]: "In some embodiments, the metadata profiles, not the input data itself, is fed to prediction module 106." (the system modifies the response (the input to the prediction module) to ensure it contains only data from the disclosable dataset (metadata profile) rather than the raw "input data")).  

Regarding claim 9, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Setlur further teaches:	
further comprising training the interpretation model (Setlur, P[0041]), comprising: 
generate a training dataset, comprising data profile descriptions in plain text (Setlur, P[0041]: "training a first neutral network model on a large corpus of text"); 
training the interpretation model using a language processing algorithm to generate real-valued embeddings corresponding to input text tokens (Setlur, P[0041]: "learning word embeddings; computing a first word vector for a first word in a first phrase in the second set of one or more analytic phrases using a second neural network model" (Setlur trains the model using an algorithm (Word2vec/RNN) to generate real-valued embeddings (word vectors) from text tokens (words)); and 
based on real-valued embeddings, training the interpretation model to correlate embeddings to activation patterns (Setlur, P[0041]: "the first word vector mapping the first word to the word embeddings" (Setlur trains the model to correlate (map) the real-valued embeddings to the specific data attributes and analytical intents (activation patterns))).  

Regarding claim 10, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Srinivasulu further teaches
wherein the data profiler produces metadata attributes based on a dataset (Srinivasulu, P[0041]), including: 
a number of null values in the dataset (Srinivasulu, P[0049]: "null data"); 
a range (Srinivasulu, P[0041]: "metadata profile fields can include… length [range]); and
 a variable type for each feature in the dataset (Srinivasulu, P[0041]: "categorical; alphabetic; numeric; alphanumeric; maxlength; meanlength") 
Srinivasulu does not teach:	median, and quartile values of the dataset; a standard deviation and skewness values of the dataset
However, Setlur teaches:	median, and quartile values of the dataset; a standard deviation and skewness values of the dataset (Setlur, P[0004]: "understand a data set visually, including distribution, trends, outliers" Calculating the distribution as taught by setlur inherently requires the production of median, quartile, standard devations, and skewness to identify outliers and trends)).  

Regarding claim 11, the combination of Setlur and Srinivasulu teaches the method disclosed in claim 2
Srinivasulu further teaches:
wherein the data profiler produces data structures based on a dataset (Srinivasulu, P[0021]), including: 
a histogram based on categories of feature values (Srinivasulu, P[0021]: "visual representation of the metadata profiles… can be displayed as a trend or a group. (A "visual representation" of "groups" and "categories" is a histogram.)); 
Srinivasulu does not teach:
an inferred distribution for one or more variables in the dataset; and 
a linear regression model based on the dataset
However, Setlur teaches:
an inferred distribution for one or more variables in the dataset (Setlur, see below mapping); and 
a linear regression model based on the dataset (Setlur, P[0004]: "understand a data set visually, including distribution, trends, outliers.") (Setlur's "visual understanding" of a distribution and trends represents the production of an inferred distribution and a linear regression model (trend line) based on the dataset)).  

Regarding claim 12, Srinivasulu teaches:
One or more non-transitory computer-readable media comprising instructions that, when executed by one or more processors, cause operations comprising (Srinivasulu, P[0026]): 
receiving a library configured to create and interface with a plurality of data profiles (Srinivasulu, P[0016]: “processing module 104 can generate metadata profiles based on input data 102.” (Srinivasulu’s “processing module (the library) creates the metadata profiles (the plurality of data profiles) based on input datasets), Srinivasulu, P[0033]: “the neural network model can be built/trained based on the profile of data (e.g., a metadata profile) not on the actual data” (Using the profiles as the interface for training and querying represents the literal interface with the data profiles)); 
receiving a profile query model trained to answer user queries relating to data profiles (Srinivasulu, P[0014]: “prediction module 106 can be a machine learning module (e.g., neural network)  “, “predictions can be generated that categorize this incoming data.” (srinivasulu’s neural network (the profile query model) is used to vategorize and answer inquiries regarding data status)) based on data profile attributes (Srinivasulu, P[0033]: “the neural network model can be built/trained based on the profile of data (e.g., a metadata profile)” (The metadata profile contains the data pofile attributes (ex. data types, ranges) used for this training)); 
pre-processing using one or more data profile attributes (P[0016]: “metadata profiles… is fed to prediction module 106.” (Srinivasulu discloses the data profile attributes (metadata) are the required input for the model’s pre-processing phase to identify relevant data categories)
to process the library (Srinivasulu, P[0006]: “A user specificizes a first natural language command related to the displayed data visualization.” (the visualization is an expression of the underlying data profiles (metadata) so the command is directed at the data profiles stored within the system (library)))
Srinivasulu does not teach:
receiving a user query concerning one or more data profiles associated with the library
pre-processing using one or more data profile attributes
Transmitting the response in a conversational program related to the user query	However, Setlur teaches:
receiving a user query concerning one or more data profiles associated with the library (Setlur, P[0004], P[0005]: “providing a natural language interface… interactive query dialog… using ordinary questions” (Setlur’s ordinary questions constitute the user query), P[0006]: “A user specifies a first natural language command related to the displayed data visualization.” (the visualization is an expression of underlying data profiles/metadata so the command s directed at the data profiles stored within the system (library))); 
using an interpretation model, pre-processing the user to determine an activation pattern for the profile query model, wherein the interpretation model is trained to produce real-valued embeddings associated with user queries (Setlur, P[0041], Setlur’s second neural network model (the interpretation model converts the user query into word embeddings (the real-valued embeddings). Setlur further teaches using the model to determine an activation pattern for the profile query model and mapping a vector to a specific intent or data feature through similarity is the functional determination of an activation pattern); 
using the profile query model to process the user query and generate a response (Setlur, P[0041]: “computing relatedness between the first word vector and the second word vector using a similarity metric.” (the analytical engine here (profile query model) processes the query against the data structure to find the “relatedness” resulting in the visualization output (the response)); 
transmitting the response in a conversational program (Setlur, P[0005]: “as a conversation…  the natural language interface provides appropriate visualization responses” (The delivery of the visualization within the conversation is the transmission in the conversational program)). related to the user query (Setlur, P[0005]: “deducing the grammatical and lexical structure of utterances ” (The response is generated specifically based on the structure of the user’s utterance (query) making it inherently related))
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Setlur in view of Srinivasulu. Doing so would have provided the metadata profile generation of of Srinivasulu (Srinivasulu, P[0016]) with the grammatical and lexical structure of utterances to create visualizations of Setlur (Setlur, P[0049]) and thus would have added integrity and completeness of the data being visualized allowing the combination to form a system capable of verifying that the activation pattern triggered by a user’s query actually corresponds to viable, non-null data. 

Regarding claim 13, claim 13 recites the non-transitory computer-readable media corresponding to the method presented in claim 3 and is rejected under the same grounds stated above.

Regarding claim 14, claim 14 recites the non-transitory computer-readable media corresponding to the method presented in claim 4 and is rejected under the same grounds stated above.

Regarding claim 15, claim 15 recites the non-transitory computer-readable media corresponding to the method presented in claim 5 and is rejected under the same grounds stated above.

Regarding claim 16, claim 16 recites the non-transitory computer-readable media corresponding to the method presented in claim 6 and is rejected under the same grounds stated above.

Regarding claim 17, claim 17 recites the non-transitory computer-readable media corresponding to the method presented in claim 7 and is rejected under the same grounds stated above.

Regarding claim 18, claim 18 recites the non-transitory computer-readable media corresponding to the method presented in claim 8 and is rejected under the same grounds stated above.

Regarding claim 19, claim 19 recites the non-transitory computer-readable media corresponding to the method presented in claim 9 and is rejected under the same grounds stated above.

Regarding claim 20, claim 20 recites the non-transitory computer-readable media corresponding to the method presented in claim 10 and is rejected under the same grounds stated above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHASHIDHAR S MANOHARAN whose telephone number is (571)272-6772. The examiner can normally be reached M-F 8:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHASHIDHAR SHANKAR MANOHARAN/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
SYSTEMS AND METHODS FOR INTERFACING WITH DATA PROFILERS USING A MACHINE LEARNING MODEL

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEMS AND METHODS FOR INTERFACING WITH DATA PROFILERS USING A MACHINE LEARNING MODEL

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email