Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Arguments and amendments filed 10/16/2025 has been examined.
Claims 1, 14, 20, 28, 41, 47, and 55 have been amended, and
Claims 17 and 44 have been cancelled.
Thus, Claims 1-16, 18-43, and 45-55 are currently pending.
This Office Action is Final.
Response to Arguments
As to arguments concerning rejections under 35 USC 103, Applicant’s arguments with respect to claim(s) and the amended claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s arguments, with respect to subject matter rejections under 35 USC 101 have been fully considered and are persuasive. The subject matter rejections under 35 USC 101 have been withdrawn, i.e. Claims 1-27 as a result of recent amendments as claim(s) 1 recites statutory computer hardware/processors. Additionally, Applicant’s arguments, with respect to abstract idea rejections under 35 USC 101 have been fully considered and are persuasive. The abstract idea rejections under 35 USC 101 have been withdrawn
Applicant’s arguments, with respect to the rejection(s) of claim(s) 14 and 41 under 35 USC 112(b), and the removal of the term “unexpected elements” have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 112(a). See the related rejection below.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-16, 18-43, and 45-55 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 1, 28 and 55 recite: “generating, a second prompt to cause the large language model to generate a natural language response to the natural language user query based on the results of the graph database query;”, however, the Examiner searched the relevant priority documents/specification, and could not find support for the limitation “generating, a second prompt to cause the large language model to generate a natural language response to the natural language user query based on the results of the graph database query”.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 7, 9-16, 18-19, 27-31, 34, 36-43, 46 and 54-55 is/are rejected under 35 U.S.C. 103 as being unpatentable over Difonzo et al. US Pub. No.: 2022/0414228, in view of Xu et al., US Pub. No. 2025/0200044 A1, in view of, in view of Naufel et al., US Pub. No. 2024/0362208 A1.
As to claim 1, (and substantially similar claim 28 and claim 55)
DiFonzo discloses:
A method for querying a graph database using natural language queries,
the method comprising, by one or more processors of a computing system:
(Difonzo abstract; [0012] and Fig. 13)
receiving a natural language user query via a user interface of the computing
system;
(Difonzo abstract: the methods may comprise receiving a first input from a user comprising a natural language query regarding data in a graph database; see also [0007] Disclosed herein are methods and systems for automated processing and translation of natural language based
user queries into graph database queries that facilitate ease-of-use and enable efficient access to the powerful analytical capabilities of graph database systems, e.g., graph-based cybersecurity analysis systems.;
see also [0002] The present disclosure relates generally to the use of graph databases for cybersecurity analysis, and more specifically to methods and systems for translating natural
language-based user queries into graph database queries to facilitate ease-of-use and efficient access to graph database analytical tools.)
identifying one or more node types from a type graph in the natural language user
query, wherein the one or more node types correspond to one or more words or phrases in
the natural language user query;
(Difonzo teaches using machine learning /large language models to generate a corresponding query from determined user intent in the formal graph database query language from the natural language query, i.e. “identifying one or more node types from a type graph in the natural language user query”
See [0049] As noted above, the disclosed methods use contextual natural language processing and machine learning to go beyond rigid, pre-defined graph database queries and
support flexible, ad hoc user queries posed using natural language. The methods infer user intent from the natural language query and generate a corresponding query in the formal graph database query language.
[0114] And finally generates the resulting query as translated to the Neo4j Cypher language (the formal graph database query language for the CyGraph platform):
[0077] The CyGraph cybersecurity analysis system employs property graphs, i.e., attributed, multi-relational graphs with nodes and edges that may have arbitrary properties. Property
graphs have the ability to express and visualize a range of heterogeneous node and edge types which arise from combining data from a variety of sources into a coherent unified cybersecurity graph model. However, this richness of data types and properties contributes to the complexity of formulating user queries. Again, the disclosed methods and systems facilitate ease-of-use and enable users to pose questions that take full advantage of the powerful analytical
capabilities of graph database platforms such as CyGraph;
see also [0009] In some embodiments, the intent classification machine learning model comprises a Bidirectional Encoder Representations from Transformers (BERT) model, a long short-term memory (LSTM) model, or a Naive Bayes model.)
Difonzo does not disclose:
generating, a first prompt to cause a large language model to generate (a) a graph database query based on the one or more node types identified in the natural language user query and (b) natural language information directed to the graph database query;
displaying, via the user interface, the natural language information directed to the
graph database query;
querying a graph database using the graph database query generated by the large
language model;
receiving results of the graph database query;
generating a visualization corresponding to the natural language response,
wherein the visualization comprises an indication of one or more nodes and/or edges
from the graph database that were used to generate the natural language response; and
displaying the visualization via the user interface;
however, Xu discloses:
generating, a first prompt to cause a large language model to generate
(a) a graph database query based on the one or more node types identified in the natural language user query and (b) natural language information directed to the graph database query
(Xu teaches generating a prompt to use a LLM as a query tagger or large language model to function as an intent/entity extractor, i.e. “generating, a first prompt to cause a large language model to generate (a) a graph database query… and (b) natural language information”
See [0127-0128] [0127] In more detail, some embodiments of graph-based query parsing component 406 communicate bidirectionally with AI model service 416, e.g., a large language model operating as query tagger 419, via communications 444, to generate a tagged version of the query based on the intents and/or entities identified by the large language model operating
as intent/entity extractor 418. For example, communications 444 can include embodiments of graph-based query parsing component 406 passing the query and the identified intents and/or entities to the large language model with an instruction to cause the large language model to
function as a query tagging component. [0128] An example of a prompt configured to cause a large language model to function as query tagger 419 is shown in Table 6.
See also [0123-0125] [0123] That is, the instruction is or is included in a prompt that is configured for input to the large language model. The query is passed to the large language model along with the instruction, e.g., as an argument of the prompt. In some embodiments, the prompt also includes or references a predefined template, such as graph template 414 [0125] As shown in Table 5, the illustrative input understanding prompt contains an instruction to cause a large language model to function as an intent/entity extractor 418. The illustrated prompt also includes a number of examples and other instructions that are designed to constrain the large
language model's processing of the input query based on a template and to avoid AI hallucination.)
displaying, via the user interface, the natural language information directed to the
graph database query;
(Xu teaches path displays with reasoning, via Fig. 1C path 171/reasoning 172 and Fig. 1D path 185 and reasoning 186 i.e. displaying “the natural language information directed to the graph database query” see also [0131] The graph path identification component 408 uses the output of the input understanding component 404 and the graph-based query parsing component 406 to generate a path through the graph of the document dataset. The path includes the intent and/or entities identified by the input understanding component 404 and the graph-based query parsing component 406. Using the example of FIG. lA, the path includes a node that corresponds to the query intent 116, a node that corresponds to the entity 118, and at least one edge connecting the query intent node with the entity node.;
see also [0091] The generative graph-enhanced information retrieval system interprets the second query 182 in a similar way as described above with respect to the query 168, and generates and outputs subgraph 184, path 185;
see also [0087] generative graph-enhanced information retrieval system generates and outputs reasoning 172. Reasoning 172 includes a natural language explanation of the process by
which the generative graph-enhanced information retrieval system identified the path 171. Reasoning 172 includes a description of how the query 168 was parsed, how the subgraph 170 was identified, and the graph query used to identify the path 171. Reasoning 172 is machine-generated using the approaches described herein; e.g., via a generative artificial intelligence model.)
querying a graph database using the graph database query generated by the large
language model;
(Xu teaches a large language model generates a graph query see [0135] In other embodiments, graph query generation and execution are executed in an iterative process in which the large language model generates a graph query in a language that can be executed on the identified subgraph (s) by the graph database management system 428, the graph database management system 428 executes the large language model-generated graph query)
receiving results of the graph database query;
(Xu teaches interactive graph queries with path visualization using a graph database management system and answers, i.e. “receiving results of the graph database query” see [0083] Various portions of the path visualization 150 are
implemented as interactive graphical user interface controls. As illustrated by element 160, each node and edge of the subgraph 151 is implemented as an interactive control such that selecting ( e.g., clicking or tapping on) a node or edge causes the path visualization 150 to display details about the selected node or link, such as information about a node or its schema, or weights and labels associated with an edge. For instance, navigating element 160 over a node or an edge
causes additional information about the node or edge to be displayed. Alternatively or in addition, selecting a node or edge via, e.g., element 160, enables one or more properties
of the node or edge to be modified and/or enables the node or edge to be added to or removed from the path 161. In some embodiments, a graph database management system,
such as NE04J, is used to generate, display, and manipulate the graphs, paths, and graph queries described herein
See also [0088] Based on receipt of a signal via control mechanism 174, the generative graph-enhanced information retrieval system can determine to proceed with generating and outputting an answer 176. The answer 176 includes a natural language description that is responsive to
the intent of the query 168. The answer 176 is machine generated using the approaches described herein; e.g., via a generative artificial intelligence model.;
see also Xu Fig. 1C item 176 “answer”;
See also [0088] The signal can be a result of a user input, such as an interaction with a user interface presentation of a graph and/or path as described herein, and/or generated automatically by the generative graph-enhanced information retrieval system. Based on receipt of a signal via control mechanism 174, the generative graph-enhanced information retrieval system can determine to proceed with generating and outputting an answer 176. The answer 176 includes a natural language description that is responsive to the intent of the query 168. The answer 176 is machine generated using the approaches described herein; e.g., via a
generative artificial intelligence model.)
generating a visualization corresponding to the natural language response,
wherein the visualization comprises an indication of one or more nodes and/or edges
from the graph database that were used to generate the natural language response;
(Xu teaches displaying path displays with reasoning, via Fig. 1C path 171/reasoning 172 and Fig. 1D path 185 and reasoning 186
i.e. “generating a visualization corresponding to the natural language response,
wherein the visualization comprises an indication of one or more nodes and/or edges”
see also [0131] The graph path identification component 408 uses the output of the input understanding component 404 and the graph-based query parsing component 406 to generate a path through the graph of the document dataset. The path includes the intent and/or entities identified by the input understanding component 404 and the graph-based query parsing component 406. Using the example of FIG. lA, the path includes a node that corresponds to the query intent 116, a node that corresponds to the entity 118, and at least one edge connecting the query intent node with the entity node.;)
and
displaying the visualization via the user interface.
(Xu teaches displaying path displays with reasoning, via Fig. 1C path 171/reasoning 172 and Fig. 1D path 185 and reasoning 186
i.e. “displaying the visualization via the user interface” see also [0131] The graph path identification component 408 uses the output of the input understanding component 404 and the graph-based query parsing component 406 to generate a path through the graph of the document dataset. The path includes the intent and/or entities identified by the input understanding component 404 and the graph-based query parsing component 406. Using the example of FIG. lA, the path includes a node that corresponds to the query intent 116, a node that corresponds to the entity 118, and at least one edge connecting the query intent node with the entity node.;)
It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply graph query visualizations as taught by Xu to the system of Difonzo since it was known in the art that query systems provide for leveraging a large language model to accomplish many of the steps of the online, real-time searching and retrieval process, thereby
enabling flexible graph construction and exploration where the online query processing pipeline
described can achieve real-time data processing by using the described intent prediction and query tagging approaches, and can achieve efficient online real-time graph navigation
via the described online graph path identification approaches, including EBR-based graph node information fetching, and are capable of providing online real-time graph interaction and path modification where embodiments achieve acceptable latency for realtime applications by using embedding-based retrieval initially to narrow down the search space prior to graph construction and path determination and alternatively or in addition, the use of a large language models at one or more phases of the query processing pipeline can further improve latency to make real-time processing feasible (Xu [0039]).
Difonzo/Xu do not explicitly disclose:
generating a second prompt to cause the large language model to generate a natural language response to the natural language user query based on the results of the graph database query;_
However, Naufel discloses:
generating a second prompt to cause the large language model to generate a natural language response to the natural language user query based on the results of the graph database query;_
(Naufel teaches an iterative graph-based Natural Language Processing (NLP) system and generating natural language summaries and reports based on the data structure and user's query to produce natural language summarizations/narratives to natural language questions, i.e. “generating a second prompt to cause the large language model to generate a natural language response to the natural language user query based on the results of the graph database query”
See Fig. 11 item 1151: “The data you have is related to rental properties, their locations, units, leases, tenants, and work orders. With this data, you can tell a story about the rental property
market in different cities and states, the types of units available, their rental prices, and the
1151 tenants who ....”,
see also [0194-0195] [0195] In particular, there is another web page utility GUI providing more detailed visualization produced by GPT system 200 responsive to the user questions and requests described in the context of FIG. 10. For instance, shown here are various rental property data with visualizations generated by GPT system 200. The top sub-interface 1151 has
summarized the question from sub-interface 1051 as "What is my data about?" The top sub-interface 1151 thus provides a GPT system 200 generated narrative summarizing the data,
for instance, describing the data as being "related to rental properties, their locations, units, leases, tenants," and so forth.;
See also [0241-0243] [0241] Computing device 100 may execute the structured data query against the graph database 203 (1440). For example, processing circuitry 199 of computing device 100 may execute the structured data query against the graph database 203.
[0242] Computing device 100 may return output to a user in response to the question (1445). For example, processing circuitry 199 of computing device 100 may return output in a structured format to a user-device having originated the user-input. ;
See also [0292] According to alternative embodiments of the exemplary system, the natural language processing component includes a natural language understanding component
that understands the intent of the user's natural language input and a natural language generation component that generates natural language summaries and reports based on
the data structure and user's query.;
see also [0315] According to alternative embodiments of the exemplary system, the natural language generation component utilizes advanced text generation techniques, such as
neural networks and deep learning, to produce accurate, coherent, and contextually relevant natural language summaries and reports based on the user's queries and preferences.;
see also iterations via [0126] User Feedback and Iteration: According to certain examples, a user may optionally review the results of the data augmentation and provide user feedback 708, which GPT system 200 uses to refine and improve the process. GPT model for data augmentation 704 may therefore iterate and learn from user feedback; see also [0453] According to alternative embodiments of the exemplary system, the system is configured to dynamically iterate through the graph and append external relevant data driven by a user's prompting, enhancing the system's adaptability and customization to specific user requirements while complementing the primary components and functionalities of the disclosure.; see additionally, [0154] The modified query is then passed back to graph database querying engine 207 for execution. This iterative process continues until a successful query execution is achieved or a predefined number of attempts have been made.)
It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply natural language processing to graph databases as taught by Naufel to the system of Difonzo/Xu since it was known in the art that query systems provide the natural language processing components which utilize advanced natural language understanding algorithms to accurately interpret and process a wide range of user inputs, including queries, commands, and requests, enabling a more efficient interaction between the user and the system. (Naufel [0310]).
As to claim 2, Difonzo as modified discloses the method of claim 1, wherein the type graph comprises a plurality of node types and a plurality of edge types
(Difonzo teaches CyGraph cybersecurity analysis system employs property graphs with nodes / edges of different types, i.e. “the type graph comprises a plurality of node types and a plurality of edge types” see [0077] The CyGraph cybersecurity analysis system employs property graphs, i.e., attributed, multi-relational graphs with nodes and edges that may have arbitrary properties. Property graphs have the ability to express and visualize a range of heterogeneous node and edge types which arise from combining data from a variety of sources into a coherent unified cybersecurity graph model.;
See also [0055] Examples of query classes include, but are not limited to, loadAll ( e.g., shows all nodes or edges of a specific type)… [0055] dbinfo (e.g., show what database is presently loaded into the system, displays nodes and edge types),; see also [0047] a legend providing a summary of node types displayed ( e.g., in another GUI field or panel 110);
See also [0114] And finally generates the resulting query as translated to the Neo4j Cypher language (the formal graph database query language for the CyGraph platform):).
As to claim 3, Difonzo as modified discloses the method of claim 2, wherein the type graph comprises a semantic description of each node type and edge type(DIfonzo teaches a named entity recognition model (NER) model used to /trained to identify and tag a number of different node and edge value pairs and node/edge properties, i.e. a semantic description of each node type and edge type [0044] For example, the disclosed machine learning-based natural language processing and graph database query generation methods compile graph database node and edge properties, semantically identify the formal property that is most similar to that referenced by the user in their English question, and then return a graph database query with formal node and edge properties. This allows the approach to be domain agnostic and be applied to any dataset.;
see also [0036] process the natural language query using a named entity recognition (NER) machine learning model to extract named entities from the natural language query and tag them
according to an entity type; process the tagged named entities using a semantic similarity algorithm to identify corresponding nodes and edges, and their associated properties,
in the graph database;
see also [0058-0059] In some instances, the named entity recognition model (NER) model used to implement the disclosed methods may be trained to identify and tag a number of different
node and edge value pairs in a natural language query. In some instances, the named entity recognition model (NER) model may be trained to identify and tag at least 2, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 different node and edge value pairs. [0059] III. Semantic Similarity Checking: a semantic similarity checking algorithm 416 is used to compare the extracted entities returned from the NER model to properties of the graph database model ( e.g., node types, node names,
edges, node properties, edge properties, unique identifiers, etc.) in a CyGraph knowledge base. This aspect of the process is included to keep the NLP and query generation process domain-agnostic.).
As to claim 4, Difonzo as modified discloses the method of claim 2, wherein the type graph comprises a name of a data source from which each node type and edge type originate
(DIfonzo teaches IP address labels, i.e. name of a data source see [0047] The graph shown in the main panel 102 of the GUI comprises the set of nodes (represented in the figure by node
names or unique identifiers for a variety of different node types, e.g., virtual machines, IP addresses, services, warfighting functions (WfF), and non-lightweight directory interchange format (non-LDIF) entities);
see also [0048] The "(inside:Inside_IP)" portion of the query is an instruction to
identify IP addresses that reside inside of a protection perimeter. The "(:Outside_IP)" portion of the query is an instruction to identify IP addresses that reside outside of a protection perimeter. Thus, the "(:Services)-[s]-(inside: Inside_IP)-[r]-(:Outside_IP)" portion of the query is an
instruction to identify Inside IP addresses that are talking to Outside IP addresses, such that those Inside IP addresses support network services.;
see also [0069] In response to the user entering the query "most influential IP addr" in panel 1010, the system returns, e.g., a list of the most influential node(s) of type IP addr (by unique identifier) in panel 1012.).
As to claim 7, Difonzo as modified discloses the method of claim 1, comprising:
identifying one or more unrecognized words or phrases in the natural language
user query;
(Difonzo teaches named entity recognition for queries, i.e. identifying words/phrases in a query see [0036] process the natural language query using a named entity recognition (NER) machine learning model to extract named entities from the natural language query and tag them according to an entity type; process the tagged named entities using a semantic similarity algorithm to identify corresponding nodes and edges, and their associated properties, in the graph database; process the natural language query using an intent classification machine learning model
to determine a user intent for the natural language query; and apply a user intent-based template to the identified nodes and edges to formulate a graph database query that corresponds to the natural language query;
see also [0059] III. Semantic Similarity Checking: a semantic similarity checking algorithm 416 is used to compare the extracted entities returned from the NER model to properties of the graph database model ( e.g., node types, node names, edges, node properties, edge properties, unique identifiers, etc.) in a CyGraph knowledge base. This aspect of the process is included to keep the NLP and query generation process domain-agnostic.)
And Naufel as modified discloses:
querying a vector database with the one or more unrecognized words or phrases;
(Naufel teaches vector-based model to generate structured data queries and Retrieval-Augmented Generation (RAG), i.e. querying a vector database with the one or more unrecognized words or phrases see [0073] Computing device 100 may implement a graph based Natural Language Processing (NLP) system by implementing a hybrid model that combines graph-based and vector-based approaches. Such a hybrid model may append attribute data to the nodes within the graph database, wherein the attribute data includes but is not limited to metadata, context-specific information, or vector embeddings that represent the semantic meaning of the data or its relation to other data points within the graph. Utilizing the appended attributes to augment the retrieval process, may enable such a hybrid model system to perform more nuanced queries that consider both the graph structure and the semantic relationships encapsulated within the vector embeddings of the attribute data, thus allowing for a deeper and more comprehensive exploration and analysis of the data. For instance, such a hybrid model system may leverage an enhanced Retrieval-Augmented Generation (RAG) process to dynamically query, retrieve, and visualize data, facilitating both breadth and depth in data exploration and
insights generation, with the capability to address complex user queries that require understanding the interplay between different data points and their attributes. In certain examples, computing device 100 executes the AI language model in conjunction with the hybrid graph/vector-based model to generate structured data queries and predictive outputs that are informed by the enriched data model, providing outputs that reflect a more sophisticated understanding of the query context and user intent, based on the enhanced representation of data within the hybrid model.)
and
locating the one or more unrecognized words or phrases in the vector database
(Naufel [0246] In some examples, processing circuitry 199 is further configured to utilize the attribute data appended to the new nodes within the graph database to augment data retrieval of the data from the graph database. In some examples, the attribute data appended to the new nodes enables the system to execute queries with increased fidelity that consider both the data structure for the graph database and semantic relationships encapsulated within the first
vector embeddings of the attribute data. In some examples, use of the semantic relationships encapsulated within the first vector embeddings of the attribute data enable deeper analysis into the data and more comprehensive exploration of the data than without use of the semantic relationships.).
As to claim 9, Naufel as modified discloses the method of claim 7, wherein the vector database comprises a plurality of vectorized documents
(Naufel [0074] A vector space model considers the relationship between data represented by vectors. Such a model is an algebraic model for representing text documents ( or more generally, items) as vectors such that the distance between vectors represents the relevance between the documents).
As to claim 10, Naufel as modified discloses the method of claim 9, wherein each vectorized document corresponds to a node of a knowledge graph
(Naufel teaches hybrid model that combines graph-based and vector-based approaches, i.e. wherein each vectorized document corresponds to a node of a knowledge graph see [0073] Computing device 100 may implement a graphbased Natural Language Processing (NLP) system by implementing a hybrid model that combines graph-based and vector-based approaches. Such a hybrid model may append attribute data to the nodes within the graph database, wherein the attribute data includes but is not limited to metadata, context-specific information, or vector embeddings that represent the semantic meaning of the data or its
relation to other data points within the graph.;
see also loading data sources into nodes [0015] generating, via the AI language model, an executable load script having self-written code to load the data extracted from the one or more original data sources into the data structure of the graph database as new nodes and new relationships with directionality between the new nodes and having metadata parameters within the new nodes describing the data loaded into the graph database;).
As to claim 11, Naufel as modified discloses the method of claim 9, wherein locating the one or more unrecognized words or phrases in the vector database comprises:
identifying, in the plurality of vectorized documents, one or more document portions semantically matching the one or more unrecognized words or phrases in the natural language user query;
(Naufel [0245] In some examples, processing circuitry 199 is further configured to append attribute data to the new nodes within the graph database. In some examples, the attribute data includes one or more of: metadata describing the data; context-specific information for the data; first vector embeddings representing semantic meaning of the data; and second vector embeddings representing relation of a first data point within the data to a second one or more data points within the data stored to the graph database.)
And Difonzo as modified discloses
adding node types and unique identifiers corresponding to the one or more document portions to a list of recognized node types
(DIfonzo [0059] III. Semantic Similarity Checking: a semantic similarity checking algorithm 416 is used to compare the extracted entities returned from the NER model to properties
of the graph database model ( e.g., node types, node names, edges, node properties, edge properties, unique identifiers, etc.) in a CyGraph knowledge base. This aspect of the
process is included to keep the NLP and query generation process domain-agnostic.).
As to claim 12, Naufel as modified discloses the method of claim 1, wherein the natural language response comprises an indication that the results of the graph database query fail to answer the natural language user query
(Naufel [0153] When graph database querying engine 207 encounters an error or a failed query, the error information is passed back to contextualized GPT model 205/GPT-x model. Contextualized GPT model 205//GPT-x model, which is specifically designed to handle error feedback, processes this information and uses it to refine its understanding of the user's intent and the underlying data structure. With this enhanced understanding, contextualized GPT model 205/GPT-x model generates a modified query that aims to resolve the encountered error.).
As to claim 13, Naufel as modified discloses the method of claim 12, comprising:
providing, to a user, an indication of one or more unrecognized words or phrases in the natural language user query that caused the results of the graph database query to fail to answer the natural language user query
(Naufel teaches using various reinforcement learning feedback/ user feedback for error correction, i.e. providing, to a user, an indication of one or more unrecognized words or phrases in the natural language user query that caused the results of the graph database query to fail;
See [0078] A feedback loop with errors 261 between contextualized GPT model 205 (e.g., an improved GPT-x model) and graph database querying engine 207 is further provided, such that, when graph database querying engine 207 or compatible trained AI model returns an error, the error information is passed back to contextualized GPT model 205 via its error-feedback model specially trained for refining queries based on the received error information. The error-feedback model may generate a corrected query that avoids the previously encountered error or generate
improved model weights which are then transferred into the trained AI model or incorporated into a newly trained model variant. Feedback loop with errors 261 increases the reliability
and accuracy of GPT system 200, allowing it to adapt and improve its query generation over time.;
See also [0155-0156] [0155] This reinforcement learning feedback loop with errors 261 mechanism not only enhances the overall user experience but also contributes to the continuous improvement of understanding by contextualized GPT model 205/ GPT-x model of complex data structures and user intent. As a result, GPT system 200 becomes more efficient and effective over time, adapting to the specific needs and requirements of the users and the diverse data structures they interact with. This dynamic learning process further differentiates the present disclosure from existing solutions in the market and adds significant value to the user experience. [0156] Further still, the reinforcement learning feedback loop mechanism is extendable to incorporate user input, enabling users to provide direct feedback regarding performance of GPT system 200, query results ( e.g., queried data 258), and any encountered issues. This user feedback may optionally be integrated into learning processes of contextualized GPT model 205/GPT-x model, further enhancing ability of contextualized GPT model 205/GPT-x model to determine user intent, preferences, and the intricacies of the
underlying data structures.
See also [0126] User Feedback and Iteration: According to certain examples, a user may optionally review the results of the data augmentation and provide user feedback 708, which
GPT system 200 uses to refine and improve the process. GPT model for data augmentation 704 may therefore iterate and learn from user feedback 708, continuously enhancing performance and accuracy of GPT model for data augmentation 704 with identifying relevant external data sources 706 and integrating those identified external data sources 706 into graph database 203.;
See also [0454] According to alternative embodiments of the exemplary system, the configured GPT model assists in the data cleansing process during the ETL process by detecting and correcting errors, standardizing formats, and removing duplicates.).
As to claim 14, Naufel as modified discloses the method of claim 1, comprising:
identifying one or more nodes, edges in the results; and
adding the one or more nodes, edges to a results dictionary
(Naufel teaches data augmentation see [0119-0121]
[0119] According to certain examples, GPT model for data augmentation 704 significantly enhances the AI-based natural language data query and visualization system by adding
relevant external data 772 to pre-existing data 771 in graph database 203, so as to supplement pre-existing data 771, based on received user feedback 708 and requests based on interpretation 753. According to such examples, GPT model for data augmentation 704 applies at least the following operations: [0120] User Request for Data Augmentation: The user
specifies the need for data augmentation through natural language input, which GPT model for data augmentation 704 interprets to determine the context of the request to generate request based on interpretation 753 which is submitted to graph database 203. [0121] Configured GPT Model for Data Augmentation: Computing device 100 configured as a GPT system for data cleansing and augmentation works with GPT model for data augmentation 704 to identify relevant external data sources 706 that can be added to pre-existing data 771 in graph database 203. GPT model for data augmentation 704 leverages a priori understanding ( e.g., schema mappings) of the context and relationships within user data or within company
data to make intelligent recommendations for data augmentation.).
As to claim 15, Difonzo as modified discloses the method of claim 1, comprising:
identifying graph-specific terminology in the natural language response; and re-wording the graph-specific terminology using natural language
(Difonzo teaches determining various graph related intents of the user query and rewording that intent to execute the query, i.e. identifying and re-wording the graph-specific terminology using natural language See Fig. 10; see also [0069] As illustrated in FIG. 10, in response to a user entering the query "centrality" in panel 1002, the system returns, e.g., a list of the top five most influential nodes 1004 (listed by unique identifiers in this example). In response to the user entering the query "centrality of 10.1.0.179" in panel 1006, the system returns, e.g., values for the eigenvector centrality, betweenness centrality, and degree centrality for node 10.1.0.179 in panel 1008. In response to the user entering the query "most influential IP addr" in panel 1010, the system returns, e.g., a list of the most influential node(s) of type IP addr (by unique identifier) in panel 1012. In response to the user entering the query "degree of situational awareness" in panel 1014, the system returns, e.g., the degree for a SituationalAwareness node in panel 1016. Degree returns the number of edge connections a specific node has, e.g., in this example, the SituationalAwareness node contains 4,440 direct connections to other nodes in the dataset.;
See also [0068] For example, in some instances, the NLP service may determine that the
user's request is the centrality intent, which calls graph analytic algorithms that blend eigenvector centrality, betweenness centrality, and degree centrality measures to identify influential nodes. Eigenvector centrality is a measure of the influence of a node in a network based on an assignment of relative scores to a set of nodes where, for a given node, connections to other high-scoring nodes contribute more to the score of the node than connections to low-scoring nodes. Betweenness centrality is a measure of centrality based on shortest paths between a group of interconnected nodes. Degree centrality is a measure of
centrality based simply on the number of edges that a node has. Additionally, in some instances, the graph analytic algorithm may determine intents for degree, neighbor, flow, and clustering. The degree intent directly returns the number of edge connections a specific node has. The neighbor intent returns results that depict important nodes with respect to the target node referenced in the user's request. The flow algorithm is similar to the neighbor intent, but returns other important nodes that may not be directly connected through one edge connection to the target node. The flow algorithm may allow an analyst to stop further propagation of a threat. The clustering intent clusters the graph into communities of nodes with similar properties, and returns the cluster or group that a node may be part of in the graph model.;
see also [0065] After viewing a translated query, the system operator may press the cancel button (616 in FIG. 6) thereby prompting the GUI to display a "cancel" message in panel 702 and a confirmation message in panel 704. The system operator may rephrase the query and enter it in panel 706, thereby causing the NLP/query generation service to provide a new translation into formal graph database query language, as illustrated in panel 708.).
As to claim 16, Naufel as modified discloses the method of claim 1, comprising:
providing the natural language response to a user
(Naufel [0242] Computing device 100 may return output to a user in response to the question (1445). For example, processing circuitry 199 of computing device 100 may return output in a structured format to a user-device having originated the user-input. ;
See also [0292] According to alternative embodiments of the exemplary system, the natural language processing component includes a natural language understanding component
that understands the intent of the user's natural language input and a natural language generation component that generates natural language summaries and reports based on
the data structure and user's query.;
see also [0315] According to alternative embodiments of the exemplary system, the natural language generation component utilizes advanced text generation techniques, such as
neural networks and deep learning, to produce accurate, coherent, and contextually relevant natural language summaries and reports based on the user's queries and preferences.).
As to claim 19, Difonzo as modified discloses the method of claim 18, wherein generating, based on the knowledge graph, training data for offline fine-tuning of the large language model comprises:
selecting a first node and a second node from the knowledge graph, wherein the
first node and the second node are connected by at least one path through the knowledge graph;
(Difonzo [0055] I. Intent Classification: an intent classification model 404 (e.g., a trained Bidirectional Encoder Representations from Transformers (BERT) model, a long short-term
memory (LSTM) model, or a Naive Bayes algorithm) is used to determine the query intent of the English question submitted by the user. Each learned intent corresponds to a parameterized class of queries. The intent classification model is trained in process step 1 of FIG. 4B using annotated training data comprising English query and intent pairs 402 as illustrated in FIG. 4A. Examples of query classes include, but are not limited to, loadAll ( e.g., shows all nodes or edges of a specific type), noDirection (e.g., relationship between nodes/entities with no specific direction from one entity to the other. Shows a general relation), toDirection (e.g., directional
relationship from one entity to), orQuery ( e.g., relationship for more than one node/edge entity), retumCount)
identifying a shortest path between the first node and the second node;
(Difonzo [0068] Betweenness centrality is a measure of centrality based on shortest paths between a group of interconnected nodes. Degree centrality is a measure of centrality based simply on the number of edges that a node has. Additionally, in some instances, the graph analytic algorithm may determine intents for degree, neighbor, flow, and clustering.)
and
generating one or more prompts, wherein a response to the one or more prompts
comprises the shortest path between the first node and the second node
(Difonzo [0076] The CyGraph cybersecurity analysis system builds information-rich graph models from various network and host data sources, thereby fusing isolated data and events
into a unified model of a computer network. Using this graph model, security analysis and computer network operators can apply powerful graph queries that are able to identify, for example, multistep threat pathways for accessing key cyber assets, as well as other patterns of cyber risk. The tool correlates and prioritizes alerts in the context of network vulnerabilities and key network assets.;
see also claim 2 “2. The computer-implemented method of claim 1, further comprising dynamically analyzing paths in a graph data model stored in the graph database to automatically determine a number of edge connections between a pair of the identified nodes.”).
As to claim 27, Naufel as modified discloses
the method of claim 1, wherein receiving results of the graph database query comprises:
receiving a notification of an error in the graph database query;
(Naufel teaches error feedback i.e. a notification of an error in the graph query see [0078] A feedback loop with errors 261 between contextualized GPT model 205 (e.g., an improved GPT-x model) and graph database querying engine 207 is further provided, such that, when graph database querying engine 207 or compatible trained AI model returns an error, the error
information is passed back to contextualized GPT model 205 via its error-feedback model specially trained for refining queries based on the received error information.;
see also [0346] According to alternative embodiments of the exemplary system, the system is designed to support realtime monitoring and alerting, enabling users to set up
custom alerts and notifications based on specific data events, trends, or anomalies, ensuring timely detection and response to potential issues or opportunities;
see also [0375] According to alternative embodiments of the exemplary system, the system is designed to support the implementation of custom alerting and notification mechanisms,
enabling users to configure and receive timely alerts based on specific data events or conditions, facilitating proactive decision-making and response.)
recasting, using the large language model, the graph database query to eliminate
the error;
(Naufel teaches generate a corrected query, i.e. “recasting, using the large language model, the graph database query to eliminate the error” [0078] The error-feedback model may generate a corrected query that avoids the previously encountered error or generate improved model weights which are then transferred into the trained AI model or incorporated into a newly trained model variant. Feedback loop with errors 261 increases the reliability and accuracy of GPT system 200, allowing it to adapt and improve its query generation over time.
See also [0110] Error Correction: A specially configured error correction GPT model may correct any identified errors, such as spelling mistakes, typos, and data entry errors. This process
applies algorithms and predefined thresholds and rules to identify and rectify errors, which results in the model systematically enhancing the overall quality of processed data 652.;
see also [0060] If querying engine 207 generates an error, this error, referred to as a "known error" is sent to contextualized GPT model 205, which has a particular model for
error correction utilizing feedback loop with errors 261. Contextualized GPT model 205, responsive to the error, re-generates query and visualization suggestion 259 as a new or modified query which is sent back to graph database 203. This process loops or repeats via feedback loop with errors 261 until data is properly returned responsive to the generated (or re-generated) graph DB query or until the process times out.).)
querying the graph database using the recast graph database query generated by
the large language model;
(Naufel [0270] In some examples, processing circuitry 199, responsive to determining the one or more errors are triggered by loading the data extracted from the one or more original data sources into the graph database, loop the one or more errors from the graph database into the AI language model to self-correct the executable load script. [0271] In some examples, processing circuitry 199 iteratively repeats the loop until the data is loaded successfully
into the graph database or until a threshold number of attempts is satisfied.;
see also [0454] According to alternative embodiments of the exemplary system, the configured GPT model assists in the data cleansing process during the ETL process by detecting and correcting errors, standardizing formats, and removing duplicates.)
and
receiving results of the recast graph database query.
(Naufel [0270] In some examples, processing circuitry 199, responsive to determining the one or more errors are triggered by loading the data extracted from the one or more original data sources into the graph database, loop the one or more errors from the graph database into the AI language model to self-correct the executable load script. [0271] In some examples, processing circuitry 199 iteratively repeats the loop until the data is loaded successfully
into the graph database or until a threshold number of attempts is satisfied.;
see also [0454] According to alternative embodiments of the exemplary system, the configured GPT model assists in the data cleansing process during the ETL process by detecting and correcting errors, standardizing formats, and removing duplicates.;
see also [0060] If querying engine 207 generates an error, this error, referred to as a "known error" is sent to contextualized GPT model 205, which has a particular model for
error correction utilizing feedback loop with errors 261. Contextualized GPT model 205, responsive to the error, re-generates query and visualization suggestion 259 as a new or modified query which is sent back to graph database 203. This process loops or repeats via feedback loop with errors 261 until data is properly returned responsive to the generated (or re-generated) graph DB query or until the process times out.).)
Referring to claim 29, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 29.
Referring to claim 30, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 30.
Referring to claim 31, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 31.
Referring to claim 34, this dependent claim recites similar limitations as claim 7;
therefore, the arguments above regarding claim 7 are also applicable to claim 34.
Referring to claim 36, this dependent claim recites similar limitations as claim 9;
therefore, the arguments above regarding claim 9 are also applicable to claim 36.
Referring to claim 37, this dependent claim recites similar limitations as claim 10;
therefore, the arguments above regarding claim 10 are also applicable to claim 37.
Referring to claim 38, this dependent claim recites similar limitations as claim 11;
therefore, the arguments above regarding claim 11 are also applicable to claim 38.
Referring to claim 39, this dependent claim recites similar limitations as claim 12;
therefore, the arguments above regarding claim 12 are also applicable to claim 39.
Referring to claim 40, this dependent claim recites similar limitations as claim 13;
therefore, the arguments above regarding claim 13 are also applicable to claim 40.
Referring to claim 41, this dependent claim recites similar limitations as claim 14;
therefore, the arguments above regarding claim 14 are also applicable to claim 41.
Referring to claim 42, this dependent claim recites similar limitations as claim 15;
therefore, the arguments above regarding claim 15 are also applicable to claim 42.
Referring to claim 43, this dependent claim recites similar limitations as claim 16;
therefore, the arguments above regarding claim 16 are also applicable to claim 43.
Referring to claim 46, this dependent claim recites similar limitations as claim 19;
therefore, the arguments above regarding claim 19 are also applicable to claim 46.
Referring to claim 54, this dependent claim recites similar limitations as claim 27;
therefore, the arguments above regarding claim 27 are also applicable to claim 54.
Claim(s) 5-6, 18, 32-33, 45 is/are rejected under 35 U.S.C. 103 as being unpatentable over
Difonzo et al. US Pub. No.: 2022/0414228, in view of Xu et al., US Pub. No. 2025/0200044 A1, in view of, in view of Naufel et al., US Pub. No. 2024/0362208 A1, in view of Rogers et al., US Pub. No.: 2021/0350248 A1.
As to claim 5, Difonzo/Xu/Naufel do not disclose:
generating a knowledge graph, wherein the knowledge graph comprises a
plurality of nodes and a plurality of edges;
grouping the plurality of nodes into a plurality of node types and the plurality of
edges into a plurality of edge types;
generating a type graph comprising the plurality of node types, the plurality of
edge types, a semantic description of each node type and edge type, and a name of a data
source from which each node type and edge type originate;
However, Rogers discloses:
generating a knowledge graph, wherein the knowledge graph comprises a
plurality of nodes and a plurality of edges;
(see Rogers teaches generating a knowledge graph with a plurality of nodes and a plurality of edges [0064-0066]: [0064] The input knowledge graph 190 is generated by analyzing the events from the STEM that reported the security incident, extracting observables/indicators from the events to make up the nodes of the graph, and enriching the graph 190 with indicator nodes from internal/external threat intelligence. The generation of the original knowledge graph
190 is not part of the exemplary embodiments herein, though it could be. Instead, exemplary embodiments take the already generated knowledge graph 190 and restructure the
graph to allow the graph to be visualized in a manner to achieve one or more goals described above and help a security analyst quickly identify a breach or compromise. [0065] The input knowledge graph 190 can contain nodes that represent internal network assets (e.g., IP addresses representing servers, desktops, mobile devices belonging to the organization that the security incident is reported for), nodes that represent external connection endpoints.. [0066] The edges 160 between each node in the input knowledge graph represent the relationship of nodes to each other. There are multiple possible types of edges, such as a CONNECT edge between two IP address nodes indicates that a connection from the source node to destination node was observed. A RESOLVE edge between an IP address and domain node indicates that domain name resolves to that IP address. A CONTAINS edge between an IP address and a
File node indicates that File has been observed on that host. A USES edge between an IP address and User/Person node indicates that the user has logged onto or has been using that
host.)
grouping the plurality of nodes into a plurality of node types and the plurality of
edges into a plurality of edge types;
(Rogers teaches grouping together related entities from the knowledge graph, i.e. grouping the plurality of nodes into a plurality of node types and the plurality of
edges into a plurality of edge types see [0008] accessing information for a knowledge graph, the knowledge graph having nodes and edges of a network and having information about one or more security incidents in the network; grouping together related entities from the knowledge graph, where the related entities that are grouped together are determined not only by types of the entities, but also by one or more threats impacting the entities,)
generating a type graph comprising the plurality of node types, the plurality of edge types, a semantic description of each node type and edge type, and a name of a data source from which each node type and edge type originate
(Rodgers teaches grouping types of threats/sources into swim lane categories, i.e. type graphs see [0079] Using this metaphor, the inventors of exemplary embodiments herein produced several different concepts, one of which featured the use of four "swim lanes." FIG. 2,
which illustrates a knowledge graph visualization 590 in accordance with an exemplary embodiment, shows swim lanes 510, which are different areas of the visualization embodied in the UI based on types. This illustrates the previous knowledge graph 190 but visualized in accordance with an exemplary embodiment herein. The swim lanes are the following: swim lane 510-1, corresponding to the alert source (of the node/icon 120, the person Celino_Espinosa);
swim lane 510-2, corresponding to assets, with a group of four assets 520-1 and a group of two assets 520-2; swim lane 510-3 of external connections of 530-1, which is an IP
address of 46.23.147.161, and 530-2, which is four ( 4) remote connections, and also two indications of threat relationships 540-1 and 540-2; and swim lane 510-4 of threats, which includes threats 550-1 of two threat actors and threats 550-2 of four malware. In more detail, reference 540 indicates an edge ( e.g., a link) from external connections 530 ( e.g., as nodes) to threats 550 (which also may be nodes) and shows the relationship between the external connections and threats. The "Known Threat" label for the edges for threat relationships 540-1 and 540-2 is showing the type of the edge. So, the external connections (e.g., nodes) 530 are
known to be associated with the threats 550, and reference 540 is showing this relationship. There is also a "Suspected Threat" type edge, which is shown as threat relationship 540-3. This would be used instead of"Known Threat" as in reference 540-2 if the system was not 100 percent (%) certain the nodes 530 were associated with threats 550, but in this case, the system was 100% certain of the relationship. The "Suspected Threat" is shown with dotted lines to
indicate it is a possibility, but the example of FIG. 2 (and other figures) assumes the threat is known, so "Suspected Threat" is not used. Also, this example uses 100% certainty, but a smaller threshold ( e.g., 90%) or another criterion could be used.;
see also [0088] Note that offense source, asset, external connections, and threats are the only swim lanes when visualizing an offense for one example. These swim lanes 510 are
picked based on what is most important to the consumer of the graph, in this case, a security analyst.; see also [0080-0085]).
It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply grouping types of nodes/threats from a knowledge graph as taught by Rogers to the system of Difonzo/Xu/Naufel since it was known in the art that query systems provide for this visualizing a knowledge graph in a way which reduces complexity by clustering ( also referred to as grouping) related entities together where related entities that can be clustered together are determined not only by the type of the entities but also by the threats impacting them where these are separated by and correspond to the swim lanes where the new graph representation (illustrated as visualization) also provides an easy-to-follow path starting from the source of the security incident-typically a user or an internal asset or an external
entity-and leading to the threat that allows the security analyst to quickly identify how the security breach proceeded through their network where the source is indicated by the node/icon and indicates a person and ends at the threats in a swim lane where the new graph representation reduces the clutter of the old diagram by allowing security analysts to selectively expand clusters on which they would like to see more details. (Rogers [0083]).
As to claim 6, Rogers as modified discloses the method of claim 5, wherein the graph database comprises the knowledge graph and the type graph
(Rogers teaches a security incident knowledge graph and grouped graph data structures/”swim lanes”, i.e. comprises the knowledge graph and the type graph See [0076] To address this and other goals, the inventors who created a new knowledge graph visualization, as described
herein, had the desire to create something that would help analysts "connect the dots" so that the analysts could tell the story of what had happened. The inventors recognized that the previous visualization, while technically correct, was not very consumable, nor did it meet the goals the inventors had for themselves for designing for AI. By contrast, and as an overview, the following material describes an exemplary utility to process an existing security incident knowledge graph to generate a new grouped graph data structure that can be used to achieve exemplary solutions described herein.;
see also [0089] FIG. 3 is a method of processing input knowledge graph data to make the data more useful for subsequent visualization via a grouped graph data structure. For
instance, the data could correspond to the input knowledge graph 190 and particularly to the source of the incident, which is illustrated in FIG. 2 by node/icon 120. The steps and sub-steps are numbered and are illustrated in FIG. 3, which is split into FIGS. 3A, 3B, 3C, 3D, and 3E. The blocks in this figure are performed by the computer system 220, e.g., under control of the security visualization program 230 and UI program 240 in some examples. The processing
results in an exemplary embodiment in a grouped graph data structure that is then used to display a knowledge graph visualization 590 as in FIG. 2.;
see also [0108] In step 5, the outputs of step 3 (e.g., one or more groups of internal assets, external connections, and threats) are then assigned to their respective swim lanes 510 by the
security visualization program 230 via the following. Note that this example places the swim lanes in columns, as illustrated in FIG. 2, but this is merely one example, and other techniques (e.g., rows) could be used. Note also that the swim lanes align with the swim lanes 510 of FIG. 2, but these swim lanes 510 are merely exemplary, and others might be used.).
As to claim 18, Naufel as modified discloses the method of claim 5, comprising:
generating, based on the knowledge graph, training data for offline fine-tuning of
the large language model
(Naufel teaches fine tuning using extracted data see [0096] Optionally, model fine-tuning 505 may be applied for finetuning extracted ETL data 551 and thus improving the training of contextualized GPT model 205.;
See also [0129] Model Training: Contextualized GPT model 205 may optionally be fine-tuned using the prepared data structure as well as through the use of additional domain-specific
training data, depending on the chosen implementation. This training process enables contextualized GPT model 205 to better determine the nuances and relationships within data
251 specified by a given user or company, which improves the accuracy of contextualized GPT model 205 responses to natural language queries 265;
See also [0250] In some examples, the condensed structure is used to fine-tune and to guide the AI language model in producing queries to the graph database which return responses which satisfy one or more of the relevance threshold, the accuracy threshold, and the usefulness threshold.).
Referring to claim 32, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 32.
Referring to claim 33, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 33.
Referring to claim 45, this dependent claim recites similar limitations as claim 18;
therefore, the arguments above regarding claim 18 are also applicable to claim 45.
Claim(s) 8, 35 is/are rejected under 35 U.S.C. 103 as being unpatentable over
Difonzo et al. US Pub. No.: 2022/0414228, in view of Xu et al., US Pub. No. 2025/0200044 A1, in view of, in view of Naufel et al., US Pub. No. 2024/0362208 A1, in view of Lei et al., US Pub. No.: 2022/0207343 A1.
As to claim 8, Difonzo/Xu/Naufel do not disclose:
adding one or more words or phrases in the natural language user query not
located in the vector database to a list of words or phrases having unrecognized node
types;
However, Lei discloses the method of claim 7, comprising:
adding one or more words or phrases in the natural language user query not
located in the vector database to a list of words or phrases having unrecognized node
types
(Lei teaches generate a vector representation for an unknown term and finding candidate lists or unknown terms/sets/subsets of target nodes based on thresholds/scoring based on term/node vector matching, i.e. adding one or more words or phrases in the natural language user query not located in the vector database to a list of words or phrases having unrecognized node types;
see [0070] FIG. 8 demonstrates the embedding generation of a single target node F 1 .
In this regard, the process demonstrated in FIG. 8 can be applied to a query graph ( e.g., query graph 302, query graph 604, and the like) to generate a vector representation for an unknown term ( e.g., "ARF") as the target node. The processes demonstrated in FIG. 8 can also be separately applied to sub-graphs for each node in the KG 304 to generate reference vector representations for each node (e.g., as illustrated in FIG. 4);
See also [0046] A matching network 314 can further compare the query node vector representation 310 with (each) of the reference node vector representations 312 using one or more similarity scoring algorithms/metrics to determine a matching score 316 for each query node vector representation/ reference node vector representation pair. In some embodiments,
the target node/term with the highest matching score can be returned as the match for the unknown term. In other embodiments, a ranked list of the top N percent scoring (e.g., wherein N can vary based on the application) target nodes/ terms can be returned as potential candidates. Still in other embodiments, a thresholding analysis can be applied wherein the system returns a finding of "no match found" if the highest scoring target node/term has a matching score ( or
similarity score) below a defined threshold value.
See also [0013] With these embodiments, the method can further comprise determining,
by the system, degrees of similarity between the vector representation (for the unknown term) and the vector representations (for the terms in the KG) to facilitate identifying a term in the KG that corresponds to the unknown term.;
see also [0050] Additionally, or alternatively, a select subset of target nodes can be targeted and processed to generate vector representations therefore simultaneously with processing of
the query graph 302. The select subset of target nodes can be determined based on one or more parameters of the query graph 302.;
see also [0052] The disclosed systems then employ the learned model in the runtime querying process. Specifically, given a text snippet, the system creates a query graph for one ambiguous entity in the snippet.)
It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply unknown/ambiguous term/node type vector matching as taught by Lei to the system of Difonzo/Xu/Naufel since it was known in the art that query systems provide for selecting subset of target nodes can be targeted and processed to generate vector representations therefore simultaneously with processing of the query graph where the select subset of target nodes can be determined based on one or more parameters of the query graph where the select subset of target nodes can include those nodes in the knowledge graph having at least one guided-metapath neighbor corresponding to a same node in a query graph or where the select subset of target nodes can include nodes of a specific type (e.g., medication, adverse effect, indication, finding, etc.) where the parameters of the query graph such as information identifying the entities, their types and their relationships, can be shared with the second instance of the heterogeneous GNN to facilitate identifying target nodes and reducing the search space for which nodes vector representations are generated.. (Lei [0050]).
Referring to claim 35, this dependent claim recites similar limitations as claim 8;
therefore, the arguments above regarding claim 8 are also applicable to claim 35.
Claim(s) 20-26, 47-53 is/are rejected under 35 U.S.C. 103 as being unpatentable over
Difonzo et al. US Pub. No.: US 2022/0414228, in view of Xu et al., US Pub. No. 2025/0200044 A1, in view of, in view of Naufel et al., US Pub. No. 2024/0362208 A1, in view of Tomkins et al. US Pub. No. 2021/0295822 A1.
As to claim 20, Difonzo/Xu/Naufel do not disclose:
wherein generating, a first prompt to cause a large language model to generate a graph database query comprises:
generating the first prompt, wherein the prompt comprises a user-role prompt component and a system-role prompt component;
providing the first prompt to the large language model; and
receiving the graph database query from the large language model in response to the
first prompt;
however Tomkins discloses the method of claim 1,
wherein generating, a first prompt to cause a large language model (Tomkins [0037] The machine learning models may include a transformer neural network model
such as Elmo, BERT, or the like. In some embodiments, the machine learning model may improve data ingestion accuracy by generating attention vectors for n-grams of an
ingested document when performing document analysis or text summarization operations)
to generate a graph database query comprises:
generating the first prompt, wherein the prompt comprises a user-role prompt component and a system-role prompt component
(Tomkins teaches expanding queries for ontology/domain/user context data i.e. wherein the prompt comprises a user-role prompt component and a system-role prompt component
See [0051] The client computing device 202 may send a query 204 to a computer system 250. Data sent in the query 204 from the client computing device 202 may include query text or terms
used to retrieve documents. In some embodiments, the query 204 may include or otherwise be associated with session data, such as an account identifier, a username, an indicated set of domain indicators, a feature associated with a user, or the like. In some embodiments, the session data may be provided as a list of context parameters, a vector of values, or the like. As further discussed below, some embodiments may expand a query based on an ontology graph to increase the effectiveness of a semantic search.
See also [0299] In some embodiments, the process 1900 may include retrieving a set of ontology graphs based on the set of context parameters, as indicated by block 1904. As described elsewhere in this disclosure, the set of context parameters may directly identify one or more ontology graphs available to a user. Alternatively, or in addition, some embodiments may determine a set of user roles or other user categories associated with a user and determine a set of ontology graphs based on the set of user roles or other user categories. For example, some embodiments may determine that a user account is labeled with the user role "Level 4
specialist," and retrieve a set of ontology graphs for use.;
see also [0302] In some embodiments, the process 1900 may include obtaining a first message requesting a set of documents of corpora, as indicated by block 1912. As discussed elsewhere in this disclosure, a message requesting a set of documents may be provided in the form of a query without identifying a specific document, where some embodiments may send the document in response to the query)
providing the first prompt to the large language model; and
(Tomkins teaches updates to one or more machine learning operations, query expansion operations, document retrieval operations, i.e. providing the prompt to the large language model [0295] As described elsewhere in this disclosure, an update to an-gram in a UI may update an association between a first n-gram and an ontology graph by generating a vertex mapped to the first n-gram, deleting the vertex, or modifying the vertex. The update to the n-gram may cause additional updates to other operations, such as updates to one or more
machine learning operations, query expansion operations, document retrieval operations, or the like. For example, as further described below, some embodiments may update a machine learning operation based on an update to a text document in a user interface. By augmenting a user interface with an updated ontology graph, some embodiments may reduce the computation time required to perform dynamic, user-specific content display in a user interface)
receiving the graph database query from the large language model in response to the
first prompt;
(Tomkins teaches generating queries for knowledge fabric augmented with data provided by the knowledge-processing system, i.e. receiving a graph database query from the large language model [0293] The language system 1808 may be used to provide a set of domain datasets 1830. The set of domain datasets 1830 may include data from the knowledge fabric 1812
augmented with data provided by the knowledge-processing system 1820. Some embodiments may then access the set of domain datasets 1830 when using the information retrieval or analysis system 1840. As described elsewhere in this disclosure, some embodiments may further augment the set of domain datasets 1830 with a set of indices 1832, where the set of indices 1832 may have been generated by the language system 1808 using one or more operations described in this disclosure. For example, the language system 1808 may generate a set of queries based on text from documents in the knowledge fabric 1812, where some
embodiments may generate or update the set of indices 1832 based on the set of queries.).
It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply generating queries for knowledge fabric augmented with data provided by the knowledge-processing system as taught by Tomkins to the system of Difonzo/Xu/Naufel since it was known in the art that query systems provide for role association with search operations as this allows some embodiments may surpass other language models when used to perform searches in domain-specific tasks where for example, some embodiments may achieve a 10 to 100% improvement in question-answer retrieval precision in a role-specific domain based on the role's association with specific classes of information associated with domain-specific knowledge and where some embodiments may include domain-specific operations or terminology, relationships, or contexts that may be relevant in only one domain or a small number of domains. (Tomkins [0050]).
As to claim 21, Tomkins as modified discloses the method of claim 20, wherein the user-role prompt component comprises the natural language user query
(Tomkins [0036] Furthermore, the usefulness of any retrieved information may be limited if a user's choice of words, choice of phrasing, or the context of the query itself is not taken into
consideration. Retrieving meaningful information for a query under these conditions may require a different set of retrieval operations, where such operations may fall under the field of natural language understanding (NLU).).
As to claim 22, Tomkins as modified discloses the method of claim 20, wherein the system-role prompt component comprises a description of paths through the type graph between the one or more node types identified in the natural language user query
(Tomkins [0212] Various operations may be performed to retrieve related n-grams of an initial set of n-grams using an index. Some embodiments may search through a self-balancing search tree based on a key, where the key may be an n-gram or a learned representation of the n-gram. Some embodiments may search through the self-balancing search tree by starting at a root of the self-balancing search tree and recursively traversing tree nodes using the key to retrieve a second n-gram or corresponding embedding vector at a leaf node of the self-balancing search tree. Alternatively, or in addition, some embodiments may use an index stored in the form of a trie, where the trie may be associated with a first ontology and a second ontology such that it may be retrieved from a database or other data structure with identifiers of the first and second ontology. Some embodiments may traverse nodes of the trie based on an n-gram of the initial set of n-grams to retrieve a second n-gram, where the second
n-gram may be part of a different ontology. By using an index connecting n-grams or representations of n-grams between different ontologies, some embodiments may accelerate the speed of data retrieval, text summarization, or other operations described in this disclosure.).
As to claim 23, Tomkins as modified discloses the method of claim 22, wherein the description of paths through the type graph is generated by:
traversing one or more paths between each unique pair of node types identified in
the natural language user query;
(Tomins [0193] Alternatively, or in addition, some embodiments may traverse edges of different ontology graphs to select n-grams of a first ontology graph based on n-grams of a second ontology graph.; see also [0212] Various operations may be performed to retrieve related n-grams of an initial set of n-grams using an index. Some embodiments may search through a self-balancing search tree based on a key, where the key may be an n-gram or a learned representation of the n-gram. Some embodiments may search through the self-balancing search tree by starting at a root of the self-balancing search tree and recursively traversing tree nodes using the key to retrieve a second n-gram or corresponding embedding vector at a leaf node of the self-balancing search tree)
and
for each path, generating a graph database query match pattern corresponding to
the path and a textual description corresponding to the path
(Tomkins [0274] As described elsewhere in this disclosure, various operations may be performed to retrieve related n-grams of an initial set ofn-grams using an index. Some embodiments may search through a self-balancing search tree based on a key, where the key may be an n-gram or a learned representation of the n-gram. Some embodiments may search
through the self-balancing search tree by starting at a root of the self-balancing search tree and recursively traversing tree nodes using the key to retrieve a second n-gram or corresponding embedding vector at a leaf node of the self balancing search tree. Alternatively, or in addition, some embodiments may use an index stored in the form of a trie (i.e. prefix tree), where the trie may be associated with a first ontology and a second ontology such that it may be retrieved
from a database or other data structure with identifiers of the first and second ontology. Some embodiments may traverse nodes of the trie based on an n-gram of the initial set of
n-grams to retrieve a second n-gram, where the second n-gram may be part of a different ontology. By using an index connecting n-grams or representations of n-grams between different ontologies, some embodiments may accelerate the speed of data retrieval, text summarization, or other operations described in this disclosure.).
As to claim 24, Difonzo as modified discloses the method of claim 22, wherein a first path through the type graph comprises a single step from a first node type identified in the natural language user query to a second node type identified in the natural language user query, wherein the first node type and second node type are connected by a first edge type
(Difonzo teaches multi-label classification and named entity recognition for node typers/node/edge values pairs, i.e. multiple first node type and second node type(s) and edge types see [0056] In some instances, the intent classification model used to implement the disclosed methods may be trained to identify a number of different user intents in a natural language query. In some instances, the intent classification model may be trained to identify at least 5, at least 10, at least 15, at least 20, or at least 25 different user intents. In some instances, the intent classification model used to
implement the disclosed methods may be trained to perform multi-label classification of user intents rather than, e.g., binary classification, to provide more nuanced interpretation of user intent.;
see also [0058-0059] In some instances, the named entity recognition model (NER) model used to implement the disclosed methods may be trained to identify and tag a number of different
node and edge value pairs in a natural language query. In some instances, the named entity recognition model (NER) model may be trained to identify and tag at least 2, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 different node and edge value pairs. [0059] III. Semantic Similarity Checking: a semantic similarity checking algorithm 416 is used to compare the extracted entities returned from the NER model to properties of the graph database model ( e.g., node types, node names,
edges, node properties, edge properties, unique identifiers, etc.) in a CyGraph knowledge base. This aspect of the process is included to keep the NLP and query generation process domain-agnostic.).
As to claim 25, Difonzo as modified discloses the method of claim 22, wherein a first path through the type graph comprises a plurality of steps from a first node type identified in the natural language user query to a second node type identified in the natural language user query via at least a third node type, wherein the first node type and third node type are connected by at least a first edge type, and the third node type and the second node type are connected by at least a second edge type
(Difonzo teaches multi-label classification and named entity recognition for node typers/node/edge values pairs, i.e. multiple first node type and second node type(s) and edge types see [0056] In some instances, the intent classification model used to implement the disclosed methods may be trained to identify a number of different user intents in a natural language query. In some instances, the intent classification model may be trained to identify at least 5, at least 10, at least 15, at least 20, or at least 25 different user intents. In some instances, the intent classification model used to implement the disclosed methods may be trained to perform multi-label classification of user intents rather than, e.g., binary classification, to provide more nuanced interpretation of user intent.;
see also [0058-0059] In some instances, the named entity recognition model (NER) model used to implement the disclosed methods may be trained to identify and tag a number of different
node and edge value pairs in a natural language query. In some instances, the named entity recognition model (NER) model may be trained to identify and tag at least 2, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 different node and edge value pairs. [0059] III. Semantic Similarity Checking: a semantic similarity checking algorithm 416 is used to compare the extracted entities returned from the NER model to properties of the graph database model ( e.g., node types, node names,
edges, node properties, edge properties, unique identifiers, etc.) in a CyGraph knowledge base. This aspect of the process is included to keep the NLP and query generation process domain-agnostic.).
As to claim 26, Difonzo discloses the method of claim 20, wherein the system-role prompt component comprises one or more n-example relevant traversals, wherein the one or more n-example relevant traversals are generated by:
identifying a plurality of single-step traversals between node types in the type graph;
(Difonzo teaches path analysis/finding neighbors/shortest paths, i.e. “identifying a plurality of single-step traversals between node types in the type graph”
See [0009] In some embodiments, the computer-implemented method further comprises dynamically analyzing paths in a graph data model stored in the graph database to automatically determine a number of edge connections between a pair of the identified nodes
See also [0068] Betweenness centrality is a measure of centrality based on shortest paths between a group of interconnected nodes. Degree centrality is a measure of
centrality based simply on the number of edges that a node has. Additionally, in some instances, the graph analytic algorithm may determine intents for degree, neighbor, flow,
and clustering. The degree intent directly returns the number of edge connections a specific node has. The neighbor intent returns results that depict important nodes with respect to the
target node referenced in the user's request.)
and Naufel as modified discloses:
for each single-step traversal, generating an example traversal comprising a description of the respective single-step traversal;
(Naufel [0244] In some examples, processing circuitry 199 is further configured to execute a hybrid AI model that combines graph-based functionality and vector-based functionality
to increase data querying, data retrieval, and data visualization capabilities of the system.
see also [0245] In some examples, processing circuitry 199 is further configured to append attribute data to the new nodes within the graph database. In some examples, the attribute
data includes one or more of: metadata describing the data; context-specific information for the data; first vector embeddings representing semantic meaning of the data; and second
vector embeddings representing relation of a first data point within the data to a second one or more data points within the data stored to the graph database.)
embedding the example traversals in a vector database;
(Naufel [0245] In some examples, processing circuitry 199 is further configured to append attribute data to the new nodes within the graph database. In some examples, the attribute
data includes one or more of: metadata describing the data; context-specific information for the data; first vector embeddings representing semantic meaning of the data; and second
vector embeddings representing relation of a first data point within the data to a second one or more data points within the data stored to the graph database.;
see also [0246] In some examples, use of the semantic relationships encapsulated within the
first vector embeddings of the attribute data enable deeper analysis into the data and more comprehensive exploration of the data than without use of the semantic relationships.)
querying the vector database with the natural language user query; and
(Naufel [0247] In some examples, processing circuitry 199 is further configured to apply an enhanced Retrieval-Augmented Generation (RAG) process to dynamically query the data, retrieve the data, and visualize the data. In some examples, application of the enhanced RAG process increases both breadth and depth of data exploration and insights generation into the data stored within the graph database. In some examples, application of the enhanced
RAG process further increases functionality of the AI language model to address complex user queries requiring understanding of interplay between different data points in the graph database and the corresponding attributes of the different data points.)
receiving the one or more n-example relevant traversals, wherein the one or more n-example relevant traversals comprise one or more example traversals corresponding to the natural language user query
(Naufel [0180] Users can interact with GPT system 200 using natural language, allowing them to ask complex questions and explore relationships between data entities without the need for specialized technical knowledge. ; see also [0262] In some examples, processing circuitry 199 is further configured to return with the first output to the user device, second output including visualized data output in the form of one or more charts or graphs self-generated by the
AI language model.;
See also [0201] Another proof of concept test and evaluation for scientific research may be used in scientific research settings to analyze experimental data, literature, and findings from
various studies. Researchers can use natural language queries to explore connections between data points, identify trends, and generate hypotheses for further investigation.
[0202] Another proof of concept test and evaluation for supply chain management may be employed to analyze and visualize data related to inventory levels, supplier performance, and customer demand. Using natural language queries, supply chain managers can quickly access relevant information, identify inefficiencies, and make informed decisions to optimize processes.)
Referring to claim 47, this dependent claim recites similar limitations as claim 20;
therefore, the arguments above regarding claim 20 are also applicable to claim 47.
Referring to claim 48, this dependent claim recites similar limitations as claim 21;
therefore, the arguments above regarding claim 21 are also applicable to claim 48.
Referring to claim 49, this dependent claim recites similar limitations as claim 22;
therefore, the arguments above regarding claim 22 are also applicable to claim 49.
Referring to claim 50, this dependent claim recites similar limitations as claim 23;
therefore, the arguments above regarding claim 23 are also applicable to claim 50.
Referring to claim 51, this dependent claim recites similar limitations as claim 24;
therefore, the arguments above regarding claim 24 are also applicable to claim 51.
Referring to claim 52, this dependent claim recites similar limitations as claim 25;
therefore, the arguments above regarding claim 25 are also applicable to claim 52.
Referring to claim 53, this dependent claim recites similar limitations as claim 26;
therefore, the arguments above regarding claim 26 are also applicable to claim 53.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVAN S ASPINWALL whose telephone number is (571)270-7723. The examiner can normally be reached Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached at 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Evan Aspinwall/Primary Examiner, Art Unit 2152 1/27/2026