DETAILED ACTION
This Office Action is sent in response to the Applicant’s Communication received on 12/04/2025 for application number 17/930,288. The Office hereby acknowledges receipt of the following and placed of record in file: Specification, Drawings, Abstract, Oath/Declaration, IDS, and Claims.
Claims 4, 14, and 18 are canceled.
Claims 1-3, 5, 7, 8, 11-13, 15-17, 19 and 20 are amended.
Claims 1-3, 5-13, 15-17, 19, and 20 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
35 USC 112
In light of the amendments, the 35 USC 112 rejection of claim 7 has been overcome and therefore withdrawn.
35 USC 101
In light of the newly amended claims, the Examiner finds the Applicant’s arguments persuasive. Therefore, the 35 USC 101 rejection of the Office Action dated 09/04/2025 has been withdrawn.
35 USC 103
In light of the newly amended claims, the Applicant’s arguments have been considered but are moot as none of the prior art used in the Office Action dated 09/04/2025 applies to the newly amended limitations.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 5-13, 15-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kapanipathi et al. (Leveraging Abstract Meaning Representation for Knowledge Base Question Answering, published 2 Jun 2021), hereinafter Kapanipathi, in view of Kocijan et al. (Knowledge Base Completion Meets Transfer Learning, published 2021), hereinafter Kocijan,
Lewis et al. (BART: Denoising Sequence-to-Sequence Pre-training for Natural
Language Generation, Translation, and Comprehension, published 2019), hereinafter Lewis, Cheng et al. (CN113792541A, see attached translation), hereinafter Cheng, Chen et al. (US 20220318593 A1), hereinafter Chen, Le et al. (US 20230376841 A1), hereinafter Le, and Yin et al. (Neural Machine Translating from Natural Language to SPARQL, published 2019), hereinafter Yin.
Regarding claim 1, Kapanipathi teaches,
A computer-implemented method [Abstract, a modular KBQA system] comprising: improving knowledge base question answering (KBQA) model [Abstract, In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system] convergence [Sect 1, para 6, A pipeline-based modular approach that integrates multiple, reusable modules that are trained specifically for their individual tasks] and prediction performance [Sect 3.3, para 2, Using AMR and the path-based approach, NSQA was able to correctly predict the total number of constraints with comparable accuracies of 79% and 70% for single and two-hops, respectively],
Generating a query graph skeleton, utilizing a model, based on a received question [Sect 2, para 1, Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph], wherein the skeleton contains one or more placeholder nodes for entities, relations, and variables [Sect 2.2, para 6, AMR Graph G contains nodes that are concepts or PropBank predicates which can correspond to both entities and relationships. For example in Figure 1, produce-01, star-01, and Spain are nodes in the AMR graph; Sect 2.2.1, para 6, a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple];
partial relation linking each relation placeholder (Sect 2.2.1, para 5, AMR predicate) comprised within the generated skeleton to one or more respective relation surface forms (Sect 2.2.1, para 5, KG relationships) [Sect 2.2.1, para 6, NSQA uses SemREL… a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.];
responsive to a received natural language question (Sect 2, para 1, Given a question in natural language) for a target knowledge graph (KG) (Sect 2.2, para 1, underlying knowledge graph), producing one or more softly-tied query sketches (Sect 2.2, para 1, query graph) utilizing the generated skeleton (Sect 2.2.1, AMR graph) with the partial relation linking (Sect 2.2.1, AMR predicate) [Sect 2, para 1, Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph; (ii) transforms the AMR graph to a set of candidate KB-aligned logical queries, via a novel but simple graph transformation approach; Sect 2.2, para 1, The core contribution of this work is our next step where the AMR of the question is transformed to a query graph aligned with the underlying knowledge graph; Sect 2.2.1, para 6, NSQA uses SemREL… a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.];
producing one or more executable queries (Sect 3.1, SPARQL query) for each softly-tied query sketch [Sect 2.2.2, para 1, Our query graph can be directly translated to the WHERE clause of the SPARQL; Sect 3.1, Each question has an associated SPARQL query].
Kapanipathi does not teach improving by generalizing model based on transfer learning, one or more computer processors, and wherein model comprises an auto-regressive decoder with a casual self-attention mask and an encoder comprising a bi-direction transformer, wherein generating the query graph skeleton further comprises: tokenizing, by one or more computer processors, text associated with the received question using bi-directional encoder representations from the encoder; producing, by one or more computer processors, encoder hidden states for each token at each laver of the model and producing, by one or more computer processors, a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing, by one or more computer processors, the one or more produced executable queries against a knowledge base.
Kocijan teaches,
improving by generalizing model based on transfer learning [Abstract, we introduce the first approach for transfer of knowledge from one collection of facts to another without the need for entity or relation matching. The method works for both canonicalized knowledge bases and uncanonicalized or open knowledge bases].
Kocijan is analogous to the claimed invention as they both relate to NLP models. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Kocijan and provide generalizing a model based on transfer learning [Kocijan, Abstract] in order to improve predictions on structured data from a specific domain.
Kapanipathi-Kocijan teach the above limitations of claim 1 including generating the query graph skeleton and text associated with the received question (Kapanipathi, Sect 2, para 1).
Kapanipathi-Kocijan do not teach one or more computer processors and wherein model comprises an auto-regressive decoder with a casual self-attention mask and an encoder comprising a bi-direction transformer, wherein generating the query graph skeleton further comprises: tokenizing, by one or more computer processors, text associated with the received question using bi-directional encoder representations from the encoder; producing, by one or more computer processors, encoder hidden states for each token at each laver of the model and producing, by one or more computer processors, a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing, by one or more computer processors, the one or more produced executable queries against a knowledge base.
Lewis teaches,
wherein model comprises an auto-regressive decoder ([Sect 1, para 2, Auto-Regressive Transformers) with a casual self-attention mask and an encoder comprising a bi-direction transformer [Sect 1, para 2, In this paper, we present BART, which pre-trains a model combining Bidirectional and Auto-Regressive Transformers; Sect 3.4, para 3, In the first step, we freeze most of BART parameters and only update… the self-attention input projection matrix of BART’s encoder first layer],
tokenizing text using bi-directional encoder representations from the encoder [Sect 2, BART is a denoising autoencoder that maps a corrupted document to the original document it was derived from. It is implemented as a sequence-to-sequence model with a bidirectional encoder over corrupted text and a left-to-right autoregressive decoder; Sect 2.2, Token Masking Following BERT (Devlin et al., 2019), random tokens are sampled and replaced with [MASK] elements].
Lewis is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Lewis and provide a model with an auto-regressive with a casual self-attention mask and an encoder comprising a bi-direction transformer [Lewis, Abstract] in order to find the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Kapanipathi-Kocijan-Lewis do not teach one or more computer processors and producing, by one or more computer processors, encoder hidden states for each token at each laver of the model and producing, by one or more computer processors, a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing, by one or more computer processors, the one or more produced executable queries against a knowledge base.
Cheng teaches,
one or more computer processors [Para 0157, These computer program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, produce means for implementing the functions specified in one or more flowcharts and/or one or more block diagrams] and
producing, by one or more computer processors, encoder hidden states for each token at each laver of the model [Para 0034, The BERT model uses N Transformer Encoder layers to extract sentiment features related to the aspects to be analyzed. The input of the first Encoder layer is the output of the feature preprocessing layer H<sub>0</sub>={h<sub>1</sub>,h<sub>2</sub>,…,h<sub>n</sub>}. The input of each subsequent Encoder layer is the hidden state output of the previous Encoder layer].
Cheng is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Cheng and provide encoder hidden states for each token at each laver of the model in order to enable faster training and enhanced capture of long-distance dependencies.
Kapanipathi-Kocijan-Lewis-Cheng do not teach producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Chen teaches,
producing a distribution over skeleton output tokens (Para 0032, probability distribution over all tokens) utilizing the decoder (Para 0032, transformer layers; Para 0047, The transformer-based text generator network… includes a dynamic decoding… procedure) at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention [Para 0032, The text generator 330 is a stack of transformer layers. The input to the text generator 330 is a complete or partial sequence of tokens. At each position, the embedding of the input token is looked up and passed forward to the causal self-attention layer 214. The causal self-attention layer 214 transforms the embedding vector using a dynamically selected set of hidden states that are before the current position. The cross-attention layer 212 further applies a transformation using a dynamically selected set of time series encoder hidden states. A subsequent projection layer generates a probability distribution over all tokens in the vocabulary; Para 0047, The transformer-based text generator network 440 further includes a dynamic decoding and sampling procedure 448 for generating a diverse set of explanation texts].
Chen is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Chen and provide producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention in order to enable dynamic, context-aware sequence generation which significantly improves alignment, handles long sequences efficiently, and allows the decoder to select relevant input information at each step.
Kapanipathi-Kocijan-Lewis-Cheng-Chen do not teach wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Le teaches,
wherein the output of each decoding step is a SoftMax over a set of operators [Para 0030, the LM 110 may be pretrained with pretraining tasks similar t those used with CodeT5 like masked span prediction (MSP). While the MSP task benefits code understanding, they have a large discrepancy with program synthesis objectives. To mitigate this gap, a pretraining task of next-token prediction (NTP) may be used in pretraining the LM 110. Specifically, a pivot location is uniformly sampled for each code sample, and then the content preceding the pivot is passed to the encoder of LM 110 and remaining to the decoder of LM 110; Para 0031, After pretraining, the pretrained LM 110 may then be finetuned for specific program synthesis tasks. Following a sequence-to-sequence approach, a program synthesis training pair of a natural language problem description 105, which take a form of an input sequence D, and a corresponding solution code program 106 may be used to finetune the pretrained LM 110. In response to the input sequence D, the pretrained LM 110 may generate an output sequence of program Ŵ=(ŵ.sub.1, . . . , ŵ.sub.T), ŵ.sub.t∈V that can solve the problem. The output at each decoding step t is a distribution over the vocabulary V, computed by the softmax function ŵ.sub.t˜softmax (Linear (s.sub.t)) where st is the contextual hidden state at decoding step t].
Le is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Le and provide a SoftMax function in order to [Le, para 0031] finetune a language model for specific program synthesis tasks.
Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le do not teach executing the one or more produced executable queries against a knowledge base.
Yin teaches,
executing the one or more produced executable queries against a knowledge base [Sect 5.1, para 2, SPARQL queries that can be executed directly on a DBpedia endpoint. Sect 5.1, para 3, DBpedia subgraphs are extracted using a generic SPARQL query].
Yin is analogous to the claimed invention as they both relate to Natural Language Processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Yin and provide executing queries against a knowledge base [Yin, Abstract] to allow users to analyze model results without the user requiring an expertise on syntax or semantics.
Regarding claim 2, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claim 1 including improving the KBQA model convergence and prediction performance by generalizing the KBQA model based on transfer learning and one or more computer processors.
Kapanipathi further teaches,
Aligning (Sect 2.2, para 1, transformed) the one or more softly-tied query sketches (Sect 2.2, para 1, query graph) to the target KG [Sect 2.2, para 1, The core contribution of this work is our next step where the AMR of the question is transformed to a query graph aligned with the underlying knowledge graph].
Regarding claim 3, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claim 2 including the one or more executable queries (see claim 1).
Yin further teaches,
providing a result, associated with queries, to a user (Sect 5.1, para 2, endpoint; Abstract, web users) [Sect 5.1, para 2, SPARQL queries that can be executed directly on a DBpedia endpoint. Sect 5.1, para 3, DBpedia subgraphs are extracted using a generic SPARQL query.].
Yin is analogous to the claimed invention as they both relate to Natural Language Processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi and Kocijan’s teachings to incorporate the teachings of Yin and provide executing queries on KGs and providing the results to a user [Abstract] to allow average human web users to analyze model results without the user requiring an expertise on syntax or semantics.
Regarding claim 5, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claims 2 including the aligning the one or more softly-tied query sketches to the KG (see claim 2) and one or more computer processors (see claim 1).
Kapanipathi further teaches,
Linking one or more entities (Sect 2.2, para 3, E is the set of entities) to each entity placeholder node comprised within the generated skeleton (Sect 2.2, para 3, Ve is the set of AMR entity nodes);
and disambiguating one or more relations, in a textual form (Sect 2.2, para 3, Spain and Benicio Del Toro), and linking the one or more disambiguated relations (Sect 2.2, para 3, used BLINK… for disambiguation) to one or more KG relations (Sect 2.2, para 3, DBpedia entries, dbr:Spain and dbr:Benicio del toro) [Sect 2.2, para 3, the question in Figure 1 contains two entities Spain and Benicio Del Toro that need to be identified and linked to DBpedia entries, dbr:Spain and dbr:Benicio del toro. Linking these entities is absolutely necessary… To do so, we trained a BERT-based neural mention detection model and used BLINK… for disambiguation. The entities are linked to AMR nodes based on the AMR node-text alignment information. The linking is a bijective mapping from Ve → E where Ve is the set of AMR entity nodes, and E is the set of entities in the underlying KG.].
Regarding claim 6, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claims 2 including the one or more produced executable queries and one or more computer processors (see claim 1).
Kapanipathi teaches,
Selecting a highest ranked answer [Sect 2.2.1, para 7, we choose the highest-ranked valid query graph].
Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin do not teach executing queries against a knowledge base to retrieve one or more answers;
Yin further teaches,
Executing queries (Sect 5.1, para 2, SPARQL query) against a knowledge base (Sect 5.1, para 2, DBpedia) to retrieve one or more answers (Sect 5.1, para 2, retrieve a list of entities and their corresponding English labels) [given a template pair in Table 1, where <A> belongs to the class dbo:Monument in DBpedia, one can then retrieve a list of entities and their corresponding English labels to replace <A> by executing an assistant SPARQL query on a DBpedia endpoint.];
Yin is analogous to the claimed invention as they both relate to Natural Language Processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi and Kocijan’s teachings to incorporate the teachings of Yin and provide executing queries against a knowledge base to retrieve an answer [Abstract] to allow average human web users to analyze model results without the user requiring an expertise on syntax or semantics.
Regarding claim 7, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claims 1 including partial relation linking each relation placeholder comprised within the generated skeleton to the respective relation surface form and one or more computer processors (see claim 1).
Kapanipathi further teaches,
Identifying the KG relation (Sect 2.2.1, para 2, relevant binary edges) to replace the relation placeholder (Sect 2.2.1, para 3, predicated from AMR) [Sect 2.2.1, para 2, Selecting the shortest paths reduces the n-ary predicates of AMR graph to only the relevant binary edges; Sect 2.2.1, para 3, This is done by line 18 of Algorithm 1 where the eventual query graph Q will have one edge with merged predicated from AMR graph G between the non-predicates (AC)] in order to produce a correct semantic representation of the query (Sect 2.2.2, para 1, SPARQL query constructs) [Sect 2.2.2, para 1, Our query graph can be directly translated to the WHERE clause of the SPARQL. We use existential first order logic (FOL) as an intermediate representation, where the non-logical symbols consist of the binary relations and entities in the KB as well as some additional functions to represent SPARQL query constructs (e.g. COUNT).].
Regarding claim 8, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claims 5 including disambiguating one or more relations, in the textual form and linking the one or more disambiguated relations to the one or more KG relations (see claim 5) and one or more computer processors (see claim 1).
Kapanipathi further teaches,
Replacing every relation surface form with each possible KG relation [Sect 2.2.1, para 2, we have developed a path-based approach depicted in Algorithm 1 that shows the steps for transforming the AMR Graph G into Query Graph Q; Sect 2.2.1, para 6, a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.].
Regarding claim 9, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claims 5 including one or more computer processors (see claim 1).
Kapanipathi further teaches,
responsive [Sect 2, para 1, Given a question in natural language] to multiple entities, defining a position (Sect 2.2, para 3, bijective mapping) of a corresponding textual span as an alignment (Sect 2.2, para 3, AMR node-text alignment information) to a corresponding entity placeholder (Sect 2.1, para 3, Ve) [Sect 2.1, para 3, An advantage of transition-based systems is that they provide explicit question text to AMR node alignments; Sect 2.2, para 3, The entities are linked to AMR nodes based on the AMR node-text alignment information. The linking is a bijective mapping from Ve → E where Ve is the set of AMR entity nodes, and E is the set of entities in the underlying KG.].
Regarding claim 10, Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le-Yin teach the limitations of claim 5 including one or more computer processors (see claim 1).
Kocijan further teaches,
jointly optimizing (Section 2, para 2, These are then used as the input to the KBC algorithm of choice to predict their score (correctness), using the loss function) skeleton generation (Section 2, para 2, vh and vt) loss and partial relation (Section 2, para 2, vr) linking loss [Section 2, para 2, The model consists of two encoders, one for entities and one for relations, and a KBC model. Given a triple (h, r, t,) the entity encoder is used to map the head h and the tail t into their vector embeddings vh and vt, while the relation encoder is used to map the relation r into its vector embedding vr. These are then used as the input to the KBC algorithm of choice to predict their score (correctness), using the loss function, as defined by the KBC model].
Kocijan is analogous to the claimed invention as they both relate to NLP models. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Kocijan and provide a loss function in order to compare model output and improve results over a plurality of iterations.
Regarding claim 11, Kapanipathi teaches,
improving knowledge base question answering (KBQA) model [Abstract, In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system] convergence [Sect 1, para 6, A pipeline-based modular approach that integrates multiple, reusable modules that are trained specifically for their individual tasks] and prediction performance [Sect 3.3, para 2, Using AMR and the path-based approach, NSQA was able to correctly predict the total number of constraints with comparable accuracies of 79% and 70% for single and two-hops, respectively],
Generating a query graph skeleton, utilizing a model, based on a received question [Sect 2, para 1, Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph], wherein the skeleton contains one or more placeholder nodes for entities, relations, and variables [Sect 2.2, para 6, AMR Graph G contains nodes that are concepts or PropBank predicates which can correspond to both entities and relationships. For example in Figure 1, produce-01, star-01, and Spain are nodes in the AMR graph; Sect 2.2.1, para 6, a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple];
partial relation linking each relation placeholder (Sect 2.2.1, para 5, AMR predicate) comprised within the generated skeleton to one or more respective relation surface forms (Sect 2.2.1, para 5, KG relationships) [Sect 2.2.1, para 6, NSQA uses SemREL… a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.];
responsive to a received natural language question (Sect 2, para 1, Given a question in natural language) for a target knowledge graph (KG) (Sect 2.2, para 1, underlying knowledge graph), producing one or more softly-tied query sketches (Sect 2.2, para 1, query graph) utilizing the generated skeleton (Sect 2.2.1, AMR graph) with the partial relation linking (Sect 2.2.1, AMR predicate) [Sect 2, para 1, Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph; (ii) transforms the AMR graph to a set of candidate KB-aligned logical queries, via a novel but simple graph transformation approach; Sect 2.2, para 1, The core contribution of this work is our next step where the AMR of the question is transformed to a query graph aligned with the underlying knowledge graph; Sect 2.2.1, para 6, NSQA uses SemREL… a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.];
producing one or more executable queries (Sect 3.1, SPARQL query) for each softly-tied query sketch [Sect 2.2.2, para 1, Our query graph can be directly translated to the WHERE clause of the SPARQL; Sect 3.1, Each question has an associated SPARQL query].
Kapanipathi does not teach A computer program product comprising: one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to perform operations comprising: improving by generalizing model based on transfer learning, one or more computer processors, and wherein model comprises an auto-regressive decoder with a casual self-attention mask and an encoder comprising a bi-direction transformer, wherein generating the query graph skeleton further comprises: tokenizing text associated with the received question using bi-directional encoder representations from the encoder; producing encoder hidden states for each token at each laver of the model and producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Kocijan teaches,
improving by generalizing model based on transfer learning [Abstract, we introduce the first approach for transfer of knowledge from one collection of facts to another without the need for entity or relation matching. The method works for both canonicalized knowledge bases and uncanonicalized or open knowledge bases].
Kocijan is analogous to the claimed invention as they both relate to NLP models. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Kocijan and provide generalizing a model based on transfer learning [Kocijan, Abstract] in order to improve predictions on structured data from a specific domain.
Kapanipathi-Kocijan teach the above limitations of claim 11 including generating the query graph skeleton and text associated with the received question (Kapanipathi, Sect 2, para 1).
Kapanipathi-Kocijan do not teach A computer program product comprising: one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to perform operations comprising: wherein model comprises an auto-regressive decoder with a casual self-attention mask and an encoder comprising a bi-direction transformer, wherein generating the query graph skeleton further comprises: tokenizing text associated with the received question using bi-directional encoder representations from the encoder; producing encoder hidden states for each token at each laver of the model and producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Lewis teaches,
wherein model comprises an auto-regressive decoder ([Sect 1, para 2, Auto-Regressive Transformers) with a casual self-attention mask and an encoder comprising a bi-direction transformer [Sect 1, para 2, In this paper, we present BART, which pre-trains a model combining Bidirectional and Auto-Regressive Transformers; Sect 3.4, para 3, In the first step, we freeze most of BART parameters and only update… the self-attention input projection matrix of BART’s encoder first layer],
tokenizing text using bi-directional encoder representations from the encoder [Sect 2, BART is a denoising autoencoder that maps a corrupted document to the original document it was derived from. It is implemented as a sequence-to-sequence model with a bidirectional encoder over corrupted text and a left-to-right autoregressive decoder; Sect 2.2, Token Masking Following BERT (Devlin et al., 2019), random tokens are sampled and replaced with [MASK] elements].
Lewis is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Lewis and provide a model with an auto-regressive with a casual self-attention mask and an encoder comprising a bi-direction transformer [Lewis, Abstract] in order to find the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Kapanipathi-Kocijan-Lewis do not teach A computer program product comprising: one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to perform operations comprising: producing encoder hidden states for each token at each laver of the model and producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Cheng teaches,
A computer program product comprising: one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to perform operations comprising: [Para 0156, this application may take the form of a computer program product implemented on one or more computer-usable storage media… containing computer-usable program code.] and
producing encoder hidden states for each token at each laver of the model [Para 0034, The BERT model uses N Transformer Encoder layers to extract sentiment features related to the aspects to be analyzed. The input of the first Encoder layer is the output of the feature preprocessing layer H<sub>0</sub>={h<sub>1</sub>,h<sub>2</sub>,…,h<sub>n</sub>}. The input of each subsequent Encoder layer is the hidden state output of the previous Encoder layer].
Cheng is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Cheng and provide encoder hidden states for each token at each laver of the model in order to enable faster training and enhanced capture of long-distance dependencies.
Kapanipathi-Kocijan-Lewis-Cheng do not teach producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Chen teaches,
producing a distribution over skeleton output tokens (Para 0032, probability distribution over all tokens) utilizing the decoder (Para 0032, transformer layers; Para 0047, The transformer-based text generator network… includes a dynamic decoding… procedure) at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention [Para 0032, The text generator 330 is a stack of transformer layers. The input to the text generator 330 is a complete or partial sequence of tokens. At each position, the embedding of the input token is looked up and passed forward to the causal self-attention layer 214. The causal self-attention layer 214 transforms the embedding vector using a dynamically selected set of hidden states that are before the current position. The cross-attention layer 212 further applies a transformation using a dynamically selected set of time series encoder hidden states. A subsequent projection layer generates a probability distribution over all tokens in the vocabulary; Para 0047, The transformer-based text generator network 440 further includes a dynamic decoding and sampling procedure 448 for generating a diverse set of explanation texts].
Chen is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Chen and provide producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention in order to enable dynamic, context-aware sequence generation which significantly improves alignment, handles long sequences efficiently, and allows the decoder to select relevant input information at each step.
Kapanipathi-Kocijan-Lewis-Cheng-Chen do not teach wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Le teaches,
wherein the output of each decoding step is a SoftMax over a set of operators [Para 0030, the LM 110 may be pretrained with pretraining tasks similar t those used with CodeT5 like masked span prediction (MSP). While the MSP task benefits code understanding, they have a large discrepancy with program synthesis objectives. To mitigate this gap, a pretraining task of next-token prediction (NTP) may be used in pretraining the LM 110. Specifically, a pivot location is uniformly sampled for each code sample, and then the content preceding the pivot is passed to the encoder of LM 110 and remaining to the decoder of LM 110; Para 0031, After pretraining, the pretrained LM 110 may then be finetuned for specific program synthesis tasks. Following a sequence-to-sequence approach, a program synthesis training pair of a natural language problem description 105, which take a form of an input sequence D, and a corresponding solution code program 106 may be used to finetune the pretrained LM 110. In response to the input sequence D, the pretrained LM 110 may generate an output sequence of program Ŵ=(ŵ.sub.1, . . . , ŵ.sub.T), ŵ.sub.t∈V that can solve the problem. The output at each decoding step t is a distribution over the vocabulary V, computed by the softmax function ŵ.sub.t˜softmax (Linear (s.sub.t)) where st is the contextual hidden state at decoding step t].
Le is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Le and provide a SoftMax function in order to [Le, para 0031] finetune a language model for specific program synthesis tasks.
Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le do not teach executing the one or more produced executable queries against a knowledge base.
Yin teaches,
executing the one or more produced executable queries against a knowledge base [Sect 5.1, para 2, SPARQL queries that can be executed directly on a DBpedia endpoint. Sect 5.1, para 3, DBpedia subgraphs are extracted using a generic SPARQL query].
Yin is analogous to the claimed invention as they both relate to Natural Language Processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Yin and provide executing queries against a knowledge base [Yin, Abstract] to allow users to analyze model results without the user requiring an expertise on syntax or semantics.
Claim 12 is a computer program product claim that recites similar limitations to computer-implemented method claim 2. Therefore, claim 12 is rejected using the same rationale as claims 2.
Claim 13 is a computer program product claim that recites similar limitations to computer-implemented method claim 3. Therefore, claim 13 is rejected using the same rationale as claim 3.
Regarding claim 15, Kapanipathi teaches,
improving knowledge base question answering (KBQA) model [Abstract, In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system] convergence [Sect 1, para 6, A pipeline-based modular approach that integrates multiple, reusable modules that are trained specifically for their individual tasks] and prediction performance [Sect 3.3, para 2, Using AMR and the path-based approach, NSQA was able to correctly predict the total number of constraints with comparable accuracies of 79% and 70% for single and two-hops, respectively],
Generating a query graph skeleton, utilizing a model, based on a received question [Sect 2, para 1, Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph], wherein the skeleton contains one or more placeholder nodes for entities, relations, and variables [Sect 2.2, para 6, AMR Graph G contains nodes that are concepts or PropBank predicates which can correspond to both entities and relationships. For example in Figure 1, produce-01, star-01, and Spain are nodes in the AMR graph; Sect 2.2.1, para 6, a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple];
partial relation linking each relation placeholder (Sect 2.2.1, para 5, AMR predicate) comprised within the generated skeleton to one or more respective relation surface forms (Sect 2.2.1, para 5, KG relationships) [Sect 2.2.1, para 6, NSQA uses SemREL… a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.];
responsive to a received natural language question (Sect 2, para 1, Given a question in natural language) for a target knowledge graph (KG) (Sect 2.2, para 1, underlying knowledge graph), producing one or more softly-tied query sketches (Sect 2.2, para 1, query graph) utilizing the generated skeleton (Sect 2.2.1, AMR graph) with the partial relation linking (Sect 2.2.1, AMR predicate) [Sect 2, para 1, Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph; (ii) transforms the AMR graph to a set of candidate KB-aligned logical queries, via a novel but simple graph transformation approach; Sect 2.2, para 1, The core contribution of this work is our next step where the AMR of the question is transformed to a query graph aligned with the underlying knowledge graph; Sect 2.2.1, para 6, NSQA uses SemREL… a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple.];
producing one or more executable queries (Sect 3.1, SPARQL query) for each softly-tied query sketch [Sect 2.2.2, para 1, Our query graph can be directly translated to the WHERE clause of the SPARQL; Sect 3.1, Each question has an associated SPARQL query].
Kapanipathi does not teach A computer system comprising: a processor set; one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to cause the processor set to perform operations comprising: improving by generalizing model based on transfer learning, one or more computer processors, and wherein model comprises an auto-regressive decoder with a casual self-attention mask and an encoder comprising a bi-direction transformer, wherein generating the query graph skeleton further comprises: tokenizing text associated with the received question using bi-directional encoder representations from the encoder; producing encoder hidden states for each token at each laver of the model and producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Kocijan teaches,
improving by generalizing model based on transfer learning [Abstract, we introduce the first approach for transfer of knowledge from one collection of facts to another without the need for entity or relation matching. The method works for both canonicalized knowledge bases and uncanonicalized or open knowledge bases].
Kocijan is analogous to the claimed invention as they both relate to NLP models. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Kocijan and provide generalizing a model based on transfer learning [Kocijan, Abstract] in order to improve predictions on structured data from a specific domain.
Kapanipathi-Kocijan teach the above limitations of claim 15 including generating the query graph skeleton and text associated with the received question (Kapanipathi, Sect 2, para 1).
Kapanipathi-Kocijan do not teach A computer system comprising: a processor set; one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to cause the processor set to perform operations comprising: wherein model comprises an auto-regressive decoder with a casual self-attention mask and an encoder comprising a bi-direction transformer, wherein generating the query graph skeleton further comprises: tokenizing text associated with the received question using bi-directional encoder representations from the encoder; producing encoder hidden states for each token at each laver of the model and producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Lewis teaches,
wherein model comprises an auto-regressive decoder ([Sect 1, para 2, Auto-Regressive Transformers) with a casual self-attention mask and an encoder comprising a bi-direction transformer [Sect 1, para 2, In this paper, we present BART, which pre-trains a model combining Bidirectional and Auto-Regressive Transformers; Sect 3.4, para 3, In the first step, we freeze most of BART parameters and only update… the self-attention input projection matrix of BART’s encoder first layer],
tokenizing text using bi-directional encoder representations from the encoder [Sect 2, BART is a denoising autoencoder that maps a corrupted document to the original document it was derived from. It is implemented as a sequence-to-sequence model with a bidirectional encoder over corrupted text and a left-to-right autoregressive decoder; Sect 2.2, Token Masking Following BERT (Devlin et al., 2019), random tokens are sampled and replaced with [MASK] elements].
Lewis is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Lewis and provide a model with an auto-regressive with a casual self-attention mask and an encoder comprising a bi-direction transformer [Lewis, Abstract] in order to find the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Kapanipathi-Kocijan-Lewis do not teach A computer system comprising: a processor set; one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to cause the processor set to perform operations comprising: producing encoder hidden states for each token at each laver of the model and producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Cheng teaches,
A computer system comprising: a processor set; one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media to cause the processor set to perform operations comprising: [Para 0156, this application may take the form of a computer program product implemented on one or more computer-usable storage media… containing computer-usable program code; Para 0157, This application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application… These computer program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, produce means for implementing the functions specified]
producing encoder hidden states for each token at each laver of the model [Para 0034, The BERT model uses N Transformer Encoder layers to extract sentiment features related to the aspects to be analyzed. The input of the first Encoder layer is the output of the feature preprocessing layer H<sub>0</sub>={h<sub>1</sub>,h<sub>2</sub>,…,h<sub>n</sub>}. The input of each subsequent Encoder layer is the hidden state output of the previous Encoder layer].
Cheng is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Cheng and provide encoder hidden states for each token at each laver of the model in order to enable faster training and enhanced capture of long-distance dependencies.
Kapanipathi-Kocijan-Lewis-Cheng do not teach producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention, wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Chen teaches,
producing a distribution over skeleton output tokens (Para 0032, probability distribution over all tokens) utilizing the decoder (Para 0032, transformer layers; Para 0047, The transformer-based text generator network… includes a dynamic decoding… procedure) at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention [Para 0032, The text generator 330 is a stack of transformer layers. The input to the text generator 330 is a complete or partial sequence of tokens. At each position, the embedding of the input token is looked up and passed forward to the causal self-attention layer 214. The causal self-attention layer 214 transforms the embedding vector using a dynamically selected set of hidden states that are before the current position. The cross-attention layer 212 further applies a transformation using a dynamically selected set of time series encoder hidden states. A subsequent projection layer generates a probability distribution over all tokens in the vocabulary; Para 0047, The transformer-based text generator network 440 further includes a dynamic decoding and sampling procedure 448 for generating a diverse set of explanation texts].
Chen is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Chen and provide producing a distribution over skeleton output tokens utilizing the decoder at each time step to consider the encoder hidden states via cross-attention and previous decoder states via self- attention in order to enable dynamic, context-aware sequence generation which significantly improves alignment, handles long sequences efficiently, and allows the decoder to select relevant input information at each step.
Kapanipathi-Kocijan-Lewis-Cheng-Chen do not teach wherein the output of each decoding step is a SoftMax over a set of operators; and executing the one or more produced executable queries against a knowledge base.
Le teaches,
wherein the output of each decoding step is a SoftMax over a set of operators [Para 0030, the LM 110 may be pretrained with pretraining tasks similar t those used with CodeT5 like masked span prediction (MSP). While the MSP task benefits code understanding, they have a large discrepancy with program synthesis objectives. To mitigate this gap, a pretraining task of next-token prediction (NTP) may be used in pretraining the LM 110. Specifically, a pivot location is uniformly sampled for each code sample, and then the content preceding the pivot is passed to the encoder of LM 110 and remaining to the decoder of LM 110; Para 0031, After pretraining, the pretrained LM 110 may then be finetuned for specific program synthesis tasks. Following a sequence-to-sequence approach, a program synthesis training pair of a natural language problem description 105, which take a form of an input sequence D, and a corresponding solution code program 106 may be used to finetune the pretrained LM 110. In response to the input sequence D, the pretrained LM 110 may generate an output sequence of program Ŵ=(ŵ.sub.1, . . . , ŵ.sub.T), ŵ.sub.t∈V that can solve the problem. The output at each decoding step t is a distribution over the vocabulary V, computed by the softmax function ŵ.sub.t˜softmax (Linear (s.sub.t)) where st is the contextual hidden state at decoding step t].
Le is analogous to the claimed invention as they both relate to natural language processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Le and provide a SoftMax function in order to [Le, para 0031] finetune a language model for specific program synthesis tasks.
Kapanipathi-Kocijan-Lewis-Cheng-Chen-Le do not teach executing the one or more produced executable queries against a knowledge base.
Yin teaches,
executing the one or more produced executable queries against a knowledge base [Sect 5.1, para 2, SPARQL queries that can be executed directly on a DBpedia endpoint. Sect 5.1, para 3, DBpedia subgraphs are extracted using a generic SPARQL query].
Yin is analogous to the claimed invention as they both relate to Natural Language Processing. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kapanipathi’s teachings to incorporate the teachings of Yin and provide executing queries against a knowledge base [Yin, Abstract] to allow users to analyze model results without the user requiring an expertise on syntax or semantics.
Claims 16, 17, 19, and 20 are computer system claims that recite similar limitations to computer-implemented method claims 2, 3, 5, and 6, respectively. Therefore, claims 16, 17, 19, and 20 are rejected using the same rationale as claims 2, 3, 5, and 6, respectively.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED RAYHAN AHMED whose telephone number is (571)270-0286. The examiner can normally be reached Mon-Fri ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SYED RAYHAN AHMED/Examiner, Art Unit 2126
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126