Last updated: May 29, 2026
Application No. 18/161,260
ONTOLOGY-DRIVEN PARAMETER EFFICIENT REPRESENTATIONS FOR KNOWLEDGE GRAPHS

Non-Final OA §102§103
Filed
Jan 30, 2023
Examiner
SPRAUL III, VINCENT ANTON
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Accenture Global Solutions Limited
OA Round
2 (Non-Final)
This examiner grants 60% of cases after interview

— +26.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 37 resolved cases, 2023–2026
Examiner Intelligence

SPRAUL III, VINCENT ANTON View full profile →
Grants 60% of resolved cases
Career Allowance Rate
22 granted / 37 resolved
+4.5% vs TC avg
Strong +27% interview lift
Without
With
+26.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
19 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.8%
-37.2% vs TC avg
§103
93.8%
+53.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
1.7%
-38.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 37 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Regarding the previous objection to claims 6-10 due to an informality, amendments to the claims have overcome the objection, which is withdrawn.

Regarding the previous rejection of claim 19 under 35 U.S.C. 101 as being directed to non-statutory subject matter, amendments to the claim have overcome the rejection, which is withdrawn.

Examiner notes that Applicant states that in claim 1 and analogous claims 18-19, ‘’the triple’ has been deleted to obviate the repetition of phrase.“ However, no such deletion is seen in the claims. 

Regarding the rejection of claims under 35 U.S.C. 102(a)(1) and 35 U.S.C. 103, Applicant submits that Li does not teach every element of claim 1, in particular “retrieving, from an ontology lookup layer, ontology embeddings for the knowledge graph” and “updating, using a loss of the scored triple, the ontology embeddings stored in the ontology lookup layer.” Applicant summarizes:

“It is evident that Li merely retrieves embeddings/triples from the KG, and not from an ontology lookup layer for the KG, as claimed. 
Accordingly, Li in its entirety nowhere talks about retrieving, from an ontology lookup layer, ontology embeddings for the knowledge graph. Also, Li does not anticipate about updating, using a loss of the scored triple, the ontology embeddings stored in the ontology lookup layer.” (emphasis in original)

	Examiner respectfully disagrees. Neither the claim nor the specification provide a definition of “layer” or “ontology lookup layer.” In this context, then, an “ontology lookup layer” is any provider of “ontology embeddings for the knowledge graph.” Li, in Fig. 1, shows:

    PNG
    media_image1.png
    450
    936
    media_image1.png
    Greyscale

In this figure, the knowledge graphs, with the process of representation learning, form a mechanism that provides “ontology embeddings for the knowledge graph,” as described in Li, section 3.1, paragraph 2, “(iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples.” Therefore, the KGs with representation learning are acting as an “ontology lookup layer” as described in the claim.
	Further, as shown in Fig. 1, the method of Li includes a mechanism whereby the KGs are updated with augmented triples using the triple validation. Because the KGs are part of the mechanism that provides the ontology embeddings, updating the KGs is a form of “updating [… ]the ontology embeddings stored in the ontology lookup layer.” The triple validation process uses a scoring function with a normalized range for the candidate triples (Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1]”), hence, the updating process is “using a loss of the scored triple.”
	The arguments are therefore unpersuasive.	

Claim Objections
Claim 20 objected to because of the following informality. Claim 20 recites:

[…] wherein the set of entity types is provided to an embedding generation layer of the neural link predictor by an entity type lookup table, the entity type lookup table stores the set of entity types of each node in the knowledge graph

	The use of the comma after “table” is non-standard English. Examiner respectfully suggests considering if the claim should be amended to read “wherein the set of entity types is provided to an embedding generation layer of the neural link predictor by an entity type lookup tableand the entity type lookup table […]” or the equivalent. 

Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-2 and 13-17 rejected under 35 U.S.C. 102(a) (1) as being anticipated by Li et al., “Rule-based data augmentation for knowledge graph embedding,” 2021, https://doi.org/10.1016/j.aiopen.2021.09.003 (hereafter Li).

Regarding claim 1: 
	Li teaches:
“A computer implemented method for training a neural link predictor, the method comprising”: Li, section 3.1, paragraph 2, “To resolve the aforementioned challenges, we propose a simple but effective data augmentation framework, as illustrated in Fig. 1. It is an iterative progress, combining KG embedding learning and rule-based data augmentation together, and the two components promote  each other [A computer implemented method for training a neural link predictor]. On one hand, the augmented triples help make up the incomplete relational structures in KGs, and thus improve both the KG embedding learning and rule mining. On the other hand, the incrementally-trained and improved embeddings can help filter noisy data reasoned by rules.”
“obtaining a plurality of triples that represent a knowledge graph”: Li, section 3.1, paragraph 3, “We define a KG as a three-tuple (ℰ,ℛ,𝒯 ), where ℰ denotes the entity set and ℛ is the relation set. They have the size of m and n, respectively. 𝒯 denotes the initial observed triples of the KG [obtaining a plurality of triples that represent a knowledge graph]. We expect the size of the triple set grows during the iterations.”
“wherein each triple in the plurality of triples comprises data that specifies a subject node in the knowledge graph, an object node in the knowledge graph, and a relation type between the subject node and object node”: Li, section 1, paragraph 1, “A fact is represented in the form of a relational triple (head entity, relation, tail entity) such as (Washington, D.C., capitalOf, United States) [specifies a subject node in the knowledge graph, an object node in the knowledge graph, and a relation type between the subject node and object node].”
“obtaining data that specifies a set of entity types of nodes in the knowledge graph”: Li, section 3.1, paragraph 3, “We define a KG as a three-tuple (ℰ,ℛ,𝒯 ), where ℰ denotes the entity set [specifies a set of entity types of nodes in the knowledge graph] and ℛ is the relation set. They have the size of m and n, respectively. 𝒯 denotes the initial observed triples of the KG. We expect the size of the triple set grows during the iterations.”
“and for each triple: retrieving, from an ontology lookup layer, ontology embeddings for the knowledge graph, the ontology embeddings comprising embeddings for each entity type in the set of entity types”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples and filter noisy data based on a propagation mechanism to obtain the final augmented data”; Li, Fig. 1, 

    PNG
    media_image1.png
    450
    936
    media_image1.png
    Greyscale

[showing KGs, left, with representation learning, serving as an ontology lookup layer, ontology lookup layer interpreted as a component providing ontology embeddings; the layer producing ontology embeddings for the knowledge graph, the ontology embeddings comprising embeddings for each entity type in the set of entity types]
“generating, using the retrieved ontology embeddings for the knowledge graph, node embeddings for the subject node and the object node included in the triple”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples and filter noisy data based on a propagation mechanism to obtain the final augmented data [generating, using the retrieved ontology embeddings for the knowledge graph, node embeddings for the subject node and the object node included in the triple].”
“scoring, using the generated node embeddings for the subject node and the object node included in the triple, the triple”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples [scoring, using the generated node embeddings for the subject node and the object node included in the triple, the triple] and filter noisy data based on a propagation mechanism to obtain the final augmented data.”
“updating, using a loss of the scored triple, the ontology embeddings stored in the ontology lookup layer”: Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1]. Our goal is to obtain new triples with high confidence scores from 𝒞i. We first label the candidate triple where p(τ) < 0.5 as negatives which would be directly excluded. Then, for the triples whose initial scores are above 0.5, we further design a propagation mechanism to filter the potential noise. Note that we choose 0.5 as the label boundary to split the candidate triples in advance. This is because 0.5 naturally divides the candidate triples into two halves, which  guarantees that the number of potential positive candidate triples is neither too much, which would introduce much noise, nor too little, which would limit the diversity of the augmented triples. We also select other values during experiments and find that choosing 0.5 as the default value is a good trial [updating, using a loss of the scored triple, the ontology embeddings stored in the ontology lookup layer].”

Regarding claim 2: 
	Li teaches “The method of claim 1.”
	Li further teaches “wherein the ontology embeddings further comprise embeddings
for each relation type in the knowledge graph”: Li, Fig. 1, 

    PNG
    media_image1.png
    450
    936
    media_image1.png
    Greyscale

[showing that representation learning produces KG embeddings including relations (r1 through rn), hence, the ontology embeddings further comprise embeddings for each relation type in the knowledge graph].

Regarding claim 13: 
	Li teaches “The method of claim 1.”
	Li further teaches “further comprising using the trained neural link predictor for the knowledge graph completion tasks”: Li, section 4.1.15, paragraph 1, “Table 3 lists the results on DBP15K. We can see that our data augmentation framework brings improvement to MTransE, AlignE and AliNet on all three entity alignment settings [using the trained neural link predictor for the knowledge graph completion tasks]. For example, on JA-EN, the H@1 score of AlignE is raised from 0.448 to 0.490 by using the augmented triples for training. These results demonstrate the effectiveness of our data augmentation framework.”

Regarding claim 14: 
	Li teaches “The method of claim 13.”
Li further teaches:
“receiving, at inference time, a request to score a new knowledge graph triple, wherein the new knowledge graph triple comprises a subject node included in the plurality of triples and an object node included in the plurality of triples”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples [score a new knowledge graph triple] and filter noisy data based on a propagation mechanism to obtain the final augmented data”; Li, section 4.2, paragraphs 1-2, “Link prediction is the task of predicting the missing head or tail entity of the incomplete relational triples, for example, predicting h given (_, r, t) or predicting t given (h, r, _) [link prediction task results in a full h, r, t triple, where h and t are entities in triples in the graph, hence, wherein the new knowledge graph triple comprises a subject node included in the plurality of triples and an object node included in the plurality of triples] . We use four benchmark link prediction datasets FB15K, WN18 (Bordes et al., 2013), FB15K-237 (Toutanova et al., 2015) and WN18RR (Dettmers et al., 2018). FB15K and WN18 are subsets of Freebase and WordNet, respectively. But the two datasets were found to suffer from the test data leakage problem caused by inverse triples. Hence, two new versions FB15K-237 and WN18RR are created by excluding such inverse triples from the test data of FB15K and WN18, respectively. The dataset statistics are listed in Table 8. We reuse their training/validation/test [receiving, at inference time, a request] splits.”
“generating, using trained ontology embeddings included in the trained neural link predictor, node embeddings for the subject node and the object node included in the new knowledge graph triple”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples and filter noisy data based on a propagation mechanism to obtain the final augmented data [generating, using trained ontology embeddings included in the trained neural link predictor, node embeddings for the subject node and the object node included in the new knowledge graph triple].”
“scoring the new knowledge graph triple, wherein node embeddings used to score the new knowledge graph triple comprise the generated node embeddings for the subject node and the object node included in the new knowledge graph triple, the score indicating a likelihood that the new knowledge graph triple is factually correct”: Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1] [scoring the new knowledge graph triple, wherein node embeddings used to score the new knowledge graph triple comprise the generated node embeddings for the subject node and the object node included in the new knowledge graph triple, the score indicating a likelihood that the new knowledge graph triple is factually correct].”

Regarding claim 15: 
	Li teaches “The method of claim 13.”
Li further teaches:
“wherein using the trained neural link predictor for knowledge graph completion tasks comprises: receiving, at inference time, a request to score a new knowledge graph triple, wherein the new knowledge graph triple comprises a subject node or an object node not included in the plurality of triples”: Li, section 4.1, paragraphs 1-2, “Entity alignment aims at matching the counterpart entities that describe the same real-world identity of different KGs [receiving, at inference time, a request to score a new knowledge graph triple, wherein the new knowledge graph triple comprises a subject node or an object node not included in the plurality of triples]. The inference of entity alignment is based on the distances between entity embeddings. We adopt the widely-used dataset DBP15K (Sun et al., 2017) for evaluation. It was extracted from the multilingual DBpedia and contains three cross-lingual entity alignment settings, i.e., ZH-EN, JA-EN and FR-EN. Each setting consists of two KGs along with 15, 000 aligned entity pairs between them. The statistics of DBP15K are reported in Table 1. To ensure a fair comparison, we reuse the data splits of DBP15K in our experiment.”
“generating, using trained ontology embeddings included in the trained neural link predictor, node embeddings for the subject node and the object node included in the new knowledge graph triple”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples and filter noisy data based on a propagation mechanism to obtain the final augmented data [generating, using trained ontology embeddings included in the trained neural link predictor, node embeddings for the subject node and the object node included in the new knowledge graph triple].”
“scoring the new knowledge graph triple, wherein node embeddings used to score the new knowledge graph triple comprise the generated node embeddings for the subject node and the object node included in the new knowledge graph triple, the score indicating a likelihood that the new knowledge graph triple is factually correct”: Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1] [scoring the new knowledge graph triple, wherein node embeddings used to score the new knowledge graph triple comprise the generated node embeddings for the subject node and the object node included in the new knowledge graph triple, the score indicating a likelihood that the new knowledge graph triple is factually correct].”

Regarding claim 16: 
	Li teaches “The method of claim 13.”
Li further teaches:
“wherein using the trained neural link predictor for knowledge graph completion tasks comprises: receiving, at inference time, a request to score a triple from a new knowledge graph”: Li, section 4.1, paragraphs 1-2, “Entity alignment aims at matching the counterpart entities that describe the same real-world identity of different KGs [receiving, at inference time, a request to score a triple from a new knowledge graph]. The inference of entity alignment is based on the distances between entity embeddings. We adopt the widely-used dataset DBP15K (Sun et al., 2017) for evaluation. It was extracted from the multilingual DBpedia and contains three cross-lingual entity alignment settings, i.e., ZH-EN, JA-EN and FR-EN. Each setting consists of two KGs along with 15, 000 aligned entity pairs between them. The statistics of DBP15K are reported in Table 1. To ensure a fair comparison, we reuse the data splits of DBP15K in our experiment.”
“generating, using trained ontology embeddings included in the trained neural link predictor, node embeddings for a subject node and an object node included in the triple from the new knowledge graph”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples and filter noisy data based on a propagation mechanism to obtain the final augmented data [generating, using trained ontology embeddings included in the trained neural link predictor, node embeddings for a subject node and an object node included in the triple from the new knowledge graph].”
“scoring the triple from the new knowledge graph, wherein node embeddings for scoring the triple from the new knowledge graph comprise the generated node embeddings for the subject node and the object node included in the triple from the new knowledge graph, the score indicating a likelihood that the triple from the new knowledge graph is factually correct”: Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1] [scoring the triple from the new knowledge graph, wherein node embeddings for scoring the triple from the new knowledge graph comprise the generated node embeddings for the subject node and the object node included in the triple from the new knowledge graph, the score indicating a likelihood that the triple from the new knowledge graph is factually correct].”

Regarding claim 17: 
	Li teaches “The method of claim 16.”
Li further teaches “wherein the new knowledge graph comprises a same ontology as the knowledge graph represented by the plurality of triples”: Li, section 4.1, paragraphs 1-2, “Entity alignment aims at matching the counterpart entities that describe the same real-world identity [wherein the new knowledge graph comprises a same ontology as the knowledge graph represented by the plurality of triples] of different KGs. The inference of entity alignment is based on the distances between entity embeddings. We adopt the widely-used dataset DBP15K (Sun et al., 2017) for evaluation. It was extracted from the multilingual DBpedia and contains three cross-lingual entity alignment settings, i.e., ZH-EN, JA-EN and FR-EN. Each setting consists of two KGs along with 15, 000 aligned entity pairs between them. The statistics of DBP15K are reported in Table 1. To ensure a fair comparison, we reuse the data splits of DBP15K in our experiment.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 3 rejected under 35 U.S.C. 103 over Li in view of Bordes et al., “Translating Embeddings for Modeling Multi-relational Data,” 2013, Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (hereafter Bordes).
Li teaches “The method of claim 1.”
Li further teaches:
“generating, using the retrieved ontology embeddings for the knowledge graph, node embeddings for the subject node and the object node included in the corrupted triple”: Li, section 3.1, paragraph 2, “At each iteration, there are four stages: (i) rule mining phase, which infers rules automatically from the currently observed triples; (ii) rule grounding phase, which utilizes the mined rules and observed triples to generate candidate triples; (iii) (incremental) representation learning phase, which learns KG embeddings based on the current triples; (iv) triple validation phase, which leverages currently learned embeddings to score candidate triples and filter noisy data based on a propagation mechanism to obtain the final augmented data [generating, using the retrieved ontology embeddings for the knowledge graph, node embeddings for the subject node and the object node included in the triple].”
(bold only) “scoring the corrupted triple, wherein embeddings used to score the corrupted triple comprise the generated node embeddings for the subject node and the object node included in the corrupted triple”: Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1] [scoring the … triple, wherein embeddings used to score the … triple comprise the generated node embeddings for the subject node and the object node included in … triple].”
(bold only) “wherein updating the ontology embeddings comprises using a loss of the scored triple and a loss of the scored corrupted triple to update the ontology embeddings”: Li, section 3.2.3, paragraph 1, “We use the current KG embeddings to score all candidate triples. Let f(⋅) denotes the scoring function of a KG embedding model. The confidence score of a candidate relational triple τ = (h,r,t) ∈ 𝒞i, denoted as p(τ), is calculated as follows:


    PNG
    media_image2.png
    73
    520
    media_image2.png
    Greyscale

where we normalize f(τ) over all candidate triples, thus making p(τ) ∈ [0, 1]. Our goal is to obtain new triples with high confidence scores from 𝒞i. We first label the candidate triple where p(τ) < 0.5 as negatives which would be directly excluded. Then, for the triples whose initial scores are above 0.5, we further design a propagation mechanism to filter the potential noise. Note that we choose 0.5 as the label boundary to split the candidate triples in advance. This is because 0.5 naturally divides the candidate triples into two halves, which  guarantees that the number of potential positive candidate triples is neither too much, which would introduce much noise, nor too little, which would limit the diversity of the augmented triples. We also select other values during experiments and find that choosing 0.5 as the default value is a good trial [corrupted triples could be handled using the same scoring process to determine inclusion in the augment knowledge graph, hence, using a loss of the scored triple and a loss of the scored … triple to update the ontology embeddings].”
Li does not explicitly teach:
“further comprising, for each triple: randomly replacing either the subject node included in the triple or the object node included in the triple with a randomly selected node in the knowledge graph to generate a corrupted triple.”
(bold only) “scoring the corrupted triple, wherein embeddings used to score the corrupted triple comprise the generated node embeddings for the subject node and the object node included in the corrupted triple”
(bold only) “wherein updating the ontology embeddings comprises using a loss of the scored triple and a loss of the scored corrupted triple to update the ontology embeddings”
Bordes teaches “further comprising, for each triple: randomly replacing either the subject node included in the triple or the object node included in the triple with a randomly selected node in the knowledge graph to generate a corrupted triple,” (bold only) “scoring the corrupted triple, wherein embeddings used to score the corrupted triple comprise the generated node embeddings for the subject node and the object node included in the corrupted triple,” and (bold only) “wherein updating the ontology embeddings comprises using a loss of the scored triple and a loss of the scored corrupted triple to update the ontology embeddings”: Bordes, section 2, paragraph 3, “The set of corrupted triplets, constructed according to Equation 2, is composed of training triplets with either the head or tail replaced by a random entity (but not both at the same time) [randomly replacing either the subject node included in the triple or the object node included in the triple with a randomly selected node in the knowledge graph to generate a corrupted triple]. The loss function (1) favors lower values of the energy for training triplets than for corrupted triplets, and is thus a natural implementation of the intended criterion. Note that for a given entity, its embedding vector is the same when the entity appears as the head or as the tail of a triplet.”
Bordes and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the corrupted triples of Bordes with the teachings of Li to arrive at the present invention, in order to provide negative examples to improve training results, as stated in Bordes, section 2, paragraphs 1-3, “The basic idea behind our model is that the functional relation induced by the l-labeled edges corresponds to a translation of the embeddings, i.e. we want that h+ l ≈ t when (h, l,  t) holds (t should be a nearest neighbor of h+ l), while h+l should be far away from t otherwise […]The set of corrupted triplets, constructed according to Equation 2, is composed of training triplets with either the head or tail replaced by a random entity (but not both at the same time). The loss function (1) favors lower values of the energy for training triplets than for corrupted triplets, and is thus a natural implementation of the intended criterion.”

Claims 4-7 rejected under 35 U.S.C. 103 over Li in view of Geng et al., “Relational Message Passing for Fully Inductive Knowledge Graph Completion,” 2022, arXiv:2210.03994v2 (hereafter Geng).

Regarding claim 4:
Li teaches “The method of claim 1.”
Li does not explicitly teach “wherein generating node embeddings for the subject node and the object node included in the triple comprises: for a target node in the triple: identifying, from the plurality of triples, a k-hop subgraph of the knowledge graph, wherein the k-hop subgraph comprises k-hop neighbor nodes of the target node; and generating an embedding for the target node using ontology embeddings for entity types of the k-hop neighbor nodes of the target node.”
Geng teaches:
“wherein generating node embeddings for the subject node and the object node included in the triple comprises: for a target node in the triple: identifying, from the plurality of triples, a k-hop subgraph of the knowledge graph, wherein the k-hop subgraph comprises k-hop neighbor nodes of the target node”: Geng, Section II. B, paragraph 1, “The basic workflow of GraIL includes three steps. It first extracts the K-hop enclosing subgraph surrounding the target triple, denoted as G(u, rt, v) [: identifying, from the plurality of triples, a k-hop subgraph of the knowledge graph, wherein the k-hop subgraph comprises k-hop neighbor nodes of the target node]. It then annotates the entities in the extracted subgraph according to their shortest distances to u and v. Namely, for each entity i in the subgraph, it is labeled with a tuple (d(i, u); d(i, v)), where d(I, u) (resp. d(I, v))  denotes the shortest distance between i and u (resp. v) without counting any path through v (resp. u). It finally summarizes the subgraph through a GNN encoder and scores the likelihood of the target triple using the encoded subgraph representation and the target relation’s embedding.”
“and generating an embedding for the target node using ontology embeddings for entity types of the k-hop neighbor nodes of the target node”: Geng, section I, paragraph 5, “It then learns the embedding of an unseen relation from the relational subgraph, by the relational message passing network, where novel graph pruning and neighborhood attention techniques are developed for both efficiency and effectiveness, with new neighborhood aggregations being used for addressing the issue of empty subgraphs [generating an embedding for the target node using ontology embeddings for entity types of the k-hop neighbor nodes of the target node].”
Geng and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the use of subgraphs for embeddings of Geng with the teachings of Li to arrive at the present invention, in order to improve relational understanding, as stated in Geng, section II. B, paragraph 2, “The extracted enclosing subgraph are informative with some logical evidences (e.g., paths) that could help deduce the relation between two entities contained.”

Regarding claim 5:
Li as modified by Geng teaches “The method of claim 4.”
Geng further teaches “wherein generating an embedding for the target node using ontology embeddings for entity types of the k-hop neighbor nodes of the target node comprises: initializing, for each node in the k-hop subgraph and using the ontology embeddings for entity types of the k-hop neighbor nodes of the target node, an embedding for the node as equal to an embedding for the entity type of the node”: Geng, Section III. C, paragraph 2, “In view of the direction of message passing, we sample all the incoming neighbors for each node. Then, the message passing is conducted layer-by-layer in steps 4-8. In particular, at the k-th GNN layer with k ∈ {1, 2, …., K}, we compute the latent representations for only neighbor nodes that are within the (K – k) hops by aggregating the features of their directed neighbors (step 6) and combining the features of themselves [initializing, for each node in the k-hop subgraph and using the ontology embeddings for entity types of the k-hop neighbor nodes of the target node, an embedding for the node as equal to an embedding for the entity type of the node] to make an update (step 7), as these nodes will contribute their features to the representation learning of the target relation node in the future message passing. For example, when K = 3, the first layer computes latent features for nodes that are within 2 hops; while the last layer only aggregates neighborhood features for the root node. Finally, we take the embedding of the target relation output at the last layer, which have already fused the messages from its K-hop neighbors, to make a prediction. In this way, we extremely reduce the computation cost with a decreasing number of nodes that participate in the aggregation and are updated at each iteration.”
“and processing the initialized embeddings through multiple relational message passing layers and a readout layer to generate an embedding for the target node”: Geng, Section III. C, paragraph 2, “In view of the direction of message passing, we sample all the incoming neighbors for each node. Then, the message passing is conducted layer-by-layer in steps 4-8. In particular, at the k-th GNN layer [multiple relational message passing layers] with k ∈ {1, 2, …., K}, we compute the latent representations for only neighbor nodes that are within the (K – k) hops by aggregating the features of their directed neighbors (step 6) and combining the features of themselves to make an update (step 7), as these nodes will contribute their features to the representation learning of the target relation node in the future message passing. For example, when K = 3, the first layer computes latent features for nodes that are within 2 hops; while the last layer only aggregates neighborhood features for the root node. Finally, we take the embedding of the target relation output at the last layer [a readout layer to generate an embedding for the target node], which have already fused the messages from its K-hop neighbors, to make a prediction. In this way, we extremely reduce the computation cost with a decreasing number of nodes that participate in the aggregation and are updated at each iteration.”
	Geng and Li are combinable for the rationale given under claim 4.

Regarding claim 6:
Li as modified by Geng teaches “The method of claim 5.”
Geng further teaches:
“wherein processing the initialized embeddings through multiple relational message passing layers to generate an embedding for the target node comprises, foreach relational message passing layer: for each triple in the k-hop subgraph, applying a message function to embeddings for nodes included in the triple to generate respective messages”: Geng, section III. C, paragraph 1, “We next show how to perform message passing on R(G). Since R(G) is much larger and denser than the original graph G, conducting message passing over all relation nodes is more computational inefficiency, as theoretically proven by Wang et al. [37]. Motivated by the neighborhood sampling strategy used in GraphSAGE [18] — a GNN model for large scale graphs, which first samples a subset of neighbors for each graph node (up to depth K) and then performs message passing on this sampled graph to generate its embedding instead of considering the whole neighborhood. We also consider not updating all the graph nodes in one iteration. Considering that our ultimate goal is to iteratively pass (aggregate) the information for the target relation rt from its K-hop neighbors, we propose to (i) sample nodes on the graph according to whether they fall in the K-hop neighborhood of the target relation rt to generate a tree-like graph with the target relation as the root node, i.e., a target relation-guided graph pruning strategy, and (ii) aggregate and update only the representations that are necessary to satisfy the recursion at each depth [applying a message function to embeddings for nodes included in the triple to generate respective messages, interpreted as algorithmic processing of node embeddings to generate messages].”
“for each node in the k-hop subgraph, applying an aggregation function to aggregate messages that correspond to nodes in the k-hop subgraph that neighbor the node”: Geng, section III. C, paragraph 2, “Formally, the AGGREGATE function [applying an aggregation function to aggregate messages that correspond to nodes in the k-hop subgraph that neighbor the node] at the k-th GNN layer is defined as follows:

    PNG
    media_image3.png
    144
    439
    media_image3.png
    Greyscale

where Nri represents the directed incoming neighbors of node ri, hk Nri is the aggregated neighborhood vector; Neri denotes the neighbors under edge type e and Wke is the corresponding transformation matrix at the k-th layer; hk-1rj is the representation of a neighbor rj learned at previous iteration, while αkrtrj is its attention weight with regard to the target relation rt in the k-th layer, the value is computed by the similarity of the representations of rj and rt learned at previous layer, following the assumption that when the neighbor is more related to the target relation, their representations are more similar.”
“for each node in the k-hop subgraph, applying an update function to an embedding for the node using the aggregated messages to obtain an updated embedding for the node”: Geng, Section III. C, paragraph 4, “After aggregating the neighborhood features, we then combine ri’s representation hk-1ri obtained at previous layer with the aggregated vector to update its latent representation at the k-th layer. The combination is given by:

    PNG
    media_image4.png
    45
    339
    media_image4.png
    Greyscale
[ applying an update function to an embedding for the node using the aggregated messages to obtain an updated embedding for the node]”
Geng and Li are combinable for the rationale given under claim 4.

Regarding claim 7:
Li as modified by Geng teaches “The method of claim 6.”
Geng further teaches “further comprising: when the relational message passing layer is not a last relational message passing layer in the multiple relational message passing layers, providing the updated embeddings as input to a subsequent relational message passing layer, or when the relational message passing layer is a last relational message passing layer in the multiple relational message passing layers, applying a readout function to the updated embeddings to obtain the embedding for the target node”: Geng, Section III. C, paragraph 2, “In view of the direction of message passing, we sample all the incoming neighbors for each node. Then, the message passing is conducted layer-by-layer in steps 4-8. In particular, at the k-th GNN layer with k ∈ {1, 2, …., K}, we compute the latent representations for only neighbor nodes that are within the (K – k) hops by aggregating the features of their directed neighbors (step 6) and combining the features of themselves to make an update (step 7), as these nodes will contribute their features to the representation learning of the target relation node in the future message passing [when the relational message passing layer is not a last relational message passing layer in the multiple relational message passing layers, providing the updated embeddings as input to a subsequent relational message passing layer]. For example, when K = 3, the first layer computes latent features for nodes that are within 2 hops; while the last layer only aggregates neighborhood features for the root node. Finally, we take the embedding of the target relation output at the last layer, which have already fused the messages from its K-hop neighbors, to make a prediction [when the relational message passing layer is a last relational message passing layer in the multiple relational message passing layers, applying a readout function to the updated embeddings to obtain the embedding for the target node]. In this way, we extremely reduce the computation cost with a decreasing number of nodes that participate in the aggregation and are updated at each iteration.”
Geng and Li are combinable for the rationale given under claim 4.

Claim 8 rejected under 35 U.S.C. 103 over Li as modified by Geng in view of Guoliang Ji et al., “Knowledge Graph Embedding via Dynamic Mapping Matrix,” 2015, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (hereafter Guoliang).
Li as modified by Geng teaches “The method of claim 6.”
Geng further teaches (bold only) “wherein the message function is dependent on an entity type of the subject node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple”: Geng, section III. C, paragraph 1, “We next show how to perform message passing on R(G). Since R(G) is much larger and denser than the original graph G, conducting message passing over all relation nodes is more computational inefficiency, as theoretically proven by Wang et al. [37]. Motivated by the neighborhood sampling strategy used in GraphSAGE [18] — a GNN model for large scale graphs, which first samples a subset of neighbors for each graph node (up to depth K) and then performs message passing on this sampled graph to generate its embedding instead of considering the whole neighborhood. We also consider not updating all the graph nodes in one iteration. Considering that our ultimate goal is to iteratively pass (aggregate) the information for the target relation rt from its K-hop neighbors, we propose to (i) sample nodes on the graph according to whether they fall in the K-hop neighborhood of the target relation rt to generate a tree-like graph with the target relation as the root node, i.e., a target relation-guided graph pruning strategy, and (ii) aggregate and update only the representations that are necessary to satisfy the recursion at each depth [applying a message function, interpreted as algorithmic processing of node embeddings to generate messages].”
Geng and Li are combinable for the rationale given under claim 4.
Li as modified by Geng does not explicitly teach (bold only) “wherein the message function is dependent on an entity type of the subject node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple.”
Guoliang teaches (bold only) “wherein the message function is dependent on an entity type of the subject node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple”: Guoliang, section 3.1, paragraph 1, “Considering the diversity of relations, CTransR segments triplets of a specific relation |r| into several groups and learns a vector representation for each group. However, entities also have various types. Figure 2 shows several kinds of head and tail entities of relation location.location.partially_containedby in FB15k. In both TransH and TransR/CTransR, all types of entities share the same mapping vectors/matrices. However, different types of entities have different attributes and functions, it is insufficient to let them share the same transform parameters of a relation. And for a given relation, similar entities should have similar mapping matrices and otherwise for dissimilar entities. Furthermore, the mapping process is a transaction between entities and relations that both have various types. Therefore, we propose a more fine-grained model TransD, which considers different types of both entities and relations, to encode knowledge graphs into embedding vectors via dynamic mapping matrices produced by projection vectors [function is dependent on an entity type of the subject node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple].”
Guoliang and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the entity-type and relation-type dependent processing of Guoliang with the teachings of LI to arrive at the present invention, in order to better project data using the type information, as stated in Guoliang, section 3.1, paragraph 1, “However, different types of entities have different attributes and functions, it is insufficient to let them share the same transform parameters of a relation. And for a given relation, similar entities should have similar mapping matrices and otherwise for dissimilar entities.”

Claims 9-10 rejected under 35 U.S.C. 103 over Li as modified by Geng in view of Hu et al., “Heterogeneous Graph Transformer,” 2020, arXiv:2003.01332v1 (hereafter Hu).

Regarding claim 9:
Li as modified by Geng teaches “The method of claim 6.”
Geng further teaches “wherein the aggregation function comprises an attention mechanism”: Geng, section III. C, paragraph 2, “Formally, the AGGREGATE function at the k-th GNN layer is defined as follows:

    PNG
    media_image3.png
    144
    439
    media_image3.png
    Greyscale

where Nri represents the directed incoming neighbors of node ri, hk Nri is the aggregated neighborhood vector; Neri denotes the neighbors under edge type e and Wke is the corresponding transformation matrix at the k-th layer; hk-1rj is the representation of a neighbor rj learned at previous iteration, while αkrtrj is its attention weight [wherein the aggregation function comprises an attention mechanism] with regard to the target relation rt in the k-th layer, the value is computed by the similarity of the representations of rj and rt learned at previous layer, following the assumption that when the neighbor is more related to the target relation, their representations are more similar.”
Geng and Li are combinable for the rationale given under claim 4.
Li as modified by Geng does not explicitly teach “wherein the attention mechanism is dependent on an entity type of the subject  node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple.”
Hu teaches “wherein the attention mechanism is dependent on an entity type of the subject  node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple”: Hu, section 1, paragraph 5, “To handle graph heterogeneity, we introduce the node-and-edge type dependent attention mechanism [wherein the attention mechanism is dependent on an entity type of the subject  node included in the triple, an entity type of the object node included in the triple, and a relation type of the relation between the subject node and the object node included in the triple]. Instead of parameterizing each type of edges, the heterogeneous mutual attention in HGT is defined by breaking down each edge e = (s, t) based on its meta relation triplet, i.e., <node type of s, edge type of e between s, node type of t>. Figure 1 illustrates the meta relations of heterogeneous academic graphs. In specific, we use these meta relations to  parameterize the weight matrices for calculating attention over each edge. As a result, nodes and edges of different types are allowed to maintain their specific representation spaces. Meanwhile, connected nodes in different types can still interact, pass, and aggregate messages without being restricted by their distribution gaps. Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different types through message passing across layers, which can be regarded as ‘soft’ meta paths.”
Hu and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the type-specific attention mechanism of Hu with the teachings of Li to arrive at the present invention, in order to better encode nodes and edges of different types, as stated in Hu, section 1, paragraph 5, “In specific, we use these meta relations to  parameterize the weight matrices for calculating attention over each edge. As a result, nodes and edges of different types are allowed to maintain their specific representation spaces. Meanwhile, connected nodes in different types can still interact, pass, and aggregate messages without being restricted by their distribution gaps. Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different types through message passing across layers, which can be regarded as ‘soft’ meta paths.”

Regarding claim 10:
Li as modified by Geng teaches “The method of claim 6.”
Li as modified by Geng does not explicitly teach “wherein the update function is dependent on an entity type of the node of the embedding to be updated”
Hu teaches “wherein the update function is dependent on an entity type of the node of the embedding to be updated”: Hu, section 3.4, paragraph 1, “With the heterogeneous multi-head attention and message calculated, we need to aggregate them from the source nodes to the target node (See Figure 2 (3)). Note that the softmax procedure in Eq. 3 has made the sum of each target node t ’s attention vectors to one, we can thus simply use the attention vector as the weight to average the corresponding messages from the source nodes and get the updated vector H~(l)[t] as:

    PNG
    media_image5.png
    43
    460
    media_image5.png
    Greyscale

This aggregates information to the target node t from all its neighbors (source nodes) of different feature distributions. The final step is to map target node t ’s vector back to its type specific distribution [the update function is dependent on an entity type of the node of the embedding to be updated], indexed by its node type τ (t ). To do so, we apply a linear projection A-Linearτ (t ) to the updated vector H~(l)[t ],followed by residual connection [8] as: 

    PNG
    media_image6.png
    45
    417
    media_image6.png
    Greyscale

In this way, we get the l-th HGT layer’s output H(l)[t ] for the target node t . Due to the ‘small-world’ property of real-world graphs, stacking the HGT blocks for L layers (L being a small value) can enable each node reaching a large proportion of nodes—with different types and relations—in the full graph. That is, HGT generates a highly contextualized representation H(L) for each node, which can be fed into any models to conduct downstream heterogeneous network tasks, such as node classification and link prediction.”
Hu and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the update mechanism of Hu with the teachings of Li to arrive at the present invention, in order to better encode nodes and edges of different types, as stated in Hu, section 3.4, paragraph 1, “Due to the ‘small-world’ property of real-world graphs, stacking the HGT blocks for L layers (L being a small value) can enable each node reaching a large proportion of nodes—with different types and relations—in the full graph. That is, HGT generates a highly contextualized representation H(L) for each node, which can be fed into any models to conduct downstream heterogeneous network tasks, such as node classification and link prediction.”

Claims 11-12 rejected under 35 U.S.C. 103 over Li as modified by Geng in view of Alesiani et al., US Pre-Grant Publication No. 2023/0102602 (hereafter Alesiani).

Regarding claim 11:
Li as modified by Geng teaches “The method of claim 5.”
Geng further teaches (bold only) “further comprising updating, using the loss of the scored triples, parameters of the multiple relational message passing layers using backpropagation”: Geng, Section III. E, paragraph 2, “Following previous works, the whole model is trained [updating, …, parameters of the multiple relational message passing layers] by contrasting the scores of positive and negative triples using e.g. a margin-based ranking loss. In general, each existing triple in the given KGs is taken as a positive triple, while a negative triples is generated by replacing its head (or tail) with a uniformly sampled random entity. Accordingly, another enclosing subgraph is extracted for the negative target triple. The loss function is as [using the loss of the scored triples]:

    PNG
    media_image7.png
    58
    344
    media_image7.png
    Greyscale

Where Τ is the set of all triples in the training graph; pi and ni denote the positive and negative triples, respectively; γ is the margin hyperparameter, which is often a value greater than 0 to score positive triples higher than the negative ones. During prediction, for a testing triple, the same enclosing subgraph is extracted and used to estimate its plausibility”; Geng, Section III. C, paragraph 2, “In view of the direction of message passing, we sample all the incoming neighbors for each node. Then, the message passing is conducted layer-by-layer in steps 4-8. In particular, at the k-th GNN layer with k ∈ {1, 2, …., K}, we compute the latent representations for only neighbor nodes that are within the (K – k) hops by aggregating the features of their directed neighbors (step 6) and combining the features of themselves to make an update (step 7), as these nodes will contribute their features to the representation learning of the target relation node in the future message passing. For example, when K = 3, the first layer computes latent features for nodes that are within 2 hops; while the last layer only aggregates neighborhood features for the root node. Finally, we take the embedding of the target relation output at the last layer, which have already fused the messages from its K-hop neighbors, to make a prediction. In this way, we extremely reduce the computation cost with a decreasing number of nodes that participate in the aggregation and are updated at each iteration.”
	Geng and Li are combinable for the rationale given under claim 4.
Li as modified by Geng does not explicitly teach (bold only) “further comprising updating, using the loss of the scored triples, parameters of the multiple relational message passing layers using backpropagation.”
	Alesiani teaches (bold only) “further comprising updating, using the loss of the scored triples, parameters of the multiple relational message passing layers using backpropagation”: Alesiani, paragraph 0051, “Another approach leverages both of the above encoders by combining a fixed graph representation with a learned graph representation. For example, the fixed graph representation can be concatenated with the learn graph representation. While one may not back-propagate gradients to the fixed graph representation, one can still backpropagate gradients to the GNN to train it. For example, in one configuration, there may be two encoders: one is fixed (for the topology) and the gradient cannot be propagated and the second is a standard GNN where the gradient can be back propagated [using backpropagation].”
Alesiani and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the backpropagation training of Alesiani with the teachings of Li to arrive at the present invention, in order to utilize a standard algorithm to train the model, as stated in Alesiani, paragraph 0051, “While one may not back-propagate gradients to the fixed graph representation, one can still backpropagate gradients to the GNN to train it.”

Regarding claim 12:
Li teaches “The method of claim 1.”
Li does not explicitly teach “wherein updating, using a loss of the scored triples, the ontology embeddings stored in the ontology lookup layer comprises performing backpropagation of gradients of the loss of the scored triples.”
Geng teaches (bold only) “wherein updating, using a loss of the scored triples, the ontology embeddings stored in the ontology lookup layer comprises performing backpropagation of gradients of the loss of the scored triples”: Geng, Section III. E, paragraph 2, “Following previous works, the whole model is trained [the model being used to generate the ontology embeddings stored in the ontology lookup layer, hence, updating … the ontology embeddings stored in the ontology lookup layer] by contrasting the scores of positive and negative triples using e.g. a margin-based ranking loss. In general, each existing triple in the given KGs is taken as a positive triple, while a negative triples is generated by replacing its head (or tail) with a uniformly sampled random entity. Accordingly, another enclosing subgraph is extracted for the negative target triple. The loss function is as [using a loss of the scored triples]:

    PNG
    media_image7.png
    58
    344
    media_image7.png
    Greyscale

Where Τ is the set of all triples in the training graph; pi and ni denote the positive and negative triples, respectively; γ is the margin hyperparameter, which is often a value greater than 0 to score positive triples higher than the negative ones. During prediction, for a testing triple, the same enclosing subgraph is extracted and used to estimate its plausibility”; Geng, Section III. C, paragraph 2, “In view of the direction of message passing, we sample all the incoming neighbors for each node. Then, the message passing is conducted layer-by-layer in steps 4-8. In particular, at the k-th GNN layer with k ∈ {1, 2, …., K}, we compute the latent representations for only neighbor nodes that are within the (K – k) hops by aggregating the features of their directed neighbors (step 6) and combining the features of themselves to make an update (step 7), as these nodes will contribute their features to the representation learning of the target relation node in the future message passing. For example, when K = 3, the first layer computes latent features for nodes that are within 2 hops; while the last layer only aggregates neighborhood features for the root node. Finally, we take the embedding of the target relation output at the last layer, which have already fused the messages from its K-hop neighbors, to make a prediction. In this way, we extremely reduce the computation cost with a decreasing number of nodes that participate in the aggregation and are updated at each iteration.”
Geng and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the use of subgraphs for embeddings of Geng with the teachings of Li to arrive at the present invention, in order to improve relational understanding, as stated in Geng, section II. B, paragraph 2, “The extracted enclosing subgraph are informative with some logical evidences (e.g., paths) that could help deduce the relation between two entities contained.”
Alesiani teaches (bold only) “wherein updating, using a loss of the scored triples, the ontology embeddings stored in the ontology lookup layer comprises performing backpropagation of gradients of the loss of the scored triples”: Alesiani, paragraph 0051, “Another approach leverages both of the above encoders by combining a fixed graph representation with a learned graph representation. For example, the fixed graph representation can be concatenated with the learn graph representation. While one may not back-propagate gradients to the fixed graph representation, one can still backpropagate gradients to the GNN to train it. For example, in one configuration, there may be two encoders: one is fixed (for the topology) and the gradient cannot be propagated and the second is a standard GNN where the gradient can be back propagated [performing backpropagation ].”
Alesiani and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the backpropagation training of Alesiani with the teachings of Li to arrive at the present invention, in order to utilize a standard algorithm to train the model, as stated in Alesiani, paragraph 0051, “While one may not back-propagate gradients to the fixed graph representation, one can still backpropagate gradients to the GNN to train it.”

Claims 18-19 rejected under 35 U.S.C. 103 over Li in view of Alesiani.

Regarding claim 18:
Li teaches “The method of claim 1.”
	Claim 18 recites the same limitations of claim 1, except that in claim 18, the method steps are performed by a computing system: “A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations for training a neural link predictor, the  operations comprising:”
	Alesiani describes the use of such a system in the context of knowledge graph processing, in paragraphs 0130-0134: “Processing system [system] 2500 can be representative of each computing system disclosed herein.  Processors 2502 can include one or more distinct processors [one or more computers], each having one or more cores. […] Examples of memory 2504 include a non-transitory computer-readable media such as RAM, ROM, flash memory, EEPROM, any kind of optical storage disk such as a DVD, a Blu-Ray® disc, magnetic storage, holographic storage, a HDD, a SSD, any medium that can be used to store program code in the form of instructions or data structures, and the like [storage devices storing instructions that are operable, when executed by the one or more computers].”
Alesiani and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the computing system of Alesiani with the teachings of Li to arrive at the present invention, in order to implement the method on computing hardware, as stated in Alesiani, paragraph 0134, “Examples of memory 2504 include a non-transitory computer-readable media such as RAM, ROM, flash memory, EEPROM, any kind of optical storage disk such as a DVD, a Blu-Ray® disc, magnetic storage, holographic storage, a HDD, a SSD, any medium that can be used to store program code in the form of instructions or data structures, and the like.”

Regarding claim 19:
Li teaches “The method of claim 1.”
Claim 19 recites the same limitations of claim 1, except that in claim 19, the method steps are represented by instructions on a computer-readable storage medium: “A computer-readable storage medium comprising instructions stored thereon that are executable by a processing device and upon such execution cause the processing device to perform operations for training a neural link predictor, the operations comprising:.”
Alesiani describes the use of such a medium in the context of knowledge graph processing, in paragraph 0134: “Examples of memory 2504 include a non-transitory computer-readable media such as RAM, ROM, flash memory, EEPROM, any kind of optical storage disk such as a DVD, a Blu-Ray® disc, magnetic storage, holographic storage, a HDD, a SSD, any medium that can be used to store program code in the form of instructions or data structures, and the like [A computer-readable storage medium comprising instructions stored thereon that are executable by a processing device and upon such execution cause the processing device to perform operations].”
Alesiani and Li are analogous arts as they are both related to knowledge graphs. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the computer-readable medium of Alesiani with the teachings of Li to arrive at the present invention, in order to implement the method on transportable media, as stated in Alesiani, paragraph 0134, “Examples of memory 2504 include a non-transitory computer-readable media such as RAM, ROM, flash memory, EEPROM, any kind of optical storage disk such as a DVD, a Blu-Ray® disc, magnetic storage, holographic storage, a HDD, a SSD, any medium that can be used to store program code in the form of instructions or data structures, and the like.”

Claim 20 rejected under 35 U.S.C. 103 over Li in view of Lou et al., US Pre-Grant Publication No. 2015/0331877 (hereafter Lou).
	Li teaches “The method of claim 1.”
	Li further teaches (bold only) “wherein the set of entity types is provided to an embedding generation layer of the neural link predictor by an entity type lookup table, the entity type lookup table stores the set of entity types of each node in the knowledge graph”: Li, section 3.1, paragraph 3, “We define a KG as a three-tuple (ℰ,ℛ,𝒯 ), where ℰ denotes the entity set [hence, KGs are provided the set of entity types] and ℛ is the relation set. They have the size of m and n, respectively. 𝒯 denotes the initial observed triples of the KG. We expect the size of the triple set grows during the iterations”; Li, Fig. 1, 

    PNG
    media_image1.png
    450
    936
    media_image1.png
    Greyscale

[showing KGs, left, with representation learning, producing embeddings, hence serving as an embedding generation layer].
Li does not explicitly teach (bold only) “wherein the set of entity types is provided to an embedding generation layer of the neural link predictor by an entity type lookup table, the entity type lookup table stores the set of entity types of each node in the knowledge graph.”
Lou teaches (bold only) “wherein the set of entity types is provided to an embedding generation layer of the neural link predictor by an entity type lookup table, the entity type lookup table stores the set of entity types of each node in the knowledge graph”: Lou, paragraph 0027, “Table 130 is a data structure, which may include an array indexed by entity and entity type. For example, as illustrated, table 130 is indexed by entity references id1 and id2, and entity types T1, T2, and T3. The entries include target entity references having the type associated with the column that are nearest in distance or time, for example, to the entity reference associated with the row. In some implementations, table 130 is pregenerated and stored for subsequent access by processing element 100 to aid in addressing a compositional query such as, for example, query 110”; Lou, paragraph 0003, “In some implementations, a computer-implemented method comprises receiving a user input indicating a first entity type, a second entity type, and a relationship between a plurality of entity references of the first entity type and a plurality of entity references of the second entity type that defines a criterion. The computer-implemented method comprises identifying from a knowledge graph a plurality of pairs of entity references of the first entity type and the second entity type that meet the criterion [the entity type lookup table stores the set of entity types of each node in the knowledge graph].”
Lou and Li are analogous arts as they are both related to knowledge graph processing. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the entity type table of Lou with the teachings of Li to arrive at the present invention, in order to simplify references to entity types, as stated in Lou, paragraph 0027, “In some implementations, table 130 is pregenerated and stored for subsequent access by processing element 100 to aid in addressing a compositional query such as, for example, query 110.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wewer et al., “Updating Embeddings for Dynamic Knowledge Graphs,” 2021, arXiv:2109.10896v1, discloses a method of updating the embeddings for a dynamic knowledge graph performing link prediction, using various scoring functions.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT SPRAUL whose telephone number is (703) 756-1511. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VAS/Examiner, Art Unit 2129                                                                                                                                                                                                       
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jan 30, 2023
Application Filed
Nov 17, 2025
Non-Final Rejection mailed — §102, §103
Jan 15, 2026
Response Filed
Mar 02, 2026
Final Rejection mailed — §102, §103
Apr 28, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/163,396
Patent 12619905
FLEXIBLE EMBEDDING SYSTEMS AND METHODS FOR REAL-TIME COMPARISONS
5y 3m to grant Granted May 05, 2026
17/557,599
Patent 12608446
DETERMINING PERFORMANCE CHANGE WITHIN A DATASET WITH AN APPLIED CONDITION USING MACHINE LEARNING MODELS
4y 4m to grant Granted Apr 21, 2026
17/163,383
Patent 12591634
COMPOSITE EMBEDDING SYSTEMS AND METHODS FOR MULTI-LEVEL GRANULARITY SIMILARITY RELEVANCE SCORING
5y 2m to grant Granted Mar 31, 2026
17/249,028
Patent 12591796
INTELLIGENT DISTANCE PROMPTING
5y 1m to grant Granted Mar 31, 2026
17/353,931
Patent 12572620
RELIABLE INFERENCE OF A MACHINE LEARNING MODEL
4y 8m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
60%
Grant Probability
86%
With Interview (+26.7%)
4y 4m (~1y 0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 37 resolved cases by this examiner. Grant probability derived from career allowance rate.