Prosecution Insights
Last updated: April 19, 2026
Application No. 17/643,501

SYSTEM AND METHOD FOR END-TO-END NEURAL ENTITY LINKING

Non-Final OA §103§112
Filed
Dec 09, 2021
Examiner
PHAM, JESSICA THUY
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Jpmorgan Chase Bank N A
OA Round
3 (Non-Final)
33%
Grant Probability
At Risk
3-4
OA Rounds
3y 3m
To Grant
0%
With Interview

Examiner Intelligence

Grants only 33% of cases
33%
Career Allow Rate
1 granted / 3 resolved
-21.7% vs TC avg
Minimal -33% lift
Without
With
+-33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
38 currently pending
Career history
41
Total Applications
across all art units

Statute-Specific Performance

§101
26.8%
-13.2% vs TC avg
§103
35.5%
-4.5% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
22.7%
-17.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases

Office Action

§103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/24/2025 has been entered. Response to Amendments Claims 1, 8, and 15 were amended. Claims 1, 3-5, 8, 10-12, 15, and 17-19 are pending and examined herein. Claims 1, 3-5, 8, 10-12, 15, and 17-19 are rejected under 35 U.S.C. 112(a). Claims 1, 3-5, 8, 10-12, 15, and 17-19 are rejected under 35 U.S.C. 112(b). Claims 1, 3-5, 8, 10-12, 15, and 17-19 are rejected under 35 U.S.C. 103. Response to Arguments Applicant’s arguments, see pages 11-12, filed 11/24/2025, with respect to the 35 U.S.C. 112(a) rejection of claims 1, 3-5, 8, 10-12, 15, and 17-19 have been fully considered and are persuasive. The 35 U.S.C. 112(a) rejection of claims 1, 3-5, 8, 10-12, 15, and 17-19 has been withdrawn. Applicant’s arguments, see pages 12-13, filed 11/24/2025, with respect to the 35 U.S.C. 112(b) rejection of claims 1, 3-5, 8, 10-12, 15, and 17-19 have been fully considered but are not persuasive. Although Applicant has stated that the Examiner’s interpretation of “linear layer to learn character patterns” is correct, the claim has not been amended to rectify the indefiniteness issue. The remainder of the 35 U.S.C. 112(b) issues presented in the previous office action have been rectified. See below 35 U.S.C. 112(b) rejection for further analysis. Applicant’s arguments, see pages 13-17, filed 11/24/2025, with respect to the rejection(s) of claim(s) 1, 3-5, 8, 10-12, 15, and 17-19 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Liu (US 2021/0383069 A1), Moreno (“Combining Word and Entity Embeddings for Entity Linking”, 2017), Mondal (“Medical Entity Linking using Triplet Network”, 2019), Joshi (“Compromised Tweet Detection Using Siamese Networks and fastText Representations”, 2019), Zhang (“Multi-view Knowledge Graph Embedding for Entity Alignment”, 2019), and Cheng (“Entity Relationship Extraction Based on Bi-channel Neural Network”, October 2020). Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1, 3-5, 8, 10-12, 15, and 17-19 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claims 1, 8, and 15 recite the limitations "implementing a function that utilizes the embeddings of the entities to generate a semantic distance score;" and "computing, by the EENELM, a similarity score between the mention embedding vector and a pre-trained entity embedding vector based on Euclidean distance to measure semantic closeness between the mention and the entity;". The claim seems to state that these are two separate steps. However, the specification, see page 32 and Fig. 6, shows only one semantic distance score. As it is unclear as to whether these two limitations refer to the same step or not, the claims are rendered indefinite. Additionally, as the specification does not provide support for calculating the semantic distance score twice, this introduces new matter. For purposes of examination, the two limitations will be interpreted as referring to the same step. Dependent claims 3-5, 10-12 and 17-19 fail to resolve these issues and are rejected with the same rationales. The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1, 3-5, 8, 10-12, 15, and 17-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claims 1, 8, and 15 recite the limitation “reduced parameters”. This is indefinite as it is unclear as to what the limitation is being compared to. In other words, it is unclear which claim element has the parameters that “reduced parameters” refers to. One of ordinary skill in the art would not be able to ascertain what the limitation is referring to. For purposes of examination, this limitation will be interpreted as “where in the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase.” Claims 1, 8, and 15 recite the limitations “two identical linear layers with shared weights”. This is indefinite as it is unclear if one of the layers in the “two identical linear layers with shared weights” is the “linear layer to learn character patterns” in the limitations starting with “deploying” in each of the claims. For purposes of examination, the “linear layer to learn character patterns” will be interpreted as one of the two linear layers in the “two identical linear layers with shared weights”. Claims 1, 8, and 15 recite the limitations "implementing a function that utilizes the embeddings of the entities to generate a semantic distance score;" and "computing, by the EENELM, a similarity score between the mention embedding vector and a pre-trained entity embedding vector based on Euclidean distance to measure semantic closeness between the mention and the entity;". The claim seems to state that these are two separate steps. However, the specification, see page 32 and Fig. 6, shows only one semantic distance score. As it is unclear as to whether these two limitations refer to the same step or not, the claims are rendered indefinite. Additionally, the specification does not provide support for calculating the semantic distance score twice, and this is also indefinite as it introduces new matter. For purposes of examination, the two limitations will be interpreted as referring to the same step. Dependent claims 3-5, 10-12 and 17-19 fail to resolve these issues and are rejected with the same rationales. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claim(s) 1, 3-6, 8, 10-13, 15, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu (US 2021/0383069 A1), Moreno (“Combining Word and Entity Embeddings for Entity Linking”, 2017), Mondal (“Medical Entity Linking using Triplet Network”, 2019), Joshi (“Compromised Tweet Detection Using Siamese Networks and fastText Representations”, 2019), Zhang (“Multi-view Knowledge Graph Embedding for Entity Alignment”, 2019), and Cheng (“Entity Relationship Extraction Based on Bi-channel Neural Network”, October 2020). Regarding claim 1, Liu teaches A method for end-to-end neural entity linking by utilizing one or more processors and one or more memories, the method comprising: ([0006] states "In a first aspect, an embodiment of the present disclosure provides a method for linking an entity”. [0124] states "The electronic device performing the method for linking an entity may further include: an input apparatus 1003 and an output apparatus 1004. The processor 1001, the memory 1002, the input apparatus 1003, and the output apparatus 1004 may be connected through a bus or in other methods".) detecting all named entity mentions from a [data source]; ([0037] states "In the present embodiment, an executing body may perform various processing on the target text to determine the at least one entity mention included in the target text." The target text is interpreted as the data source. Further, [0037] states "Alternatively, the executing body may perform named entity recognition on the target text, and use obtained named entities as the entity mentions.") computing, in response to detecting entity mentions, embeddings of entities in a knowledge [base] by implementing context information and; ([0038] states "The executing body may be connected to at least one preset knowledge base, and the knowledge base includes rich text semantic information." Further, [0038] states "Here, the candidate entity is an entity that exists in the knowledge base and is associated with the entity mention." [0040] states "In the present embodiment, the executing body may input each candidate entity into a pre-trained entity embedding vector determination model to obtain the embedding vector (embedding) of each candidate entity." [0050] states "A first embedding vector may be obtained using the first vector discrimination model. The first vector model may learn the relationship between the entity and the semantically related word in the description text of the entity, and make a distance between the obtained first embedding vector of the entity and a vector of the semantically related word closer. In this way, the first embedding vector contains semantic information of the entity, which may be used to improve the accuracy of the entity linking. The relationship between the entities may be learned using the second vector determination model. A second embedding vector obtained using the second vector determination model contains the relationship information between entities." As the second embedding vector is obtained using relationship information between entities, it is interpreted as implementing context information. [0062] states "After obtaining the first embedding vector and the second embedding vector, the executing body may perform second embedding vector, the executing body may perform fusion or concatenating or other processing on the first embedding vector and the second embedding vector to obtain the embedding vector.") deploying … a machine learning model to match character and semantic information of the named entity mentions and entities in the knowledge [base] and ([0087 states "In the present embodiment, the executing body may input the embedding vector of each entity mention, the context semantic information of the target text and the type information of each entity mention into the learning to rank (LTR) model to obtain the ranking of each candidate entity corresponding to each entity mention. The executing body may use the candidate entity in the first place in the corresponding ranking of each entity mention as the entity linking result of the entity mention." The LTR model is interpreted as the machine learning model. [0068] states "Simply put, the Attention mechanism is to quickly filter high-value information from a large amount of information." Therefore, the attention mechanism is interpreted as the wide part of the model. [0076] states "The executing body may input the target text into a word vector determination model to determine the word vector sequence." [0077] states "The executing body may input the occluded target text into the pre-trained language model to obtain the type information of the entity mention. The pre-trained language model may be Bert (Bidirectional Encoder Representation from Transformers, bidirectional Transformer encoder), Ernie (Ernie is build based on Baidu’s deep learning framework paddlepaddle and so on." Thus, the model that determines type information of each entity mention is interpreted as the deep part of the model. [0069] states "The Attention mechanism uses parameter matrix A to learn the word vector sequence and the embedding vector of each candidate entity to obtain a vector." [0066] states "The executing body may input the target text into a word vector determination model to determine the word vector sequence." [0066] further states "The word vector determination model may be char2vec." According to Hussain, “Char2Vec: Learning the Semantic Embedding of Rare and Unseen Words in the Biomedical Literature”, 2018, “Char2Vec (C2V) is a character-level semantic indexing model”. As the attention mechanism uses the word vector sequence and the word vector sequence is determined by a word determination model which may be char2vec, which is a character-level semantic indexing model, the attention mechanism is interpreted as the character information part of the matching model. [0077] states "In this way, the pre-trained language model may reinforcement-learn the context information of the target text, that is, learn the relationship between the nearest neighboring vocabulary of the occluded entity mention and the occluded entity mention." Therefore, the model that determines the type information of each entity mention is interpreted as the context information part of the matching model.) linking, in response to deployment of the wide and deep learning model, the named mentions in text with corresponding entities in the knowledge [base]. ([0087] states, in reference to the LTR model, "The executing body may use the candidate entity in the first place in the corresponding ranking of each entity mention as the entity linking result of the entity mention.") wherein the machine learning model is a wide and deep learning model and ([0068] states "Simply put, the Attention mechanism is to quickly filter high-value information from a large amount of information." Therefore, the attention mechanism is interpreted as the wide part of the model. [0076] states "The executing body may input the target text into a word vector determination model to determine the word vector sequence." [0077] states "The executing body may input the occluded target text into the pre-trained language model to obtain the type information of the entity mention. The pre-trained language model may be Bert (Bidirectional Encoder Representation from Transformers, bidirectional Transformer encoder), Ernie (Ernie is build based on Baidu’s deep learning framework paddlepaddle and so on." Thus, the model that determines type information of each entity mention is interpreted as the deep part of the model.) generating, by an end-to-end neural entity linking module (EENELM), a mention embedding vector for each entity mention by: ([0036] states "Step 202, determining at least one entity mention included in the target text and a candidate entity corresponding to each entity mention" [0040] states "In the present embodiment, the executing body may input each candidate entity into a pre-trained entity embedding vector determination model to obtain the embedding vector (embedding) of each candidate entity. The entity embedding vector determination model is used to represent a corresponding relationship between the candidate entity and the embedding vector. The entity embedding vector determination model may be a plurality of existing language models, for example, Bert (Bidirectional Encoder Representation from Transformers, two-way Transformer encoder), Ernie (Ernie is built based on Baidu's deep learning frame work paddlepaddle), and so on. The embedding vector is a vector representation of the candidate entity, which includes semantic information of the entity mention." Therefore, an embedding vector is generated for each entity mention. [0122] states "The memory 1002, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for linking an entity in embodiments of the present disclosure (for example, the target text acquisition unit 901, the candidate entity determination unit 902, the embedding vector determination unit 903, the context determination unit 904, the type information determination unit 905 and the entity linking unit 906 as shown in FIG. 9). The processor 1001 executes the non-transitory software programs, instructions, and modules stored in the memory 1002 to execute various functional applications and data processing of the server, that is, to implement the method for linking an entity in the foregoing method embodiments." The program instructions/modules are interpreted as the end-to-end neural entity linking module (EENELM).) computing, by the EENELM, a similarity score between the mention embedding vector and a pre-trained entity embedding vector … to measure semantic closeness between the mention and the entity. ([0040] states "In the present embodiment, the executing body may input each candidate entity into a pre-trained entity embedding vector determination model to obtain the embed ding vector (embedding) of each candidate entity. The entity embedding vector determination model is used to represent a corresponding relationship between the candidate entity and the embedding vector. The entity embedding vector determination model may be a plurality of existing language models, for example, Bert (Bidirectional Encoder Representation from Transformers, two-way Transformer encoder), Ernie (Ernie is built based on Baidu's deep learning frame work paddlepaddle), and so on. The embedding vector is a vector representation of the candidate entity, which includes semantic information of the entity mention." Therefore, the candidate entity embedding vector is a pre-trained entity embedding vector. [0091] states "In the present embodiment, for each entity mention, the executing body may further concatenate the context semantic information, the embedding vector of the entity mention, and the type information of the entity mention to obtain a vector representation of the entity mention, then calculate a distance between the vector representation and a vector of each candidate entity. Here, the distance is used to indicate the similarity between the entity mention and each candidate entity. Then, the candidate entity having the highest similarity is used as the entity linking result of the entity mention." Therefore, the distance, a similarity score, is determined between the mention embedding vector and the pre-trained entity embedding vector. The similarity score measures semantic closeness as it is a distance measure that involves embeddings with semantic significance, such as the type information.) Liu does not appear to explicitly teach [embeddings of entities in a] knowledge graph a plurality of data sources [computing embeddings by implementing] a margin-based loss function validating the embeddings of entities; [deploying a machine learning model] in response to validating the embeddings of entities applies a linear layer to learn character patterns; and implementing two identical linear layers with shared weights implementing a distance algorithm to estimate similarities of the two identical linear layers with the shared weights and generating a syntax distance score; implementing a function that utilizes the embeddings of the entities to generate a semantic distance score; combining the syntax distance score and the semantic distance score to generate a contrastive loss; applying a first long short-term memory (LSTM) neural network to embed words within a first context window including left n words of the mention, applying a second LSTM neural network to embed words within a second context window including right n words of the mention, applying an attention layer to weight outputs of the first and second LSTM networks based on contextual importance of the words, concatenating left and right context representations and processing the concatenated representation through a fully connected feed-forward layer to obtain a mention embedding vector; [computing a similarity score between embeddings] based on Euclidean distance wherein the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with reduced parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase However, Moreno—directed to analogous art—teaches [embeddings of entities in a] knowledge graph (Section 2 states "The use of embeddings for existing semantic networks has previously been studied by [2]. Representing knowledge information with embed dings consists in transforming each piece of information of the KB – usually represented by triples (head, relationship, tail) – into low dimensional vectors." As would be known by one of ordinary skill in the art, a knowledge graph is defined by nodes representing concepts/entities and edges representing relationships between concepts/entities. Additionally, it would be known that a semantic network is also defined by nodes representing concepts/entities and edges representing relationships between the concepts/entities. Additionally, triples are a common implementation of a knowledge graph, so one of ordinary skill in the art would be able to reason that the knowledge base, in this case, is a knowledge graph. Additionally, Section 2 states "This technique manages to mix the knowledge and text, which results in a unique representation space for words, entities and relations." One of ordinary skill in the art would be able to reason that the entities are from the knowledge graph.) validating the embeddings of entities; (Section 5.2 discusses evaluation of the embeddings. The end of Section 5.2 states "Further experiments are performed with the EAT relaxed parameters version, e.g., our EAT implementation based on Gensim in order to have the maximum number of words and entities represented in the joint space." This is interpreted as validating the entity embeddings because, based on the result of the embedding strategy, an entity embedding implementation was chosen.) [deploying a machine learning model] in response to validating the embeddings of entities (Section 5.4 is about the “Evaluation of Candidate Entity Generation”. According to Section 4.2 on page 344, the candidate entities are generated using a classifier, which is a machine learning model. As stated above in regards to the last limitation, experiments after validating the entity embeddings are based on a result of the embedding evaluation. Therefore, the machine learning model is deployed in response to validating the entity embeddings.) It would have been obvious for one of ordinary skill in the art before the effective filing date of the application to combine the teachings of Liu with the validation of embeddings and the entities in knowledge graphs taught by Moreno because, as Moreno states in Section 5.2, "to evaluate the quality of the obtained vectors for words and entities", and because, as stated by Moreno in Section 2, "Representing knowledge information with embeddings consists in transforming each piece of information of the KB – usually represented by triples (head, relationship, tail) – into low dimensional vectors." From that, a common knowledge base implementation using knowledge graphs would be obvious to one of ordinary skill in the art. The combination of Liu and Moreno do not appear to explicitly teach a plurality of data sources [computing embeddings by implementing] a margin-based loss function applies a linear layer to learn character patterns; and implementing two identical linear layers with shared weights implementing a distance algorithm to estimate similarities of the two identical linear layers with the shared weights and generating a syntax distance score; implementing a neural network architecture that utilizes the embeddings of the entities to generate a semantic distance score; combining the syntax distance score and the semantic distance score to generate a contrastive loss; applying a first long short-term memory (LSTM) neural network to embed words within a first context window including left n words of the mention, applying a second LSTM neural network to embed words within a second context window including right n words of the mention, applying an attention layer to weight outputs of the first and second LSTM networks based on contextual importance of the words, concatenating left and right context representations and processing the concatenated representation through a fully connected feed-forward layer to obtain a mention embedding vector; [computing a similarity score between embeddings] based on Euclidean distance wherein the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with reduced parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase However, Mondal—directed to analogous art—teaches a plurality of data sources (Section 2 describes the dataset that will be used for the project. It states that "The NCBI disease corpus (Dogan et al., 2014) contains 792 Pubmed abstracts with disorder concepts manually annotated." The Pubmed abstracts are interpreted as the plurality of data sources.) [computing embeddings by implementing] a margin-based loss function (Section 3.3.2 contains the loss function “ L = m a x ⁡ ( d p - d n i +   α ,   0 ) " . Additionally, it states "Another variable α , a hyperparameter is added to the loss equation which defines how far away the dissimilarities should be." A person of ordinary skill in the art would recognize this as a margin.) contrastive loss (One of ordinary skill in the art would realize that the margin-based loss function is a contrastive loss.) It would have been obvious for one of ordinary skill in the art before the effective filing date of the application to combine the teachings of Liu and Moreno with the teachings of Mondal because, as stated by Mondal in Section 5.1 "In Example 1, the disease mention “inherited neurodegeneration” was not mapped with “heredodegenerative disorders” (D020271) by the existing methods, because of their incapability to capture the semantic similarity. In contrast to this, our system obtains additional semantic and syntactic information from the domain-specific sub word embeddings and thereby maps to the correct concept ID." The increase in accuracy would render the combination obvious. The combination of Liu, Moreno, and Mondal do not appear to explicitly teach applies a linear layer to learn character patterns; and implementing two identical linear layers with shared weights implementing a distance algorithm to estimate similarities of the two identical linear layers with the shared weights and generating a syntax distance score; implementing a neural network architecture that utilizes the embeddings of the entities to generate a semantic distance score; combining the syntax distance score and the semantic distance score to generate a contrastive loss; applying a first long short-term memory (LSTM) neural network to embed words within a first context window including left n words of the mention, applying a second LSTM neural network to embed words within a second context window including right n words of the mention, applying an attention layer to weight outputs of the first and second LSTM networks based on contextual importance of the words, concatenating left and right context representations and processing the concatenated representation through a fully connected feed-forward layer to obtain a mention embedding vector; [computing a similarity score between embeddings] based on Euclidean distance wherein the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with reduced parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase However, Joshi—directed to analogous art—teaches applies a linear layer to learn character patterns; and (Page 3 states "For feature extraction, we combine the word, character and flexible n-grams in a single stack and consider this as one input. In comparison to this, we also consider just the tf-idf weighted word embedding feature set as the input." Therefore, the patterns learned by the model are character patterns. Page 3 further states "Fig 1 shows the architecture of the Siamese Network containing two branches of a MLP. The MLP has four fully connected layers, where three of them have 128 units with Tanh activation function, while the last layer having 512 units." One of ordinary skill in the art would recognize a fully connected layer is a linear layer. Therefore, the linear layer is used to learn character patterns.) implementing two identical linear layers with shared weights (Page 3 states "Fig 1 shows the architecture of the Siamese Network containing two branches of a MLP. The MLP has four fully connected layers, where three of them have 128 units with Tanh activation function, while the last layer having 512 units." The fully connected/linear layers are inside of the Siamese network. As one of ordinary skill in the art would understand that corresponding layers within the subnetworks of the Siamese network are identical. Fig. 1 shows the architecture. The two first fully-connected layers are interpreted as the two identical linear layers. Page 3 states "The weights for both branches are shared to learn a common representation of tweets." As the two are in branches, they share weights.) implementing a distance algorithm to estimate similarities of the two identical linear layers with the shared weights and generating a syntax distance score; (Page 3 states "To this end, a representation of the user’s style of writing is created by passing all tweets of a user into a MultiLayer Perceptron (MLP). Subsequently, the tweet whose author is to be determined is passed on to another MLP. The euclidean distance between the representation from both of these MLPs is calculated. During training, this signal is 1, if it is from the same user; and is 0, if it is from different users. During testing, we observe the euclidean distance and categorize it as the same user, if the value is greater than 0.5; and as a different user, if the value is less than 0.5." The Euclidean distance is interpreted as the distance algorithm. As the distance algorithm is used to determine if the tweets are from the same user, the distance algorithm estimates similarities of the output of the Siamese branches. As the branches each include an identical linear layer, the distance algorithm estimates similarities between them. The result of the Euclidean distance algorithm is interpreted as the syntax distance score.) [computing a similarity score between embeddings] based on Euclidean distance (Page 3 states "To this end, a representation of the user’s style of writing is created by passing all tweets of a user into a MultiLayer Perceptron (MLP). Subsequently, the tweet whose author is to be determined is passed on to another MLP. The euclidean distance between the representation from both of these MLPs is calculated. During training, this signal is 1, if it is from the same user; and is 0, if it is from different users. During testing, we observe the euclidean distance and categorize it as the same user, if the value is greater than 0.5; and as a different user, if the value is less than 0.5." The Euclidean distance is interpreted as the distance algorithm. As the distance algorithm is used to determine if the tweets are from the same user, the distance algorithm computes a similarity score between embeddings based on Euclidean distance.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Liu, Moreno, and Mondal with the teachings of Joshi because, as Joshi states on page 1, "Latent similarity between two documents is the semantic closeness between them based on the context. In this paper, we employed Siamese Networks [8] to explore the learning of latent representations for analyzing tweeted messages. As Horiguchi et al. [9] stated, latent representations or metric based features perform well when there is less data, as in the case here." The combination of Liu, Moreno, Mondal, and Joshi does not appear to explicitly teach implementing a neural network architecture that utilizes the embeddings of the entities to generate a semantic distance score; combining the syntax distance score and the semantic distance score to generate a contrastive loss; applying a first long short-term memory (LSTM) neural network to embed words within a first context window including left n words of the mention, applying a second LSTM neural network to embed words within a second context window including right n words of the mention, applying an attention layer to weight outputs of the first and second LSTM networks based on contextual importance of the words, concatenating left and right context representations and processing the concatenated representation through a fully connected feed-forward layer to obtain a mention embedding vector; wherein the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with reduced parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase However, Zhang—directed to analogous art—teaches implementing a neural network architecture that utilizes the embeddings of the entities to generate a semantic distance score; (Page 3 states "The relation view characterizes the structure of KGs, in which entities are linked by relations. To preserve such relational structures, we adopt TransE [Bordes et al. 2013], to interpret a relation as a translation vector from its head entity to tail entity. Given a relation fact in KGs, we use the following ( h ;   r ;   t ) score function to measure the plausibility of the embeddings:". Page 4 states "We consider two kinds of similarity: name similarity based on literal embeddings and semantic similarity based on relation embeddings. We combine them as a weighted sum: sim r , r ^ = α 1   cos ϕ name r ,   ϕ n a m e r ^ + α 2 cos ⁡ r ,   r ^ ,   (15)”. Therefore, α 2 cos ⁡ r ,   r ^ is a semantic distance score.) combining the syntax distance score and the semantic distance score to generate a … loss; (Page 5 states "We first train literal entity name embedding based on pre-train word embeddings [Mikolov et al., 2018] and character embeddings. Thus, we can directly obtain entity name embeddings. Then, we train (and combine, if in-training combination is used) embeddings from other views and perform the cross-KG entity, relation and attribute identity inference alternately." Page 4 states "We consider two kinds of similarity: name similarity based on literal embeddings and semantic similarity based on relation embeddings. We combine them as a weighted sum: sim r , r ^ = α 1   cos ϕ name r ,   ϕ n a m e r ^ + α 2 cos ⁡ r ,   r ^ ,   (15)”. Therefore, as the embedding was based on character embeddings, α 1   cos ϕ name r ,   ϕ n a m e r ^ is the syntax distance score, which are combined. Page 4 further states "We regard this similarity as a smooth coefficient to reduce the negative effect of inaccurate alignment and combine it into the loss of cross-KG relation identity inference:". It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Liu, Moreno, Mondal, and Joshi with the teachings of Zhang because as Zhang states on pages 2-3, "So entity embeddings can be learned from each particular view and jointly optimized to improve the alignment performance." The combination of Liu, Moreno, Mondal, Joshi, and Zhang does not appear to explicitly teach applying a first long short-term memory (LSTM) neural network to embed words within a first context window including left n words of the mention, applying a second LSTM neural network to embed words within a second context window including right n words of the mention, applying an attention layer to weight outputs of the first and second LSTM networks based on contextual importance of the words, concatenating left and right context representations and processing the concatenated representation through a fully connected feed-forward layer to obtain a mention embedding vector; wherein the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with reduced parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase However, Cheng—directed to analogous art—teaches applying a first long short-term memory (LSTM) neural network to embed words within a first context window including left n words of the mention; (Page 350 states "The bi-directional long-short term memory network processes sequence data in two directions. It consists of two sub-networks, one is the forward long-short term memory network, and the other is the backward long-short term memory network. As shown in Figure 2, the input sequence ( x 0 ,   x 1 , x 2 , … ,   x i ) , the forward long-short term memory network A 0 → A 1 → A 2 → … → A i participates in the forward calculation, and the input value of this sub-network at time t is the output data A t - 1 at time t and the sequence value x t at time t ; the backward long-short term memory network A i ' → … → A 2 ' → A 1 ' → A 0 ' backward calculation, the input value of the sub-network at time t is the sequence value x t at time t and the output value of t + 1   : A t + 1 ' ; the final output at time t is composed of A t - 1 and A t + 1 ' jointly decided." Therefore, the forward LSTM is interpreted as the first LSTM neural network. Page 351 states "The input layer is a sentence containing n words, in which e 1 entities and e 2 are marked, and the word vector sequence is S = ( w 1 . w 2 , … ,   e 1 ,   … ,   e 2 ,   … ,   w n ) using a bi-directional long-short term memory network. Encoding the sentence, the hidden information obtained is H = ( h 1 , h 2 ,   … ,   e 1 ,   … ,   e 2 , … ,   w n ) and then all the hidden information H is input to the multi-head attention module." As the sentence is encoded, the words within the context window (the sentence) are embedded. As the first LSTM calculates in the forward direction, the left n words of the mention are included in the context window.) applying a second LSTM neural network to embed words within a second context window including right n words of the mention; (Page 350 states "The bi-directional long-short term memory network processes sequence data in two directions. It consists of two sub-networks, one is the forward long-short term memory network, and the other is the backward long-short term memory network. As shown in Figure 2, the input sequence ( x 0 ,   x 1 , x 2 , … ,   x i ) , the forward long-short term memory network A 0 → A 1 → A 2 → … → A i participates in the forward calculation, and the input value of this sub-network at time t is the output data A t - 1 at time t and the sequence value x t at time t ; the backward long-short term memory network A i ' → … → A 2 ' → A 1 ' → A 0 ' backward calculation, the input value of the sub-network at time t is the sequence value x t at time t and the output value of t + 1   : A t + 1 ' ; the final output at time t is composed of A t - 1 and A t + 1 ' jointly decided." Therefore, the backward LSTM is interpreted as the second LSTM neural network. Page 351 states "The input layer is a sentence containing n words, in which e 1 entities and e 2 are marked, and the word vector sequence is S = ( w 1 . w 2 , … ,   e 1 ,   … ,   e 2 ,   … ,   w n ) using a bi-directional long-short term memory network. Encoding the sentence, the hidden information obtained is H = ( h 1 , h 2 ,   … ,   e 1 ,   … ,   e 2 , … ,   w n ) and then all the hidden information H is input to the multi-head attention module." As the sentence is encoded, the words within the context window (the sentence) are embedded. As the first LSTM calculates in the backward direction, the right n words of the mention are included in the context window.) applying an attention layer to weight outputs of the first and second LSTM networks based on contextual importance of the words; (Page 351 states “Encoding the sentence, the hidden information obtained is H = ( h 1 , h 2 ,   … ,   e 1 ,   … ,   e 2 , … ,   w n ) and then all the hidden information H is input to the multi-head attention module." Figure 3 shows that the outputs of the bi-LSTM, which consists of the first and second LSTM networks are input into the multi-head attention. Page 350 states "The multi-head attention mechanism divides the model into multiple heads and multiple subspaces to focus on different information." As the input of the attention mechanism is the embedded words, the information that is focused on is the words. As one of ordinary skill in the art would understand, the contextual importance of the words is learned by the attention mechanism.) concatenating left and right context representations and processing the concatenated representation through a fully connected feed-forward layer to obtain a mention embedding vector; (Page 351 states “Encoding the sentence, the hidden information obtained is H = ( h 1 , h 2 ,   … ,   e 1 ,   … ,   e 2 , … ,   w n ) and then all the hidden information H is input to the multi-head attention module. The result obtained is finally input into the fully connected layer to obtain the extracted feature representation." The extracted feature representation is interpreted as the mention embedding vector. Figure 3 shows the architecture, where the information flows through one direction into the dense/fully-connected layer. Thus, the fully-connected layer is feed-forward. Page 350 states “As shown in Figure 2, the input sequence ( x 0 ,   x 1 , x 2 , … ,   x i ) , the forward long-short term memory network A 0 → A 1 → A 2 → … → A i participates in the forward calculation, and the input value of this sub-network at time t is the output data A t - 1 at time t and the sequence value x t at time t ; the backward long-short term memory network A i ' → … → A 2 ' → A 1 ' → A 0 ' backward calculation, the input value of the sub-network at time t is the sequence value x t at time t and the output value of t + 1   : A t + 1 ' ; the final output at time t is composed of A t - 1 and A t + 1 ' jointly decided." As the output of the bi-LSTM is jointly decided by the left representation A t - 1 and right representation A t + 1 ' , the output of the bi-LSTM and input of the fully connected feedforward layer is the concatenated representations.) wherein the wide and deep learning model, through the first and second LSTM neural networks, generates targeted embeddings with reduced parameters, thereby improving performance of the wide and deep learning model both in training phase and inference phase. (Page 350 states "The bi-directional long-short term memory network is an extension of the long-short term memory network (LSTM). With faster and more sufficient learning ability, the basis of the network model used in this article relies on the bi-directional long-short term memory network, and its structure is shown in Figure 1." The parameters calculated are shown in Equations 4, 5, and 6. The model is interpreted as the wide and deep learning model.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Liu, Moreno, Mondal, Joshi, and Zhang with the teachings of Cheng because, as Cheng states on page 352, "In the case that the convolutional neural network and the long short time memory network have been also used, the long short time memory network after the attention mechanism is introduced in this paper makes the effect of the model slightly higher than that of other models. The reason may be that the feature values of the output of the long short time memory network have been partially strengthened by the attention mechanism." Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, the combination of Liu and Moreno does not appear to teach implementing a triplet loss model to generate the embeddings of entities from pre-trained word embedding models. However, Mondal—directed to analogous art—teaches (Figure 2 shows the architecture of the triplet loss model, which has as input, a positive candidate, a mention, and a negative candidate. Section 3.2.2 on page 98 states "We use 200-dimensional word2vec (Mikolov et al., 2013) embeddings trained on Wikipedia and Pubmed PMC-Corpus (Pyysalo et al., 2013) as in put to Conv." Word2vec is interpreted as the pre-trained model.) It would have been obvious for one of ordinary skill in the art before the effective filing date of the application to combine the teachings of Liu and Moreno with the teachings of Mondal for the reasons given above in regards to claim 1. Regarding claim 4, the rejection of claim 1 is incorporated herein. Further, Liu teaches embedding the entity mentions into vectors; and ([0085] states "After obtaining the embedding vector of each target text and the type information of each entity mention, the executing body may obtain the entity linking result through step 8061, or obtain the entity linking result through steps 8062 and 8072, or obtain the entity linking result through steps 8063 and 8073." [0090] states "In the present embodiment, for each entity mention, the executing body may further concatenate the context semantic information, the embedding vector of the entity mention, and the type information of the entity mention to obtain a vector representation of the entity mention". mathematically measuring similarities between the mentions and corresponding embeddings of entities based on the vectors. ([0090] continues with "then calculate a distance between the vector representation and a vector of each candidate entity. Here, the distance is used to indicate the similarity between the entity mention and each candidate entity.") Regarding claim 5, the rejection of claim 4 is incorporated herein. Further, the combination of Liu and Moreno does not appear to explicitly teach implementing a cosine similarity algorithm to measure similarities between each mention and corresponding entity embedding. However, Mondal—directed to analogous art—teaches implementing a cosine similarity algorithm to measure similarities between each mention and corresponding embeddings of entities. (Section 3.1 states "For a given mention m consisting of l words represented by { m 1 , m 2 ,   .   .   .   , m l } , we represent m as the sum of its word embeddings. The steps for the candidate generation algorithm are as follows: Step 1: Candidate Set 1, { C 1 } : Calculate the cosine similarity between vector representation of each synonym (candidate) of the KBIDs and the mention.") It would have been obvious for one of ordinary skill in the art before the effective filing date of the application to combine the teachings of Liu and Moreno with the teachings of Mondal for the reasons given above in regards to claim 1. Regarding claim 8, Liu teaches A system for end-to-end neural entity linking, the system comprising: ([0008] states "In a third aspect, an embodiment of the present disclosure provides an electronic device for linking an entity.") a processor; and ([0008] further states "at least one processor".) a memory operatively connected to the processor via a communication interface, the memory storing computer readable instructions, when executed, causes the processor to: ([0008] further states "and a memory, communicatively connected with the at least one processor, the memory storing instructions executable by the at least one processor, the instructions, when executed by the at least one processor, causing the at least one processor to perform the method according to the first aspect.") The remainder of claim 8 recites substantially similar subject matter as claim 1 and is rejected with the same rationale, mutatis mutandis. Claims 10, 11, and 12 recite substantially similar subject matter as claims 3, 4, and 5 respectively, and are rejected with the same rationale, mutatis mutandis. Regarding claim 15, Liu teaches A non-transitory computer readable medium configured to store instructions for end-to-end neural entity linking, wherein, when executed, the instructions cause a processor to perform the following ([0008] states "In a third aspect, an embodiment of the present disclosure provides an electronic device for linking an entity, the device electronic including : at least one processor ; and a memory, communicatively connected with the at least one processor, the memory storing instructions executable by the at least one processor, the instructions, when executed by the at least one processor, causing the at least one processor to perform the method according to the first aspect.") The remainder of claim 15 recites substantially similar subject matter as claim 1 and is rejected with the same rationale, mutatis mutandis. Claims 17, 18, and 19 recite substantially similar subject matter as claims 3, 4, and 5 respectively, and are rejected with the same rationale, mutatis mutandis. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M.. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /J.T.P./Examiner, Art Unit 2121 /Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action

Prosecution Timeline

Dec 09, 2021
Application Filed
Apr 14, 2025
Non-Final Rejection — §103, §112
May 08, 2025
Interview Requested
May 14, 2025
Applicant Interview (Telephonic)
May 14, 2025
Examiner Interview Summary
Jul 10, 2025
Response Filed
Sep 25, 2025
Final Rejection — §103, §112
Nov 24, 2025
Response after Non-Final Action
Dec 22, 2025
Request for Continued Examination
Jan 15, 2026
Response after Non-Final Action
Feb 02, 2026
Non-Final Rejection — §103, §112 (current)

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
33%
Grant Probability
0%
With Interview (-33.3%)
3y 3m
Median Time to Grant
High
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month