Last updated: April 19, 2026
Application No. 18/322,091
MACHINE LEARNING TECHNIQUES FOR DISAMBIGUATING UNSTRUCTURED DATA FIELDS FOR MAPPING TO DATA TABLES

Non-Final OA §101§112
Filed
May 23, 2023
Examiner
LU, HWEI-MIN
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Optum Inc.
OA Round
1 (Non-Final)
Interview Optional

— +39.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 217 resolved cases, 2023–2026
Examiner Intelligence

LU, HWEI-MIN View full profile →
Grants 62% of resolved cases
Career Allow Rate
134 granted / 217 resolved
+6.8% vs TC avg
Strong +40% interview lift
Without
With
+39.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
37 currently pending
Career history
254
Total Applications
across all art units
Statute-Specific Performance

§101
11.2%
-28.8% vs TC avg
§103
43.8%
+3.8% vs TC avg
§102
9.4%
-30.6% vs TC avg
§112
33.0%
-7.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 217 resolved cases
Office Action

§101 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This office action is in responsive to communication(s): original application filed on 05/23/2023.  Claims 1-20 are pending. Claims 1, 8, and 15 are independent.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 802 and 804 in ¶ [0084].  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to because "… the number of rows of a matrix representation may be equal to an amount of the plurality of data fields and the number of columns of the matrix representation may be equal to an amount of the plurality of data tables" in ¶¶ [0049] and [0084] is inconsistent with "an operational example of a matrix representation of a common data model " depicted in FIG. 8, where the number of columns of a matrix representation may be equal to an amount of the plurality of data fields and the number of rows of the matrix representation may be equal to an amount of the plurality of data tables.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The use of the term "Bluetooth" and "Wi-Fi" in ¶¶ [0038], [0041], and [0043]; and "WiMax" in ¶¶ [0038] and [0041], which is a trade name or a mark used in commerce, has been noted in this application. The term should be accompanied by the generic terminology; furthermore the term should be capitalized wherever it appears or, where appropriate, include a proper symbol indicating use in commerce such as ™, SM , or ® following the term.
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) are permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks.

Claim Objections
Claims 1, 5-6, 12-13, and 18-19 are objected to because of the following informalities: 
In Claim 1, lines 23-24, "… one or more prediction-based actions based on the assigning" appears to be "… one or more prediction-based actions based on the assignment of the one or more select data fields" according to Claims 8 and 15;
 In Claim 5, lines 2-3; Claim 12, lines 2-3; and Claim 18, lines 2-3, "… one or more mutual information scores between logical data type and data table …" appears to be "… one or more mutual information scores between a logical data type and a data table …" (see also 112 rejections to Claims 5, 12, and 18);
In Claims 6, line 2; Claim 13, line 3; and Claim 19, line 4, "…  feedback data associated with the assigning …" appears to be "…  feedback data associated with the assignment of the one or more select data fields …" according to Claims 8 and 15.  
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1, 8, and 15 recite the limitation "… wherein one query set of the one or more query sets comprises: (i) a plurality of candidate data tables selected from the plurality of data tables, and (ii) a select one of the plurality of data fields … a plurality of probability scores associated with a select data field matching to respective ones of the plurality of data tables … one of the plurality of probability scores is associated with one of the plurality of candidate data tables …" in lines 11-1, which rendering these claims indefinite because it is unclear .
Claims 1, 8, and 15 recites the limitation "the performance of one or more prediction-based actions" in line 23, 22, and 22 respectively.  There is insufficient antecedent basis for this limitation in the claim.  Clarification is required.
Claims 2-7, 9-14, and 16-20 are rejected for fully incorporating the deficiency of their respective base claims.
Claims 5, 12, and 18 recite the limitation "… one or more mutual information scores between logical data type and data table … and one or more rarity scores of a logical data type with respect to the plurality of data tables" in lines 2-5, , which rendering these claims indefinite because it is unclear whether two instances of ".
Claims 7, 14, and 20 recite the limitation "... re-initiating " in lines 2-3, 2-3, and 3 respectively, which rendering these claims indefinite because ".

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-2, 5-9, 12-15, and 18-20 are rejected under 35 U.S.C. 101 because 
the claimed invention is directed to abstract idea without significantly more. 

Independent Claims 1, 8, and 15
Step 1: Claim 1 is a process claim, Claim 8 is an apparatus claim, and Claim 15 is a claim for non-transitory computer-readable storage media. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) recite(s) additional elements/limitations of "one or more processors", "computing apparatus" (Claim 8), "memory" (Claim 8), "one or more non-transitory computer-readable storage media" (Claim 15), "machine learning", and "initiating/initiate the performance of one or more prediction-based actions based on the assigning/assignment of the one or more select data fields" which only amount to "apply it" with the use of generic computer components or insignificant extra solution activity. None of the additional elements/limitations, taken alone or in combination, integrate the abstract idea into a practical application. 
Step 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because (a) the additional limitation/element of "machine learning" is well-understood, routine and conventional (WURC) activity similar to "performing repetitive calculation" (see MPEP 2106.05(d), "Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values)"); and (b) the additional limitation/element of "initiating/initiate the performance of one or more prediction-based actions based on the assigning/assignment of the one or more select data fields" is also well-understood, routine and conventional (WURC) activity similar to "presenting offers and gathering statistics" (see MPEP 2106.05(d), "Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93"). Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 2 and 9
Step 1: Claim 2 is a process claim and Claim 9 is an apparatus claim. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) does/do not further recite the limitations/elements  which can be reasonably considered as mental processes (i.e., which "c. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional limitation/element of "using a disambiguation machine learning model, wherein the disambiguation machine learning model comprises a neural network machine learning network"  is also well-understood, routine and conventional (WURC) activity similar to "performing repetitive calculation" (see MPEP 2106.05(d), "Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values)"). Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 3-4, 10-11, and 16-17
Step 1: Claims 3-4 are process claims, Claims 10-11 are apparatus claims, and Claims 16-17 are claims for non-transitory computer-readable storage media. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is integrated into a practical application because the claim(s) . 

Claims 5, 12, and 18
Step 1: Claim 5 is a process claim, Claim 12 is an apparatus claim, and Claim 18 is a claim for non-transitory computer-readable storage media. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 6, 13, and 19
Step 1: Claim 6 is a process claim, Claim 13 is an apparatus claim, and Claim 19 is a claim for non-transitory computer-readable storage media. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) does/do not further recite the limitations/elements  which can be reasonably considered as mental processes (i.e., which "c. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception because (a) the additional limitation/element of "receiving feedback data associated with the assigning" is also well-understood, routine and conventional (WURC) activity similar to "receiving or transmitting data over a network" (see MPEP 2106.05(d), "Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network)"); and (b) the additional limitation/element of "re-training the disambiguation machine learning model based on the feedback data"  is also well-understood, routine and conventional (WURC) activity similar to "performing repetitive calculation" (see MPEP 2106.05(d), "Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values)"). Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claims 7, 14, and 20
Step 1: Claim 7 is a process claim, Claim 14 is an apparatus claim, and Claim 20 is a claim for non-transitory computer-readable storage media. These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Allowable Subject Matter
Claims 1-2, 5-9, 12-15, and 18-20 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 101 and 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
Claims 3-4, 10-11, 16-17 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
In regard to independent Claims 1, 8, and 15, prior arts of records, either singularly or in combination, do not teach or suggest the combination of claimed elements including "a computer-implemented method comprising: generating, by one or more processors, a matrix representation of a common data model, wherein the common data model comprises (i) a plurality of rows associated with a plurality of data tables, and (ii) a plurality of columns associated with a plurality of data fields; determining, by the one or more processors, one or more logical data type weights for respective one or more data table-data field pairs associated with the matrix representation; generating, by the one or more processors, one or more disambiguation embeddings based on the matrix representation and the one or more logical data type weights; generating, by the one or more processors, a plurality of input embedding vectors for one or more prediction inputs based on the one or more disambiguation embeddings, wherein the one or more prediction inputs comprise one or more query sets, and wherein one query set of the one or more query sets comprises (i) a plurality of candidate data tables selected from the plurality of data tables, and (ii) a select one of the plurality of data fields; generating, by the one or more processors and using a disambiguation machine learning model, a plurality of prediction vectors based on the plurality of input embedding vectors, wherein (i) one of the plurality of prediction vectors comprises a plurality of probability scores associated with a select data field matching to respective ones of the plurality of data tables, and (ii) one of the plurality of probability scores is associated with one of the plurality of candidate data tables; assigning, by the one or more processors, one or more select data fields associated with the one or more query sets to respective one or more candidate data tables based on the plurality of prediction vectors; and initiating, by the one or more processors, performance of one or more prediction-based actions based on the assignment of the one or more select data fields ", "a computing apparatus comprising memory and one or more processors communicatively coupled to the memory, the one or more processors configured to: generate a matrix representation of a common data model, wherein the common data model comprises (i) a plurality of rows associated with a plurality of data tables, and (ii) a plurality of columns associated with a plurality of data fields; determine one or more logical data type weights for respective one or more data table-data field pairs associated with the matrix representation; generate one or more disambiguation embeddings based on the matrix representation and the one or more logical data type weights; generate a plurality of input embedding vectors for one or more prediction inputs based on the one or more disambiguation embeddings, wherein the one or more prediction inputs comprise one or more query sets, and wherein one query set of the one or more query sets comprises (i) a plurality of candidate data tables selected from the plurality of data tables, and (ii) a select one of the plurality of data fields; generate, using a disambiguation machine learning model, a plurality of prediction vectors based on the plurality of input embedding vectors, wherein (i) one of the plurality of prediction vectors comprises a plurality of probability scores associated with a select data field matching to respective ones of the plurality of data tables, and (ii) one of the plurality of probability scores is associated with one of the plurality of candidate data tables; assign one or more select data fields associated with the one or more query sets to respective one or more candidate data tables based on the plurality of prediction vectors; and initiate performance of one or more prediction-based actions based on the assignment of the one or more select data fields", or "one or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors, cause the one or more processors to: generate a matrix representation of a common data model, wherein the common data model comprises (i) a plurality of rows associated with a plurality of data tables, and (ii) a plurality of columns associated with a plurality of data fields; determine one or more logical data type weights for respective one or more data table-data field pairs associated with the matrix representation; generate one or more disambiguation embeddings based on the matrix representation and the one or more logical data type weights; generate a plurality of input embedding vectors for one or more prediction inputs based on the one or more disambiguation embeddings, wherein the one or more prediction inputs comprise one or more query sets, and wherein one query set of the one or more query sets comprises (i) a plurality of candidate data tables selected from the plurality of data tables, and (ii) a select one of the plurality of data fields; generate, using a disambiguation machine learning model, a plurality of prediction vectors based on the plurality of input embedding vectors, wherein (i) one of the plurality of prediction vectors comprises a plurality of probability scores associated with a select data field matching to respective ones of the plurality of data tables, and (ii) one of the plurality of probability scores is associated with one of the plurality of candidate data tables; assign one or more select data fields associated with the one or more query sets to respective one or more candidate data tables based on the plurality of prediction vectors; and initiate performance of one or more prediction-based actions based on the assignment of the one or more select data fields" when interpreted as a whole.
Xie et al. ("Joint Entity Linking for Web Tables with Hybrid Semantic Matching", in Krzhizhanovskaya et al. (Eds.): Computational Science – ICCS 2020, LNCS 12138, 20th International Conference Proceedings, Part II, Amsterdam, The Netherlands, June 3–5, 2020, pp. 618-631) discloses in Abstract of Page 618 that (1) in order to extract the semantics of web tables to produce machine-readable knowledge, one of the critical steps is table entity linking, which maps the mentions in table cells to their referent entities in knowledge bases; (2) propose a novel model JHSTabEL, which converts table entity linking into a sequence decision problem and uses hybrid semantic features to disambiguate the mentions in web tables; (3) this model captures local semantics of the mentions and entities from different semantic aspects, and then makes full use of the information of previously referred entities for the subsequent entity disambiguation; (4) the decisions are made from a global perspective to jointly disambiguate the mentions in the same column; and (5) experimental results show that our proposed model significantly outperforms the state-of-the-art methods.  Xie further discloses in Section 1 with FIG. 1 of Pages 618-620 that (1) in order to make machines to understand these tables, one of the critical steps is to map the mentions in table cells to their corresponding entities in a given knowledge base (KB), which is called table entity linking or table entity disambiguation; (2) table entity linking is an important and challenging stage in table semantic understanding since the mentions in tables are usually ambiguous; (3) only focus on tables where rows clearly represent separate tuple-like objects, and columns represent different dimensions of each tuple (similar to Fig. 1); (4) since this paper does not focus on how to determine which cells can be linked to the knowledge base, assume that the linkable mentions are already known and perform entity linking on these linkable mentions, excluding un-linkable content, such as numbers, etc.; (5) compared with entity linking in free-format text, it is more difficult to disambiguate mentions in tables due to the less context of table cells; (6) the existing researches mainly used collective classification techniques, graph-based algorithm, multi-layer perceptron, etc. to solve this problem; (7) these methods do not capture the semantic features of mentions and entities well, and can’t yield desired disambiguation effect; (8) in order to better represent mentions and entities, use a hybrid semantic matching model to capture the local semantic information between table mentions and candidate entities from different semantic aspects; (9) since tables have the property of column consistency, that is, cells in the same column have similar contents and belong to the same category, it is natural to jointly disambiguate the mentions in the same column; (10) mentions usually have different difficulty in disambiguating depending on the quality of the contextual information; (11) sort the mentions in the same column and start with mentions that are easier to disambiguate, it would be useful to utilize the information of previously referred entities for the subsequent entity disambiguation; (12) propose a joint model with hybrid semantic matching for table entity linking, which is called JHSTabEL for short; (13) this model consists of two modules: Hybrid Semantic Matching Model and Global Decision Model; (14) the Hybrid Semantic Matching Model encodes the contextual information of each mention and its candidate entities, which uses the representation-based and interaction-based models to capture matching features at abstract and concrete levels respectively, and then aggregates them to obtain the hybrid semantic features, based on which the similarity scores of the mentions and entities are calculated; (15) before entering the global model, the mentions in the same column are sorted according to local similarity scores; (16) the Global Decision Model uses an LSTM network to encode the local representations of mention-entity pairs and jointly disambiguate the mentions via a sequential manner; (17) propose a hybrid semantic matching model which aggregates complementary abstract and concrete matching features to make full use of the local context; (18) use a global decision model to jointly disambiguate the mentions in the same column; (19) the disambiguation is made from a global perspective; and (20) evaluate our model on web table datasets and the experimental results show that our model significantly outperforms the state-of-the-art methods.  Xie also discloses in Section 2 of Page 620 that (1) various efforts have been made to extract semantics from web tables, which usually contain but not limited to three tasks: table entity linking, column type identification and table relation extraction; (2) Syed et al. presented a pipeline approach, which first inferred the types of columns, then linked cell values to entities in the given KB, finally selected appropriate relations between columns; (3) Mulwad et al. and Limaye et al. described approaches to jointly model entity linking, column type identification and relation extraction tasks using graphical model; (4) these models, which handle all three tasks at the same time, rely on the correctness and completeness of the knowledge base, and therefore may run the risk of negatively affecting the performance of entity linking; (5) Shen et al. linked the mentions in list-like web tables (multiple rows with one column) to the entities in a knowledge base; (6) Efthymiou et al. proposed three unsupervised annotation methods and attempted to map each table row to an entity in a KB, where this work was based on the assumption that the entity columns of tables were already known and their values served as the names of the described entities; (7) Bhagavatula et al. presented TabEL which used a collective classification technique to collectively disambiguate all mentions in web tables; (8) Wu et al. constructed a graph of mentions and candidate entities and used page rank to determine the similarity scores between mentions and candidates; (9) in the above methods, a lot of hand-designed features are applied, which is time-consuming and laborious; (10) recently, with the popularity of deep learning models, representation learning is used to automatically capture semantic features; e.g., Luo et al. proposed a neural network method for cross-language table entity linking, which took some embedding features as inputs and used a two-layer fully connected network to perform entity linking, where this model only used simple coherence features and a MLP network to link all mentions in tables, thus cannot achieve desired linking effect; (11) in this paper, automatically capture the semantic features of the mentions and candidate entities from different aspects to fully use the local information, and then use a global model to disambiguate the mentions in web tables in a global perspective.  Xie further teaches in Section 3, 3.1, 3.2, and 3.3 with FIGS. 2-4 of Pages 621-626 that (1) as shown in Fig. 2, the overall structure of JHSTabEL consists of two parts: (a) the hybrid semantic matching model which encodes the contextual information from two different semantic aspects to obtain the local semantic representations and matching scores of the mentions and the candidate entities; and (b) the global decision model which makes decisions from a global perspective to jointly disambiguate the mentions in the same column; (2) to generate candidate referent entities from a given knowledge base, use several heuristic rules to obtain the candidates: (i) the mention’s redirect and disambiguation page in Wikipedia; (ii) exact match of the string mention; (iii) fuzzy match (e.g., edit distance) of the string mention; (iv) entities containing the n-grams of the mention; (3) to optimize the memory and avoid unnecessary calculations during model training, use the XGBoost model to simplify the candidate sets; (4) the features used in XGBoost are the edit distance between the mentions and their candidate entities, the semantic similarity between the mention context representations and the entity embeddings, and the statistical features based on the pageview and hyperlinks in Wikipedia; (5) then take top K scored entities for each mention based on this model; (6) in contrast, if the number of candidate entities for a mention is less than K, complement it with negative examples from its candidate set; (7) aim to get a local representation and a match score for each mention-entity pair, which is essentially a semantic matching problem between the mention context XM and the candidate entity context Xe; (8) due to the scarce context of table cells, construct the mention context XM by using the other mentions in the row and the column of the table where the mention exists, and represent them as word embeddings using a pre-trained lookup table; (9) the context Xe of the candidate entity is obtained from the abstract of its corresponding page in Wikipedia and embedded in the same way; (10) existing neural semantic matching models can be divided into two categories: representation-based model and interaction-based model; (11) the representation-based model first uses a neural network to construct a representation for a single text, such as a mention context or an entity abstract, and then conducts matching between the abstract representations of two pieces of text; (12) the interaction-based method attempts to establish a local interaction (e.g., cosine similarity) between two pieces of text, and then uses a neural network to learn the final matching score based on the local interaction; (13) the representation-based and interaction-based models can capture abstract and concrete level matching signals respectively; (14) propose to fuse these two models to perform semantic matching between the mention and entity contexts; (15) the left part of Fig. 2 shows the structure of our local hybrid model, which takes the mention and candidate entity as inputs and generates their corresponding contexts and embeddings, which are passed into the representation and interaction models; (16) finally, the hybrid semantic features and local ranking scores are acquired from this hybrid model; (17) Representation-Based Model: (a) given the mention context XM and the candidate entity context Xe aim to get their abstract representations using siamese LSTM with tied weights; (b) Figure 3 illustrates the architecture of our representation-based model; (c) the mention context embedding EmbM and the entity context embedding Embe are obtained from a pre-trained lookup table; (d) use two networks LSTMa and LSTMb with tied weights to encode the embeddings separately, and take the last hidden states of the LSTM networks as the representations of the word sequences; (e) in this way, get the mention representation VM and the entity representation Ve, and feed their concatenation result to a multi-layer perceptron (MLP); (f) the output layer of the MLP produces a feature vector Vabs(M, e) of dabs dimension as in Eqn. (1); (g) in this way, extract abstract-level features Vabs of the local contexts, and also calculate the local similarity between mention and candidate entity using the abstract-level features; and (h) however, if only this representation-based approach is used, the concrete matching signals (e.g., exact match) are lost, since the matching happens after their individual representations; (18) next introduce an interaction-based model to better capture the concrete matching features to complement the representation-based model; (19) Interaction-Based Model: (a) inspired by the latest advances in information retrieval, propose to use an interaction-based approach to capture the concrete-level features; (b) the interaction-based model using Conv-KNRM attempts to establish local interactions (e.g., cosine similarity) and get concrete-level features between mention and entity contexts; (c) as shown in Fig. 4, the Conv-KNRM model first composes n-gram embeddings using CNN networks, and then constructs translation matrices between n-grams of different lengths in the n-gram embedding space; (d) it uses a kernel-pooling layer to count the soft matches of word or n-gram pairs and gets the concrete level features; (e) the Conv-KNRM model takes the mention context embedding EmbM and the entity context embedding Embe as inputs; (f) the convolutional layer applies convolution filters to compose n-grams from the text embeddings; (g) for each window of h words, the filter sums up all elements in the h words’ embeddings Embi:i+h, weighted by the filter weights; (h) using F different filters of size h gives F scores for each window position, represented by a score vector                 
                    
                                    g
                                
                                →
                            
                            i
                        
                            h
                        
                    ∈
                    
                            R
                        
                            F
                        
            , where each of the values in                 
                    
                                    g
                                
                                →
                            
                            i
                        
                            h
                        
             describes the text in the ith window in a different perspective as shown in Eqn. (2); (i) then the convolution feature matrix for h-gram can be obtained by concatenating convolution outputs                  
                    
                                    g
                                
                                →
                            
                            i
                        
                            h
                        
            ; (j) after getting the word-level n-gram feature matrices, the cross-match layer constructs translation matrices using n-grams of different lengths; (k) for mention n-grams of length hM and entity n-grams of length he, a translation matrix                 
                    
                            T
                            M
                        
                                    h
                                
                                    M
                                
                            ,
                             
                                    h
                                
                                    e
                                
             is constructed by calculating their cosine similarity as shown in Eqn. (3); (l) then the Kernel-pooling is applied to each                 
                    
                            T
                            M
                        
                                    h
                                
                                    M
                                
                            ,
                             
                                    h
                                
                                    e
                                
              matrix to generate the concrete feature vector ϕ(                
                    
                            T
                            M
                        
                                    h
                                
                                    M
                                
                            ,
                             
                                    h
                                
                                    e
                                
            ), which describes the distribution of match scores between mention hM-grams and entity he-grams as shown in Eqns. (4)-(5), wherein Eqn. (5) applies k RBF kernels to the i-th row of the translation matrix                 
                    
                            T
                            M
                        
                                    h
                                
                                    M
                                
                            ,
                             
                                    h
                                
                                    e
                                
            , and then generates a k-dimensional feature vector; (m) each kernel calculates how pairwise similarities between n-gram feature vectors are distributed around its mean μk; (n) the more similarities closed to its mean, the higher the output value is; (o) then each of the translation matrices is pooled to a k-dimensional vector, and the concatenation of these vectors produces a scoring feature vector ϕ(TM); and (p) in this way, capture the concrete features Vcon(M, e) = ϕ(TM) based on the word-level n-gram interactions between mention and entity, where these features can complement the abstract features for a better semantic representation; (20) Hybrid Semantic Matching: (a) use the two sub-models introduced above to capture the abstract and concrete level features respectively, and combine them to get the hybrid semantic features; (b) then we pass the concatenation result to a MLP network to get the local similarity score for each mention-entity pair as shown in Eqn. (7); (c) in order to better distinguish the correct entity from the wrong entities in the candidate set when training the hybrid model, use the hinge loss function, which can rank the correct entity higher than others; (d) the loss function of the hybrid model is defined as shown in Eqn. (8); and (e) through the hybrid semantic matching model, we obtain the hybrid semantic features and local similarity scores of the mentions and candidate entities, which will serve as inputs to the subsequent global decision model; (21) the global decision model aims to enhance the topical consistency among the mentions in the same column; (22) as shown in the right part of Fig. 2, the global decision model takes the hybrid semantic features and local similarity scores acquired from the hybrid semantic matching model as inputs, and uses an LSTM network to deal with mentions in a sequence manner; (23) the LSTM network can maintain a long-term memory on features of entities selected in previous states; (24) therefore, the column consistency information can be fully utilized when disambiguating entities; (25) sort the mentions in the same column when disambiguating them; (26) in table entity linking task, it is natural to divide all the mentions in a table into multiple segments according to the column they belong to; (27) then the mentions in a segment are sorted according to the local similarity scores, the one with a higher score is placed first; (28) take the maximum local similarity between the mention and its corresponding candidate entities as the criterion for each mention when sorting; (29) then an LSTM network is used to deal with these sorted segments in a sequence manner; (30) in this way, start with mentions that are easier to disambiguate and utilize the information provided by previously selected entities to disambiguate subsequent mentions; (31) the local similarity score indicates the probability of an entity being the target entity of the mention; (32) therefore, at each time step, randomly select a candidate for the mention based on this probability, and take the corresponding hybrid representations of the mention and the selected entity as inputs to LSTM network; (33) then the output at each time step is passed into a MLP network to produce the label for the selected entity; (34) the objective function of the global decision model is defined as shown in Eqn. (9); and (35) in this way, the mentions in the same column are disambiguated jointly.
Chatbri et al. (US 2021/0406283 A1, pub. date: 12/30/2021) discloses in ABSTRACT and ¶¶ [0002]-[0005] that (1) performing database management operations that require performing data field matching; (2) consolidating (e.g., combining, matching and/or the like) data from input data fields across a plurality of databases, database tables and/or the like; (3) performing automated data field matching across a plurality of input data fields; (4) for each input data field of the plurality of input data fields, identifying one or more occurred characters associated with the input data field, determining a per-character frequency score for each occurred character of the one or more occurred characters across the plurality of input data fields based on a cross-field per-character frequency score of the occurred character across the plurality of input data fields and a total size of the plurality of input data fields, determining a per-character increment score for each occurred character of the one or more occurred characters across the plurality of input data fields based on the per-character frequency score of the occurred character and generating an per-field encoded representation of the input data field based on each per-character increment score for an occurred character of the one or more occurred characters; (5) performing the automated data field matching based on each per-field encoded representation for an input data field of the plurality of input data fields to generate one or more data field matching outputs across the plurality of input data fields; and (6) causing display of the one or more data field matching determinations using a data field matching output interface.  Chatbri further discloses in ¶¶ [0071]-[0103] with FIGS. 4-9 that (1) utilize supervised machine learning models to perform database management operations (e.g., generate query outputs with respect to input data fields, generate user interface data and/or the like); (2) utilize a particular combination of input expansion rules, data field encoding and character-level embedding models in which the output of input expansion rules is supplied as an input of data field encoding operations, which in tum is supplied as an input of and/or combined with a character-level embedding model; (3) providing frequency-awareness and contextual-awareness in feature representations of input data fields improves accuracy of subsequent numerical operations and reduces the number of false positives in query results/outputs; (4) additionally, improved matching operations between input data fields enables the consolidation of related data across various databases and/or various database tables, which in tum reduces storage needs of various existing data storage systems; (5) the content data storage subsystem 108 provides raw input data fields 411 to the database management computing entity 106 for operations; (6) the database management computing entity 106 comprises an input expansion unit 401 configured to generate expanded input data fields 412 from raw input data fields 411; (7) an input data field may refer to a data object that describes a data attribute that contains an atomic unit of structured data in a database (e.g., a database value in a database table of a database, where the database value is associated with a row identifier and a column identifier); (8) an input expansion rule may refer to a data object that describes a set operations that are utilized to convert a raw input data field into an expanded input data field; (9) a raw input data field may comprise input data strings including truncated values, word order errors, typographical errors, shorthand and/or the like; (10) an input expansion rule may be utilized to perform one or more operations on the raw input data field in order to reduce sparsity of a numerical representation of the raw input data field in a multi-dimensional embedding space and increase accuracy of numerical operations (e.g., cross-field distance measurements) with respect to input data fields; (11) the database management computing entity 106 comprises a data field encoding unit 402 configured to generate per-field encoded representations 413 for the raw input data fields 411 (i.e., a per-field encoded representation for each raw input data field of the raw input data fields 411); (12) a per-field encoded representation may be a data object describing a numerical representation (e.g., a feature vector) of a corresponding input data field based on occurrence of characters in the input data field; (13) the numerical representation may comprise an ordered histogram; (14) an example numerical representation corresponding with an input data field that in turn comprises a plurality of occurred characters may comprise an N-dimensional vector, where N is the total number of candidate characters in an applicable character encoding system; (15) each of the N values in the N-dimensional vector may describe the per-character increment score of a candidate character that corresponds to the vector value as well as the per-field per-character frequency score of the noted character with respect to the input data field; (16) the database management computing entity 106 comprises a data field matching unit 403 configured to generate data field matching outputs 414 based on the per-field encoded representations 413; and (17) a data field matching output may refer to a data object that describes an output of a process that involves calculating at least one cross-field distance measure between a group of input data fields; e.g., (a) the data field matching output may be an output that describes a determination about whether two input data fields are deemed equivalent, where the noted determination is determined by calculating a cross-field distance measure between the two input data fields; and (b) the data field matching output may be an output that describes the output of a database join operation (e.g., a relational join operation, such as a relational inner join operation, a relational outer join operation, a relational left join operation, a relational right join operation, and/or the like), where the database join operation includes an equivalence determination, and where the equivalence determinations are determined by calculating cross-field distance measures; (18) perform one or more operations (based on input expansion rules) on raw input data fields in order to reduce sparsity of a numerical representation of the raw input data field in a multi-dimensional embedding space, which increases the accuracy of arithmetic operations performed on the numerical representations of the input data fields such as data field matching operations. wherein exemplary input expansion rules and/or operations may include stemming, lemmatization techniques and/or the like; (19) at step/operation 601 when the data field encoding unit 402 identifies occurred characters associated with an input data field; (20) at step/operation 602, the data field encoding unit 402 may determine a per-character increment score for each occurred character; (21) the per-character increment score for an occurred character may refer to a data object that describes a predictive significance of each occurrence of the occurred character within a dataset to performing data field matching operations across the data fields of the noted data set; (22) the per-character increment score for the occurred character may be determined using at least one of the per-character frequency score for the occurred character and the per-character context score for the occurred character; (23) at step/operation 801 when the data field encoding unit 402 determines a per-character frequency score for each occurred character; (24) a per-character frequency score a data object that describes a measure of overall occurrence frequency of a corresponding character in a dataset containing a group of input data fields; (25) the frequency score of a corresponding character may be determined based the occurrence frequency of the occurred character in the dataset relative to the overall size of the dataset; (26) the per-character frequency score for a corresponding character in a group of input data fields may be determined based on a cross-field per-character frequency score of the occurred character across the group of input data fields and a total size of the group of input data fields; (27) at step/operation 802, the data field encoding unit 402 determines a per-character context score for the occurred character; (28) the data field encoding unit 402 may utilize a character-level embedding model to determine a per-character context score for each occurred character; (29) the character-level embedding model may be a machine learning model that is configured to process a dataset in order to determine a representation of a character that describes the occurrence context for the character relative to the characters that occur in the dataset; (30) the character-level embedding model may be configured to extract features from input data fields to facilitate the performance of machine learning operations that are in turn configured to generate a per-field encoded representation (e.g., numerical representation and/or a feature vector representation) for each character; (31) an example of a character-level embedding model is a convolutional neural network model, an autoencoder model (e.g., a regular autoencoder model, a variational autoencoder model, and/or the like), a convolutional-network-based encoder model, a recurrent-neural-network-based encoder model, a character to vector machine learning model, and/or the like; (32) given the expanded input data field 412B, the data field encoding unit 402 may utilize the character-level embedding model 901 (e.g., a convolutional neural network model, an encoder model and/or the like) to determine the character context modeling data object 911 that describes the per-character context score for each occurred character of a set of characters; (33) at step/operation 803, the data field encoding unit 402 determines a per-character increment score for the occurred character based on the per-character frequency score for the occurred character and the per-character context score for the occurred character; (34) the data field encoding unit 402 combines (e.g., adds, multiplies, performs a weighted addition of, and/or the like) the per-character frequency score for the occurred character and the per-character context score for the occurred character to determine the per-character increment score for the occurred character; (35) at step/operation 603, the data field encoding unit 402 generates the per-field encoded representation for the input data field based on each per-character increment score for an occurred character; (36) the per-field encoded representation may include the per-character context-aware frequency score for each candidate character with respect to the input data field, where the per-character context aware frequency score for a candidate character is determined by combining (e.g., by multiplying) the per-field per-character frequency score for the candidate character in the input data field and the per-character increment score for the candidate character (e.g., by n times incrementing the value for character char in the per-field encoded representation of input data field i, where the value of n is determined based on the per-character context aware frequency score for the character char in the input data field i, and where the magnitude of each increment is determined based on the per-character increment score for the character char char); (37)  the data field encoding unit 402 combines (e.g., in an ordered histogram) each per-character context aware frequency score for a candidate character with respect to the input data field in order to generate the encoded representation of the input data field; (38) FIG. 7 provides an operational example for generating a per-field encoded representation 413B for an expanded input data field 412B by the data field encoding unit 402; (39) a data field matching output 414 may be an output of a process that involves calculating at least one cross-field distance measure between a group of input data fields; e.g., the data field matching output may be an output that describes a determination about whether two input data fields are deemed equivalent, where the noted determination is determined by calculating a cross-field distance measure between the two input data fields; (40) to perform data field matching operations, the data field matching unit 403 may perform numerical operations configured to determine similarity of two or more input data fields based on the two or more per-field encoded representations of those input data fields; e.g., to determine a similarity measure for two input data fields, the data field matching unit 403 may compute a measure of distance between the mappings of the per-field encoded representations of the two input data fields in a multi-dimensional embedding space; (41) a multi-dimensional embedding space may be an N-dimensional space for modeling encoded representations of a group of terms, where each of the N dimensions of the N-dimensional space corresponds to a candidate character in a character encoding system; (42) a cross-field distance measure may describe the distance between the mappings of two encoded representations in a multi-dimensional embedding space; (43) the cross-field distance measure may be a measured distance between two input data fields (e.g., a primary data field and an associated secondary data field) in a multi-dimensional embedding space; (44) a cross-field distance operation may be determined using a distance between two per-field encoded representations, V1 and V2, where the distance may be calculated by utilizing the equation                 
                    d
                    
                                    V
                                
                                    1
                                
                            ,
                             
                                    V
                                
                                    2
                                
                    =
                    
                            1
                        
                            K
                        
                                ∑
                                
                                    k
                                    =
                                    0
                                
                                    K
                                
                                                        V
                                                    
                                                        1
                                                    
                                                        k
                                                    
                                                -
                                                
                                                        V
                                                    
                                                        2
                                                    
                                                [
                                                k
                                                ]
                                            
                                        2
                                    
            , where " V1'' and " V2" are two feature vectors representing data from two input data fields (e.g., input data strings); and "K" is the number of dimensions for each vector; (45)  a cross-field distance operation may be determined using similarity determination measures such as cosine distance, Jaccard distance and/or the like; (46) once the data field matching outputs 414 are generated, an interface generation unit 404 may be configured to generate user interface data 415 based on the data field matching outputs 414 and provide the user interface data 415 to a client computing entity 102; (47) facilitate evaluating equivalence between data fields as part of performing join operations; (48) support at least two types of database join operations: non-probabilistic join operations and probabilistic join operations; (49) unlike non-probabilistic join operations, probabilistic join operations may be associated with (e.g., may specify) a deviation tolerance parameter which can in tum be used to generate the identity threshold used to perform at least some aspects of the data field matching concepts; and (50) based on the specified deviation tolerance parameter, the database management computing entity 106 may determine that two data fields are equivalent if the cross-field distance measure between the noted two data fields falls within the 1 % least of a maximal cross-field distance measure as defined by the hyper-parameters of a corresponding multi-dimensional embedding space.
CHEN et al. (US 2018 / 0189265 A1, pub. date: 07/05/2018) discloses in ABSTRACT and ¶¶ [0003]-[0006] that (1) facilitate the learning of entity and word embeddings for entity disambiguation; (2) using a novel disambiguation model accurately identify named entities across a large base on information; (3) generally, embeddings include a mapping or mappings of entities and words from training data to vectors of real numbers in a low dimensional space , relative to a size of the training data (e.g., continuous vector space); (3) training disambiguation models in continuous vector space comprises a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data, define a probabilistic model for the one or more concurrence graphs, define an objective function based on the probabilistic model and the one or more concurrence graphs, and train at least one disambiguation model based on feature vectors generated through an optimized version of the objective function; (4) training data including free text and a plurality of document anchors, a pre-processing component configured to pre-process at least a portion of the training data to generate one or more concurrence graphs of named entities, words, and document anchors, and a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs, wherein the training component is further configured to train at least one disambiguation model based on the vector embeddings; and (5) prepare training data for machine learning through extraction of a plurality of observations, wherein the training data comprises a corpus of text and a plurality of document anchors, generate a mapping table based on the plurality of observations of the training data, and generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data and based on the mapping table.  CHEN further discloses in ¶¶ [0021]-[0029] with FIG. 1 that (1) a corpus of training data 101 may include a large amount of free text 102 and a plurality of document anchors 103 for training a disambiguation model 127; (2) the large amount of free text 102 may include a number of articles, publications, Internet websites, or other forms of text associated with one or more topics; (3) the one or more topics may include one or more named entities, or may be related to one or more named entities; (4) the document anchors 103 may include metadata or information related to a particular location in a document of the free text 102 , and a short description of information located near or in the particular location of the document; e.g., a document anchor may refer a reader to a particular chapter in an article; (5) document anchors may also automatically advance a viewing pane in a web browser to a location in a web article; (6) document anchors may include "data anchors" if referring to data associated with other types of data, rather than particular documents; (7) document anchors and data anchors may be used interchangeably under some circumstances; (8) other forms of anchors, including document anchors, data anchors, glossaries, outlines, table of contents, and other suitable anchors, are also applicable to the technologies described herein; (9) the training data 101 may be accessed by a machine learning system 120; (10) a number of pseudo - labeled observations 104 may be taken from the training data 101 by a pre-processing component 121; (11) using the pseudo - labeled observations 104, the pre-processing component 121 may generate one or more mapping tables 122, a number of concurrence graphs 123, and a tokenized text sequence 124; (12) upon pre-processing at least a portion of the training data 101 to create the mapping tables 122, concurrence graphs 123, and tokenized text sequence 124, a training component 125 may train embeddings of entities and words for development of training data; (13) the training component 125 may also generate a number of feature vectors 126 in continuous vector space, wherein the feature vectors 126 may be used to train the disambiguation model 127 in vector space, as well; and (14) upon training the disambiguation model 127, a run-time prediction component 128 may utilize the disambiguation model 127 to identify named entities in a corpus of data.  CHEN also discloses in ¶¶ [0030]-[0037] with FIGS. 1-2 that (1) the method 200 for pre-processing training data may begin pre-processing at block 201, and cease pre-processing at block 214; (2) the pre-processing component 121 may prepare the training data 101 for machine learning at block 202; (3) upon preparation of the training data 101 based on the pseudo-labeled observations 104, the pre-processing component generates the one or more map ping tables 122, at block 204; (4) the mapping table or tables 122 include tables configured to train a model to associate a correct candidate or an incorrect candidate; (5) therefore, the mapping table or tables 122 may be used to train the disambiguation model 127 with both positive and negative examples for any particular phrase mentioning a candidate entity; (6) the pre-processing component 121 also generates an entity-word concurrence graph from the document anchors 103 and text surrounding the document anchors 103, at block 206, an entity-entity concurrence graph from titles of articles as well as the document anchors 13, at block 208, and an entity-word concurrence graph from titles of articles and words contained in the articles, at block 210; e.g., a concurrence graph may also be termed a share topic graph; (7) a concurrence graph may be representative of a co-occurrence relationship between named entities; (8) upon generating the concurrence graphs, the pre-processing component 121 may generate a tokenized text sequence 124, at block 212; (9) the tokenized text sequence 124 may be a clean sequence that represents text, or portions of text, from the free text 102 as sequences of normalized tokens; (10) upon completing any or all of the pre-processing sequences with reference to blocks 201-212, the method 200 may cease at block 214; and (11) as shown in FIGS. 1 and 3, the training component 125 may receive the map ping table 122, concurrence graphs 123 , and the tokenized text sequence 124 as input.  CHEN further teaches in ¶¶ [0038]-[0043] with FIG. 3 that (1) the method 300 for training embeddings of entities and words may begin at block 301; (2) the training component 125 may initially define a probabilistic model for concurrences at block 302; (3) the probabilistic model may be based on each concurrence graph 123 based on vector representations of named entities and words; (4) word and entity representations are learned to discriminate the surrounding word (or entity) within a short text sequence; (5) the connections between words and entities are created by replacing all document anchors with their referent entities; e.g., a vector of ωv is trained to perform well at predicting the vector of each surrounding term ωu from a sliding window; (6) the vector in the corpus-vocabulary                 
                    V
                
             is trained to predict the vectors in the context-vocabulary                 
                    U
                
            ; (7) the collection of word (or entity ) and context pairs extracted from the phrases may be denoted as                 
                    D
                
            ; (8) as an example of a probabilistic model appropriate in this context, a corpus-context pair (ν, μ) [Symbol font/0xCE]                 
                    D
                
            , (ν[Symbol font/0xCE]                
                    V
                
            , μ[Symbol font/0xCE]                 
                    U
                
            ) may be considered; (9) the training component may model the conditional probability ρ(μlν) using a softmax function defined by Equation 1; (10) upon defining the probabilistic model, the training component 125 may also define an objective function for the concurrences, at block 304; (11) generally, the objective function may be an objective function defined by learning as the likelihood of generating concurrences; e.g., the objective function based on Equation 1 may be defined as set forth in Equation 2; (12) in Equation 2, σ(x) = 1/(1+exp(-x)) and c is the number of negative examples to be discriminated for each positive example; (13) given the objective function, the training component 125 may encourage a gap between appeared concurrences in the training data and candidate occurrences that have not appeared, at block 306; (14) the training component 125 may further optimize the objective function at block 308, and the method 300 may cease at block 310; and (15) by training embeddings of entities and words in creation of a probabilistic model and an objective function , features may be generated to train the disambiguation model 127 to better identify named entities.  CHEN also teaches in ¶¶ [0044]-[0052] with FIGS. 1 and 4 that (1) the method 400 for generating feature vectors 126 in vector space and training the disambiguation model 127 in vector space begins training in vector space at block 401; (2) generally, the training component 125 defines templates to generate features, at block 402, wherein the templates may be defined as templates for automatically generating features; (3) at least two templates are defined: (a) the first template may be based on a local context score, wherein the local context score template is a template to automatically generate features for neighboring or " neighborhood ” words; and (b) the second template may be based on a topical coherence score, wherein the topical coherence score template is a template to automatically generate features based on an average-semantic-relatedness, or the assumption that unambiguous named entities may be helpful in identifying mentions of named entities in a more ambiguous context; (4) utilizing the generated templates, the training component 125 computes a score for each template, at block 404; (5) the score computed is based on each underlying assumption for the associated template; e.g., the local context template may have a score computed based on local contexts of mentions of a named entity; (6) an example equation to compute the local context score may be implemented as Equation 3; (7) in Equation 3, Γ(mi) denotes the candidate entity set of mention mi; (8) additionally, multiple local context scores may be computed by changing the context window size |                
                    T
                
            |;(9) with regard to a topical coherence template, a document level disambiguation context C may be computed based on Equation 4; (10) in Equation 4, d is an analyzed document and                 
                    D
                
            (d) = {ê1, ê2, …, êm} is the set of unambiguous entities identified in document d; (11) after computing scores for each template, the training component 125 generates features from the templates, based on the computed scores, at block 306; (12) generating the features may include, e.g., generating individual features for constructing one or more feature vectors based on a number of disambiguation decisions; (13) a function for the disambiguation decisions is defined by Equation 5; (14) in Equation 5, F=Uj=1 fi denotes the feature vector, while the basic features are local context scores cs(mi, ei,                 
                    T
                
            ) and topical coherence scores tc(mi, ei); (15) furthermore, additional features can also be combined utilizing Equation 5; (16) but generally, the training component is configured to optimize the parameters β, such that the correct entity has a higher score over irrelevant entities; (17) during optimization of the parameters β, the training component 125 defines the disambiguation model 127 and trains the disambiguation model 127 based on the feature vectors 126, at block 408; (18) the method 400 ceases at block 410; and (19) the disambiguation model 127 may be used to more accurately predict the occurrence of a particular named entity.  CHEN further discloses in ¶¶ [0053]-[0058] with FIGS. 1 and 5 that (1) run-time prediction begins at block 501, and may be performed by run-time prediction component 128, or may be performed by another portion of the system 100; (2) initially, run-time prediction component 128 receives a search request identifying one or more named entities, at block 502; (3) upon receipt of the search request, the run-time prediction component 128 may identify candidate entries of web articles or other sources of information, at block 504; (4) thereafter, the run-time prediction component 128 may retrieve feature vectors 126 of words and/or named entities, at block 506; (5) upon retrieval, the run-time prediction component 128 may compute features based on the retrieved vectors of words and named entities contained in the request, at block 508; (6) thereafter, the run-time prediction component 128 applies the disambiguation model to the computed features, at block 510; (7) upon application of the disambiguation model, the run-time prediction component 128 may rank the candidate entries based on the output of the disambiguation model, at block 512; (8) the ranking may include ranking the candidate entries based on a set of probabilities that any one candidate entry is more likely to reference the named entity than other candidate entries; (9) upon ranking, the run-time prediction component 128 may output the ranked entries at block 514; and (10) the method 500 may continually iterate as new requests are received, or alternatively, may cease after outputting the ranked entries.
Trabelsi et al. ("Semantic Labeling Using a Deep Contextualized Language Model", ARXIV ID: 2010.16037, Oct. 29, 2020, pp. 1-11) discloses in ABSTRACT of Page 1 that (1) generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking; e.g., automatically extracted tables with missing headers can be filled by the predicted schema labels which significantly minimizes human effort; (2) furthermore, the predicted labels can reduce the impact of inconsistent names across multiple data tables; (3) understanding the connection between column values and contextual information is an important yet neglected aspect as previously proposed methods treat each column independently; (4) propose a context-aware semantic labeling method using both the column values and context; (5) the proposed method is based on a new setting for semantic labeling, where sequentially predict labels for an input table with missing headers; and (6) incorporate both the values and context of each data column using the pre-trained contextualized language model, BERT, that has achieved significant improvements in multiple natural language processing tasks.  Trabelsi further discloses in Section 1 with FIG. 1 of Pages 1-2 that (1) given an unseen data table, the objective is to generate a schema label for each column from a set of labels; (2) schema labels of datasets are used in multiple tasks such as data discovery, schema matching and data preparation and analysis; (2) existing methods generate schema labels solely on the basis of their content or data values, and thus ignore the contextual information of each column when predicting schema labels; e.g., both columns with labels nationality and location can contain data values from the class country, but the context of these two columns within the data table, such as other columns in the data table, has the potential to solve the ambiguity when inferring the label; (3) many prior methods decouple the feature extraction and model building steps and require significant human effort to validate both phases; (4) propose a new context-aware semantic labeling method that incorporates both data values and column’s context in order to infer the label; (5) present a new setting for generating schema labels  in which the input is a table with missing schema labels or headers, instead of the traditional setting that treats each column separately; (6) the overview of the framework, that is used in the method, is described in Figure 1, wherein given a previously-unseen table with missing headers, sequentially predict schema labels, and incorporate the already-predicted labels as context for next header prediction within the same table; (7) integrate BERT into the proposed method, denoted by SeLaB (Semantic Labeling with BERT), to solve the schema generation task; (8) train a single BERT model that makes an initial prediction for the column’s label using only data values, and then updates its prediction by incorporating both data values and predicted contexts of the column; (9) SeLaB is trained end-to-end for feature extraction and model building, which reduces the significant human effort that is needed in prior methods, and gives the model the ability to capture specific features that are better than the hand-crafted ones for semantic labeling; (10) in addition to that, by incorporating the context, enable to predict labels in a richer and more fine-grained set of vocabulary unlike the limited classes that are used to describe the semantic labels; and (11) SeLaB doesn’t assume that the column values match an existing KB, and therefore SeLaB generalizes to table collections from multiple domains.  Trabelsi also discloses in Section 2 of Pages 2-3 that (1) in order to infer the semantic type of a column using data values, the authors define four categories of features which are: global statistics, character distributions, pretrained word embedding, and trained paragraph embedding; (2) each feature category has different performance and noise level, so that the propose a multi-input neural networks model, instead of simply concatenating all features, and feeding the resulting feature vector to a single-input neural network;(3) the multi-input neural networks model is composed of multiple identical subnetworks without weights sharing; (4) each subnetwork consists of two fully connected hidden layers with batch normalization, rectified linear unit (ReLU) activation functions, and dropout; (5) semantic types use a limited set of vocabulary, and can restrict the number of categories that can be considered when inferring the label of a given column; (6) generating schema labels is more challenging as the number of possible labels is large compared to the predefined set of semantic types; (7) schema matching is related to semantic type detection where the objective is to find correspondence between attributes in different schemas; (8) the data values provide additional information that can disambiguate between the possible candidates; (9) matching functions are used to infer the correct semantic labels for data values; (10) after extracting similarity metrics features from a pair of attributes, each feature vector is given a True/False label, where True means that the attributes have the same semantic type, and False indicates that the attributes are not sharing the same semantic type; (11) the matching score is estimated using the distance between the embeddings of two sets of data values; (12) the score is adjusted using the output of another neural network to distinguish two columns that are different but their data values are identically distribute; (13) for textual data, the feature vector is a weighted bag of words with TF-IDF; (14) for numerical data, use a statistical hypothesis testing to analyze the distribution of numerical data values that corresponds to a given semantic label; (15) the proposed method is based on the multiclass classification setting because the schema labels are easily collected from data table corpus, unlike matching based strategy that requires additional human effort to define pairs of attributes that have similar semantic type; (16) BERT is trained on unlabeled data over two pre-training tasks which are the masked language model, and next sentence prediction; (17) then, BERT can be used for downstream tasks on single text or text pairs using special tokens ([SEP] and [CLS]) that are added into the input; and (18) the sentence pair classification setting is used to solve multiple tasks in information retrieval including document retrieval, frequently asked question retrieval, passage reranking, and table retrieval.  Trabelsi further teaches in Section 3 of Pages 3-4 with FIG. 1 of Page 2 that (1) the goal is to generate schema labels or semantic types for tables columns using data values, and predicted contexts in order to resolve the ambiguity problem in the prediction phase; (2) use the multiclass classification setting to solve schema labeling; (3) the training data consists of a table corpus T = {𝑇1, 𝑇2, …, 𝑇𝑛}, with 𝑛 is the total number of data tables; (4) each table 𝑇𝑘 has a set of 𝑚 columns 𝐴1, 𝐴2, …, 𝐴𝑚, where each column 𝐴𝑖 has a schema label 𝑙𝑖 (column’s header), and a set of data values 𝑉𝑖 = {𝑣1, 𝑣2, …, 𝑣𝑟}, where 𝑟 is the number of rows in 𝑇𝑘; (5) the set of all possible schema labels is denoted by 𝐿; (6) resolving ambiguity when predicting schema labels requires the whole table as input to the model, instead of only using independent column’s values; (7) therefore, the setting consists of table inputs that have missing headers, and the objective is to predict schema labels for all columns of the input table; (8) denote the proposed model by𝑀 = 𝑁 ◦ 𝐹, with 𝐹 is the feature extractor function (Contextual input block in Figure 1), and 𝑁 is the classification layer (Model block in Figure 1); (9) the input to 𝑀 is a table 𝑇𝑘 with missing schema labels, and the output of the model is a sequence of predicted schema labels Â1, Â2, …, Â𝑚.; and (10) the method learns both features and model simultaneously leading to significant reduction in human’s effort spent in the feature engineering phase.  Trabelsi also teaches in Section 4 of Pages 4-6 that (1) propose incorporating predicted context instead of the ground truth context; i.e., the model has two passes for predicting schema labels; (2) during the first pass, given a table 𝑇𝑘 with missing headers, only data values are used to make initial predictions for semantic labels, denoted by 𝐴'1, 𝐴'2, …, 𝐴'𝑚; (3) the initial predictions are context-free, as they only capture data values; (4) for the second pass, incorporate both data values 𝑉𝑖, and the predicted context 𝐴'1, 𝐴'2, …, 𝐴'𝑖-1, 𝐴'𝑖+1, …, 𝐴'𝑚 of 𝐴𝑖 to make the final context-aware prediction, denoted by Â𝑖; (5) incorporate data values and predicted contexts of a given attribute using the contextualized language model BERT; (6) for the proposed model 𝑀 = 𝑁 ◦ 𝐹 , denoted by SeLaB, 𝐹 is equivalent to BERT with parameters 𝜃, as the hidden state of [CLS] token from the last transformer block is used to compute the embedding of the input sentence, where 𝑁 denotes the softmax layer with parameters 𝑊 that is used to produce the probability distribution of a given sequence over all schema labels from 𝐿; (7) the general form of input to 𝑀 for an attribute 𝐴𝑖 , denoted by contextual input, is the sequence [CLS]+𝑉𝑖+[SEP]+𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖)+[SEP], where 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖) is the predicted context of 𝐴𝑖; (8) for first pass prediction, where 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖) is missing, the input sequence form, denoted by only values, becomes [CLS]+𝑉𝑖+[SEP]+[SEP]; (9) training phase: (a) the steps of training phase are shown in Algorithm 1; (b) the inputs to training phase are: table corpus                 
                    T
                
             = {𝑇1, 𝑇2, …, 𝑇𝑛} where semantic labels 𝑙1, 𝑙2, …, 𝑙𝑚 are available for all attributes 𝐴1, 𝐴2, …, 𝐴𝑚 of a given table 𝑇𝑘 ∈                 
                    T
                
            , set of possible semantic labels 𝐿, and pre-trained BERT model as a feature extractor 𝐹; (c) the compact notation of table 𝑇𝑘, that is used in algorithms, is 𝑇𝑘 = [ [𝐴1, 𝑉1], [𝐴2, 𝑉2], …,  [𝐴𝑚, 𝑉𝑚] ]; (d) the training process has three phases: (i) the first phase consists of predicting an initial label for each column using only values input form as shown in Lines 4–9 of Algorithm 1, where the output of the first phase is a sequence '1, 𝐴'2, …, 𝐴'𝑚 of initial predicted labels; (ii) during the second phase (Lines 10–15), construct the predicted context 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖) for each attribute 𝐴𝑖, which is the set of predicted labels {𝐴'𝑗; 𝑗 ∈ [1, 𝑚] \ {𝑖}}, where in order to avoid the true label leakage in 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖), remove 𝑙𝑖 from 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖) if 𝑙𝑖 ∈ 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 (𝐴𝑖), and also remove duplicates from 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖) as most of data tables contain unique headers; and (iii) the final phase (Lines 16–21) computes the context-aware predictions by using contextual input form, where the output of 𝑀 is the probability distribution                 
                    
                                    p
                                
                                    i
                                
                        ^
                    
             over all labels in 𝐿, for every 𝐴𝑖 ∈ 𝑇𝑘; (e) these probability distributions are used to calculate the cross entropy loss, and to update the parameters of 𝑀 as indicated in Lines 22–23; (f) in addition to incorporating the context of column for schema labeling, the model has the ability to accept two forms of sequence inputs (only values and contextual input), which significantly reduces the number of parameters compared to the case where a separate model is needed to handle each type of input sequence; (g) the  BERT-based feature extractor 𝐹 is able to process string and numerical texts by taking advantage of BERT tokenizers; (h) the model 𝑀 = 𝑁 ◦ 𝐹 is trained end-to-end to jointly optimize the feature extractor 𝐹, and the classification layer 𝑁; and (i) SeLaB needs only BERT embeddings that is fine-tuned on target table corpus to extract the feature of each column, and therefore generalizes to data tables from multiple domains; (10) testing phase: (a) the steps of the testing phase are shown in Algorithm 2; (b) the inputs to the testing phase are: a testing table 𝑇𝑘 that has missing headers (𝑙1, 𝑙2, …, 𝑙𝑚, are not available), set of possible semantic labels 𝐿, trained model 𝑀, and two parameters 𝑢𝑛𝑖𝑞𝑢𝑒_ℎ𝑒𝑎𝑑𝑒𝑟𝑠 and 𝑡𝑜𝑝𝑘; (c) the testing process has three phases: (i) the first and second phases (Lines 2–3) are similar to the training process, where initial predictions are computed using only values input form, and then used to produce the context of each attribute; and (ii) during the third phase, the final predicted labels for the testing data table are generated sequentially as shown in Lines 4–22; (d) for a given table 𝑇𝑘, initially all schema labels are missing, and the set of predicted attributes, denoted by 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠, is empty; (e) given that the prediction is done sequentially, 𝑚 passes are needed to obtain a predicted schema label for each column in 𝑇𝑘; (f) for the 𝑗-th pass, the 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 has 𝑗−1 labeled headers, and 𝑚−𝑗+1 columns in 𝑇𝑘, denoted by 𝑆𝑗, are still missing the predicted labels; (g) predict the probability distribution                 
                    
                                    p
                                
                                    i
                                
                        ^
                    
            , and a schema label Â𝑖 for each column 𝐴𝑖 ∈ 𝑆𝑗 using the model 𝑀 with contextual input sequence; (h) the confidence of prediction for 𝐴𝑖 ∈ 𝑆𝑗 is given by 𝑝𝑖𝑚𝑎𝑥 = max𝑙∈𝐿                 
                    
                                    p
                                
                                    i
                                
                        ^
                    
            [𝑙]; (i) the 𝑢𝑛𝑖𝑞𝑢𝑒_ℎ𝑒𝑎𝑑𝑒𝑟𝑠 is a Boolean variable set to True to force the unique headers constraint for a given table; (j) when predicting duplicate headers is allowed, the column ℎ chosen to predict from 𝑆𝑗 in the 𝑗-th pass, is given by ℎ = argmax𝑠∈[1,𝑚−𝑗+1] 𝑝𝑠𝑚𝑎𝑥 as shown in Lines 15–16; (k) on the other hand, when unique headers constraint is required for a given data table, propose a routine, called 𝑈𝑛𝑖𝑞𝑢𝑒𝐻𝑒𝑎𝑑𝑒𝑟𝑠, that resolves the duplicate headers problem as shown in Algorithm 3, where the inputs to this routine are: the probability distributions                 
                    
                                    p
                                
                                    i
                                
                        ^
                    
             for 𝐴𝑖 ∈ 𝑆𝑗, 𝑡𝑜𝑝𝑘 which denotes the number of top confidences per attribute that are used to find the label, and the set 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 that contains the 𝑗−1 semantic labels that are already assigned to 𝑗−1 columns of 𝑇𝑘; (l) the objective of the function 𝑈𝑛𝑖𝑞𝑢𝑒𝐻𝑒𝑎𝑑𝑒𝑟𝑠 is to find the label 𝑐ℎ𝑜𝑠𝑒𝑛_𝑙𝑎𝑏𝑒𝑙 with the highest confidence value, with respect to the unique headers constraint that requires 𝑐ℎ𝑜𝑠𝑒𝑛_𝑙𝑎𝑏𝑒𝑙 ∉ 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠; (m) for time complexity efficiency, limit the depth of search by choosing 𝑡𝑜𝑝𝑘 << |𝐿|; (n) by limiting the depth of search, 𝑈𝑛𝑖𝑞𝑢𝑒𝐻𝑒𝑎𝑑𝑒𝑟𝑠 can produce a duplicate header; (o) in this case, use a heuristic that returns the label that corresponds to the maximum confidence score; (p) 𝑈𝑛𝑖𝑞𝑢𝑒𝐻𝑒𝑎𝑑𝑒𝑟𝑠 is called in Lines 12–14 of Algorithm 2; (q) remove the chosen column ℎ from 𝑆𝑗 to obtain 𝑆𝑗+1 (the columns of 𝑇𝑘 that are still missing labels after the 𝑗-th pass), and add the chosen column ℎ to 𝑠𝑒𝑒𝑛_𝑐𝑜𝑙𝑢𝑚𝑛𝑠 set, and the predicted label Âℎ to 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑_𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 set (Lines 18–20); (r) finish the 𝑗-th pass by updating 𝑐𝑜𝑛𝑡𝑒𝑥𝑡(𝐴𝑖), where 𝐴𝑖 ∈ 𝑆𝑗+1, using the predicted label Âℎ from the 𝑗-th pass as shown in Line 21; (s) the objective of the context update step is to replace the only values predicted label by the contextual input inferred schema label, as the latter is more accurate than the former; (t) for the 𝑗-th pass, select the best schema label from 𝑚−𝑗+1 predicted labels; (u) the increase in the number of predictions is justified by the sequential nature of the testing algorithm where context is updated in each pass, and the most confident prediction is selected.
Habti et al. (US 2019/0279101 A1, pub. date: 09/12/2019) discloses in ¶¶ [0026]-[0045] with FIG. 1 that (1) the unstructured content (from various input sources and in different formats and/or languages) is provided to content ingestion pipeline 115 for processing (e.g., language detection, content extraction, content analyzing, tagging, etc.); (2) content ingestion pipeline 115 can decompose inputs of various content types from various content sources into respective source-specific metadata tables, map them to internal ingestion pipeline document 119, populate internal ingestion pipeline document 119 with inferred/derived/determined metadata utilizing uniform mapping schema 117 (with or without custom extensions, depending upon use case), and persist them in central repository 150 through metadata tables 154 that conform to single common data model 152 of central repository 150; (3) a numerical analysis on the input data can be performed by numerical content analyzer or ingestion pipeline 122 within advanced analytics system 120; (4) the output data from the numerical analysis (e.g., analyzed numerical information 124) can be mapped to the persistence layer (which, for instance, can include metadata table(s) 154 residing in central repository 150) utilizing the same or similar mapping scheme for the textual content; (5) a uniform mapping schema and a single common data model for mapping both unstructured text and structured data, wherein the uniform mapping schema is utilized by the advanced content ingestion pipeline to create internal ingestion pipeline documents which are then mapped to metadata tables using the single common data model; (6) because disparate contents can now be processed and persisted in a unified manner, this allows users of the advanced analytics system to build and train data models for predictive analytics using ML, with unstructured text and structured data as input data; (7) the advanced content ingestion pipeline can be configured with a variety of crawlers for crawling a variety of types of content from a variety of data sources; (8) the unified mapping schema and the single common data model together define how disparate contents would be mapped (e.g., dynamically mapped as they are crawled) and persisted in metadata tables and how they are related to each other; (9) through the ML model development environment, the data scientists may augment and/or incorporate other features into the metadata tables, potentially mutating and/or modifying the metadata tables depending on the kind of analysis and/or modeling that they are building; and (10) in this way, the platform flexibly enables the data scientists to use the platform the way they want to.  Habti further discloses in ¶¶ [0055]-[0070] with FIGS. 2-9 that (1) although various crawlers can be used to extract various types of editorial metadata from disparate contents, these crawler do not perform natural language processing (NLP) or have the necessary intelligence to understand or comprehend the meanings of words in the disparate contents; (2) FIG. 5 depicts a diagrammatic representation of a process flow for determining/inferring semantic metadata from disparate contents 501 (e.g., social media data feeds, Web content, enterprise content, etc.) and persisting the semantic metadata in respective metadata tables 550 in central repository 580; (3) disparate contents 501 are fed or otherwise provided to intelligent ingestion pipeline engine or text mining engine 510, wherein Engine 510 is configured with basic and sophisticated NLP capabilities; (4) basic NLP capabilities of engine 510 can include language detection, tokenization and parsing, lemmatization/stemming, part-of-speech tagging, and identification of semantic relationships; (5) sophisticated NLP capabilities of engine 510 can include text mining functions such as concept extraction, categorization (also referred to as topic or classification), sentiment analysis, summarization, entity extraction, etc.; (6) for conception extraction, engine 510 is operable to extract key concepts, including complex concepts; e.g., concepts can be identified with an algorithm based on linguistic and statistical patterns (e.g., keywords and key phrases); (7) the extracted concepts can be weighted ranked such that they are outputted with relevancy ranking; (8) for categorization/topic/classification, engine 510 is operable to programmatically examine the input text and determine, according to a controlled vocabulary (a taxonomy- a scheme of classification), a best topic for the document and attach the topic to the document; (9) the discovery feature provides a range of data engineering and enrichment methods that enable users to aggregate and decode data, build expressions to create calculated fields, create numeric and quantile ranges, build parametric columns consisting of query-based values, and rank records; (10) for sentiment analysis, engine 510 is operable to pro grammatically examine a piece of content (e.g., a post, a document, a tweet, an article, a message, etc.) in an even more fine-grained manner; e.g., for a given sentence in a document that describes a company releasing a new product, engine 510 is operable to analyze the sentence and determine whether the sentiment for the totality of the sentence is positive, negative, or neutral; (11) since engine 510 also extracts the named entities (e.g., company name, product name, etc.), the sentiment or tonality detected in a sentence by engine 510 can be associated with an entity or entities (e.g., the company and/or the product) in the sentence; (12) to perform summarization, engine 510 is operable to identify the most relevant sentences in a piece of content using, for instance, an output from the categorization, and generate a summary with the identified sentences; (13) for entity extraction, engine 510 is operable to extract named entities based on linguistic rules and statistical patterns; (14) all occurrences of an entity type can also be extracted as sub entities; (15) outputs from these text mining functions (e.g., language, concepts, categories/topics/classifications, document-level sentiments, sentence-level sentiments, summaries, named entities, sub entities, etc.) can be captured in internal ingestion pipeline document 530, where this capturing process is performed utilizing uniform mapping schema 515; (16) all internal ingestion pipeline documents conform to uniform mapping schema 515 which defines a set of master metadata; (17) depending upon use case, the set of master metadata can be extended to include custom metadata; (18) internal ingestion pipeline document 530 is not persisted in central repository 580, and so instead, metadata captured in internal ingestion pipeline document 530 (e.g., language, concepts, categories/topics/classifications, document-level sentiments, sentence-level sentiments, summaries, named entities, sub entities, etc.) can be mapped to metadata tables 550 using common data model 535 (regardless of the disparate nature of source contents); (19) common data model 600 defines how metadata captured in internal ingestion pipeline document 530 should be mapped to various metadata tables, all of which are associated with document table 210; (20) as illustrated in FIGS. 7-9, all of the metadata tables, including source-specific editorial metadata tables (e.g., social media metadata tables shown in FIG. 2, Web metadata table shown in FIG. 3, enterprise content metadata tables shown in FIG. 4) and semantic metadata tables conforming to the single common data model (e.g., metadata tables shown in FIGS. 6A-6B) are keyed to or otherwise associated with document table 210; and (21) this unique mapping schema allows disparate metadata tables to be associated with and through the same document.
Luo et al., ("Implementing a Portable Clinical NLP System with a Common Data Model — a Lisp Perspective", 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Dec. 3-6, 2018, pp. 461-466) in ABSTRACT of Page 461 that (1) presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes; (2) LAPNLP integrates multiple standard, customized and in-house developed NLP tools; (3) facilitate portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements; (4) utilize UMLS to perform domain adaptation when integrating generic domain NLP tools; (5) feature stand-off annotations that are specified by positional reference to the original document; (6) built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements; (7) develop a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations; (8) experiment with the system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes; and (9) these experiments showcased the broader applicability and utility of LAPNLP.
However, closest arts of records, as discussed above, singly or in combination do not teach or suggest at least following features "generate a matrix representation of a common data model, wherein the common data model comprises (i) a plurality of rows associated with a plurality of data tables, and (ii) a plurality of columns associated with a plurality of data fields; determine one or more logical data type weights for respective one or more data table-data field pairs associated with the matrix representation; generate one or more disambiguation embeddings based on the matrix representation and the one or more logical data type weights; generate a plurality of input embedding vectors for one or more prediction inputs based on the one or more disambiguation embeddings, wherein the one or more prediction inputs comprise one or more query sets, and wherein one query set of the one or more query sets comprises (i) a plurality of candidate data tables selected from the plurality of data tables, and (ii) a select one of the plurality of data fields; generate, using a disambiguation machine learning model, a plurality of prediction vectors based on the plurality of input embedding vectors, wherein (i) one of the plurality of prediction vectors comprises a plurality of probability scores associated with a select data field matching to respective ones of the plurality of data tables, and (ii) one of the plurality of probability scores is associated with one of the plurality of candidate data tables; assign one or more select data fields associated with the one or more query sets to respective one or more candidate data tables based on the plurality of prediction vectors; and initiate the performance of one or more prediction-based actions based on the assignment of the one or more select data fields" when combining with all other limitations of these claim as a whole.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HWEI-MIN LU/Primary Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

May 23, 2023
Application Filed
Feb 20, 2026
Non-Final Rejection — §101, §112
Mar 27, 2026
Interview Requested
Apr 08, 2026
Applicant Interview (Telephonic)
Apr 08, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/737,938
Patent 12602578
LIGHT SOURCE COLOR COORDINATE ESTIMATION SYSTEM AND DEEP LEARNING METHOD THEREOF
2y 5m to grant Granted Apr 14, 2026
17/804,513
Patent 12596954
MACHINE LEARNING FOR MANAGEMENT OF POSITIONING TECHNIQUES AND RADIO FREQUENCY USAGE
2y 5m to grant Granted Apr 07, 2026
17/231,757
Patent 12591770
PREDICTING A STATE OF A COMPUTER-CONTROLLED ENTITY
2y 5m to grant Granted Mar 31, 2026
17/662,568
Patent 12579466
DYNAMIC USER-INTERFACE COMPARISON BETWEEN MACHINE LEARNING OUTPUT AND TRAINING DATA
2y 5m to grant Granted Mar 17, 2026
17/805,377
Patent 12561222
REDUCING BIAS IN MACHINE LEARNING MODELS UTILIZING A FAIRNESS DEVIATION CONSTRAINT AND DECISION MATRIX
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+39.5%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 217 resolved cases by this examiner. Grant probability derived from career allow rate.