Last updated: April 19, 2026
Application No. 18/103,559
DEEP LEARNING ENTITY MATCHING SYSTEM USING WEAK SUPERVISION

Non-Final OA §101§103
Filed
Jan 31, 2023
Examiner
CHUANG, SU-TING
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Walmart Apollo LLC
OA Round
1 (Non-Final)
Interview Optional

— +39.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 101 resolved cases, 2023–2026
Examiner Intelligence

CHUANG, SU-TING View full profile →
Grants 52% of resolved cases
Career Allow Rate
52 granted / 101 resolved
-3.5% vs TC avg
Strong +40% interview lift
Without
With
+39.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
28 currently pending
Career history
129
Total Applications
across all art units
Statute-Specific Performance

§101
27.4%
-12.6% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
10.8%
-29.2% vs TC avg
§112
11.7%
-28.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 101 resolved cases
Office Action

§101 §103
DETAILED ACTION Claims 1-20 are pending and have been examined. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Information Disclosure Statement The information disclosure statements (IDS) submitted on 06/05/2023 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-6 and 11-16 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more Step 1 : Claims 1- 6 recite a system comprising processors and non-transitory computer-readable media. Claims 11-16 recite a method. Therefore, claims 1-6 are directed to a process, and claims 11-16 are directed to a machine. With respect to claims 1 and 11: 2A Prong 1: The claim recites a judicial exception. generating pairs of identities from a plurality of sources; (mental process – evaluation or judgement) for each respective pair of identities of the pairs of identities: determining a match probability for the respective pair of identities… (mental process – evaluation or judgement) linking the respective pair of identities as nodes on a graph when the match probability meets a predetermined threshold, wherein a linkage between the nodes represents a match for the respective pair of identities; (mental process – evaluation or judgement) generating… clusters each containing identities representing a respective user; and (mental process – evaluation or judgement) generating a respective user profile for the respective user for each cluster (mental process – evaluation or judgement) 2A Prong 2: The judicial exception is not integrated into a practical application. (claim 11) execution of computing instruction configured to run on one or more processors and stored at one or more non-transitory computer-readable media (mere instructions to apply an exception – MPEP 2106.05(f), (2) invoking generic computer components) using a deep-learning transformer-based binary classification model… using a connected component algorithm… (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. (claim 11) execution of computing instruction configured to run on one or more processors and stored at one or more non-transitory computer-readable media (mere instructions to apply an exception – MPEP 2106.05(f), (2) invoking generic computer components) using a deep-learning transformer-based binary classification model… using a connected component algorithm… (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. With respect to claims 2 and 12: 2A Prong 1: The claim recites a judicial exception. generating a probabilistic set of labels for an unlabeled training dataset to output a labeled training dataset (mental process – evaluation or judgement) 2A Prong 2: The judicial exception is not integrated into a practical application. wherein the computing instructions, when executed on the one or more processors, further cause the one or more processors to perform functions comprising (mere instructions to apply an exception – MPEP 2106.05(f), (2) invoking generic computer components) training the deep-learning transformer-based binary classification model using the labeled training dataset (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. wherein the computing instructions, when executed on the one or more processors, further cause the one or more processors to perform functions comprising (mere instructions to apply an exception – MPEP 2106.05(f), (2) invoking generic computer components) training the deep-learning transformer-based binary classification model using the labeled training dataset (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. With respect to claims 3 and 13: 2A Prong 2: The judicial exception is not integrated into a practical application. wherein the probabilistic set of labels is generated using heuristic functions (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception: in light of specification [0114] “heuristic functions can be written to perform labelling functions that can be processed through Snorkel's algorithm”) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. wherein the probabilistic set of labels is generated using heuristic functions (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception: in light of specification [0114] “heuristic functions can be written to perform labelling functions that can be processed through Snorkel's algorithm”) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. With respect to claims 4 and 14: 2A Prong 2: The judicial exception is not integrated into a practical application. wherein generating the probabilistic set of labels uses a weak supervision model (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. wherein generating the probabilistic set of labels uses a weak supervision model (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. With respect to claims 5 and 15: 2A Prong 1: The claim recites a judicial exception. wherein determining the match probability comprises (mental process – evaluation or judgement) 2A Prong 2: The judicial exception is not integrated into a practical application. obtaining textual features for each identity of the respective pair of identities, wherein each of the textual features comprises unique string length distributions (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting) generating a first sub-model based on the textual features (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. obtaining textual features for each identity of the respective pair of identities, wherein each of the textual features comprises unique string length distributions (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting and WURC: Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i)) generating a first sub-model based on the textual features (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. With respect to claims 6 and 16: 2A Prong 1: The claim recites a judicial exception. wherein determining the match probability further comprises (mental process – evaluation or judgement) 2A Prong 2: The judicial exception is not integrated into a practical application. obtaining boolean features for each identity of the respective pair of identities, wherein the boolean features comprise external metadata and transaction history (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting) generating a second sub-model based on the boolean features (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception)) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. obtaining boolean features for each identity of the respective pair of identities, wherein the boolean features comprise external metadata and transaction history (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting , and WURC: Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))) generating a second sub-model based on the boolean features (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception)) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-2 and 11-12 rejected under 35 U.S.C. 103 as being unpatentable over Krivosheev ("Siamese graph neural networks for data integration" 20200117) in view of Nie ("Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution" 20191103) in view of Ahn ("Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning" 20221205) in further view of Yan ("Relation-aware Heterogeneous Graph for User Profiling" 20211014) In regard to claims 1 and 11, Krivosheev teaches: generating pairs of identities from a plurality of sources; (Krivosheev, p. 4, 4 Proposed Approach "This paper assumes we can obtain a graph representation of the entities of interest. This representation can be extracted in many ways, either by exploiting structures in the database (links between entities, such as foreign keys) or by using an external source. [a plurality of sources] "; p. 5, 4.1 GNNs for Data Integration "If nodes include textual elements, such as names or descriptions, then features can be extracted using word embeddings... Our goal is to correctly match a node q* ∈ Q* to a node r* ∈ R* [pairs of identities] such that eq = er …") for each respective pair of identities of the pairs of identities: determining a match... for the respective pair of identities using a deep-learning transformer-based... model; and (Krivosheev, p. 5, 4.1 GNNs for Data Integration "If nodes include textual elements, such as names or descriptions, then features can be extracted using word embeddings, like Word2Vec [30] and GloVe [38], or recent contextual language models like BERT [9] [transformer-based model] ... Our goal is to correctly match a node q* ∈ Q* to a node r* ∈ R* [for each respective pair of identities] such that eq = er … The methodology we propose is instead based on learning distributed representations... This can be achieved by applying deep metric learning methods , [deep-learning model] such as Siamese networks [5] or triplet networks [43]... We denote as γr the embedding of the node r* ∈ R* , whereas ΓR represents the set of all the embeddings for every node in R*... for a node q* ∈ Q*we compute an embedding vector γq ... by applying the already trained GNN model... we have three potential scenarios: (i) q* cannot be matched successfully to any node in R*; (ii) q* can be matched to a single node r* ∈ R*; (iii)... q*... = rc* if dist(γq, γr) &lt; t , ⊥ otherwise (2) where dist(γq, γr) is the distance... between vectors γq and γr , rc* ∈ R* is the closest node to q* in the embedding space ΓR , [determining a match for the respective pair of identities] t ∈ R is a threshold on the distance, and ⊥ means no matches were provided according to the right 4550410 0 0 given threshold t.") … linking the respective pair of identities as nodes on a graph when the match… meets a predetermined threshold, wherein a linkage between the nodes represents a match for the respective pair of identities; (Krivosheev, p. 5, 4.1 GNNs for Data Integration "For the first two cases, the rule to link q* to a node in the reference graph [linking the respective pair of identities as nodes on a graph, wherein a linkage between the nodes represents a match] is given by:... q*... = rc* if dist(γq, γr) &lt; t , ⊥ otherwise (2) where dist(γq, γr) is the distance (e.g., Euclidean) between vectors γq and γr, rc* ∈ R* is the closest node to q* in the embedding space ΓR, t ∈ R is a threshold on the distance , [when the match… meets a predetermined threshold] and ⊥ means no matches were provided according to the given threshold t.") center 1453515 0 0 Krivosheev does not teach, but Nie teaches: … a match probability for the respective pair… the match probability… (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer... a two layer fully-connected layer followed by a softmax classifier to get the final similarity score of the entity pair (S, T). [the match probability] "; the output of a softmax layer is a probability ) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev to incorporate the teachings of Nie by replacing the distance function with Seq2Seq entity matching model with heterogeneous data as input and a softmax output layer. Doing so would effectively solve the heterogeneous problem and achieve remarkable performance improvements on entity resolution tasks. (Nie, p. 629, Abstract "In this paper, we propose a deep sequence-to-sequence entity matching model, denoted Seq2SeqMatcher, which can effectively solve the heterogeneous and dirty problems by modeling ER as a token-level sequence-to-sequence matching task... our Seq2Seq entity matching model can achieve remarkable performance improvements on 9 standard entity resolution benchmarks.") center 2137410 0 0 Krivosheev and Nie do not tech, but Ahn teaches: using a deep-learning transformer-based binary classification model; (Ahn, p. 364, 4.4 Fine-tuner: Model for Code Similarity "Based on a generic BERT model with a large swath of binaries , we define a downstream task (3 in Figure 2); BCSD... Then, our binary classifier [a deep-learning transformer-based binary classification model] learns a weighted distance vector with the following binary cross entropy loss function: L = Ylogp(Xi, Xj) + (1-Y)log (1-p(Xi, Xj)) (6)"; fine-tuning BERT model ) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev and Nie to incorporate the teachings of Ahn by including a weighted distance vector with a binary cross entropy as a loss function on top of BERT. Doing so would achieve better results that is effective, transferable, practical and robust for detecting similarity unseen pairs. (Ahn, p. 361 "We tackle the problem of detecting code similarity with one-shot learning... we adopt a weighted distance vector with a binary cross entropy as a loss function on top of BERT... our experimental results demonstrate the effectiveness, transferability, and practicality of BinShot, which is robust to detecting the similarity of previously unseen functions.") Krivosheev, Nie and Ahn do not teach, but Yan teaches: A system comprising: one or more processors; and one or more non-transitory computer-readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform functions comprising: (Yan, p. 4, 3.3 Experimental Setup "We implement our RHGN in the PyTorch framework for efficient GPU computation.") 3302635 3646170 0 0 121285 3499485 0 0 … generating, using a connected component algorithm, clusters each containing identities representing a respective user; and (Yan, p. 1, Abstract "we propose to leverage the relation-aware heterogeneous graph method [using a connected component algorithm] for user profiling, which also allows capturing significant meta relations . [clusters containing users and relations] "; p. 1, 1 INTRODUCTION "we propose Relation - aware Heterogeneous Graph Network for user profiling (RHGN) that can model multiple relations on the heterogeneous graph... we adopt a graph with various relations between different types of entities, as illustrated in Figure 1."; using RHGN to generate relations between nodes (e.g. between a user node and item/Ads nodes) [using a connected component algorithm to generate clusters] , and each clusters containing user and her/his items [identities representing a user] ) generating a respective user profile for the respective user for each cluster. (Yan, p. 1, Abstract "we propose to leverage the relation-aware heterogeneous graph method for user profiling , which also allows capturing significant meta relations ."; p. 1, 1 INTRODUCTION "we propose Relation aware Heterogeneous Graph Network for user profiling [a respective user profile for the respective user] (RHGN) that can model multiple relations on the heterogeneous graph [cluster] ") It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev, Nie and Ahn to incorporate the teachings of Yan by including the relation-aware heterogeneous graph method for user profiling. Doing so would allow capturing significant meta relations. (Yan, p. 1, Abstract "we propose to leverage the relation-aware heterogeneous graph method for user profiling, which also allows capturing significant meta relations.") Claim 11 recites substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claim 11. In addition, Krivosheev, Nie and Ahn do not teach, but Yan teaches: A method being implemented via execution of computing instruction configured to run on one or more processors and stored at one or more non-transitory computer-readable media, the method comprising: (Yan, p. 4, 3.3 Experimental Setup "We implement our RHGN in the PyTorch framework for efficient GPU computation.") The rationale for combining the teachings of Krivosheev, Nie, Ahn and Yan is the same as set forth in the rejection of claim 1. In regard to claims 2 and 12, Krivosheev and Nie do not tech, but Ahn teaches: generating a probabilistic set of labels for an unlabeled training dataset to output a labeled training dataset; and (Ahn, p. 362, 2 BACKGROUND "MLM randomly masks a certain portion of tokens in a given sentence (e.g., 15% in the original BERT scheme), exploiting unlabeled data [for an unlabeled training dataset] (i.e., masked positions) to yield labels (i.e., original tokens)."; p. 364, 4.3 Pre-trainer: Model for Assembly "MLM task. We take the identical strategy with the original BERT, replacing 15% of input tokens (instructions) with a mask symbol (i.e., [MASK] token). The parameters of MLM... t ∈ T... y^ = softmax [probabilistic] (Gm(X) (4) where t, T, y and y^ denote a token, a set of tokens, an original token before masking, and a predicted token [a probabilistic set of labels] for MLM, respectively."; p. 364, Figure 3 "Siamese neural network for building a BCSD model. Our model learns a weighted distance vector from a labeled dataset (i.e., a set of two functions and a label) . [a labeled training dataset] "; the output of center 1141095 0 0 the pre-trained BERT model is a labeled dataset, which is an input to a downstream task ) center 2825750 0 0 training the deep-learning transformer-based binary classification model using the labeled training dataset. (Ahn, p. 364, 4.4 Fine-tuner: Model for Code Similarity "Based on a generic BERT model with a large swath of binaries , we define a downstream task (3 in Figure 2); BCSD. To this end, we leverage a Siamese neural network into a classifier, learning a weighted distance from a labeled dataset [using the labeled training dataset] (i.e., (NF1, NF2, {0,1}) where 1 for a similar pair and 0 for a dissimilar one)... Then, our binary classifier [a deep-learning transformer-based binary classification model] learns a weighted distance vector with the following binary cross entropy loss function: L = Ylogp(Xi, Xj) + (1-Y)log (1-p(Xi, Xj)) (6)"; fine-tuning BERT model ) The rationale for combining the teachings of Krivosheev, Nie and Ahn is the same as set forth in the rejection of claim 1. Krivosheev, Nie and Ahn do not teach, but Yan teaches: wherein the computing instructions, when executed on the one or more processors, further cause the one or more processors to perform functions comprising: (Yan, p. 4, 3.3 Experimental Setup "We implement our RHGN in the PyTorch framework for efficient GPU computation.") The rationale for combining the teachings of Krivosheev, Nie, Ahn and Yan is the same as set forth in the rejection of claim 1. Claims 3-4 and 13-14 rejected under 35 U.S.C. 103 as being unpatentable over Krivosheev, Nie, Ahn and Yan as applied to claims 2 and 12, and in further view of Wu ("Demonstration of Panda: A Weakly Supervised Entity Matching System" 20210923) In regard to claims 3 and 13, Krivosheev, Nie, Ahn and Yan do not teach, but Wu teaches: wherein the probabilistic set of labels is generated using heuristic functions. (Wu, p. 1, Abstract "where labeling functions (LF) are user-provided programs that can generate large amounts of (somewhat noisy) labels quickly and cheaply"; p. 2 "To write LFs for EM, users need to examine tuple pairs from a specific EM task, in order to develop intuitions/ heuristics that can be turned into code (LFs) [using heuristic functions (LFs)] to quickly label matches/non-matches."; p. 3, 3. Combining LFs "All possible triples ti, tj, tk form a feasible set Q for the probabilistic labels of the tuple pairs . [the probabilistic set of labels] We then enforce the transitivity constraint by projecting the estimated probabilistic labels to the feasible set... at each E-step.") It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev, Nie, Ahn and Yan to incorporate the teachings of Wu by including heuristic labeling functions. Doing so would generate large amounts of (somewhat noisy) labels quickly and cheaply. (Wu, p. 1, Abstract "where labeling functions (LF) are user-provided programs that can generate large amounts of (somewhat noisy) labels quickly and cheaply") In regard to claims 4 and 14, Krivosheev, Nie, Ahn and Yan do not teach, but Wu teaches: wherein generating the probabilistic set of labels uses a weak supervision model. (Wu, p. 1, Abstract "In this paper, we introduce Panda, a weakly supervised system specifically designed for EM.") The rationale for combining the teachings of Krivosheev, Nie, Ahn, Yan and Wu is the same as set forth in the rejection of claim 3. Claims 5-6 and 15-16 rejected under 35 U.S.C. 103 as being unpatentable over Krivosheev, Nie, Ahn and Yan as applied to claims 1 and 11, and in further view of Wilcke ("End-to-End Entity Classification on Multimodal Knowledge Graphs" 20200325) In regard to claims 5 and 15, Krivosheev teaches: … for each identity of the respective pair of identities (Krivosheev, p. 5, 4.1 GNNs for Data Integration "Our goal is to correctly match a node q* ∈ Q* to a node r* ∈ R* [the respective pair of identities] such that eq = er …") Krivosheev does not teach, but Nie teaches: wherein determining the match probability comprises: (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer... a two layer fully-connected layer followed by a softmax classifier to get the final similarity score of the entity pair (S, T). [ the match probability] "; the output of a softmax layer is a probability ) The rationale for combining the teachings of Krivosheev and Nie is the same as set forth in the rejection of claim 1. Krivosheev, Nie, Ahn and Yan do not teach, but Wilcke teaches: obtaining textual features..., wherein each of the textual features comprises unique string length distributions; and (Wilcke, p. 9, Table 4 "Distribution of datatypes in the datasets…. Textual information includes strings and its subsets, as well as raw URIs (e.g. links) . [unique string length distributions] "; URIs (Uniform Resource Identifier) are unique identifiers ) center 1789430 0 0 generating a first sub-model based on the textual features. (Wilcke, p. 4, Figure 2 "Solid circles represent entities, whereas open shapes represent literals of different modalities. The nodes’ feature embeddings are learned using dedicated (neural) encoders (here f, g, and h) [a first sub-model] "; p. 5, 4.1.3 Textual Information "Vector representations for textual attributes with the datatype XSD: string or any subtype thereof...") It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev, Nie, Ahn and Yan to incorporate the teachings of Wilcke by including embeddings for node features belonging to five different types of modalities. Doing so would help our models obtain a better overall performance. (Wickle, p. Abstract "Our model uses dedicated (neural) encoders to naturally learn embeddings for node features belonging to five different types of modalities, including images and geometries, which are projected into a joint representation space together with their relational information... Our result supports our hypothesis that including information from multiple modalities can help our models obtain a better overall performance.") In regard to claims 6 and 16, Krivosheev teaches: … for each identity of the respective pair of identities (Krivosheev, p. 5, 4.1 GNNs for Data Integration "Our goal is to correctly match a node q* ∈ Q* to a node r* ∈ R* [the respective pair of identities] such that eq = er …") Krivosheev does not teach, but Nie teaches: wherein determining the match probability further comprises: (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer... a two layer fully-connected layer followed by a softmax classifier to get the final similarity score of the entity pair (S, T). [ the match probability] "; the output of a softmax layer is a probability ) … wherein the boolean features comprise external metadata and transaction history; and (Nie, p. 631, 3.2 Types of ER Problems "Heterogeneous ER. In this category, entities in E and E' are described using different schemas (A1, · · · , Am) and (B1,· · · , Bn). These entities may come from different data sources [external metadata] where different schemas - (Name, Brand, Location) and (Name, Manufacturer, Location, Price ) [transaction history] are used to describe the same real-world entity (see Table 3)."; in light of specification [0101] "other qualities of an identity, such as external metadata and transaction history, are stored as Boolean features" ) The rationale for combining the teachings of Krivosheev and Nie is the same as set forth in the rejection of claim 1. Krivosheev, Nie, Ahn and Yan do not teach, but Wilcke teaches: obtaining boolean features... (Wilcke, p. 9, Table 4 "Distribution of datatypes in the datasets. Numerical information includes all subsets of real numbers, as well as booleans ...") generating a second sub-model based on the boolean features. (Wilcke, p. 4, Figure 2 "Solid circles represent entities, whereas open shapes represent literals of different modalities. The nodes’ feature embeddings are learned using dedicated (neural) encoders (here f, g , and h) [a second sub-model] "; p. 5, 4.1.1 Numerical Information "We also include values of the type XSD: boolean into this category...") The rationale for combining the teachings of Krivosheev, Nie, Ahn, Yan and Wilcke is the same as set forth in the rejection of claim 5. Claims 7-10 and 17-20 rejected under 35 U.S.C. 103 as being unpatentable over Krivosheev, Nie, Ahn and Yan as applied to claims 6 and 16, and in view of Hou ("Token Dropping for Efficient BERT Pretraining" 20220324) in further view of Reimers ("Sentence-bert: Sentence embeddings using siamese bert-networks" 20190827) In regard to claims 7 and 17, Krivosheev, Nie, Ahn and Yan do not teach, but Wilcke teaches: wherein generating the first sub-model comprises: (Wilcke, p. 4, Figure 2 "Solid circles represent entities, whereas open shapes represent literals of different modalities. The nodes’ feature embeddings are learned using dedicated (neural) encoders (here f, g, and h) [the first sub-model] "; p. 5, 4.1.3 Textual Information "Vector representations for textual attributes with the datatype XSD: string or any subtype thereof...") generating character-level encodings to convert the textual features into numeric representations; (Wilcke, p. 5, 4.1.3 Textual Information "Vector representations for textual attributes with the datatype XSD: string or any subtype thereof, are created using a character-level encoding , [character-level encodings] proposed in [16] Hereto, we let Es be a |Ω|×|s| matrix representing string s using vocabulary Ω, such that Es ij = 1.0 if sj = Ωi, and 0.0 otherwise . [numeric representations] A character-level representation enables our models to be language agnostic and independent of controlled vocabularies...") The rationale for combining the teachings of Krivosheev, Nie, Ahn, Yan and Wilcke is the same as set forth in the rejection of claim 5. center 3847465 0 0 Krivosheev, Nie, Ahn, Yan and Wilcke do not teach, but Hou teaches: sending the character-level encodings into a first embedding layer to generate a first embedding, wherein the first embedding layer is trained to remove sparsity from the character-level encodings; (Hou, p. 1, Abstract "We develop a simple but effective 'token dropping' method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In short, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus on important tokens; the dropped tokens are later picked up by the last layer of the model so that the model still produces full length sequences."; p. 3, 3 Token-Dropping "Using sparse tensors can address the issue of having a different number of important tokens, but sparse tensor related operations in practice are slow."; dropping tokens will keep the first intermediate layer dense, i.e. [removing sparsity from the encodings] ) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev, Nie, Ahn, Yan and Wilcke to incorporate the teachings of Hou by including a token dropping method. Doing so would reduce the pretraining cost of BERT while achieving similar overall fine-tuning performance. (Hou, p. 1, Abstract "In our experiments, this simple approach reduces the pretraining cost of BERT by 25% while achieving similar overall fine-tuning performance on standard downstream tasks.") center 3751580 0 0 Krivosheev, Nie, Ahn, Yan, Wilcke and Hou do not teach, but Reimers teaches: sending the first embedding to an encoder block to generate final encodings, wherein the encoder block comprises a transformer using multi-head attention and a first fully connected layer using a Siamese architecture in which the encoder block is shared between two textual features; (Reimers, p. 3 "Figure 1: SBERT architecture with classification objective function, e.g., for fine-tuning on SNLI dataset. The two BERT networks have tied weights (siamese network structure) ."; p. 2, 2 Related Work "BERT (Devlin et al., 2018) is a pre-trained transformer network... Multi-head attention over 12 (base-model) or 24 layers (large-model) is applied and the output is passed to a simple regression function to derive the final label."; 'BERT' block in Fig. 1 is [the encoder block] comprising shared/same Siamese architecture, i.e. [a first fully connected] and u and v are [embeddings] ; Hou also teaches BERT Siamese architecture, 'FFW' in Fig. 2 ) calculating an absolute difference between the final encodings; and (Reimers, p. 3, 3 Model "Classification Objective Function. We concatenate the sentence embeddings u and v with the element-wise difference |u−v| [an absolute difference] ") passing each difference of each textual feature encoding into a second fully connected layer. (Reimers, p. 3, 3 Model " softmax (Wt(u, v, |u − v|))"; softmax layer is a second fully connected layer ) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Krivosheev, Nie, Ahn, Yan, Wilcke and Hou to incorporate the teachings of Reimers by including Sentence-BERT to derive semantically meaningful sentence embeddings. Doing so would efficiently find the most similar pair while maintaining the accuracy. (Reimers, p. 1 "In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings... This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.") In regard to claims 8 and 18, Krivosheev, Nie, Ahn and Yan do not teach, but Wilcke teaches: wherein generating the second sub-model comprises: (Wilcke, p. 4, Figure 2 "Solid circles represent entities, whereas open shapes represent literals of different modalities. The nodes’ feature embeddings are learned using dedicated (neural) encoders (here f, g , and h) [the second sub-model] "; p. 5, 4.1.1 Numerical Information "We also include values of the type XSD: boolean into this category...") processing the boolean features using multiple fully connected layers. (Wilcke, p. 4, 3.2 Message Passing Neural Networks "A message passing neural network [3] is a graph neural network model that uses trainable functions to propagate node embeddings over the edges of the neural network."; p. 3, 2 Related Work "our approach includes a message passing layer, allowing multimodal information to be propagated through the graph, several hops , [multiple fully connected layers] before being used for classification.") The rationale for combining the teachings of Krivosheev, Nie, Ahn, Yan and Wilcke is the same as set forth in the rejection of claim 5. In regard to claims 9 and 19, Krivosheev does not teach, but Nie teaches: wherein determining the match probability further comprises: (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer... a two layer fully-connected layer followed by a softmax classifier to get the final similarity score of the entity pair (S, T). [ the match probability] "; the output of a softmax layer is a probability ) concatenating each output of the first sub-model and the second sub-model to generate a combined output; (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer. The prediction layer performs similarity assessment based on the two feature vectors generated in the previous step. Specifically, taking two feature vectors as input, we first concatenate them and then pass the resultant vector ...") passing the combined output, as concatenated, into a final fully connected layer; and (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer... Specifically, taking two feature vectors as input, we first concatenate them and then pass the resultant vector to a two layer fully-connected layer followed by a softmax classifier...") center 1377315 0 0 outputting the match probability. (Nie, p. 632, 4.1 Seq2Seq Entity Matching Network "Prediction Layer... a two layer fully-connected layer followed by a softmax classifier to get the final similarity score of the entity pair (S, T). [ the match probability] "; the output of a softmax layer is a probability ) The rationale for combining the teachings of Krivosheev and Nie is the same as set forth in the rejection of claim 1. In regard to claims 10 and 20, Krivosheev and Nie do not tech, but Ahn teaches: wherein weights of deep-learning transformer-based binary classification model are tuned using a binary cross-entropy loss function. (Ahn, p. 362, 2 BACKGROUND "the fine-tuning phase of BERT improves a generic model by retraining weights with another dataset for a specialized user-defined task (i.e., supervised learning)"; p. 364, 4.4 Fine-tuner: Model for Code Similarity "Based on a generic BERT [deep-learning transformer-based binary classification model] model with a large swath of binaries, we define a downstream task (3 in Figure 2); BCSD... Then, our binary classifier learns a weighted distance vector with the following binary cross entropy loss function: L = Ylogp(Xi, Xj) + (1-Y)log (1-p(Xi, Xj)) (6) ) The rationale for combining the teachings of Krivosheev, Nie and Ahn is the same as set forth in the rejection of claim 1. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT SU-TING CHUANG whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (408)918-7519 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Monday - Thursday 8-5 PT . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Usmaan Saeed can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-4046 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SU-TING CHUANG/ Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Jan 31, 2023
Application Filed
Dec 21, 2025
Non-Final Rejection — §101, §103
Apr 07, 2026
Applicant Interview (Telephonic)
Apr 08, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

16/953,977
Patent 12561600
LINEAR TIME ALGORITHMS FOR PRIVACY PRESERVING CONVEX OPTIMIZATION
2y 5m to grant Granted Feb 24, 2026
16/984,909
Patent 12518154
TRAINING MULTIMODAL REPRESENTATION LEARNING MODEL ON UNNANOTATED MULTIMODAL DATA
2y 5m to grant Granted Jan 06, 2026
17/224,858
Patent 12481725
SYSTEMS AND METHODS FOR DOMAIN-SPECIFIC ENHANCEMENT OF REAL-TIME MODELS THROUGH EDGE-BASED LEARNING
2y 5m to grant Granted Nov 25, 2025
16/540,414
Patent 12468951
Unsupervised outlier detection in time-series data
2y 5m to grant Granted Nov 11, 2025
18/609,221
Patent 12412095
COOPERATIVE LEARNING NEURAL NETWORKS AND SYSTEMS
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
91%
With Interview (+39.7%)
4y 5m
Median Time to Grant
Low
PTA Risk
Based on 101 resolved cases by this examiner. Grant probability derived from career allow rate.