Last updated: May 29, 2026

Application No. 18/333,670

CONVERTING HISTORICAL TRANSACTION DATA INTO MERCHANT VECTORS FOR MODEL TRAINING

Non-Final OA §101§103

Filed

Jun 13, 2023

Examiner

SPRAUL III, VINCENT ANTON

Art Unit

2129

Tech Center

2100 — Computer Architecture & Software

Assignee

The Toronto-Dominion Bank

OA Round

1 (Non-Final)

This examiner grants 60% of cases after interview

— +26.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 37 resolved cases, 2023–2026

Examiner Intelligence

SPRAUL III, VINCENT ANTON View full profile →

Grants 60% of resolved cases

Career Allowance Rate

22 granted / 37 resolved

+4.5% vs TC avg

Strong +27% interview lift

Without

With

+26.7%

Interview Lift

resolved cases with interview

Typical timeline

4y 4m

Avg Prosecution

19 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

2.8%

-37.2% vs TC avg

§103

93.8%

+53.8% vs TC avg

§102

0.6%

-39.4% vs TC avg

§112

1.7%

-38.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 37 resolved cases

Office Action

§101 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 17-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because neither the claim nor the specification limit “computer-readable storage medium” to non-transitory forms. The claims therefore include transitory forms, i.e., signals per se, which do not fall into a statutory category.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5-11, 13-18, and 20 rejected under 35 U.S.C. 103 over Shmueli, US Pre-Grant Publication No. 2017/0109344 (hereafter Shmueli) in view of Wong et al., US Pre-Grant Publication No. 2023/0118240 (hereafter Wong). 

Regarding claim 1 and analogous claims 9 and 17:
	Shmueli teaches:
“An apparatus comprising: a storage device comprising a feature store; and a processor configured to”: Shmueli, paragraph 0075, “As shown in FIG. 5, computer system/server 12 [apparatus] in cloud computing node 10 is shown in the form of a general purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16 [a processor], a system memory 28 [a storage device], and a bus 18 that couples various system components including system memory 28 to processor 16”; Shmueli, paragraph 0051, “An exemplary aspect of the disclosed invention provides a system, method, and non-transitory recording medium for determining and discerning items with multiple meanings which can provide a technical solution to the technical problem in the conventional approaches by examining the distributed representation of words in the sequence, deciding on main types of occurrences, and then, replacing each occurrence of the word in the input sequence with the ‘appropriate sense’ word and occurrence sequence by listing ‘strongly related’ words for each of the occurrences.”
“train a machine learning model using a neural network capability with central words and contextual words to convert text content into a vector”: Shmueli, paragraph 0025, “The first distribution representation device 101 receives the sequence of items 150 and produces a distributed representation for each item ‘w’ as a word vector and a context vector. That is, the distribution representation device 101 uses a vector producing algorithm such as word2vec (w2v) or Glove, and produces output vectors of two types per each item: word (Syn0 type in w2v) and context (Synlneg in w2v)). More specifically, given a sequence of items (words, concepts, nodes in a graph, etc. or a combination thereof [with central words and contextual words]), the first distribution representation device 101 learns [train] a distributed representation (i.e. vectors) of dimension n (a user specified parameter) [convert text content into a vector], using a tool such as word2vec or Glove [a machine learning model using a neural network capability ].”
“test the trained machine learning model based on execution of the trained machine learning model on a test input to generate a predicted test output”: Shmueli, paragraph 0043, “Step 201 receives a sequence of items 150 and produces a distributed representation for each item ‘w’ as a word vector and a context vector [test the trained machine learning model based on execution of the trained machine learning model on a test input to generate a predicted test output]. Step 201 runs word2vec or similar vector producing tool and, obtains word vectors v1, . . . , vn for words w1, . . . , wn in the vocabulary V of text T=t1, t2, . . . , tu.”
“compare the predicted output to a known output based on the test”: Shmueli, paragraph 0046, “Step 204 produces a new sequence by replacing each word w occurrence oc in the input sequence by w_j  where 1<=j<=k is the index of the class, among the k closest classes whose c' vector is cosine closest to the average context vector for the words in a window around oc as shown in FIG. 3 [compare the predicted output to a known output based on the test]. More specifically, step 204 replaces in the text word occurrence tm with a ‘variation’ wi_j. For example, if word occurrence tm=car and ‘car’ has 3 ‘strong’ classes and the current context (of tm) average vector is closest to the vector c'i2 representing the second strong class, then tm=car is replaced by car_2. So, each occurrence is replaced with the appropriate ‘sense’ of the word in the current window context usage.”
“receive an input via a graphical user interface (GUI)”: Shmueli, paragraph 0080, “Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12 [receive an input via a graphical user interface (GUI)].”
“establish a hyperparameter of the machine learning which identifies a predefined number of vector slots based on the input via the GUI”: Shmueli, paragraph 0025, “The first distribution representation device 101 receives the sequence of items 150 and produces a distributed representation for each item ‘w’ as a word vector and a context vector. That is, the distribution representation device 101 uses a vector producing algorithm such as word2vec (w2v) or Glove, and produces output vectors of two types per each item: word (Syn0 type in w2v) and context (Synlneg in w2v)). More specifically, given a sequence of items (words, concepts, nodes in a graph, etc. or a combination thereof), the first distribution representation device 101 learns a distributed representation (i.e. vectors) of dimension n (a user specified parameter) [establish a hyperparameter of the machine learning which identifies a predefined number of vector slots based on the input via the GUI], using a tool such as word2vec or Glove.”
“execute the trained machine learning model on the encoding to identify latent features of the data, convert the latent features and the name into vectorized values, and embed the vectorized values into a vector that comprises the predetermined number of vector slots based on the hyperparameter”: Shmueli, paragraph 0025, “The first distribution representation device 101 receives the sequence of items 150 and produces a distributed representation for each item ‘w’ as a word vector and a context vector. That is, the distribution representation device 101 uses a vector producing algorithm such as word2vec (w2v) or Glove, and produces output vectors of two types per each item [execute the trained machine learning model on the encoding to identify latent features of the data, convert the latent features and the name into vectorized values]: word (Syn0 type in w2v) and context (Synlneg in w2v)). More specifically, given a sequence of items (words, concepts, nodes in a graph, etc. or a combination thereof), the first distribution representation device 101 learns a distributed representation (i.e. vectors) of dimension n [embed the vectorized values into a vector that comprises the predetermined number of vector slots based on the hyperparameter] (a user specified parameter), using a tool such as word2vec or Glove.”
Shmueli does not explicitly teach:
“query data read from a point of sale (POS) system and convert the data and a name associated with the data into an encoding”
“generate an entry comprising metadata of the data, a geographic location associated with the POS system, and the generated vector”
“store the entry in the feature store”
Wong teaches:
“query data read from a point of sale (POS) system and convert the data and a name associated with the data into an encoding”: Wong, 0003, “Digital payments have exploded over the last twenty years, with more than three-quarters of global payments using some form of payment card or electronic wallet. Point of sale systems are progressively becoming digital rather than cash based [a point of sale (POS) system ]”; Wong, paragraph 0133, “At block 1302, the method 1300 comprises receiving transaction data for anomaly detection. This may comprise operations similar to block 522 in FIG. 5A. The transaction data may relate to a proposed transaction, e.g. as described with reference to blocks 512 to 518 in FIGS. 5A and 5B. The transaction data may be received as data packets accompanying an API request, e.g. with respect to an internal function, or external RESTful, interface. At block 1304, a first set of features are generated based on the received transaction data. This may be performed as part of one or blocks 522 or 524 in FIGS. 5A or 5B. It may comprise applying an observable feature generator 910 generate an observable feature vector as described with reference to FIG. 9 [query data read from a point of sale (POS) system and convert the data and a name associated with the data into an encoding]. The first set of features may be configured as a vector of numeric values that are computed based, at least, on the data packets accompanying an API request, i.e. data for a proposed transaction.”
“generate an entry comprising metadata of the data, a geographic location associated with the POS system, and the generated vector”: Wong, paragraph 0052, “Other fields present in the transaction data can include, but are not limited to, an account number (e.g., a credit card number), a location of where the transaction is occurring [a geographic location associated with the POS system], and a manner (e.g., in person, over the phone, on a website) in which the transaction is executed [metadata of the data].”
“store the entry in the feature store”: Wong, paragraph 0052, “FIG. 3B shows how transaction data 330 for a particular transaction may be stored in numeric form for processing [store the entry in the feature store] by one or more machine learning models.”
Wong and Shmueli are analogous arts as they are both related to vectorizing data. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the application of vectorization to transactional data of Wong with the teachings of Shmueli to arrive at the present invention, in order to detect anomalous transactions, as stated in Wong, paragraph 0027, “The output of the machine learning system may be used to prevent a wide variety of fraudulent and criminal behaviour such as card fraud, application fraud, payment fraud, merchant fraud, gaming fraud and money laundering.”

Regarding claim 2 and analogous claims 10 and 18:
Shmueli as modified by Wong teaches “The apparatus of claim 1.”
Shmueli further teaches (bold only) “wherein the data comprises a plurality of different variations of a name, and the processor is further configured to normalize the different variations of the name into a single name value and generate the encoding based on the single name value”: Shmueli, paragraph 0046, “Step 204 produces a new sequence by replacing each word w occurrence oc in the input sequence by w_j  where 1<=j<=k is the index of the class, among the k closest classes whose c' vector is cosine closest to the average context vector for the words in a window around oc as shown in FIG. 3. More specifically, step 204 replaces in the text word occurrence tm with a ‘variation’ wi_j. For example, if word occurrence tm=car and ‘car’ has 3 ‘strong’ classes and the current context (of tm) average vector is closest to the vector c'i2 representing the second strong class, then tm=car is replaced by car_2. So, each occurrence is replaced with the appropriate ‘sense’ of the word in the current window context usage [normalize the different variations of the name into a single name value and generate the encoding based on the single name value].”

Regarding claim 3 and analogous claim 11:
Shmueli as modified by Wong teaches “The apparatus of claim 1.”
Shmueli further teaches “wherein the machine learning model comprises a word to vector (Word2Vec) model”: Shmueli, paragraph 0025, “The first distribution representation device 101 receives the sequence of items 150 and produces a distributed representation for each item ‘w’ as a word vector and a context vector. That is, the distribution representation device 101 uses a vector producing algorithm such as word2vec (w2v) or Glove, and produces output vectors of two types per each item: word (Syn0 type in w2v) and context (Synlneg in w2v)). More specifically, given a sequence of items (words, concepts, nodes in a graph, etc. or a combination thereof), the first distribution representation device 101 learns a distributed representation (i.e. vectors) of dimension n (a user specified parameter), using a tool such as word2vec [wherein the machine learning model comprises a word to vector (Word2Vec) model] or Glove.”

Regarding claim 5 and analogous claims 13 and 20:
Shmueli as modified by Wong teaches “The apparatus of claim 1.”
Wong further teaches “wherein the processor is configured to identify a category type of the data from among a plurality of possible category types, and store the vector within a location in the feature store based on the identified category type”: Wong, paragraph 0050, “In FIG.2B, a secure logical storage layer 270 is provided using the physical data storage device 260. The secure logical storage layer 270 may be a virtualized system that appears as separate physical storage devices to the machine learning system 210 while actually being implemented independently upon the at least one data storage device 260. The logical storage layer 270 may provide separate encrypted partitions 280 for data relating to groups of entities (e.g., relating to different issuing banks etc.) [identify a category type of the data from among a plurality of possible category types, and store the vector within a location in the feature store based on the identified category type] and the different sets of historical transaction data 240-A to N and ancillary data 242-A to N may be stored in the corresponding partitions 280-A to N.”
Wong and Shmueli are analogous arts as they are both related to vectorizing data. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the partitioned storage to transactional data of Wong with the teachings of Shmueli to arrive at the present invention, in order to improve data security, as stated in Wong, paragraph 0006, “Moreover, engineers need to contend with the problem of implementing machine learning models on data that is siloed or partitioned based on access security, and in situations where the velocity of data updates is extreme.”

Regarding claim 6 and analogous claim 14:
Shmueli as modified by Wong teaches “The apparatus of claim 1.”
Wong further teaches “wherein the processor is further configured to execute the machine learning model on additional data to generate an additional vector that comprises vectorized values corresponding to latent features of the additional data embedded within slots of the additional vector, respectively, and store the additional vector within the feature store”: Wong, paragraph 0111, “In FIG. 9, the observable feature generator 910 receives transaction data and uses this to generate an observable feature vector 916. The context feature generator 930 receives ancillary data and uses this to generate a context feature vector 926 [generate an additional vector that comprises vectorized values corresponding to latent features of the additional data]. The observable feature vector 916 and the context feature vector 926 are then combined to generate the overall feature vector 930. In one case, the observable feature vector 916 and the context feature vector 926 may be combined by concatenating the two feature vectors 916 and 926 to generate a longer vector”; Wong, paragraph 0044, “The payment processor server 140 is communicatively coupled to a first data storage device 142 storing transaction data 146 and a second data storage device 144 storing ancillary data 148 [store the additional vector within the feature store].”
Wong and Shmueli are combinable for the rationale given under claim 1.

Regarding claim 7 and analogous claim 15:
Shmueli as modified by Wong teaches “The apparatus of claim 6.”
Wong further teaches “wherein the processor is configured to input the vector into the machine learning model when generating the additional vector”: Wong, paragraph 0111, “In FIG. 9, the observable feature generator 910 receives transaction data and uses this to generate an observable feature vector 916. The context feature generator 930 receives ancillary data and uses this to generate a context feature vector 926 [input the vector into the machine learning model when generating the additional vector]. The observable feature vector 916 and the context feature vector 926 are then combined to generate the overall feature vector 930. In one case, the observable feature vector 916 and the context feature vector 926 may be combined by concatenating the two feature vectors 916 and 926 to generate a longer vector.”
Wong and Shmueli are combinable for the rationale given under claim 1.

Regarding claim 8 and analogous claim 16:
Shmueli as modified by Wong teaches “The apparatus of claim 1.”
Wong further teaches “wherein the processor is configured to identify a combination of attributes including a name, a type, and a code, and convert the combination of attributes into numerical values within the encoding”: Wong, paragraph 0052, “FIG. 3B shows how transaction data for a particular transaction may be stored in numeric form for processing [convert the combination of attributes into numerical values] by one or more machine learning models. For example, in FIG. 3B, transaction data has at least fields: transaction amount, timestamp (e.g., as a Unix epoch), transaction type [a type] (e.g., card payment or direct debit), product description or identifier (i.e., relating to items being purchased), merchant identifier [a name], issuing bank identifier, a set of characters (e.g., Unicode characters within a field of predefined character length), country identifier [a code] etc. It should be noted that a wide variety of data types and formats may be received and pre-processed into appropriate numerical representations”; Wong, paragraph 0074, “In certain cases, the output of the first multilayer perceptron 610, which may be considered a normalized transaction data feature vector [within the encoding], may be concatenated, or otherwise combined with output data of the recurrent neural network architecture 620. to provide an input to the one or more attention neural network architectures 660.”
Wong and Shmueli are combinable for the rationale given under claim 1.

Claims 4, 12, and 19 rejected under 35 U.S.C. 103 over Shmueli as modified by Wong in view of Yoon et al., US Pre-Grant Publication No. 2021/0081768 (hereafter Yoon).

Regarding claim 4 and analogous claim 12:
Shmueli as modified by Wong teaches “The apparatus of claim 3.”
	Shmueli as modified by Wong does not explicitly teach “wherein the processor is further configured to execute the Word2Vec model on training data to generate a weight matrix for the Word2Vec model, prior to generating the vector, and determine the vector based on the weight matrix for the Word2Vec model.”
	Yoon teaches “wherein the processor is further configured to execute the Word2Vec model on training data to generate a weight matrix for the Word2Vec model, prior to generating the vector, and determine the vector based on the weight matrix for the Word2Vec model”: Yoon, paragraph 0070, “In Equation 8, Ewi is a vector corresponding to the weight matrix W matched with the input word wi in FIG. 2, and E'wj is the vector for the output word w obtained using the weight matrix W', When a corpus T is given, the Word2Vec algorithm learns [execute the Word2Vec model on training data to generate a weight matrix for the Word2Vec model, prior to generating the vector] to maximize the log probability of Equation 7 and obtains a weight matrix to be used as the latent vector for the word [determine the vector based on the weight matrix for the Word2Vec model], to efficiently calculate the denominator of Equation 8, a negative sampling loss function may be used.”
Yoon and Shmueli are analogous arts as they are both related to vectorizing data. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the Word2Vec matrix training of Yoon with the teachings of Shmueli to arrive at the present invention, in order to use the well-known Word2Vec algorithm for producing context-based data embeddings, as stated in Yoon, paragraph 0069, “As shown in FIG, 1, the Word2Vec algorithm learns to predict the context words wj, wj+1, … , wj+n of the input word w,, which belongs to the word set w consisting of V words by using a neural-network-based model that comprises an input layer, a projection layer, and an output layer.”

Regarding claim 19:
Shmueli as modified by Wong teaches “The computer-readable storage medium of claim 17.”
Shmueli as modified by Wong does not explicitly teach “wherein the method further comprises executing the trained machine learning model on training data to generate a weight matrix for the trained machine learning model, prior to generating the vector, and the executing comprises determining the vector based on the weight matrix.”
	Yoon teaches “wherein the method further comprises executing the trained machine learning model on training data to generate a weight matrix for the trained machine learning model, prior to generating the vector, and the executing comprises determining the vector based on the weight matrix”: Yoon, paragraph 0070, “In Equation 8, Ewi is a vector corresponding to the weight matrix W matched with the input word wi in FIG. 2, and E'wj is the vector for the output word w obtained using the weight matrix W', When a corpus T is given, the Word2Vec algorithm learns [executing the trained machine learning model on training data to generate a weight matrix for the trained machine learning model, prior to generating the vector] to maximize the log probability of Equation 7 and obtains a weight matrix to be used as the latent vector for the word [determining the vector based on the weight matrix], to efficiently calculate the denominator of Equation 8, a negative sampling loss function may be used.”
Yoon and Shmueli are analogous arts as they are both related to vectorizing data. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the matrix training of Yoon with the teachings of Shmueli to arrive at the present invention, in order to use the well-known Word2Vec algorithm for producing context-based data embeddings, as stated in Yoon, paragraph 0069, “As shown in FIG, 1, the Word2Vec algorithm learns to predict the context words wj, wj+1, … , wj+n of the input word w,, which belongs to the word set w consisting of V words by using a neural-network-based model that comprises an input layer, a projection layer, and an output layer.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Heisler et al., US Pre-Grant Publication No. 2022/0292685, discloses a method including the generation of embeddings with the Word2Vec model, in which the number of dimensions of the resulting embedding is determined by user input via a graphical user interface.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT SPRAUL whose telephone number is (703) 756-1511. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VAS/               Examiner, Art Unit 2129                                                                                                                                                                                         
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129

Read full office action

Prosecution Timeline

Jun 13, 2023

Application Filed

Jun 26, 2024

Response after Non-Final Action

Mar 12, 2026

Non-Final Rejection mailed — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/163,396

Patent 12619905

FLEXIBLE EMBEDDING SYSTEMS AND METHODS FOR REAL-TIME COMPARISONS

5y 3m to grant Granted May 05, 2026

17/557,599

Patent 12608446

DETERMINING PERFORMANCE CHANGE WITHIN A DATASET WITH AN APPLIED CONDITION USING MACHINE LEARNING MODELS

4y 4m to grant Granted Apr 21, 2026

17/163,383

Patent 12591634

COMPOSITE EMBEDDING SYSTEMS AND METHODS FOR MULTI-LEVEL GRANULARITY SIMILARITY RELEVANCE SCORING

5y 2m to grant Granted Mar 31, 2026

17/249,028

Patent 12591796

INTELLIGENT DISTANCE PROMPTING

5y 1m to grant Granted Mar 31, 2026

17/353,931

Patent 12572620

RELIABLE INFERENCE OF A MACHINE LEARNING MODEL

4y 8m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

60%

Grant Probability

86%

With Interview (+26.7%)

4y 4m (~1y 4m remaining)

Median Time to Grant

Low

PTA Risk

Based on 37 resolved cases by this examiner. Grant probability derived from career allowance rate.