Last updated: April 19, 2026
Application No. 17/589,614
Machine Learning based on Post-Transaction Data

Final Rejection §103§DP
Filed
Jan 31, 2022
Examiner
PHAKOUSONH, DARAVANH
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Paypal Inc.
OA Round
2 (Final)
Interview Optional

— +100.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 2 resolved cases, 2023–2026
Examiner Intelligence

PHAKOUSONH, DARAVANH View full profile →
Grants 50% of resolved cases
Career Allow Rate
1 granted / 2 resolved
-5.0% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
31.2%
-8.8% vs TC avg
§103
38.1%
-1.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
13.2%
-26.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 2 resolved cases
Office Action

§103 §DP
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments/Amendment

	1. Applicant requests that the rejection based on non-statutory double patenting be held in abeyance. However, this argument is not persuasive. MPEP § 714.02 explains that “a request may be made that objections or requirements as to form not necessary to further consideration of the claims be held in abeyance.” The non-statutory double patenting rejection is not an objection or requirement as to form, but rather a substantive ground of rejection addressing whether the claims are patentably distinct from claims of a commonly owned patent. Accordingly, the request to hold the rejection in abeyance is not proper under MPEP 714.02, and the non-statutory double patenting rejection is maintained.
	2. Applicant’s arguments regarding the rejection under 35 U.S.C. 103 filed on December 23, 2025 are not persuasive. Applicant, on pages 10-11 of the Remarks, discusses paragraphs [0031]-[0032] of the Specification and Figures 2 and 3 to explain the meanings of the terms “pre-transaction information” and “post-transaction” information. However, this argument is not persuasive. During examination, claim terms are interpreted under the broadest reasonable interpretation consistent with the specification. The claims broadly recite information associated with a transaction that is available before the completion of the transaction (“pre-transaction information”) and information that becomes available after the completion of the transaction (“post-transaction information”). The claims do not require specific examples or attributes described in the specification, such as geofence information, internet activity information, or other particular user attributes. Accordingly, the terms are interpreted broadly as information associated with a transaction that is available before or after completion of the transaction, respectively, and the interpretation applied in the Office Action remains proper.
	Applicant further provides, on page 11 of the Remarks, a non-limiting example embodiment of dependent claim 9 from the Specification describing a weight generator module and a particular weighting equation. However, claim 9 does not recite the specific equation or implementation described in the example embodiment. Instead, the claim broadly recites generating weights such that transactions whose output predictions are further from a labeled classification receive greater weights such that transactions whose output predictions are further from a labeled classification receive greater weights. Arguments based on specific embodiments in the Specification are therefore not commensurate in scope with the claims. Accordingly, this discussion and any subsequent arguments relying on these specific embodiments do not change the prior art analysis set forth in the Office Action.
	Applicant’s argument on page 12-13 of the Remarks that the cited references do not teach or suggest electronic payment transactions are not persuasive. Applicant correctly notes that Vapnik provides an illustrative example involving medical procedures such as surgeries (Vapnik, pp. 2024-2026). However, Vapnik is not relied upon for teaching electronic payment transactions. Rather, Vapnik is relied upon for teaching a teacher-student machine learning framework (learning using privileged information) in which training examples include labeled input data together with additional “privileged information” available during training but not during testing. Vapnik explains that such privileged information may be generated or provided by an “intelligent teacher” during training, while the resulting model operates using only the standard input information at test time (Vapnik, pp. 2024-2026). 
	The surgery example cited by Applicant is merely one illustrative application of the disclosed learning framework and does not limit Vapnik’s techniques to medical data. Vapnik explicitly explains that privileged information may exist for many machine-learning problems and states that “privileged information is ubiquitous: it usually exists for almost any machine learning problem” (Vapnik, p. 2026). Vapnik further provides additional examples demonstrating broader applicability. For example, Vapnik describes predicting the direction of a currency exchange rate using information available before a given time together with additional information afterward (Vapnik, p. 2026, Example 3), illustrating that the disclosed learning techniques apply to financial or transactional prediction problems. 
Furlanello  likewise teaches a teacher-student training framework, in which a student model is trained using guidance from a previously trained teacher model, including weights training examples based on the teacher model’s outputs. Thus, Vapnik and Furlanello teach complimentary techniques in which additional information or guidance is used during training to improve the resulting model.
	The electronic payment transaction context recited in the amended claims is taught by Chari, which describes financial transaction log data including credit card transactions, point-of sale transactions, online purchase transactions, and mobile payment transactions performed by client devices (Chari, paragraph [0034]). Chari further teaches applying machine-learning classifiers for such transaction data to detect fraudulent transactions. This corresponds to the fraud-detection context described in Applicant’s own specification (Spec, [0036], [0051], [0052]). Accordingly, the cited references collectively teach applying the machine-learning techniques described in Vapnik and Furlanello to electronic payment transaction data such as that described in Chari.
	Furthermore, Vapnik teaches that training examples may include additional “privileged information” associated with completed events that becomes available after the primary event (Vapnik, p. 20206). Vapnik explains that such information may include observations occurring after the event, such as complications during surgery and the development of symptoms in the weeks following surgery. In the combined system described by the cited references, the primary event corresponds to a transaction, as Chari describes financial transaction log data including credit card transactions, point-of-sale transactions, online purchase transactions, and mobile payment transactions associated with users (Chari, paragraph [0034]). Under the broadest reasonable interpretation, Vapnik’s disclosure of information occurring after an event therefore corresponds to post-transaction information occurring after completion of a transaction. A person of ordinary skill in the art would have understood that such post-transaction information may include both subsequent transactions associated with a user as well as other user activity occurring after completion of the transaction. Chari further describes transaction log data including multiple financial transaction associated with users, which reasonably includes subsequent transactions occurring after a prior transaction. Accordingly, the cited references collectively teach or suggest post-transaction information including subsequent transactions and non-transaction activity when training a classifier on completed electronic payment transactions. Therefore, Applicant’s argument is not persuasive. 
	Applicant’s arguments on page 13 of the Remarks regarding Vapnik’s discussion of separable and non-separable cases at section 3, page 2027 are not persuasive. Applicant focuses on Vapnik’s discussion of VC theory and the rate of convergence of learning algorithms. However, the rejection does not rely on this portion of Vapnik to teach the claimed “subsequent transactions occurring subsequent to a given transaction of a user.” Rather, the rejection relies on Vapnik’s disclosure that training examples may include additional “privileged information” associated with completed events that becomes available after the primary event (Vapnik, pp. 2024-2026).
	As explained in the Office Action, Vapnik teaches a machine-learning paradigm in which training examples may include standard information together with additional privileged information available during training. Vapnik explains that, for previous patients, additional information may include procedures and complications during surgery and the development of symptoms in the weeks following surgery (Vapnik, p. 2026). These disclosures describe information associated with events that occur after completion of an initial event. Under the broadest reasonable interpretation, such later-occurring events corresponds to information associated with subsequent events following the initial event. In the context of electronic payment transactions, a person of ordinary skill in the art would have understood that analogous subsequent events may include subsequent transactions performed by the same user after a prior transaction, as well as other activity occurring after completion of an initial transaction.
	Applicant’s arguments regarding Vapnik’s discussion of separable versus non-separable cases at page 2027 therefore do not address the teachings relied upon in the rejection. The rejection does not relay on Vapnik’s VC-theory discussion to teach subsequent transactions or post-transaction information, and Applicant’s arguments directed to that portion of the reference are therefore not persuasive. 
	Applicant further argues that Furlanello does not teach post-transaction data. This argument is not persuasive because the rejection relies on the combined teachings of the cited references. Vapnik is relied upon for teaching the use of additional information associated with events occurring after an initial event during training. Furlanello is relied upon for teaching a teacher-student training framework in which a student classifier is trained using guidance derived from a previously trained teacher model, including generating weights for training examples  based on the teacher model’s outputs. Additionally, Chari teaches the electronic payment transaction context recited in the amended claims, including financial transaction log data containing credit card transactions, point-of-sale transactions, online purchase transactions, and mobile payment transactions, and further teaches applying machine-learning classifiers to detect fraudulent transactions using such transaction data. Accordingly, the cited references collectively teach or suggest training a classifier using completed transactions and information associated with subsequent transactions or other activity occurring after those transactions. Therefore, Applicant’s arguments are not persuasive. 
	Applicant’s arguments on page 15 of the Remarks that the cited reference does not teach or suggest “generating the respective weights provides greater weights for transactions whose output predictions are further from the labeled classification,” as recited in claim 9, are not persuasive. 
	Applicant argues that Furlanello weights training examples based on teacher confidence rather than whether the teacher’s prediction is further from the correct classification. However, Furlanello explicitly teaches generating weights for training examples based on the output of a previously trained teacher classifier. In particular, Furlanello explains that the training procedure may weight each training example in the student’s loss function based on the teacher model’s confidence in the prediction for that example (Furlanello, Sec. 3.2, Eq. 8). Thus, the weight assigned to each example is determined using the teacher classifier’s output associated with that example. 
	Under the broadest reasonable interpretation, the teacher model’s confidence reflects how closely the predicted output corresponds to the labeled classification. Accordingly, the confidence values disclosed by Furlanello provide a metric indicating the relationship between predicted outputs and labeled classifications. Because Furlanello generates training weights directly from the teacher classifier’s output probabilities, the reference teaches generating weights based on classifier predictions relative to labeled classifications.
	Applicant’s argument that Furlanello teaches weighting based solely on correctness rather than distance from the labeled classification is therefore not persuasive. The claim does not require a specific mathematical formulation for determining weights or require that the weighting be based on classification correctness. Rather, the claim broadly recites generating weights based on the relationship between predicted outputs and labeled classifications, which is satisfied by the weighting mechanism disclosed in Furlanello. 
	Accordingly, Furlanello teaches generating respective weights based on the teacher model’s output confidence relative to the labeled classification, which reflects how closely the prediction corresponds to the labeled classification. Under the broadest reasonable interpretation, such confidence-based weighting corresponds to generating weights based on the relationship between predicted outputs and labeled classifications as recited in claim 9. Therefore, Applicant’s argument is not persuasive.
	Applicant’s remarks have been fully considered but are not persuasive. For the reasons set forth above, the rejection of claims 2-21 under 35 U.S.C. 103 is maintained. 
	

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
	Claims 2, 10, and 16 are rejected on the ground on non-statutory double patenting as being unpatentable over claims 1, 10, and 16 of U.S. Patent No. 11,321,632 B2. Although the claims issue are not identical, they are not patentably distinct from each other because the claims from U.S. Patent No. 11,321,632 B2 anticipates the claims in the instant application. See table below. 

Instant Application
U.S. Patent No. 11,321,632 B2
2. A method, comprising
1. A method comprising
 receiving, at a trained second classifier module of a computer system, an indication of a
pending electronic payment transaction initiated by a user computing device, wherein a first classifier module was trained using first data for a set of completed electronic payment transactions as training data input, wherein the first data includes, for respective electronic payment transactions in the set of completed transactions both pre-transaction information and post-transaction information, wherein labeled classifications for transactions in the set of completed transactions are known,  wherein the post-transaction information for a first electronic payment transaction in the set of completed transactions includes information for one or more electronic payment transactions performed subsequent to a completion of the first electronic payment transaction and non-transaction activity occurring subsequent to the completion of the first electronic payment transaction for a user associated with the first electronic payment transaction,
(The difference is that the instant application adds “receiving, at a trained second classifier module,” replaces “correct classifications” with “labeled classifications,” changes “subsequent to the first transaction” to “ subsequent to a completion of the first transaction,” and deletes determiners before “pre-transaction information” and “post-transaction information.” It would have been obvious to a person of ordinary skill in the art to recite the second classifier as trained at this point because the same claim later trains the second classifier by generating weights and training on second data, so a trained second classifier is already established; the label, timing, and determiner edits are drafting clarifications without substantive effect. Hereafter, this is considered to be the explanation for the difference in the “trained second classifier module” and related phrasing. The instant claims further recite that the pending transaction is an electronic payment transaction; however, this limitation merely specifies a particular type of transaction and therefore does not render the claims patentably distinct from the transaction classification framework recited in U.S. Patent No. 11,321,632.)

 training a first classifier module using first data for a set of completed transactions as training data input, wherein the first data includes, for respective transactions in the set of completed transactions both a set of pre-transaction information and a set of post-transaction information, wherein correct classifications for transactions in the set of completed transactions are known, and wherein the set of post-transaction information for a first transaction in the set of completed
transactions includes information for one or more transactions performed subsequent to the first transaction and non-transaction activity occurring subsequent to the first transaction for a user associated with the first transaction
 wherein the trained second classifier module was trained using operations comprising:
generating respective weights for multiple transactions in the set of completed
transactions based on classification outputs of the trained first classifier module for the multiple
transactions; and

generating respective weights for multiple transactions in the set of completed transactions based on classification outputs of the trained first classifier module for the multiple transactions;
 training, based on the generated weights, a second classifier module using second
data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions;
 training, based on the generated weights, a second classifier module using second data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions;
 classifying, using the trained second classifier module of the computer system, the
pending transaction based on pre-transaction information for the pending transaction; and

 classifying, using the trained second classifier module, one or more pending transactions
based only on respective sets of pre-transaction information for the one or more pending transactions; and
the computer system generating, based on a classification output by the trained second
classifier for the pending transaction, an authorization decision for the pending transaction, wherein the authorization decision specifies an indication of approval or non-approval for the pending transaction. 
(The difference is that the instant application specifies “the computer system” as performing entity, using singular “classification output” and “pending transaction” instead of “classifications” and “one or more pending transactions,” omits “module” after “trained second classifier,” and expressly states that the authorization decision indicates approval or non-approval. It would have been obvious to state the performing entity, adopt a singular phrasing, streamline the term, and specify the binary outcome as routine drafting clarifications with no substantive change. Hereinafter, this is considered to be the explanation for “the computer system generating,” singular phrasing, omission of “module,” and the explicit approval/non-approval recital.)

generating, based on classifications output by the trained second classifier module for the one
or more pending transactions, an authorization decision for the one or more pending transactions.

10. A non-transitory computer-readable medium having stored thereon instructions
that are executable by a computer system having a processor and a memory to cause the
computer system to perform operations comprising:
(The difference is that the instant application specifies a computer system with a processor and a memory, phrases the preamble as “having stored thereon instructions…to cause the computer system to perform operations,” and reorders “instructions stored thereon,” whereas the patent recites a computing device and “to perform operations.” It would have been obvious to specify processor and memory, to use “cause the computer system to perform,” and to reorder the phrase as routine program-product drafting with no substantive effect. Hereinafter, this is considered to be the explanation for the difference in entity identification and phrasing.)
10. A non-transitory computer-readable medium having instructions stored
thereon that are executable by a computing device to perform operations comprising:
receiving, at a trained second classifier module of a computer system, an indication of a
pending electronic payment transaction initiated by a user computing device, wherein a first classifier module was trained using first data for a set of completed electronic payment transactions as training data input, wherein the first data includes, for respective electronic payment transactions in the set of completed transactions both pre-transaction information and post-transaction information, wherein labeled classifications for transactions in the set of completed transactions are known, wherein the post-transaction information for a first electronic payment transaction in the set of completed transactions includes information for one or more transactions performed subsequent to a completion of the first electronic payment transaction and non-transaction activity occurring subsequent to the completion of the first electronic payment transaction for a user associated with the first electronic payment transaction,
(The difference is that the instant application adds “receiving, at a trained second classifier module,” replaces “correct classifications” with “labeled classifications,” changes “subsequent to the first transaction” to “ subsequent to a completion of the first transaction,” and deletes determiners before “pre-transaction information” and “post-transaction information.” It would have been obvious to a person of ordinary skill in the art to recite the second classifier as trained at this point because the same claim later trains the second classifier by generating weights and training on second data, so a trained second classifier is already established; the label, timing, and determiner edits are drafting clarifications without substantive effect. Hereafter, this is considered to be the explanation for the difference in the “trained second classifier module” and related phrasing. The instant claims further recite that the pending transaction is an electronic payment transaction; however, this limitation merely specifies a particular type of transaction and therefore does not render the claims patentably distinct from the transaction classification framework recited in U.S. Patent No. 11,321,632.)

training a first classifier module using first data for a set of completed transactions as training data input, wherein the first data includes, for respective transactions in the set of completed transactions both a set of pre-transaction information and a set of post-transaction information, wherein correct classifications for transactions in the set of completed transactions are known, and wherein the set of post-transaction information for a first transaction in the set of completed
transactions includes information for one or more transactions performed subsequent to the first transaction and non-transaction activity occurring subsequent to the first transaction for a user associated with the first transaction
wherein the trained second classifier module was trained using operations comprising:
generating respective weights for multiple transactions in the set of completed
transactions based on classification outputs of the trained first classifier module for the multiple
transactions; and
generating respective weights for multiple transactions in the set of completed transactions based on classification outputs of the trained first classifier module for the multiple transactions;
training, based on the generated weights, a second classifier module using second
data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions;
training, based on the generated weights, a second classifier module using second data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions;
classifying, using the trained second classifier module of the computer system, the
pending transaction based on pre-transaction information for the pending transaction; and
classifying, using the trained second classifier module, one or more pending transactions
based only on respective sets of pre-transaction information for the one or more pending transactions; and
the computer system generating, based on a classification output by the trained second
classifier for the pending transaction, an authorization decision for the pending transaction, wherein the authorization decision specifies an indication of approval or non-approval for the pending transaction. 
(The difference is that the instant application specifies “the computer system” as performing entity, using singular “classification output” and “pending transaction” instead of “classifications” and “one or more pending transactions,” omits “module” after “trained second classifier,” and expressly states that the authorization decision indicates approval or non-approval. It would have been obvious to state the performing entity, adopt a singular phrasing, streamline the term, and specify the binary outcome as routine drafting clarifications with no substantive change. Hereinafter, this is considered to be the explanation for “the computer system generating,” singular phrasing, omission of “module,” and the explicit approval/non-approval recital.)

generating, based on classifications output by the trained second classifier module for the one
or more pending transactions, an authorization decision for the one or more pending transactions.

16. A computer system, comprising:
a processor;
a network interface; and
a non-transitory computer-readable medium having stored thereon instructions executable
by the computer system to cause the computer system to perform operations comprising:
(The difference is that the instant application recites a computer system with a processor, a network interface, and non-transitory computer-readable medium, and phrases the instructions as executable by the computer system to cause the computer system to perform operations, whereas the patent recites a system with at least one processor and a memory having instructions executable by the processor to cause the system to act. It would have been obvious to enumerate a network interface, to describe storage as a non-transitory computer-readable medium, to use singular processor phrasing, and to adopt the “cause the computer system to perform operations” wording as routine program-product drafting with no substantive change. Hereinafter, this is considered to be the explanation for the difference in component enumeration, storage terminology, processor phrasing, and “cause to perform” wording.)
16. A system, comprising:
at least one processor; and
a memory having instructions stored thereon that are executable by the at least one processor
to cause the system to:
receiving, at a trained second classifier module of a computer system, an indication of a
pending electronic payment transaction initiated by a user computing device, wherein a first classifier module was trained using first data for a set of completed electronic payment transactions as training data input, wherein the first data includes, for respective electronic payment transactions in the set of completed transactions both pre-transaction information and post-transaction information, wherein labeled classifications for transactions in the set of completed transactions are known, and wherein the post-transaction information for a first electronic payment transaction in the set of completed transactions includes information for one or more transactions performed subsequent to a completion of the first electronic payment transaction and non-transaction activity occurring subsequent to the completion of the first electronic payment transaction for a user associated with the first electronic payment transaction,
(The difference is that the instant application adds “receiving, at a trained second classifier module,” replaces “correct classifications” with “labeled classifications,” changes “subsequent to the first transaction” to “ subsequent to a completion of the first transaction,” and deletes determiners before “pre-transaction information” and “post-transaction information.” It would have been obvious to a person of ordinary skill in the art to recite the second classifier as trained at this point because the same claim later trains the second classifier by generating weights and training on second data, so a trained second classifier is already established; the label, timing, and determiner edits are drafting clarifications without substantive effect. Hereafter, this is considered to be the explanation for the difference in the “trained second classifier module” and related phrasing. The instant claims further recite that the pending transaction is an electronic payment transaction; however, this limitation merely specifies a particular type of transaction and therefore does not render the claims patentably distinct from the transaction classification framework recited in U.S. Patent No. 11,321,632.)

training a first classifier module using first data for a set of completed transactions as training data input, wherein the first data includes, for respective transactions in the set of completed transactions both a set of pre-transaction information and a set of post-transaction information, wherein correct classifications for transactions in the set of completed transactions are known, and wherein the set of post-transaction information for a first transaction in the set of completed
transactions includes information for one or more transactions performed subsequent to the first transaction and non-transaction activity occurring subsequent to the first transaction for a user associated with the first transaction
wherein the trained second classifier module was trained using operations comprising:
generating respective weights for multiple transactions in the set of completed
transactions based on classification outputs of the trained first classifier for the multiple
transactions; and
generating respective weights for multiple transactions in the set of completed transactions based on classification outputs of the trained first classifier module for the multiple transactions;
training, based on the generated weights, a second classifier module using second
data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions;
training, based on the generated weights, a second classifier module using second data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions;
classifying, using the trained second classifier module of the computer system, the
pending transaction based on pre-transaction information for the pending transaction; and
classifying, using the trained second classifier module, one or more pending transactions
based only on respective sets of pre-transaction information for the one or more pending transactions; and
the computer system generating, based on a classification output by the trained second
classifier for the pending transaction, an authorization decision for the pending transaction, wherein the authorization decision specifies an indication of approval or non-approval for the pending transaction. 
(The difference is that the instant application specifies “the computer system” as performing entity, using singular “classification output” and “pending transaction” instead of “classifications” and “one or more pending transactions,” omits “module” after “trained second classifier,” and expressly states that the authorization decision indicates approval or non-approval. It would have been obvious to state the performing entity, adopt a singular phrasing, streamline the term, and specify the binary outcome as routine drafting clarifications with no substantive change. Hereinafter, this is considered to be the explanation for “the computer system generating,” singular phrasing, omission of “module,” and the explicit approval/non-approval recital.)

generating, based on classifications output by the trained second classifier module for the one
or more pending transactions, an authorization decision for the one or more pending transactions.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 6-10, 12-16, and 18-20 are rejected under the 35 U.S.C. 103 as being unpatentable over Vapnik et al., (NPL: “Learning Using Privileged Information: Similarity Control and Knowledge Transfer” (Published: 2015) in view of Furlanello et al., (NPL: “Born-Again Neural Networks” (Published: May 2018)) further in view of Alemdar et al., (NPL: “Ternary Neural Networks for Resource-Efficient AI Applications” (Published: 2016)) further in view of Chari et al., (Pub. No.: US 20170140382 A1 (Filed: 2015)). .

Regarding claim 2, Vapnik in view of Furlanello further in view of Alemdar teaches the following limitations:
A method, comprising: receiving, at a trained second classifier module of a computer system, an indication of a pending electronic payment transaction initiated by a user computing device (Furlanello, Introduction, col. 2, second last paragraph, the paper describes a process where, after a teacher model has been fully converged, a new student model is initialized and trained. Section 3.2, col. 2, last paragraph, The student model is trained with the objective of matching the teacher’s outputs, using methods like “Confidence-Weighted by Teacher Max” (CWTM) to weigh each example in the student’s loss function based on the teacher’s confidence. Section 5.2, table 2, It establishes that the “BAN student” models are “trained” with various losses, including “teacher loss” and “label loss.” It explicitly refers to the performance of these trained models on new data by mentioning the “test error on CIFAR-100.” In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Therefore, this section describes a trained student model classifying new data. Vapnik, page 2024, section 2, the paper describes a “test stage” where a “Student” model operates “without the supervision of Teacher.” This is analogous to real-time classification of a pending transaction. Page 2026, This document also provides a conceptual example of a prediction that is made “before the surgery,” which is analogous to a transaction that has been initiated but is still pending. Chari, paragraph [0034] “client device 114 include transaction log data 116… the transaction log data may include, for example, financial transactions performed on a point-of-sale terminal, financial transactions performed on an automated teller machine, credit card account transaction logs, bank account transaction logs, online purchase transaction logs, mobile phone transaction payment logs, and the like.” – describes electronic payment transactions initiated through user computing device. [0096] “To score current financial transactions…can be defined in either an unsupervised or a supervised manner. Possible examples of supervised fraud scoring functions S( ) may include logistic regression or support vector machines. These supervised machine learning systems require a set of labeled transactions (i.e., known instances of fraudulent transactions, such as fraudulent transaction data 246 in FIG. 2) to train a classifier.” – Chari further explains that machine learning techniques maybe be used to analyze such transactions, which corresponds to a trained classifier module used to analyze financial transactions.  Alemdar, introduction, The hardware described in this paper is designed for resource-efficient AI applications on “ubiquitous computing devices such as smart phones, wearable and autonomous drones”. This shows that the trained student network could be deployed on a user computing device to perform its classification task.)
wherein a first classifier module was trained using first data (Furlanello, Introduction “after teacher model converges, we initialize a new student and train it with the dual goals of predicting the correct labels and matching the output distribution” – teacher trained first) for a set of completed electronic payment transactions as training data input (Vapnik, page 2024, section 2, “given a set of training examples…” page 2026, the paper also provides examples using “historical data” and data from “previous patients,” which is analogous to completed transactions. Chari, paragraph [0034] further describes transaction log data corresponding to electronic financial transactions including online purchase transactions and mobile phone payment transactions. Such transaction log data represents historical electronic payment transactions that may be used as training data for machine learning classifiers.), wherein the first data includes, for respective electronic payment transactions in the set of completed transactions both pre-transaction information and post-transaction information (Vapnik, page 2024, the paper provides a direct analogy by describing a model that is trained on a standard dataset x (pre-event information) and additional “privileged information” x* (post-completion information) that is only available during the first training -teacher. Pg. 2026 “we considered three examples of privileged information that could generated by Intelligent teacher.”), wherein labeled classifications for transactions in the set of completed transactions are known (Furlanello, Introduction “new student and train it with the dual goals of predicting the correct labels” – student is trained with known labels) , wherein the post-transaction information for a first electronic payment transaction in the set of completed transactions includes information for one or more electronic payment transactions performed subsequent to a completion of the first electronic payment transaction and non-transaction activity occurring subsequent to the completion of the first electronic payment transaction for a user associated with the first transaction (Vapnik, page 2026, “we use pairs (xi, yi) from previous patients. However, for previous patients there is also additional information x* about procedures and complications during surgery, development of symptoms in one or two weeks after surgery, and so on.” – The “privilege information” x* is analogous to “post-transaction information” which happens subsequent to a completion of the procedure (analogous to transaction). Chari, paragraph [0034] further describes transaction log data corresponding to electronic financial transactions including online purchase transactions and mobile phone payment transactions. Such transaction log data represents historical electronic payment transactions that may be used as training data for machine learning classifiers.).
wherein the trained second classifier module was trained using operations comprising: generating respective weights for multiple transactions in the set of completed transactions based on classification outputs of the trained first classifier module for the multiple transactions; (Furlanello, Introduction “A dark knowledge term, containing the information on the wrong outputs, and a ground-truth component which corresponds to a simple rescaling of the original gradient that would be obtained using real labels. We interpret the second term as training from the real labels using importance weights and for each sample based on the teacher’s (first classifier) confidence in its maximum value. Experiments investigating the importance of each term are aimed at quantifying the contribution of dark knowledge to the success” Section 3.2 describes the methodology used to calculate the weight) and
training, based on the generated weights, a second classifier module using second data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions (Furlanello, page 4 , section 3.2 “we weight each example in the student’s loss function…by the confidence of the teacher model on that example” – second classifier is trained on generated weights. Page 2, “labels using importance weights for each sample based on the teacher’s confidence in its maximum value.” – per-sample weights from teacher outputs. Introduction, “after the teacher model converges, we initialize a new student and train it…matching the output distribution  of the teacher” – confirms training a second classifier. Vapnik, page 2024, “The existing machine learning paradigm considers a simple scheme: given a set of training examples” – training a completed set; and page 2026 “Let our goal be to predict the direction of the exchange rate of a currency at the moment t.” – pre-event information.);
classifying, using the trained second classifier module of the computer system, the pending transaction based on pre-transaction information for the pending transaction (Furlanello, Introduction, col. 2, second last paragraph: The paper describes a process where, after a teacher model has been fully trained and converged, a new student model is initialized and trained. This establishes the “trained second classifier.” Section 5.2, table 2: This section gives empirical evidence that the trained student model is used to classify new inputs by reporting “test error rates” on new data. In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Vapnik, page 2024, section 2: The paper provides the conceptual framework that “privileged information is available only at the training stage…and is not available at the test stage (when a Student operates without supervision of Teacher)” This confirms that the student’s classification must be based on the information available at the time of the event, analogous to pre-transaction data. Page 2026: This section gives a general example of a classification decision, referring to the goal to “classify biopsy images…into two categories.” Alemdar, page 6-7: This paper describes a hardware architecture designed for a computer system with a processor. This hardware can be implemented on platforms such as “Field-Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) technologies” and can be coupled with “general-purpose multi-core processors.”) and
the computer system generating, based on a classification output by the trained second classifier for the pending transaction, an authorization decision for the pending transaction, wherein the authorization decision specifies an indication of approval or non-approval for the pending transaction (Alemdar, page 6-7: This paper describes a hardware architecture designed for a computer system with a processor. This hardware can be implemented on platforms such as “Field-Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) technologies” and can be coupled with “general-purpose multi-core processors.”). Furlanello, Introduction, col. 2, second last paragraph: The paper describes a process where “after a teacher model converges, we initialize a new student and train it.” This establishes the “trained second classifier module.” Section 5.2, table 2: This section gives empirical evidence that the trained student model is used to classify new inputs by reporting “test error rates” on new data. In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Vapnik, page 2026: The paper provides a conceptual example of a prediction that is made “before the surgery,” which is analogous to a transaction that has been initiated but still pending. It also provides a direct analogy for a classification decision that specifies an indication of approval or non-approval by referring to the goal to “classify biopsy images…into two categories…cancer…non-cancer.” This is a binary classification is analogous to approval or non-approval outcomes.)
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Chari, and Alemdar before them, to implement a teacher-student workflow for analyzing electronic payment transactions on conventional machine learning computing hardware (processor, memory, program storage) as exemplified by Alemdar. In such a system, a teach model may be trained on completed labeled examples that include pre-event information together with post-completion information (per Vapnik), the teacher’s outputs may be used to compute per-example importance weights, and a student model may then be trained on pre-event information using those weights (per Furlanello), with the student classifier operating on incoming records using pre-event information. Chari further teaches electronic financial transactions performed via client devices and electronic transaction channels (e.g., online purchases and mobile payment transactions) and describes the use of machine learning classifiers trained using labeled transactions for detecting fraudulent transactions. One would have been motivated to apply the improved machine learning training techniques of Vapnik and Furlanello within the electronic transaction analysis framework described by Chari, and implement the resulting trained classifier on computing hardware as described by Alemdar, in order to make effective use of information from completed events while keeping the operational classification path simple, thereby improving the accuracy, calibration, and stability of classification outcomes for electronic payment transactions.
Regarding claim 6, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented for claim 2. Vapnik in view of Furlanello further in view of Alemdar further teaches:
wherein the pending transaction is an electronic purchase transaction initiated by the user computing device via a software program executing on the user computer device (Alemdar, page 1, Abstract: The article discusses A tasks being deployed on “ubiquitous computing devices such as smart phones, wearables and autonomous drones.” This supports the concept of a user computing device. The article also refers to “AI Applications,” which is analogous to a software program executing on a device. Vapnik, page 2026: Example 3 provides a conceptual example of a financial-related event by discussing how to predict “the direction of the exchange rate of a currency”. This is an analogy for an electronic purchase transaction.)
Regarding claim 7, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented for claim 2. Furlanello further teaches:
wherein the training based on the generated weights includes one or more of: performing a greater number of training iterations for a second transaction than for a third transaction based on the second transaction having a greater weight than the third transaction; or performing a greater training adjustment for a second transaction than for the third transaction based on the second transaction having a greater weight than the third transaction (Furlanello, page 4 section 3.2: This section describes a weighting mechanism based on a teacher model’s outputs. It notes “knowledge distillation might resemble importance-weighting where the weight corresponds to the teacher’s confidence in the correct prediction.” The training process can be interpreted as using this importance weighting for each sample to perform a greater training adjustment for a sample with a greater weight. For example, the gradient from the correct choice is “rescaled by a factor p*,s”. This rescaling factor, which represents the teacher’s confidence, functions as an importance weight and is the mechanism for making a greater training adjustment. This principle of giving more importance to certain samples can also be applied by performing a greater number of training iterations for a sample with a greater weight.).
Regarding claim 8, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented for claim 2. Vapnik in view of Furlanello further teaches:
wherein the generating the respective weights is based on relationships between labeled classifications and output predictions generated by the trained first classifier module for the transactions (Furlanello, Introduction, The paper states that the student is trained with “dual goals of predicting the correct labels and matching the output distribution of the teacher.” The “output distribution of the teacher” is the output prediction generated by the first classifier module. The paper also states that “we interpret the second term as training from the real labels using importance weights for each sample based on the teacher’s confidence in its maximum value.” The teacher’s confidence is the output prediction that is used to generate the weights. Vapnik, page 2024, section 2: The paper states that “Intelligence Teacher provides Student with information that contains, along with classification of each example, additional privileged information.” The “classification of each example” is the output prediction generated by the teacher model.).
Regarding claim 9, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented for claim 2. Furlanello further teaches:
wherein the generating the respective weights provides greater weights for transactions whose output predictions are further from a labeled classification (Furlanello, page 4, section 3.2: This section describes a weighting mechanism based on a teacher model’s outputs. It notes that “knowledge distillation might resemble importance-weighting where the weight corresponds to the teacher’s confidence in the correct prediction.” This establishes a relationship where the weight (rescaling factor) is determined by the output prediction (confidence) of the teacher model for a given sample. The paper states that “samples with lower confidence have their gradients rescaled by a factor p*, s and have reduced contribution to overall training signal.” This demonstrates how the weight influences the training process.).
Regarding claim 10, Vapnik in view of Furlanello further in view of Alemdar teaches the following limitations:
A non-transitory computer-readable medium having stored thereon instructions that are executable by a computer system having a processor and a memory to cause the computer system to perform operations comprising (Alemdar, pages 6-7, Fig. 3, The paper refers to “reprogrammable off-the-shelf circuits” like FPGAs and also notes that each layer of the hardware design contains a “memory that can be programmed at run-time with neuron weights or output ternarization thresholds.” This “memory” stores the data and instructions that cause the system to perform its operations, such as processing “one new input item per clock cycle.”).
receiving, at a trained second classifier module of the computer system, an indication of a pending electronic payment transaction initiated by a user computing device (Furlanello, Introduction “after teacher model converges, we initialize a new student and train it..” and “Test error on CIFAR-100 for …students trained from…teachers” – established a trained student used at inference; Vapnik, page 2024 “privileged information is available only at the training stage…and is not available at the test stage (the Student operates without supervision of Teacher)” – student runs at test time. It would be inherent that a trained second classifier module is executed by a computer system and necessarily receives an input record to classify. Chari, paragraph [0034] “client device 114 include transaction log data 116… the transaction log data may include, for example, financial transactions performed on a point-of-sale terminal, financial transactions performed on an automated teller machine, credit card account transaction logs, bank account transaction logs, online purchase transaction logs, mobile phone transaction payment logs, and the like.” – describes electronic payment transactions initiated through user computing device. [0096] “To score current financial transactions…can be defined in either an unsupervised or a supervised manner. Possible examples of supervised fraud scoring functions S( ) may include logistic regression or support vector machines. These supervised machine learning systems require a set of labeled transactions (i.e., known instances of fraudulent transactions, such as fraudulent transaction data 246 in FIG. 2) to train a classifier.” – Chari further explains that machine learning techniques maybe be used to analyze such transactions, which corresponds to a trained classifier module used to analyze financial transactions.),
wherein a first classifier module was trained using first data (Furlanello, Introduction “after teacher model converges, we initialize a new student and train it with the dual goals of predicting the correct labels and matching the output distribution” – teacher trained first) for a set of completed electronic payment transactions as training data input (Vapnik, page 2024, section 2, “given a set of training examples…” page 2026, the paper also provides examples using “historical data” and data from “previous patients,” which is analogous to completed transactions. Chari, paragraph [0034] further describes transaction log data corresponding to electronic financial transactions including online purchase transactions and mobile phone payment transactions. Such transaction log data represents historical electronic payment transactions that may be used as training data for machine learning classifiers.), wherein the first data includes, for respective electronic payment transactions in the set of completed transactions both pre-transaction information and post-transaction information (Vapnik, page 2024, the paper provides a direct analogy by describing a model that is trained on a standard dataset x (pre-information data) and additional “privileged information” x* (post-information) that is only available during the first training -teacher. Pg. 2026 “we considered three examples of privileged information that could generated by Intelligent teacher.”), wherein labeled classifications for transactions in the set of completed transactions are known (Furlanello, Introduction “new student and train it with the dual goals of predicting the correct labels” – student is trained with known labels) , wherein the post-transaction information for a first electronic payment transaction in the set of completed transactions includes information for one or more transactions performed subsequent to a completion of the first electronic payment transaction and non-transaction activity occurring subsequent to the completion of the first electronic payment transaction for a user associated with the first electronic payment transaction (Vapnik, page 2026, “we use pairs (xi, yi) from previous patients. However, for previous patients there is also additional information x* about procedures and complications during surgery, development of symptoms in one or two weeks after surgery, and so on.” – The “privilege information” x* is analogous to “post-transaction information” which happens subsequent to a completion of the procedure (analogous to transaction). Chari, paragraph [0034] further describes transaction log data corresponding to electronic financial transactions including online purchase transactions and mobile phone payment transactions. Such transaction log data represents historical electronic payment transactions that may be used as training data for machine learning classifiers.).
wherein the trained second classifier module was trained using operations comprising: generating respective weights for multiple transactions in the set of completed transactions based on classification outputs of the trained first classifier module for the multiple transactions; (Furlanello, Introduction “A dark knowledge term, containing the information on the wrong outputs, and a ground-truth component which corresponds to a simple rescaling of the original gradient that would be obtained using real labels. We interpret the second term as training from the real labels using importance weights and for each sample based on the teacher’s (first classifier) confidence in its maximum value. Experiments investigating the importance of each term are aimed at quantifying the contribution of dark knowledge to the success” Section 3.2 describes the methodology used to calculate the weight) and
training, based on the generated weights, a second classifier module using second data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions (Furlanello, page 4 , section 3.2 “we weight each example in the student’s loss function…by the confidence of the teacher model on that example” – second classifier is trained on generated weights. Page 2, “labels using importance weights for each sample based on the teacher’s confidence in its maximum value.” – per-sample weights from teacher outputs. Introduction, “after the teacher model converges, we initialize a new student and train it…matching the output distribution  of the teacher” – confirms training a second classifier. Vapnik, page 2024, “The existing machine learning paradigm considers a simple scheme: given a set of training examples” – training a completed set; and page 2026 “Let our goal be to predict the direction of the exchange rate of a currency at the moment t.” – pre-event information.);
classifying, using the trained second classifier module of the computer system, the pending transaction based on pre-transaction information for the pending transaction (Furlanello, Introduction, col. 2, second last paragraph: The paper describes a process where, after a teacher model has been fully trained and converged, a new student model is initialized and trained. This establishes the “trained second classifier.” Section 5.2, table 2: This section gives empirical evidence that the trained student model is used to classify new inputs by reporting “test error rates” on new data. In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Vapnik, page 2024, section 2: The paper provides the conceptual framework that “privileged information is available only at the training stage…and is not available at the test stage (when a Student operates without supervision of Teacher)” This confirms that the student’s classification must be based on the information available at the time of the event, analogous to pre-transaction data. Page 2026: This section gives a general example of a classification decision, referring to the goal to “classify biopsy images…into two categories.” Alemdar, page 6-7: This paper describes a hardware architecture designed for a computer system with a processor. This hardware can be implemented on platforms such as “Field-Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) technologies” and can be coupled with “general-purpose multi-core processors.”) and
the computer system generating, based on a classification output by the trained second classifier for the pending transaction, an authorization decision for the pending transaction, wherein the authorization decision specifies an indication of approval or non-approval for the pending transaction (Alemdar, page 6-7: This paper describes a hardware architecture designed for a computer system with a processor. This hardware can be implemented on platforms such as “Field-Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) technologies” and can be coupled with “general-purpose multi-core processors.”). Furlanello, Introduction, col. 2, second last paragraph: The paper describes a process where “after a teacher model converges, we initialize a new student and train it.” This establishes the “trained second classifier module.” Section 5.2, table 2: This section gives empirical evidence that the trained student model is used to classify new inputs by reporting “test error rates” on new data. In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Vapnik, page 2026: The paper provides a conceptual example of a prediction that is made “before the surgery,” which is analogous to a transaction that has been initiated but still pending. It also provides a direct analogy for a classification decision that specifies an indication of approval or non-approval by referring to the goal to “classify biopsy images…into two categories…cancer…non-cancer.” This is a binary classification is analogous to approval or non-approval outcomes.)
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Chari, and Alemdar before them, to implement a teacher-student workflow for analyzing electronic payment transactions on conventional machine learning computing hardware (processor, memory, program storage) as exemplified by Alemdar. In such a system, a teach model may be trained on completed labeled examples that include pre-event information together with post-completion information (per Vapnik), the teacher’s outputs may be used to compute per-example importance weights, and a student model may then be trained on pre-event information using those weights (per Furlanello), with the student classifier operating on incoming records using pre-event information. Chari further teaches electronic financial transactions performed via client devices and electronic transaction channels (e.g., online purchases and mobile payment transactions) and describes the use of machine learning classifiers trained using labeled transactions for detecting fraudulent transactions. One would have been motivated to apply the improved machine learning training techniques of Vapnik and Furlanello within the electronic transaction analysis framework described by Chari, and implement the resulting trained classifier on computing hardware as described by Alemdar, in order to make effective use of information from completed events while keeping the operational classification path simple, thereby improving the accuracy, calibration, and stability of classification outcomes for electronic payment transactions. 
Regarding claim 12,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 10, therefore is rejected for the same reasons as those presented in claim 10.  The claim recites similar limitations corresponding to claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Regarding claim 13,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 10, therefore is rejected for the same reasons as those presented in claim 10.  The claim recites similar limitations corresponding to claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Regarding claim 14,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 10, therefore is rejected for the same reasons as those presented in claim 10.  The claim recites similar limitations corresponding to claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Regarding claim 15,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 10, therefore is rejected for the same reasons as those presented in claim 10.  The claim recites similar limitations corresponding to claim 9 and is rejected for similar reasons as claim 9 using similar teachings and rationale.
Regarding claim 16, Vapnik in view of Furlanello further in view of Alemdar teaches the following limitations:
A computer system, comprising: a processor (Alemdar, page 6-7, Fig. 3, the paper discusses the use of a hardware architecture that can be implemented on specialized chips like an ASIC, and also mentions that “high-performance cloud solutions use high-end FPGA tightly coupled with general-purpose multi-core processors.”); 
a network interface (Alemdar, Abstract, The document describes the system being used for “ubiquitous computing devices such as smart phones” and “cloud solutions,” both of which rely on network connectivity, thus implying a network interface.); and
 a non-transitory computer-readable medium having stored thereon instructions executable by the computer system to cause the computer system to perform operations comprising (Alemdar, pages 6-7, Fig. 3, The paper refers to “reprogrammable off-the-shelf circuits” like FPGAs and also notes that each layer of the hardware design contains a “memory that can be programmed at run-time with neuron weights or output ternarization thresholds.” This “memory” stores the data and instructions that cause the system to perform its operations, such as processing “one new input item per clock cycle.”).
receiving, at a trained second classifier module of the computer system, an indication of a pending electronic payment transaction initiated by a user computing device (Furlanello, Introduction “after teacher model converges, we initialize a new student and train it..” and “Test error on CIFAR-100 for …students trained from…teachers” – established a trained student used at inference; Vapnik, page 2024 “privileged information is available only at the training stage…and is not available at the test stage (the Student operates without supervision of Teacher)” – student runs at test time. It would be inherent that a trained second classifier module is executed by a computer system and necessarily receives an input record to classify. Chari, paragraph [0034] “client device 114 include transaction log data 116… the transaction log data may include, for example, financial transactions performed on a point-of-sale terminal, financial transactions performed on an automated teller machine, credit card account transaction logs, bank account transaction logs, online purchase transaction logs, mobile phone transaction payment logs, and the like.” – describes electronic payment transactions initiated through user computing device. [0096] “To score current financial transactions…can be defined in either an unsupervised or a supervised manner. Possible examples of supervised fraud scoring functions S( ) may include logistic regression or support vector machines. These supervised machine learning systems require a set of labeled transactions (i.e., known instances of fraudulent transactions, such as fraudulent transaction data 246 in FIG. 2) to train a classifier.” – Chari further explains that machine learning techniques maybe be used to analyze such transactions, which corresponds to a trained classifier module used to analyze financial transactions.),
wherein a first classifier module was trained using first data (Furlanello, Introduction “after teacher model converges, we initialize a new student and train it with the dual goals of predicting the correct labels and matching the output distribution” – teacher trained first) for a set of completed electronic payment transactions as training data input (Vapnik, page 2024, section 2, “given a set of training examples…” page 2026, the paper also provides examples using “historical data” and data from “previous patients,” which is analogous to completed transactions. Chari, paragraph [0034] further describes transaction log data corresponding to electronic financial transactions including online purchase transactions and mobile phone payment transactions. Such transaction log data represents historical electronic payment transactions that may be used as training data for machine learning classifiers.), wherein the first data includes, for respective electronic payment transactions in the set of completed transactions both pre-transaction information and post-transaction information (Vapnik, page 2024, the paper provides a direct analogy by describing a model that is trained on a standard dataset x (pre-information data) and additional “privileged information” x* (post-information) that is only available during the first training -teacher. Pg. 2026 “we considered three examples of privileged information that could generated by Intelligent teacher.”), wherein labeled classifications for transactions in the set of completed transactions are known (Furlanello, Introduction “new student and train it with the dual goals of predicting the correct labels” – student is trained with known labels) , wherein the post-transaction information for a first electronic payment transaction in the set of completed transactions includes information for one or more transactions performed subsequent to a completion of the first electronic payment transaction and non-transaction activity occurring subsequent to the completion of the first electronic payment transaction for a user associated with the first electronic payment transaction (Vapnik, page 2026, “we use pairs (xi, yi) from previous patients. However, for previous patients there is also additional information x* about procedures and complications during surgery, development of symptoms in one or two weeks after surgery, and so on.” – The “privilege information” x* is analogous to “post-transaction information” which happens subsequent to a completion of the procedure (analogous to transaction). Chari, paragraph [0034] further describes transaction log data corresponding to electronic financial transactions including online purchase transactions and mobile phone payment transactions. Such transaction log data represents historical electronic payment transactions that may be used as training data for machine learning classifiers.).
wherein the trained second classifier module was trained using operations comprising: generating respective weights for multiple transactions in the set of completed transactions based on classification outputs of the trained first classifier for the multiple transactions; (Furlanello, Introduction “A dark knowledge term, containing the information on the wrong outputs, and a ground-truth component which corresponds to a simple rescaling of the original gradient that would be obtained using real labels. We interpret the second term as training from the real labels using importance weights and for each sample based on the teacher’s (first classifier) confidence in its maximum value. Experiments investigating the importance of each term are aimed at quantifying the contribution of dark knowledge to the success” Section 3.2 describes the methodology used to calculate the weight) and
training, based on the generated weights, a second classifier module using second data for the set of completed transactions as training data input, wherein the second data for the set of completed transactions includes pre-transaction information for transactions in the set of completed transactions (Furlanello, page 4 , section 3.2 “we weight each example in the student’s loss function…by the confidence of the teacher model on that example” – second classifier is trained on generated weights. Page 2, “labels using importance weights for each sample based on the teacher’s confidence in its maximum value.” – per-sample weights from teacher outputs. Introduction, “after the teacher model converges, we initialize a new student and train it…matching the output distribution  of the teacher” – confirms training a second classifier. Vapnik, page 2024, “The existing machine learning paradigm considers a simple scheme: given a set of training examples” – training a completed set; and page 2026 “Let our goal be to predict the direction of the exchange rate of a currency at the moment t.” – pre-event information.);
classifying, using the trained second classifier module of the computer system, the pending transaction based on pre-transaction information for the pending transaction (Furlanello, Introduction, col. 2, second last paragraph: The paper describes a process where, after a teacher model has been fully trained and converged, a new student model is initialized and trained. This establishes the “trained second classifier.” Section 5.2, table 2: This section gives empirical evidence that the trained student model is used to classify new inputs by reporting “test error rates” on new data. In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Vapnik, page 2024, section 2: The paper provides the conceptual framework that “privileged information is available only at the training stage…and is not available at the test stage (when a Student operates without supervision of Teacher)” This confirms that the student’s classification must be based on the information available at the time of the event, analogous to pre-transaction data. Page 2026: This section gives a general example of a classification decision, referring to the goal to “classify biopsy images…into two categories.” Alemdar, page 6-7: This paper describes a hardware architecture designed for a computer system with a processor. This hardware can be implemented on platforms such as “Field-Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) technologies” and can be coupled with “general-purpose multi-core processors.”) and
the computer system generating, based on a classification output by the trained second classifier for the pending transaction, an authorization decision for the pending transaction, wherein the authorization decision specifies an indication of approval or non-approval for the pending transaction (Alemdar, page 6-7: This paper describes a hardware architecture designed for a computer system with a processor. This hardware can be implemented on platforms such as “Field-Programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) technologies” and can be coupled with “general-purpose multi-core processors.”). Furlanello, Introduction, col. 2, second last paragraph: The paper describes a process where “after a teacher model converges, we initialize a new student and train it.” This establishes the “trained second classifier module.” Section 5.2, table 2: This section gives empirical evidence that the trained student model is used to classify new inputs by reporting “test error rates” on new data. In machine learning, “test data” is, by definition, new, unseen data used to evaluate a trained model’s ability to classify. Vapnik, page 2026: The paper provides a conceptual example of a prediction that is made “before the surgery,” which is analogous to a transaction that has been initiated but still pending. It also provides a direct analogy for a classification decision that specifies an indication of approval or non-approval by referring to the goal to “classify biopsy images…into two categories…cancer…non-cancer.” This is a binary classification is analogous to approval or non-approval outcomes.)
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Chari, and Alemdar before them, to implement a teacher-student workflow for analyzing electronic payment transactions on conventional machine learning computing hardware (processor, memory, program storage) as exemplified by Alemdar. In such a system, a teach model may be trained on completed labeled examples that include pre-event information together with post-completion information (per Vapnik), the teacher’s outputs may be used to compute per-example importance weights, and a student model may then be trained on pre-event information using those weights (per Furlanello), with the student classifier operating on incoming records using pre-event information. Chari further teaches electronic financial transactions performed via client devices and electronic transaction channels (e.g., online purchases and mobile payment transactions) and describes the use of machine learning classifiers trained using labeled transactions for detecting fraudulent transactions. One would have been motivated to apply the improved machine learning training techniques of Vapnik and Furlanello within the electronic transaction analysis framework described by Chari, and implement the resulting trained classifier on computing hardware as described by Alemdar, in order to make effective use of information from completed events while keeping the operational classification path simple, thereby improving the accuracy, calibration, and stability of classification outcomes for electronic payment transactions.
Regarding claim 18,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 16, therefore is rejected for the same reasons as those presented in claim 16.  The claim recites similar limitations corresponding to claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Regarding claim 19,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 16, therefore is rejected for the same reasons as those presented in claim 16.  The claim recites similar limitations corresponding to claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Regarding claim 20,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 16, therefore is rejected for the same reasons as those presented in claim 16.  The claim recites similar limitations corresponding to claim 9 and is rejected for similar reasons as claim 9 using similar teachings and rationale.

Claim 3 is rejected under the 35 U.S.C. 103 as being unpatentable over Vapnik et al., (NPL: “Learning Using Privileged Information: Similarity Control and Knowledge Transfer” (Published: 2015) in view of Furlanello et al., (NPL: “Born-Again Neural Networks” (Published: May 2018)) further in view of Alemdar et al., (NPL: “Ternary Neural Networks for Resource-Efficient AI Applications” (Published: 2016)) further in view of Ding et al., (Pub. No.: US 20140324699 A1 (Filed: 2014)).

Regarding claim 3, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented in claim 2. However, Vapnik in view of Furlanello further in view of Alemdar do not teach, but Ding does teach the limitation:
wherein the post-transaction information includes location data for a plurality of user devices (Ding [0058] “The authorization request message may also include other information such as information that identifies the access device that generated the authorization request message, information about the location of the access device, etc.”).
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Alemdar, and Ding before them, to incorporate location data for a plurality of users in implementing a teacher-student workflow. One would have been motivated to do so because multi-user location provides strong , routinely collected contextual signals – supporting cohort baselines and peer-group comparisons, spatiotemporal consistency checks (e.g., impossible to travel), detection of coordinated/co-located patterns, and broader coverage through cross-user corroboration – thereby improving the accuracy, calibration, and stability of operational classification outcomes. 

Claim 4 is rejected under the 35 U.S.C. 103 as being unpatentable over Vapnik et al., (NPL: “Learning Using Privileged Information: Similarity Control and Knowledge Transfer” (Published: 2015) in view of Furlanello et al., (NPL: “Born-Again Neural Networks” (Published: May 2018)) further in view of Alemdar et al., (NPL: “Ternary Neural Networks for Resource-Efficient AI Applications” (Published: 2016)) further in view of Wilson et al., (Pub. No.: US 20170140262 A1 (Filed: 2017)).

Regarding claim 4, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented in claim 2. However, Vapnik in view of Furlanello further in view of Alemdar do not teach, but Wilson does teach the limitation:
wherein the post-transaction information includes Internet browsing history data associated with a plurality of user devices (Wilson, paragraph [0065] “An individual's web browsing history or ant trail can also provide insight into affinity for certain venues, as discerned from cookies or the various reviews an individual generates across multiple forums, including but not limited to websites associated with each venue. An individual's website navigation bookmarks and browsing history also reflect browsing behavior and may likewise be mined for source data. The geographic position of an individual over time, such as derived from cellular GPS data, can likewise be correlated with venues and thereby generate data reflective of venue affinity.”). 
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Alemdar, and Wilson before them, to specify that the post-completion information includes internet browsing history data associated with a plurality of user devices. One would have been motivated to make such a combination to understand behavioral patterns of users for more accurate predictions. 

Claims 5, 11, and 21 are rejected under the 35 U.S.C. 103 as being unpatentable over Vapnik et al., (NPL: “Learning Using Privileged Information: Similarity Control and Knowledge Transfer” (Published: 2015) in view of Furlanello et al., (NPL: “Born-Again Neural Networks” (Published: May 2018)) further in view of Alemdar et al., (NPL: “Ternary Neural Networks for Resource-Efficient AI Applications” (Published: 2016)) further in view of De Zeeuw et al., (Pub. No.: US 20130138427 A1 (Filed: 2011)).

Regarding claim 5, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented in claim 2. However, Vapnik in view of Furlanello further in view of Alemdar do not teach, but De Zeeuw does teach the limitation:

wherein the post-transaction information includes textual data input by a plurality of users associated with the post-transaction information (Abstract “a method executed by at least one processor includes receiving text from submitted by a user. The method also includes determining a text score for the received text by comparing a first set of phrases included in the received text to a second set of phrases. The second set of phrases includes phrases from stored text. The stored text includes stored text known to be genuine and stored text known to be fraudulent. The method also includes determining that the received text is fraudulent based on the text score.”).
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Alemdar, and De Zeeuw before them, to incorporate textual data input by a plurality of users into post-completion information when implementing a teacher-student workflow. One would have been motivated to do so to capture user-reported context and intent across users for better predictions. 
Regarding claim 11,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 10, therefore is rejected for the same reasons as those presented in claim 10.  The claim recites similar limitations corresponding to claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.
Regarding claim 21,  Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 16, therefore is rejected for the same reasons as those presented in claim 16.  The claim recites similar limitations corresponding to claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.

Claim 17 is rejected under the 35 U.S.C. 103 as being unpatentable over Vapnik et al., (NPL: “Learning Using Privileged Information: Similarity Control and Knowledge Transfer” (Published: 2015) in view of Furlanello et al., (NPL: “Born-Again Neural Networks” (Published: May 2018)) further in view of Alemdar et al., (NPL: “Ternary Neural Networks for Resource-Efficient AI Applications” (Published: 2016)) further in view of Liu et al., (Pub. No.: US 20180158552 A1 (Filed: 2017)).

Regarding claim 17, Vapnik in view of Furlanello further in view of Alemdar teaches all the elements of claim 16, therefore is rejected for the same reasons as those presented in claim 16. However, Vapnik in view of Furlanello further in view of Alemdar do not teach, but De Zeeuw does teach the limitation:
wherein the training the second classifier module based on the generated weights is performed using gradient boosting tree machine learning techniques (paragraph [0052] In some embodiments, the learning processor may also output the interpretable machine learning model. For example, if the interpretable machine learning model is a gradient boosting tree model then the learning processor may output the various decision trees and weights of the gradient boosting tree mimic model.).
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Vapnik, Furlanello, Alemdar, and Liu before them, to implement training the second classifier module, based on generated weights, using gradient boosting decision tree techniques. One would have been motivated to make such a combination because gradient boosting trees accept per-example weights directly, capture nonlinear feature interactions in tabular transaction data without heavy feature engineering, address class imbalance via weighting, and deploy efficiently within the computing architecture – thereby improving the accuracy, calibration, and stability of operational classification outcomes. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daravanh Phakousonh whose telephone number is (571)272-6324. The examiner can normally be reached Mon - Thurs 7 AM - 5 PM, Every other Friday 7 AM - 4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Daravanh Phakousonh/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Jan 31, 2022
Application Filed
Feb 08, 2022
Response after Non-Final Action
Sep 05, 2025
Non-Final Rejection — §103, §DP
Nov 29, 2025
Interview Requested
Dec 10, 2025
Applicant Interview (Telephonic)
Dec 10, 2025
Examiner Interview Summary
Dec 23, 2025
Response Filed
Mar 05, 2026
Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/670,194
Patent 12572821
ACCURACY PRIOR AND DIVERSITY PRIOR BASED FUTURE PREDICTION
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
50%
Grant Probability
99%
With Interview (+100.0%)
4y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 2 resolved cases by this examiner. Grant probability derived from career allow rate.