Last updated: April 19, 2026

Application No. 17/810,123

INTEGRATED SYNTHETIC LABELING OPTIMIZATION FOR MACHINE LEARNING

Final Rejection §103

Filed

Jun 30, 2022

Examiner

MEIS, JON CHRISTOPHER

Art Unit

2654

Tech Center

2600 — Communications

Assignee

Paypal Inc.

OA Round

4 (Final)

Interview Optional

— +59.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 22 resolved cases, 2023–2026

Examiner Intelligence

MEIS, JON CHRISTOPHER View full profile →

Grants 46% of resolved cases

Career Allow Rate

10 granted / 22 resolved

-16.5% vs TC avg

Strong +59% interview lift

Without

With

+59.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

30 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

24.9%

-15.1% vs TC avg

§103

49.7%

+9.7% vs TC avg

§102

12.9%

-27.1% vs TC avg

§112

10.6%

-29.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 22 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Claims 1-9, 12-16, and 21-26 are pending. Claims 1, 12, and 21 are independent. 
This Application was published as US 20240005099.
Apparent priority is 30 June 2022.
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in
view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.

Response to Arguments
35 USC 103
Applicant’s first arguments with respect to claim(s) 12 (pg. 10-11) have been considered but are not persuasive. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Applicant argues that “Zinkevich does not teach or suggest to execute a pair of jointly optimized label and supervised models in unison to classify a new data sample, much less that this execution includes generating synthetic labels using a label model and inputting the synthetic labels into a supervised model that corresponds to the label model.”
As argued in previous office actions, Suri discloses executing a pair of jointly optimized label and supervised models in unison to classify a new data sample. (Regarding classifying new data samples, see for example, Suri pg. 8, Section 6. Specifically: “We sample live traffic after this time to generate unlabeled data independent of previously labeled image data, ensuring no train-test leakage.”) Suri discloses generating synthetic labels using a label model, but Suri does not explicitly disclose inputting the synthetic labels into a supervised model that corresponds to the label model. However, Zinkevich discloses inputting a heuristic into a model as a feature. (“Rule #7: Turn heuristics into features, or handle them externally.” Zinkevich pg. 6.) As noted by Applicant, Zinkevich defines feature as: “Feature: A property of an instance used in a prediction task. For example, a web page might have a feature "contains the word 'cat'". (pg. 2) Using a heuristic as a feature is the same as inputting the result of the heuristic into a model.
Zinkevich does not explicitly disclose that heuristics are used to generate labels. However, Suri does disclose this. (“Rule-Based Services. Teams develop heuristics and rules to make manually collecting, analyzing and labeling data more efficient.” Suri pg. 5, section 3.1.1) The instant application spec also describes synthetic labels as being generated by heuristic rules: “[0016] …The synthetic labels are generated using heuristic rules which are derived from domain knowledge…”
Using the result of a heuristic as a synthetic label and also using the result of a heuristic as a feature to be input to the model is equivalent to using the synthetic label as a feature. Therefore, the combination of Suri and Zinkevich does teach both execute a pair of jointly optimized label and supervised models in unison to classify a new data sample, and generating synthetic labels using a label model and inputting the synthetic labels into a supervised model that corresponds to the label model. 
Therefore, the rejection is maintained. Kanter is also cited in the independent claims regarding selecting the best performing trained model, but this limitation is not specifically argued in the most recent communication.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-3, 6-9, 12-13, 15-16, 21, and 24-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Suri et al. (“Leveraging Organizational Resources to Adapt Models to New Data Modalities”), in view of Kanter et al. (US 20220351004 A1) and Zinkevich (“Rules of Machine Learning: Best Practices for ML Engineering”).

Regarding claim 1, Suri discloses: 1. A method for jointly optimizing a desired label model and a corresponding desired supervised model for a classification problem, comprising:
generating, by a computer system based on different subsets of a set of rules, (Figure 6 – Subsets of features A, B, C, and D (A, AB, ABC, ABCD) are shown. Figures 1 and 2 show that the features contain rules.)
 respective sets of synthetic labels for unlabeled data for the classification problem, ("We automatically generate labels for the new, unlabeled data modality to develop a training dataset." Pg. 4, Section 2.4.2)  wherein a given one of the respective sets of synthetic labels is produced by a corresponding one of a plurality of different label models; ("Factor analysis for CT 1 demonstrating the increase in AUPRC when adding additional sets of features (A,B,C,D)” Figure 6 - The combinations of feature sets generated by different services can be understood as label models, as seen in Fig. 6.)
	fitting, by the computer system, a set of supervised models, a given one of the set of supervised models being fitted with one of the respective sets of synthetic labels to produce a respective set of predictions; ("We train a model using both the weakly supervised data in the new modality and fully supervised data of existing modalities." Pg. 4, Section 2.4.2)
evaluating, by the computer system, the set of supervised models based on their respective set of predictions ("We evaluate four types of services used to generate feature sets: URL-based, keyword-based, topic-model-based, page content-based, labeled as sets A, B, C, and D, which provide us with 3, 2, 5, and 5 features, respectively" Pg. 8, Section 6.2; Figure 6) 
and using a set of labeled data for the classification problem; ("We compute the area under the precision-recall curve (AUPRC) over the labeled image test set to evaluate our pipeline." Pg. 9, Section 6.3)
selecting, by the computer system based on the evaluating, a particular label model from the plurality of different label models and a corresponding particular supervised model from the set of supervised models (Fig. 6 shows the feature sets (rule models) correspond to each model for training)
and classifying, using both the particular label model and the corresponding particular supervised model, a newly received data sample, (Table 1 shows some of the data points are reserved as a test set, which would be a newly received sample.)
wherein the classifying includes: inputting the newly received data sample into the particular label model; and inputting the newly received data sample and one or more synthetic labels generated by the particular label model into the corresponding particular supervised model. (“We sample live traffic after this time to generate unlabeled data independent of previously labeled image data, ensuring no train-test leakage.” Pg. 8, Section 6.1. – performing test would be putting newly received samples into the supervised model.) (Suri does not explicitly disclose that the synthetic label is input into the supervised model for classification.)
Suri does not disclose: selecting, by the computer system based on the evaluating, a particular label model from the plurality of different label models and a corresponding particular supervised model from the set of supervised models. Suri also does not disclose classifying a new sample using both the label model and the supervised model.
Kanter discloses: selecting, by the computer system based on the evaluating, a particular label model from the plurality of different label models and a corresponding particular supervised model from the set of supervised models. (Kanter: "The ranking module 350 selects one of the trained models based on the ranking, e.g., the training model having the best performance." [0063] – the supervised models taught by Suri correspond to the feature sets ( label models), and are only identified based on the feature sets. Therefore, using Kanter’s method for selecting one of the supervised models taught by Suri would include selecting the corresponding feature set, because they are inseparably linked.)
Suri and Kanter are considered analogous art to the claimed invention because they disclose methods of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Suri to include a ranking module that selects a model based on ranking as taught by Kanter. Doing so would have been beneficial in order to select the model with the best performance. (Kanter [0063])
Kanter also does not disclose classifying a new sample using both the label model and the supervised model.
Zinkevich discloses: classifying, using both the particular label model and the corresponding particular supervised model, (“Create a feature. Directly creating a feature from the heuristic is great. For example, if you use a heuristic to compute a relevance score for a query result, you can include the score as the value of a feature.” Pg. 7, para 3 – creating a feature from the heuristic rule set would mean that the label model (heuristic rule set) is used on the input sample, and the output of the label model is fed into the supervised model.)
wherein the classifying includes: inputting the newly received data sample into the particular label model; and (“Directly creating a feature from the heuristic is great.” Pg. 7, para 3 – a heuristic rule set is equivalent to a label model.)
inputting the newly received data sample and one or more synthetic labels generated by the particular label model into the corresponding particular supervised model. (“Directly creating a feature from the heuristic is great.” Pg. 7, para 3 – creating a feature from the heuristic rule set would mean that the output of the label model (heuristic rule set) is fed into the supervised model. See also “Feature: A property of an instance used in a prediction task…” pg. 2)
Suri, Kanter, and Zinkevich are considered analogous art to the claimed invention because they disclose methods of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Suri in view of Kanter to include the heuristic rule set as a feature. Doing so would have been beneficial in order to utilize the intuition about the system rather than discard it. (Zinkevich pg. 7, para 1)

Regarding claim 2, Suri discloses: 2. The method of claim 1, wherein the classification problem is a text-classification problem. (Pg. 8, Section 6.1 - Suri teaches entity classification including text and images.)
	Suri does not explicitly teach the above method of claim 1, wherein the text data is unlabeled. (Pg. 3, paragraph 4, Suri discloses that the text data points are previously hand labeled.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate labels for unlabeled text data. Suri discloses generating additional labels for previously labeled text data (Figure 3 – the figure shows that organizational resources are applied to the text data to produce new labels with classifiers, rules etc. Figure 6 – shows that combinations of the sets of features (labels) are trained with the text data.) It would have been obvious to generate the labels for unlabeled text data if previously labeled data was not available. 

Regarding claim 3, Suri discloses: The method of claim 2, wherein each supervised model is also fitted using a set of general features. ("a baseline fully supervised model trained using pre-trained image embeddings." Figure 6 — The feature sets are added to this pretrained model.)

Regarding claim 6 Suri discloses: 6. The method of claim 1, further comprising: utilizing the corresponding particular supervised model to evaluate the classification problem for subsequently generated data samples. (Pg. 1, section 1 “The moderators must thus classify new video posts for the same violations as the text and image posts.” – the new video posts are subsequently generated data samples, which the model is trained for in this example.) 

Regarding claim 7, Suri discloses: The method of claim 1, wherein the evaluating includes evaluating the respective set of predictions ("We evaluate four types of services used to generate feature sets: URL-based, keyword-based, topic-model-based, page content-based, labeled as sets A, B, C, and D, which provide us with 3, 2, 5, and 5 features, respectively" Pg. 8, Section 6.2; Figure 6) according to the set of labeled data. ("We compute the area under the precision-recall curve (AUPRC) over the labeled image test set to evaluate our pipeline." Pg. 9, Section 6.3)

Regarding claim 8, Suri discloses: 8. The method of claim 1, further comprising: determining whether a number of samples included in the set of labeled data meets a sample training threshold. (“Training Data Curation: use weak supervision with label propagation. Given the common feature space, one approach to cross-modal adaptation is to train a model with labeled data from existing modalities using just the features shared between modalities. We can then perform inference over the new data modality using the shared features. We find that this baseline performs worse than training a model on the target modality with respect to AUPRC, likely due to distribution differences in the feature spaces. Thus, we still require labeled training data in the new modality.” Pg. 2, para 4 - Worse performance with the limited labels compared to another model would be understood as a threshold.)

Regarding claim 9, Suri discloses: 9. The method of claim 1, and wherein a size of the set of labeled data is sufficient to evaluate, but not train, the set of supervised models based on the set of labeled data including a threshold number of samples. (“Training Data Curation: use weak supervision with label propagation. Given the common feature space, one approach to cross-modal adaptation is to train a model with labeled data from existing modalities using just the features shared between modalities. We can then perform inference over the new data modality using the shared features. We find that this baseline performs worse than training a model on the target modality with respect to AUPRC, likely due to distribution differences in the feature spaces. Thus, we still require labeled training data in the new modality.” Pg. 2, para 4 - Suri discloses that the data in the scenario is insufficient to train the model (the model performs worse) but sufficient to evaluate (it is evaluated with respect to AUPRC). Worse performance with the limited labels compared to another model would be understood as a threshold.)

Claim 12 is a non-transitory computer-readable medium claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally, “A non-transitory computer-readable medium having instructions stored thereon that are executable by a computer system” of the Claim are taught by Suri (Pg. 9, Section 6.3 – The experimental setup mentions several software components [Snorkel Drybell, Expander, TFX] which intrinsically require a computer readable medium.).

Regarding claim 13, Suri further discloses: The computer-readable medium of claim 12, wherein the different rule sets are different subsets (Figure 6 – Subsets of features A, B, C, and D (A, AB, ABC, ABCD) are shown. Figures 1 and 2 show that the features contain rules.) of a plurality of domain-specific heuristics for the classification problem. ("Teams develop heuristics and rules" Pg. 5, Section 3.1.1; “Domain experts typically construct LFs using task expertise.” Pg. 2, Section 1 – Suri teaches that heuristics are developed by domain experts so it would be obvious that they are domain specific)

Claim 15 is a non-transitory computer-readable medium claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.  

Claim 16 is a non-transitory computer-readable medium claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.  

Claim 21 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally, “at least one processor; and a memory having instructions stored thereon that are executable by the at least one processor to cause the system to perform the method of claim 1” of the Claim are taught by Suri (Pg. 9, Section 6.3 – The experimental setup mentions several software components [Snorkel Drybell, Expander, TFX] which intrinsically require a processor and a memory.)

Claim 24 is a non-transitory computer-readable medium claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.  

Claim 25 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.  

Claim 26 is a non-transitory computer-readable medium claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.  

Claim(s) 4, 14, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Suri in view of Kanter and Zinkevich, in further view of Yang (US 20200142948 A1).

Regarding claim 4, Suri discloses: The method of claim 3.
Suri does not disclose: wherein the set of general features specifies values for a word embedding vector. Neither does Kanter or Zinkevich.
Yang discloses: wherein the set of general features specifies values for a word embedding vector ("Word embedding layers are generally used to map words to numeric vectors, and the use of word embedding layers with vectors representing words is well-known in the art." [0054])
Suri, Kanter, Zinkevich and Yang are considered analogous art to the claimed invention because they disclose methods of machine learning for natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Suri in view of Kanter and Zinkevich with the teaching of Yang to use a word embedding vector. This would have been a known method with predictable results.

Regarding claim 14, Suri discloses: The computer-readable medium of claim 12, wherein the classification problem is a text-classification problem, (Pg. 8, Section 6.1 - Suri teaches entity classification including text and images.)
and wherein each of the plurality of supervised models is also fitted using a set of vector values for terms in a word embedding space. 
Suri does not disclose: word embedding vectors. Neither does Kanter or Zinkevich.
Yang discloses: using a set of vector values for terms in a word embedding space. ("Word embedding layers are generally used to map words to numeric vectors, and the use of word embedding layers with vectors representing words is well-known in the art." [0054])
Suri, Kanter, Zinkevich and Yang are considered analogous art to the claimed invention because they disclose methods of machine learning for natural language processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Suri in view of Kanter and Zinkevich with the teaching of Yang to use a word embedding vector. This would have been a known method with predictable results.

Claim 22 is a system claim with limitations corresponding to the limitations of Claim 14 and is rejected under similar rationale.  

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Suri in view of Kanter and Zinkevich, in further view of D'Ercoli et al. (US 20200189103 A1).

Regarding claim 5, Suri discloses: The method of claim 2.
Suri does not disclose: wherein the text-classification problem is sentiment analysis. Neither does Kanter or Zinkevich.
D'Ercoli discloses: wherein the text-classification problem is sentiment analysis. ("The Naïve Bayes classifier machine learning algorithm, which has been studied since the 1950s, is a well-known tool and often used in text classification, spam filtering, and sentiment analysis domains" [0070])
Suri, Kanter, Zinkevich, and D’Ercoli are considered analogous art to the claimed invention because they disclose methods of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Suri in view of Kanter and Zinkevich to use a machine learning algorithm for a text sentiment classification problem as taught by D’Ercoli. Using a machine learning algorithm for sentiment analysis would have been a known method with predictable results.

Claim(s) 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Suri in view of Kanter, Zinkevich, and Yang, in further view of D’Ercoli.

Regarding claim 23, Suri discloses: The system of claim 22.
Suri does not disclose: wherein the text-classification problem is sentiment analysis. Neither does Kanter, Zinkevich, or Yang.
D'Ercoli discloses: wherein the text-classification problem is sentiment analysis. ("The Naïve Bayes classifier machine learning algorithm, which has been studied since the 1950s, is a well-known tool and often used in text classification, spam filtering, and sentiment analysis domains" [0070])
Suri, Kanter, Zinkevich, Yang, and D’Ercoli are considered analogous art to the claimed invention because they disclose methods of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Suri in view of Kanter, Zinkevich, and Yang to use a machine learning algorithm for a text sentiment classification problem as taught by D’Ercoli. Using a machine learning algorithm for sentiment analysis would have been a known method with predictable results.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JON C MEIS whose telephone number is (703)756-1566. The examiner can normally be reached Monday - Thursday, 8:30 am - 5:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached on 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JON CHRISTOPHER MEIS/Examiner, Art Unit 2654                  

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

Jun 30, 2022

Application Filed

Sep 12, 2024

Non-Final Rejection — §103

Nov 18, 2024

Interview Requested

Dec 04, 2024

Applicant Interview (Telephonic)

Dec 04, 2024

Examiner Interview Summary

Dec 17, 2024

Response Filed

Feb 27, 2025

Final Rejection — §103

Apr 16, 2025

Interview Requested

Apr 22, 2025

Examiner Interview Summary

Apr 22, 2025

Applicant Interview (Telephonic)

May 05, 2025

Request for Continued Examination

May 07, 2025

Response after Non-Final Action

May 14, 2025

Non-Final Rejection — §103

Jul 25, 2025

Interview Requested

Aug 08, 2025

Applicant Interview (Telephonic)

Aug 08, 2025

Examiner Interview Summary

Aug 19, 2025

Response Filed

Sep 18, 2025

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/881,473

Patent 12603087

VOICE RECOGNITION USING ACCELEROMETERS FOR SENSING BONE CONDUCTION

2y 5m to grant Granted Apr 14, 2026

18/303,296

Patent 12579975

Detecting Unintended Memorization in Language-Model-Fused ASR Systems

2y 5m to grant Granted Mar 17, 2026

17/979,989

Patent 12482487

MULTI-SCALE SPEAKER DIARIZATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

2y 5m to grant Granted Nov 25, 2025

18/020,514

Patent 12475312

FOREIGN LANGUAGE PHRASES LEARNING SYSTEM BASED ON BASIC SENTENCE PATTERN UNIT DECOMPOSITION

2y 5m to grant Granted Nov 18, 2025

18/065,374

Patent 12430329

TRANSFORMING NATURAL LANGUAGE TO STRUCTURED QUERY LANGUAGE BASED ON MULTI-TASK LEARNING AND JOINT TRAINING

2y 5m to grant Granted Sep 30, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

46%

Grant Probability

99%

With Interview (+59.0%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.

INTEGRATED SYNTHETIC LABELING OPTIMIZATION FOR MACHINE LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email