Office Action Analysis: 17952577 — METHODS AND SYSTEMS FOR AUTOMATED CREATION OF ANNOTATED DATA AND TRAINING OF A MACHINE LEARNING MODEL THEREFROM

Examiner Intelligence

PHAM, JESSICA THUY View full profile →
Grants only 33% of cases
Career Allow Rate
1 granted / 3 resolved
-21.7% vs TC avg
Minimal -33% lift
Without
With
+-33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
38 currently pending
Career history
41
Total Applications
across all art units
Statute-Specific Performance

§101
26.8%
-13.2% vs TC avg
§103
35.5%
-4.5% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
22.7%
-17.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims/Response to Amendment
Claims 1, 6, 8, 13, and 15 were amended.
Claims 1-20 are pending and examined herein.
Claims 1-20 are rejected under 35 U.S.C. 112(b).
Claims 1-20 are rejected under 35 U.S.C. 101.
Claims 1-20 are rejected under 35 U.S.C. 103.

Response to Arguments
Applicant’s arguments, see page 8, filed 10/29/2025, with respect to the 35 U.S.C. 112(b) of claims 1-20 have been fully considered and are persuasive.  The 35 U.S.C. 112(b) of claims 1-20 has been withdrawn. 

Applicant's arguments filed 10/29/2025 regarding the 35 U.S.C. 101 rejection of claims 1-20 have been fully considered but they are not persuasive. Each amended limitation is either an abstract idea or an additional element which, taken alone or in combination with the other additional elements, does not integrate the judicial exception into a practical application nor amounts to significantly more than the judicial exception. See amended 35 U.S.C. 101 rejection below for further explanation.

	Applicant’s arguments, see page 9-11, filed 10/29/2025 with respect to the rejection(s) of claim(s) 1, 3-6, 8, 10-12, 15, and 17-19 under 35 U.S.C. 102 and claims 2, 7, 9, 13, 16, and 20 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Zhang (“PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning”, March 2022) and Li (“Weakly Supervised Named Entity Tagging with Learned Logical Rules”, 2021).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject
matter. The analysis of claims 1-20, in accordance with these steps, follows.

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine,
manufacture, or composition of matter. Claims 1-7 are directed to a process, claims 8-14 are directed to a machine, and claims 15-20 are directed to a manufacture. All claims are directed to statutory categories and analysis proceeds.

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
	None of the claims represent an improvement to technology.
	
Regarding claim 1, the following claim elements are abstract ideas:
generating, … a first label for each of the unlabeled training data; (Generating labels for data can be practically performed in the human mind. This is a mental process.)
generating first labeled training data based on an identified one of the first labels having a score exceeding a first range and first unlabeled training data of the unlabeled training data associated with the identified first label; (Generating a labeled dataset based on a threshold and unlabeled data can be practically performed in the human mind. This is a mental process.)
generating, … a second label for each of the unlabeled training data; (Generating labels for data can be practically performed in the human mind. This is a mental process.)
generating second labeled training data based on an identified one of the second labels having a score exceeding a second range and second unlabeled training data of the unlabeled training data associated with the identified second label; (Generating a labeled dataset based on a threshold and unlabeled data can be practically performed in the human mind. This is a mental process.)
generating, …, a first labeling rule for each of the first labeled training data; (Generating labeling rules can be practically performed in the human mind. This is a human mind.)
determining a score of each first labeling rule based on, at least, a number of spans extracted by each first labeling rule; (Determining a score can be practically performed in the human mind. This is a mental process.)
identifying a first labeling rule having a score that is greater than a third range; and (Identifying a rule can be practically performed in the human mind. This is a mental process.)
generating at least one seed labeling rule based on the identified first labeling rule. (Generating labeling rules can be practically performed in the human mind. This is a human mind.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	A method for training a machine learning model using augmented training data, the method comprising: (This recites generic machine learning training. This amounts to mere instructions to apply an exception. See MPEP § 2106.05(f).)
receiving a plurality of unlabeled training data, at least one seed labeling rule, and at least one seed label; (Receiving data is the existing process of data transmission, which amounts to mere instructions to apply an exception.)
	training a named entity recognizer model using the unlabeled training data and the at least one seed labeling rule; (This recites generic machine learning training and components. This amounts to mere instructions to apply an exception.)
	by the named entity recognizer model, (This recites a generic machine learning component. This amounts to mere instructions to apply an exception.)
	training a meta-learning model using the unlabeled training data and at least one seed label; (This recites generic machine learning training and components. This amounts to mere instructions to apply an exception.)
	by the meta-learning model, (This recites a generic machine learning component. This amounts to mere instructions to apply an exception.)
training the named entity recognizer model using the first labeled training data and the second labeled training data; (This recites generic machine learning training and components. This amounts to mere instructions to apply an exception.)
by the named entity recognizer model (This recites a generic machine learning component. This amounts to mere instructions to apply an exception.)

	Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea:
further comprising generating the first labels … and generating the second labels by the meta-learning model in parallel. (One could practically in the human mind, generate labels in parallel (i.e. generate a label for one dataset and then generate a label for another dataset). This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
by the named entity recognizer model … by the meta-learning model (This recites generic machine learning components. This amounts to mere instructions to apply an exception.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea:
wherein conflicting labels are resolved based on voting, wherein a label losing the vote is modified in favor of a label that won the vote. (Voting in machine learning is an algorithm wherein one model is chosen based on the algorithm. This is a mathematical concept.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea:
wherein the first range and second range correspond to a respective percentage range. (From claim 1, the ranges are as a threshold for choosing labels. This represents a mathematical relationship and is a mathematical concept.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea:
wherein the first range and second range correspond to one or more respective predetermined values that indicate a limit. (From claim 1, the ranges are as a threshold for choosing labels. This represents a mathematical relationship and is a mathematical concept.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Further, the following are abstract ideas:
wherein the first labeling rule is generated using one or more rule templates that include at least one simple rule with at least one predicate. (One could practically in the human mind generate rules using rule templates. This is a mental process.)

Regarding claim 7, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein the meta-learning model includes a ProtoBERT model. (This recites generic machine learning components. This is mere instructions to apply an exception.)

Regarding claim 8, the following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A system for training a machine learning model using augmented training data, the system comprising: a processor; and a memory including instructions that, when executed by the processor, cause the processor to: (This recites generic computer components and processes and generic machine learning components and processes. This is mere instructions to apply an exception.)
The remainder of claim 8 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 9-14 recite substantially similar subject matter to claims 2-7 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 15, the following is an abstract idea:
generate at least one seed labeling rule based on an identified first labeling rule having a score determined based on a number of spans extracted by the identified first labeling rule and that is greater than a score threshold, wherein the first labeling rule is generated based on the first labeled training data using one or more rule templates that include at least one simple rule with at least one predicate. (Generating a labelling rule based on another labelling rule can be practically performed in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
An apparatus for generating data augmentation labeling rules, the apparatus comprising: a processor; and a memory including instructions that, when executed by the processor, cause the processor to: (This recites generic computer components and processes and generic machine learning components and processes. This is mere instructions to apply an exception.)
The remainder of claim 15 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 16-20 recite substantially similar subject matter to claims 2-5 and 7 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 3-6, 8, 10-13, 15, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (“PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning”, March 2022) and Li (“Weakly Supervised Named Entity Tagging with Learned Logical Rules”, 2021).
Li (“Weakly Supervised Named Entity Tagging with Learned Logical Rules”, 2021) was made available by Applicant via IDS.

Regarding claim 1, Zhang teaches
	A method for training a machine learning model using augmented training data, the method comprising: (Page 3 states "Weakly-supervised learning (WSL) creates weak labels for model training by applying labeling rules over unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                    . Given an unlabeled instance                         
                            x
                            ∈
                            
                                    D
                                
                                    u
                                
                    , a labeling rule                         
                            r
                            
                                    ⋅
                                
                     maps                         
                            x
                        
                     into an extended label space:                         
                            r
                            
                                    x
                                
                            →
                            y
                            ∈
                            Y
                            ∪
                            {
                            0
                            }
                        
                    . Here                         
                            Y
                        
                     is the original label set for the task, and 0 is a special label indicating                         
                            x
                        
                     is unmatchable by                         
                            r
                        
                    . Given a set                         
                            R
                        
                     of labeling rules, we can apply each rule in                         
                            R
                        
                     on unlabeled instances to create a weakly labeled dataset                         
                            
                                    D
                                
                                    l
                                
                                    '
                                
                    ." The weakly labeled dataset is interpreted as the augmented training data.)
receiving a plurality of unlabeled training data, at least one seed labeling rule, and at least one seed label; (Page 3 states “Given an unlabeled instance                         
                            x
                            ∈
                            
                                    D
                                
                                    u
                                
                    , a labeling rule                         
                            r
                            
                                    ⋅
                                
                     maps                         
                            x
                        
                     into an extended label space:                         
                            r
                            
                                    x
                                
                            →
                            y
                            ∈
                            Y
                            ∪
                            {
                            0
                            }
                        
                    . Here                         
                            Y
                        
                     is the original label set for the task, and 0 is a special label indicating                         
                            x
                        
                     is unmatchable by                         
                            r
                        
                    . Given a set                         
                            R
                        
                     of labeling rules, we can apply each rule in                         
                            R
                        
                     on unlabeled instances to create a weakly labeled dataset                         
                            
                                    D
                                
                                    l
                                
                                    '
                                
                    .” The set                         
                            R
                        
                     is interpreted as the seed labelling rules and the unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                     is interpreted as the plurality of unlabeled training data. As they are used, they must have been received. Page 3 states "Besides                         
                            
                                    D
                                
                                    u
                                
                     and                         
                            
                                    D
                                
                                    l
                                
                                    '
                                
                    , we also assume access to a small set of clean labels                         
                            
                                    D
                                
                                    l
                                
                    ,                         
                            (
                            
                                            D
                                        
                                            l
                                        
                            ≪
                            
                                            D
                                        
                                            u
                                        
                            )
                        
                    , and the task is to iteratively find a set of new rules for model improvement." The small set of clean labels is interpreted as the seed labels. As they are used, they must have been received.)
	 training a named entity recognizer model using the unlabeled training data and the at least one seed labeling rule; (Page 3 states "It takes as input the candidate rules proposed by the previous component, and asks humans to select the high quality ones. Then the human-selected rules                         
                            
                                    R
                                
                                    t
                                
                     are used to generate weak labels for the unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                     in a soft-matching way." Page 3 further states "We train a new weak model                         
                            
                                    m
                                
                                    t
                                    +
                                    1
                                
                     on the updated weakly labeled dataset                         
                            
                                    D
                                
                                    r
                                
                    . Then we self-train the weak model                         
                            
                                    m
                                
                                    t
                                    +
                                    1
                                
                     and integrate it into the ensemble model." As the seed labeling rules are used in combination with the unlabeled training data to generate the weakly labeled dataset, and the weakly labeled dataset is used for training the model, the training uses the seed labeling rules and the unlabeled training data is used for training the model ensemble. Table 1 shows that the rules generated by the model ensemble are used for recognizing entities, and therefore, the model ensemble is a named entity recognizer. Note that the model ensemble at each time step is interpreted as the named entity recognizer.)
	 generating, by the named entity recognizer model, a first label for each of the unlabeled training data; (Page 3 states "This component proposes candidate rules to be evaluated by human annotators. Using the small labeled dataset                         
                            
                                    D
                                
                                    l
                                
                    , it measures the weakness of the current model by identifying large-error instances on                         
                            
                                    D
                                
                                    l
                                
                    , and proposes rules based on these instances using PLM prompting." Page 3 further states "This component collects human feedback to improve the weak supervision quality. It takes as input the candidate rules proposed by the previous component, and asks humans to select the high quality ones. Then the human-selected rules                         
                            
                                    R
                                
                                    t
                                
                     are used to generate weak labels for the unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                     in a soft-matching way.”)
	 generating first labeled training data based on an identified one of the first labels having a score exceeding a first range and first unlabeled training data of the unlabeled training data associated with the identified first label; (Page 6 states "The instance                         
                            
                                    x
                                
                                    u
                                
                     is matched by the rule                         
                            
                                    r
                                
                                    j
                                
                     if                         
                            
                                    s
                                
                                    j
                                
                     is higher than the matching threshold                         
                            σ
                        
                     obtained on the development set." The threshold is interpreted as the first range. The instance                         
                            
                                    x
                                
                                    u
                                
                     is the unlabeled training data. Matching the rule to the instance is generating labeled training data. They are associated with each other as they have a matching score.)
	 training a meta-learning model using the unlabeled training data and at least one seed label; ((Page 3 states "It takes as input the candidate rules proposed by the previous component, and asks humans to select the high quality ones. Then the human-selected rules                         
                            
                                    R
                                
                                    t
                                
                     are used to generate weak labels for the unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                     in a soft-matching way." Page 3 further states "We train a new weak model                         
                            
                                    m
                                
                                    t
                                    +
                                    1
                                
                     on the updated weakly labeled dataset                         
                            
                                    D
                                
                                    r
                                
                    . Then we self-train the weak model                         
                            
                                    m
                                
                                    t
                                    +
                                    1
                                
                     and integrate it into the ensemble model." As the seed labeling rules are used in combination with the unlabeled training data to generate the weakly labeled dataset, and the weakly labeled dataset is used for training the model, the training uses the seed labeling rules and the unlabeled training data is used for training the model ensemble. Page 6 states "In iteration                         
                            t
                        
                    , with the new rule-matched data                        
                             
                                    D
                                
                                    r
                                
                    , we obtain an enlarged weakly labeled dataset                        
                             
                                    D
                                
                                    t
                                
                            =
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            ∪
                            
                                    D
                                
                                    r
                                
                    . We fit a weak model                         
                            
                                    m
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                    ". The model at                         
                            t
                            +
                            2
                        
                     is interpreted as the meta-learning model. Note that ‘meta-learning’ is being interpreted broadly, as the term is not defined. This model is interpreted as a meta-learning model as it learns a task.)
	 generating, by the meta-learning model, a second label for each of the unlabeled training data; (Page 3 states "This component proposes candidate rules to be evaluated by human annotators. Using the small labeled dataset                         
                            
                                    D
                                
                                    l
                                
                    , it measures the weakness of the current model by identifying large-error instances on                         
                            
                                    D
                                
                                    l
                                
                    , and proposes rules based on these instances using PLM prompting." Page 3 further states "This component collects human feedback to improve the weak supervision quality. It takes as input the candidate rules proposed by the previous component, and asks humans to select the high quality ones. Then the human-selected rules                         
                            
                                    R
                                
                                    t
                                
                     are used to generate weak labels for the unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                     in a soft-matching way.” The labels produced by the model at                         
                            t
                            +
                            2
                        
                     is interpreted as the second labels.)
	 generating second labeled training data based on an identified one of the second labels having a score exceeding a first range and second unlabeled training data of the unlabeled training data associated with the identified second label; (See 112(b) rejection for interpretation. Page 6 states "The instance                         
                            
                                    x
                                
                                    u
                                
                     is matched by the rule                         
                            
                                    r
                                
                                    j
                                
                     if                         
                            
                                    s
                                
                                    j
                                
                     is higher than the matching threshold                         
                            σ
                        
                     obtained on the development set." The threshold used for the rules generated by the model at                         
                            t
                            +
                            2
                        
                     is interpreted as the second range. The instance                         
                            
                                    x
                                
                                    u
                                
                     is the unlabeled training data. Matching the rule to the instance is generated labeled training data. They are associated with each other as they have a matching score. The labels produced by the model at                         
                            t
                            +
                            2
                        
                     is interpreted as the second labels.)
training the named entity recognizer model using the first labeled training data and the second labeled training data; (Page 5 states "When a rule is accepted (                        
                            
                                    d
                                
                                    j
                                
                            =
                            1
                        
                    ), it will be incorporated into the accepted rule set                         
                            
                                    R
                                
                                    +
                                
                     for later weak label generation." Page 6 states "In iteration                         
                            t
                        
                    , with the new rule-matched data                         
                            
                                    D
                                
                                    r
                                
                    , we obtain an enlarged weakly labeled dataset                         
                            
                                    D
                                
                                    t
                                
                            =
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            ∪
                            
                                    D
                                
                                    r
                                
                    . We fit a weak model                        
                            
                                    m
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                     by optimizing:". Therefore, at the time step after the meta-learning model (the model at                         
                            t
                            +
                            2
                        
                    ), another model would be trained using the first and second labeled training data. Page 6 states "Finally, we incorporate the self-trained weak model into the ensemble model. The final model is a weighted ensemble of the weak models". As the named entity recognizer model (the ensemble) would include the new model, the named entity recognizer model is trained using the first and second labeled training data.)
generating, by the named entity recognizer model, a first labeling rule for each of the first labeled training data; (Page 3 states "This strategy iteratively checks feature regimes in which the current model mt is weak, and proposes candidate rules from such regimes." Page 5 states "Given a large-error instance                         
                            
                                    x
                                
                                            e
                                        
                                            i
                                        
                            ∈
                            
                                    X
                                
                                    e
                                
                    , we first convert it into a prompt by                         
                            
                                    x
                                
                                            p
                                        
                                            i
                                        
                            =
                            τ
                            
                                            x
                                        
                                                    e
                                                
                                                    i
                                                
                    . Such a prompt consists of the key components of the original input and a [MASK] token. By inheriting the original input, we construct context for the token to be predicted by a pre-trained [MASK] LM                         
                            M
                        
                    ." Page 5 further states "We collect the top-k predictions with highest                         
                            p
                            (
                            MASK
                            =
                            
                                    v
                                
                                ^
                            
                            |
                            
                                    x
                                
                                            p
                                        
                                            i
                                        
                            )
                        
                     to form the candidate rules." Therefore, the named entity recognizer model generates labeling rules. Examples of the rules are available on Page 5, Table 1, which shows that the rules are labeling rules.)
generating at least one seed labeling rule based on the identified first labeling rule. (Page 5 states "As the candidate rules                         
                            
                                    R
                                
                                    t
                                
                     can be still noisy, PRBOOST thus presents                         
                            
                                    R
                                
                                    t
                                
                    to humans for selecting high-quality rules. Specifically, for each candidate rule                         
                            
                                    r
                                
                                    j
                                
                            ∈
                            
                                    R
                                
                                    t
                                
                    , we present it along with its prompt template                         
                            
                                    x
                                
                                            p
                                        
                                            j
                                        
                     to human experts, then they judge whether the rule                         
                            
                                    r
                                
                                    j
                                
                    should be accepted or not. Formally,                         
                            
                                    r
                                
                                    j
                                
                     is associated with a label                         
                            
                                    d
                                
                                    j
                                
                            ∈
                            {
                            1,0
                            }
                        
                    . When a rule is accepted (                        
                            
                                    d
                                
                                    j
                                
                            =
                             
                            1
                        
                    ), it will be incorporated into the accepted rule set                         
                            
                                    R
                                
                                    +
                                
                     for later weak label generation." The accepted rule set is interpreted as the seed labeling rule.)
Zhang does not appear to explicitly teach
determining a score of each first labeling rule based on, at least, a number of spans extracted by each first labeling rule;
identifying a first labeling rule having a score that is greater than a third range; and
However, Li—directed to analogous art—teaches
determining a score of each first labeling rule based on, at least, a number of spans extracted by each first labeling rule; (Page 5 states “"We select new rules from rule candidates based on their confidence scores. We adopt the RlogF method (Thelen and Riloff, 2002) to compute the confidence score of a rule                         
                            r
                        
                    :                         
                            F
                            
                                    r
                                
                            =
                            
                                            F
                                        
                                            i
                                        
                                            N
                                        
                                            i
                                        
                                            log
                                        
                                            2
                                        
                                ⁡
                                
                                    (
                                    
                                            F
                                        
                                            i
                                        
                                    )
                                
                     93) where                         
                            
                                    F
                                
                                    i
                                
                     is the number of spans predicted with category label                         
                            i
                        
                     and matched by rule                         
                            r
                        
                    , and                         
                            
                                    N
                                
                                    i
                                
                     is the total number of spans matched by rule                         
                            r
                        
                    ."                         
                            
                                    N
                                
                                    i
                                
                     is interpreted as the number of spans extracted by each labelling rule, and                         
                            F
                            
                                    r
                                
                     is interpreted as the score.)
identifying a first labeling rule having a score that is greater than a third range; and (Page 6 further states, in relation to the score of the rules, "In our experiments, we select the top K rules for each entity class per iteration." The range is interpreted as the lowest score of the top-K rules, as above that, the labeling rule is selected/identified.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Zhang and Li because as Li states on page 6, "This method allows a variety of logical rules to be considered, yet is precise enough that all logical rules are strongly associated with the category."

Regarding claim 3, the rejection of claim 1 is incorporated herein. Zhang teaches
wherein conflicting labels are resolved based on voting, wherein a label losing the vote is modified in favor of a label that won the vote. (Page 7 states "Each rule is annotated by three humans, and the annotated rule labels are majority-voted for later weak label generation." One of ordinary skill in the art would realize that majority-voting means that a label losing the vote is modified in favor of a label that won the vote.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Zhang teaches
wherein the first range and second range correspond to a respective percentage range. (Page 6 states "The instance                         
                            
                                    x
                                
                                    u
                                
                     is matched by the rule                         
                            
                                    r
                                
                                    j
                                
                     if                         
                            
                                    s
                                
                                    j
                                
                     is higher than the matching threshold                         
                            σ
                        
                     obtained on the development set." The threshold for the model at                         
                            t
                            +
                            1
                             
                    is interpreted as the first range and the threshold used for the rules generated by the model at                         
                            t
                            +
                            2
                        
                     is interpreted as the second range. The value is                         
                            σ
                        
                    , which is a number that can be represented by a percentage.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Zhang teaches
wherein the first range and second range correspond to one or more respective predetermined values that indicate a limit. (Page 6 states "The instance                         
                            
                                    x
                                
                                    u
                                
                     is matched by the rule                         
                            
                                    r
                                
                                    j
                                
                     if                         
                            
                                    s
                                
                                    j
                                
                     is higher than the matching threshold                         
                            σ
                        
                     obtained on the development set." The threshold for the model at                         
                            t
                            +
                            1
                             
                    is interpreted as the first range and the threshold used for the rules generated by the model at                         
                            t
                            +
                            2
                        
                     is interpreted as the second range. The value is                         
                            σ
                        
                    , which is a predetermined value that indicates a limit for acceptance.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Zhang teaches
wherein the first labeling rule is generated using one or more rule templates that include at least one simple rule with at least one predicate. (Page 6 states "When                         
                            
                                    x
                                
                                    u
                                
                     is matched by multiple rules that provide conflicting labels, we use the one with the highest matching score to assign the weak label." The label that assigns the weak label is interpreted as the labeling rule. Page 5 states "For example, as shown in Table 1, the prompt of the relation extraction task can be "entity [MASK] entity", which rephrases the original input using relation phrases while keeping the key semantics." The prompt, which is simple, is interpreted as the rule template. One of ordinary skill would realize that the mask is in place of a predicate. Support for this is also available in Table 1, where the first example mask is a predicate.)

Regarding claim 8, Zhang teaches
A system for training a machine learning model using augmented training data, the system comprising: a processor; and a memory including instructions that, when executed by the processor, cause the processor to: (Page 3 states "Weakly-supervised learning (WSL) creates weak labels for model training by applying labeling rules over unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                    . Given an unlabeled instance                         
                            x
                            ∈
                            
                                    D
                                
                                    u
                                
                    , a labeling rule                         
                            r
                            
                                    ⋅
                                
                     maps                         
                            x
                        
                     into an extended label space:                         
                            r
                            
                                    x
                                
                            →
                            y
                            ∈
                            Y
                            ∪
                            {
                            0
                            }
                        
                    . Here                         
                            Y
                        
                     is the original label set for the task, and 0 is a special label indicating                         
                            x
                        
                     is unmatchable by                         
                            r
                        
                    . Given a set                         
                            R
                        
                     of labeling rules, we can apply each rule in                         
                            R
                        
                     on unlabeled instances to create a weakly labeled dataset                         
                            
                                    D
                                
                                    l
                                
                                    '
                                
                    ." The weakly labeled dataset is interpreted as the augmented training data. Page 13 states "We test our code on the System Ubuntu 18.04.4 LTS with CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz and GPU: NVIDIA GeForce RTX 2080. We implement our method using Python 3.6 and PyTorch 1.2 (Paszke et al., 2019)." One of ordinary skill in the art would realize that using the system Ubuntu, the CPU and GPU (which are processors), and Python requires using a system with memory including instructions that are executed via the CPU and GPU. As the method is implemented using this system, the processor executes the method.)
The remainder of claim 8 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 10-13 recite substantially similar subject matter to claims 3-6 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 15, Zhang teaches
An apparatus for generating data augmentation labeling rules, the apparatus comprising: a processor; and a memory including instructions that, when executed by the processor, cause the processor to: (Page 3 states "Weakly-supervised learning (WSL) creates weak labels for model training by applying labeling rules over unlabeled instances                         
                            
                                    D
                                
                                    u
                                
                    . Given an unlabeled instance                         
                            x
                            ∈
                            
                                    D
                                
                                    u
                                
                    , a labeling rule                         
                            r
                            
                                    ⋅
                                
                     maps                         
                            x
                        
                     into an extended label space:                         
                            r
                            
                                    x
                                
                            →
                            y
                            ∈
                            Y
                            ∪
                            {
                            0
                            }
                        
                    . Here                         
                            Y
                        
                     is the original label set for the task, and 0 is a special label indicating                         
                            x
                        
                     is unmatchable by                         
                            r
                        
                    . Given a set                         
                            R
                        
                     of labeling rules, we can apply each rule in                         
                            R
                        
                     on unlabeled instances to create a weakly labeled dataset                         
                            
                                    D
                                
                                    l
                                
                                    '
                                
                    ." The weakly labeled dataset is interpreted as the augmented training data. Page 13 states "We test our code on the System Ubuntu 18.04.4 LTS with CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz and GPU: NVIDIA GeForce RTX 2080. We implement our method using Python 3.6 and PyTorch 1.2 (Paszke et al., 2019)." One of ordinary skill in the art would realize that using the system Ubuntu, the CPU and GPU (which are processors), and Python requires using a system with memory including instructions that are executed via the CPU and GPU. As the method is implemented using this system, the processor executes the method.)
generate at least one seed labeling rule based on an identified first labeling rule having wherein the first labeling rule is generated based on the first labeled training data using one or more rule templates that include at least one simple rule with at least one predicate. (Page 5 states "As the candidate rules                         
                            
                                    R
                                
                                    t
                                
                     can be still noisy, PRBOOST thus presents                         
                            
                                    R
                                
                                    t
                                
                    to humans for selecting high-quality rules. Specifically, for each candidate rule                         
                            
                                    r
                                
                                    j
                                
                            ∈
                            
                                    R
                                
                                    t
                                
                    , we present it along with its prompt template                         
                            
                                    x
                                
                                            p
                                        
                                            j
                                        
                     to human experts, then they judge whether the rule                         
                            
                                    r
                                
                                    j
                                
                    should be accepted or not. Formally,                         
                            
                                    r
                                
                                    j
                                
                     is associated with a label                         
                            
                                    d
                                
                                    j
                                
                            ∈
                            {
                            1,0
                            }
                        
                    . When a rule is accepted (                        
                            
                                    d
                                
                                    j
                                
                            =
                             
                            1
                        
                    ), it will be incorporated into the accepted rule set                         
                            
                                    R
                                
                                    +
                                
                     for later weak label generation." The accepted rule set is interpreted as the seed labeling rule.)
Zhang does not appear to explicitly teach
However, Li—directed to analogous art—teaches
a score determined based on a number of spans extracted by the identified first labeling rule and that is greater than a score threshold, (Page 5 states “"We select new rules from rule candidates based on their confidence scores. We adopt the RlogF method (Thelen and Riloff, 2002) to compute the confidence score of a rule                         
                            r
                        
                    :                         
                            F
                            
                                    r
                                
                            =
                            
                                            F
                                        
                                            i
                                        
                                            N
                                        
                                            i
                                        
                                            log
                                        
                                            2
                                        
                                ⁡
                                
                                    (
                                    
                                            F
                                        
                                            i
                                        
                                    )
                                
                     93) where                         
                            
                                    F
                                
                                    i
                                
                     is the number of spans predicted with category label                         
                            i
                        
                     and matched by rule                         
                            r
                        
                    , and                         
                            
                                    N
                                
                                    i
                                
                     is the total number of spans matched by rule                         
                            r
                        
                    ."                         
                            
                                    N
                                
                                    i
                                
                     is interpreted as the number of spans extracted by each labelling rule, and                         
                            F
                            
                                    r
                                
                     is interpreted as the score. Page 6 further states, in relation to the score of the rules, "In our experiments, we select the top K rules for each entity class per iteration." The threshold is interpreted as the lowest score of the top-K rules, as above that, the labeling rule is selected/identified. Page 6 states "When                         
                            
                                    x
                                
                                    u
                                
                     is matched by multiple rules that provide conflicting labels, we use the one with the highest matching score to assign the weak label." The label that assigns the weak label is interpreted as the labeling rule. Page 5 states "For example, as shown in Table 1, the prompt of the relation extraction task can be "entity [MASK] entity", which rephrases the original input using relation phrases while keeping the key semantics." The prompt, which is simple, is interpreted as the rule template. One of ordinary skill would realize that the mask is in place of a predicate. Support for this is also available in Table 1, where the first example mask is a predicate.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Zhang and Li because as Li states on page 6, "This method allows a variety of logical rules to be considered, yet is precise enough that all logical rules are strongly associated with the category."
The remainder of claim 15 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 17-19 recite substantially similar subject matter to claims 3-5 respectively and are rejected with the same rationale, mutatis mutandis.

Claim(s) 2, 9, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (“PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning”, March 2022) and Li (“Weakly Supervised Named Entity Tagging with Learned Logical Rules”, 2021) as applied to claim 1 above, and further in view of Ma (“Self-paced Multi-view Co-training”, April 2020).

	Regarding claim 2, the rejection of claim 1 is incorporated herein. Zhang teaches
	the named entity recognizer model (See rejection of claim 1.)
the meta-learning model (See rejection of claim 1.)
The combination of Zhang and Li does not appear to explicitly teach
further comprising generating the first labels by [a model] and generating the second labels by [another model] in parallel.
However, Ma—directed to analogous art—teaches
further comprising generating the first labels by [a model] and generating the second labels by [another model] in parallel. (Page 15, Algorithm 1 states "8: Update                         
                            Θ
                        
                     : train classifiers for all views in a distributed way 9: Update                         
                            
                                    Y
                                
                                ~
                            
                    : renew predictions on all unlabeled instances". Page 13 states “Update                         
                            
                                    Y
                                
                                ~
                            
                    : The newly learned classifier is expected to perform gradually better since more confident data are expected to be used for training. It is then reasonable to make use of the updated predictions on the unlabeled set to update their pseudo-labels." The pseudo-labels are interpreted as the labels, where one of the models in the ensemble generates the first labels and another generates the second labels. As they are trained in a distributed way and updated in the same step, the labels are generated in parallel.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Zhang and Li with the teachings of Ma because, as Ma states on page 15, "The training time becomes critical when deep neural networks are adopted for each view. The parallel training manner should be not only necessary but also a must." One of ordinary skill in the art would be motivated to modify the self-training taught by Zhang to the parallel co-training of Ma for this reason.

Claims 9 and 16 recite substantially similar subject matter to claim 2 and are rejected with the same rationale, mutatis mutandis.

Claim(s) 7, 14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (“PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning”, March 2022) and Li (“Weakly Supervised Named Entity Tagging with Learned Logical Rules”, 2021) as applied to claim 1 above, and further in view of Ding (“Few-NERD: A Few-shot Named Entity Recognition Dataset”, 2021).
Ding (“Few-NERD: A Few-shot Named Entity Recognition Dataset”, 2021) was made available through the IDS of the application.

Regarding claim 7, the rejection of claim 1 is incorporated herein. Zhang does not appear to explicitly teach
wherein the meta-learning model includes a ProtoBERT model.
However, Ding—directed to analogous art—teaches
wherein the meta-learning model includes a ProtoBERT model. (Page 7 states "Inspired by achievements of meta-learning approaches (Finn et al., 2017; Snell et al., 2017; Ding et al., 2021) on few-shot learning. The first baseline model we implement is ProtoBERT, which is a method based on prototypical network (Snell et al., 2017) with a backbone of BERT (Devlin et al., 2019a) encoder.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Zhang and Li with the teachings of Ding because, as Ding states on Page 6, "Recent studies show that pre-trained language models with deep transformers (e.g., BERT (Devlin et al., 2019a)) have become a strong encoder for NER (Li et al., 2020b)." Ding also states on Page 8, "In the comparison across models, ProtoBERT generally achieves better performance than NNShot and StructShot, especially in 510 shot setting where calculation by prototype may differ more from calculation by entity."

Claims 14 and 20 recite substantially similar subject matter to claim 7 and are rejected with the same rationale, mutatis mutandis.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.T.P./               Examiner, Art Unit 2121                                                                                                                                                                                         
/Li B. Zhen/               Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Sep 26, 2022
Application Filed
Aug 01, 2025
Non-Final Rejection — §101, §103, §112
Oct 29, 2025
Response Filed
Jan 20, 2026
Final Rejection — §101, §103, §112 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

3-4
Expected OA Rounds
33%
Grant Probability
0%
With Interview (-33.3%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.
METHODS AND SYSTEMS FOR AUTOMATED CREATION OF ANNOTATED DATA AND TRAINING OF A MACHINE LEARNING MODEL THEREFROM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHODS AND SYSTEMS FOR AUTOMATED CREATION OF ANNOTATED DATA AND TRAINING OF A MACHINE LEARNING MODEL THEREFROM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email