Last updated: April 18, 2026
Application No. 18/037,149
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND RECORDING MEDIUM

Non-Final OA §101§102§103§112§DP
Filed
May 16, 2023
Examiner
BALAKRISHNAN, VIJAY MURALI
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
NEC Corporation
OA Round
1 (Non-Final)
This examiner grants 43% of cases after interview

— +85.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

BALAKRISHNAN, VIJAY MURALI View full profile →
Grants 43% of resolved cases
Career Allow Rate
6 granted / 14 resolved
-12.1% vs TC avg
Strong +86% interview lift
Without
With
+85.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 12m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.4%
-13.6% vs TC avg
§103
31.5%
-8.5% vs TC avg
§102
13.2%
-26.8% vs TC avg
§112
24.3%
-15.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§101 §102 §103 §112 §DP
DETAILED ACTION
This nonfinal action is in response to application 18/037,149 filed 05/16/2023, which is a national stage entry of international application PCT/JP2020/044486 filed 11/30/2020.
Receipt of applicant’s preliminary amendment filed 05/16/2023 is acknowledged.
Claims 1-9 are pending in the application. Claims 1, 8, and 9 are independent claims.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
	The information disclosure statement (IDS) filed 05/16/2023 has been fully considered by the examiner.
Specification
The specification is objected to because the title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed.
The specification is further objected to because of the following informality:
The specification contains abbreviations (e.g., SMOTE, MUNGE) that are not expressly defined [see ¶ 0031, 0040]. All abbreviations should be expressly defined in the specification the first time that they are used.
	Appropriate correction is required.
Claim Interpretation
	As recited in MPEP § 2111, during patent examination, “the pending claims must
be given their broadest reasonable interpretation consistent with the specification”.
Under a broadest reasonable interpretation (BRI), claim terms must be given their plain
and ordinary meaning (i.e., the meaning that the term would have to a person of
ordinary skill in the art), unless applicant sets forth a special definition of a claim term
within the specification. The plain and ordinary meaning of a term “may be evidenced by
a variety of sources, including the words of the claims themselves, the specification, drawings, and prior art”.
	Claims 1, 8, and 9, each recite the limitations “assign[ing] labels to the training examples using a teacher model” and “calculat[ing] errors between predictions of the one or more student models and predictions of the teacher model”. The specification further describes a teacher model as a model “which can be regarded as outputting absolutely correct predictions” [see ¶ 0013-0014], i.e., an oracle [see ¶ 0002] that always assigns correct labels to examples.
	It is well understood in the art that it is virtually impossible for a trained machine learning model to be “absolutely correct” in its predictions, i.e., 100% accurate, due to inherent probabilistic noise/randomness in real-world data and fundamental model limitations. The specification also does not appear to further explain how such a trained model would be generated or prepared. Based on the limited description of the specification and what would be understood by one of ordinary skill in the art, the examiner has thereby broadly interpreted a “teacher model” to be encompass any oracle/expert system with access to correct predictions, e.g., a human annotator that assigns ground truth labels to examples, or a system/component that receives, stores and accesses ground truth labels. 
	Claims 1, 8, and 9 further recite the limitation “generating one or more student models using at least a part of the training examples to which the labels are assigned”. Although the specification describes the recited models in the context of machine learning, the claims do not expressly define the recited “student models” as being machine learning models, and the additional limitations of the claim do not particularly require the functionality of machine learning models. As such, the examiner has broadly interpreted “student models” to encompass any rule-based system or algorithm that is developed based on observed examples (i.e., training examples).
	Claims 1, 8, and 9 further recite the limitation “extract[ing] and output[ting] each example for which the error is to be significant based on the calculated errors”. While neither the specification or claims explicitly set forth a requisite degree for determining an error to be “significant”, applicant’s specification does describe errors “greater than a predetermined threshold value” [¶ 0041], or errors “in which a weighted sum of a degree of appearance and the error increases” [¶ 0042], to be examples of significance. The examiner has thereby broadly interpreted “significant” errors to be any errors that meet a predetermined condition and/or exceed a predetermined measure or threshold.
	Claim 3 recites the term “a degree of appearance”. While neither the specification or claims explicitly set forth a special definition, the specification does appear to describe “degrees of appearance” as synonymous to probability values with respect to examples within a distribution of predicted class outputs [¶ 0032, 0042]. The examiner has thereby broadly interpreted “a degree of appearance” to be any measure of likelihood or confidence with respect to a given example.
Claim Objections
Claims 1 and 6-9 are objected to because of the following informalities:
In claims 1, 8, and 9, “extract[ing] and output[ting] each example for which the error is to be significant based on the calculated errors, from the data retention means” should read “extract[ing] and output[ting] each example, for which the error is to be significant based on the calculated errors, from the data retention means” (add preceding comma) to improve grammatical clarity.
In claim 6, “predictions of the one or more students” should read “predictions of the one or more student models” to have clearer antecedent basis.
In claim 7, “examples other than the training example” should read “examples other than the training examples” to have clearer antecedent basis.
Appropriate corrections are required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, it recites the limitation “assign labels to the training examples using a teacher model”, followed by the limitation “calculate errors between predictions of the one or more student models and predictions of the teacher model”. However, the claim does not establish sufficient antecedent basis for “predictions” of the teacher model; it is thereby unclear if the recited “predictions of the teacher model” are equivalent to the previously recited “assign[ed] labels”, or are instead referring to an entirely different set of values. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
For purposes of examination and as best understood in light of the specification, the examiner has interpreted the limitation “calculate errors between predictions of the one or more student models and predictions of the teacher model” as “calculate errors between predictions of the one or more student models and labels assigned by the teacher model”.
Claim 1 further recites “retain examples formed by features in a data retention means; and extract and output each example for which the error is to be significant based on the calculated errors”. It is unclear if the recited “examples” formed by features are in reference to the previously recited “receive[d] training examples formed by features”, or are instead in reference to the previously recited “error calculation examples different from the part of the training examples”. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
For purposes of examination and as best understood in light of the specification, the limitation “retain examples formed by features in a data retention means; and extract and output each example for which the error is to be significant based on the calculated errors” is interpreted as “retain training examples formed by features in a data retention means; and extract and output each training example for which the error is to be significant based on the calculated errors”.
Regarding claim 3, it recites the limitation “determines each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant”. The clause “for which a weighted sum of the degree of appearance and the error” appears to be incomplete, as the claim does not recite a following condition to meet in order to determine the error to be “significant”, and thereby does not logically connect to the following clause (“as an example for which the error is significant”). It is further unclear if the term “an example” refers to the “error calculation example” at issue, or instead refers to a separate “training example”. The lack of clarity results in the apparent relationship between claim elements being indefinite, such that one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
For purposes of examination and as best understood in light of the specification, the examiner has interpreted ““determines each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant” as “determines each error calculation example, for which a weighted sum of the degree of appearance and the error is significant, as an error calculation example for which the error is significant”.
Regarding claim 6, it recites the limitation “calculates an average of the errors calculated for the one or more students as the errors with respect to the predictions of the one or more students and the predictions of the teacher model”. The clause “an average of the errors calculated for the one or more students” appears to be incomplete, and thereby does not logically connect to the following clause (“as the errors with respect to the predictions of the one or more students and the predictions of the teacher model”). It is unclear what the recited relationship is between “the errors calculated for the one or more students” and “the errors with respect to the predictions of the one or more students and the predictions of the teacher model”, or if these clauses are intended to be referring to the same element or distinct claim elements. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
For purposes of examination and as best understood in light of the specification, the examiner has interpreted “calculates an average of the errors calculated for the one or more students as the errors with respect to the predictions of the one or more students and the predictions of the teacher model” as “calculates an average of calculated errors between the predictions of the one or more student models and the labels assigned by the teacher model’.
Regarding claim 7, it recites the limitation “examples other than the training example as the error calculation examples”. There is insufficient antecedent basis for this term in the claim; it is unclear if these “other” examples refer to previously unspecified received examples that are separate from the “receive[d] training examples formed by features”, or separate from the training examples with “assign[ed] labels”, or training examples separate from a subset of training examples that make up the previously recited “part of the training examples to which the labels are assigned”. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
For purposes of examination and as best understood in light of the specification, the examiner has interpreted the limitation “examples other than the training example as the error calculation examples” as “a plurality of received examples separate from the received training examples as the error calculation examples”.
	Regarding claims 2 and 4-5, they inherit the deficiencies of their parent claims. Consequently, they are also rejected under 35 U.S.C. 112(b) as being indefinite for depending on an indefinite parent claim. Any further indefinite references to “predictions of the teacher model” or ”examples” are likewise interpreted as detailed above.
Regarding claims 8 and 9, they have the same deficiencies as those found in claim 1 above. Consequently, they are rejected for the same reasons as claim 1 and are likewise interpreted as detailed above. 
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1, 5-6, and 8-9 of the instant application are provisionally rejected on the grounds of nonstatutory double patenting as being unpatentable over claims 1 and 7-9 of co-pending application 18/037,298, further in view of Hady et al. (“Semi-supervised Learning for Regression with Co-training by Committee”, available 2009), hereinafter Hady.
Claims 2-3 of the instant application are provisionally rejected on the grounds of nonstatutory double patenting as being unpatentable over claim 1 of co-pending application 18/037,298 in view of Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available 2009), further in view of Kee et al., (“Query-by-committee improvement with diversity and density in batch active learning”, available online 3 May 2018), hereinafter Kee.
Claim 4 of the instant application is provisionally rejected on the grounds of nonstatutory double patenting as being unpatentable over claim 1 of co-pending application 18/037,298 in view of Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available 2009), further in view of Nandi et al. (“Sampling Based Methods for Class Imbalance in Datasets”, available online 15 May 2017), hereinafter Nandi.
Claim 7 of the instant application is provisionally rejected on the grounds of nonstatutory double patenting as being unpatentable over claim 1 of co-pending application 18/037,298 in view of Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available 2009), further in view of Yang et al., (“Active Learning Using Uncertainty Information”, available conference 2016), hereinafter Yang.
Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the instant application are substantially similar in scope to the claims of the co-pending application, such that a person of ordinary skill in the art would understand the claimed invention of the instant application to be an obvious variation.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
The table below shows similarities and differences between claims of the instant application and the co-pending application. Differences that amount to more than minor variations in language and punctuation are further discussed below.
Instant Application (18/037,149)
Co-pending Application (18/037,298)
Claim 1.
An information processing device comprising:

a memory storing instructions; and

one or more processors configured to execute the instructions to:

receive training examples formed by features;

assign labels to the training examples using a teacher model;





generate one or more student models using at least a part of the training examples to which the labels are assigned, and calculate errors between predictions of the one or more student models and predictions of the teacher model by using error calculation examples different from the part of the training examples used to generate the one or more student models;

retain examples formed by features in a data retention means; and






extract and output each example for which the error is to be significant based on the calculated errors, from the data retention means.

(Claim 2, depending from claim 1)
wherein the processor selects each example for which the calculated error is significant, extracts each example similar to the selected example from the data retention means, and outputs the extracted example as an example for which the error is predicted to be significant.

(Claim 3, depending from claim 2)
wherein the processor calculates a degree of appearance, and determines each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant.

(Claim 4, depending from claim 1)
wherein the processor generates new error calculation examples by oversampling from the training examples


(Claim 5, depending from claim 1)
wherein the processor generates the one or more student models, and calculates the errors using a remaining part of the training examples as the error calculation examples

(Claim 6, depending from claim 1)
wherein the processor generates a plurality of sample groups by random sampling with duplicates from the training examples, generates the one or more student models using respective sampling groups, calculates the errors using, as the error calculation examples, samples included in the training examples but not included in the sample groups for each of the one or more student models, and calculates an average of the errors calculated for the one or more students as the errors with respect to the predictions of the one or more students and the predictions of the teacher model

(Claim 7, depending from claim 1)
wherein the processor calculates the errors using examples other than the training example as the error calculation examples

(Claim 8)
An information processing method
(Examiner Note: Method claim body corresponds to claim 1 as detailed above)

(Claim 9)
A non-transitory computer-readable recording medium storing a program, the
program causing a computer to perform a process
(Examiner Note: Product claim body corresponds to claim 1 as detailed above)
Claim 1.
An information processing device comprising:

a memory storing instructions; and

one or more processors configured to execute the instructions to:

accept training examples formed by features;

assign labels to the training examples;(Examiner Note: Assigning labels to training examples implicitly requires a component of the information processing device (i.e., teacher model – see Claim Interpretation above))

generate one or more student models
using the training examples to which the labels are assigned, and calculate errors
between predictions of the one or more student models and the labels;
(Examiner Note: As best understood in light of the specification, “predictions of the teacher model” are interpreted as equivalent to assigned labels – see Claim Rejections - 35 USC § 112 above)

(Examiner Note: The information processing device implicitly stores training examples in memory (i.e., a data retention means))

generate an error prediction model which is a model for predicting the errors; and

output each example for which the error is predicted to be significant based on the error prediction model.


(claim 1, as above)









(claim 1, as above)








(claim 1, as above)





(claim 1, as above)
 (repeated from claim 1)
generate one or more student models
using the training examples to which the labels are assigned, and calculate errors
between predictions of the one or more student models and the labels;

(Claim 7, depending from claim 1)
wherein the processor generates a plurality of sampling groups by random sampling with duplicates from the training examples, generates the one or more student models using each of the sampling groups, calculates, for each of the one or more student models, the errors with respect to data which are included in the training examples but not included in the sampling group, and calculates an average of the errors calculated for the one or more student models



(claim 1, as above)





(Claim 8)
An information processing method
(Examiner Note: Method claim body corresponds to claim 1 as detailed above)

(Claim 9)
A non-transitory computer-readable recording medium storing a program, the
program causing a computer to perform a process
(Examiner Note: Product claim body corresponds to claim 1 as detailed above)


	Claim 1 of the co-pending application does not expressly teach generating one or more student models using at least a part of the training examples, and calculating errors by using error calculation examples different from the part of the training examples used to generate the one or more student models, as recited in claims 1, 8, and 9 of the instant application, and further does not expressly teach using a remaining part of the training examples as the error calculation examples as recited in claim 5 of the instant application.
	In the same field of endeavor, Hady discloses a semi-supervised ensemble learning framework (“In this paper, a semi-supervised regression framework, denoted by CoBCReg is proposed, in which an ensemble of diverse regressors is used for semi-supervised learning that requires neither redundant independent views nor different base learning algorithms. Experimental results show that CoBCReg can effectively exploit unlabeled data to improve the regression estimates” [Hady Abstract]) that generates one or more student models using at least a part of the training examples (“In the experiments, an initial ensemble of four RBF network regressors, N = 4, is constructed by Bagging… Table 2 present the average of the RMSEs of the four RBF Network regressors used in CoBCReg and the RMSE of CoBCReg on the test set at iteration 0 (initial) trained only on the 10% available labeled data L” [Hady page 127 Methodology and Results]; see lines 1-3 of Algorithm 1. COBC for Regression (wherein L is the set of m labeled training examples, N is the number of committee members (ensemble size), and hi is each ensemble member) – 
    PNG
    media_image1.png
    63
    666
    media_image1.png
    Greyscale
 [Hady page 123]; The disclosed algorithm initially generates each RBF network regressor (i.e., student model) from a respective set of labeled training examples Li) and calculates errors by using error calculation examples different from the part of the training examples used to generate the one or more student models (“For each iteration t and for each ensemble member hi, a set U’ of u examples is drawn randomly from U. The SelectRelevantExamples method is applied such that the companion committee Hi (ensemble consists of all members except hi) estimates the output of each unlabeled example in U’” [Hady page 124 Co-training by Committee for Regression (CoBCReg)]; “Then, the root mean squared error (RMSE) of hj is evaluated first (∈ j )… It is worth mentioning that the RMSEs ∈ j and ∈’ j should be estimated accurately. If the training data of hj is used, this will under-estimate the RMSE. Fortunately, since the bootstrap sampling [5] is used to construct the committee, the out-of-bootstrap examples are considered for a more accurate estimate of ∈’j” [Hady pages 125-126 Confidence Measure]; see lines 8-9 of Algorithm 1 – 
    PNG
    media_image2.png
    51
    586
    media_image2.png
    Greyscale
, and lines 2-6 of Algorithm 2. SelectRelevantExamples – 
    PNG
    media_image3.png
    156
    712
    media_image3.png
    Greyscale
 [Hady page 123]; The disclosed algorithm calculates error for out-of-bag examples from validation set Vj (i.e., error calculation examples) by calculating root mean squared error (RMSE) between example label (i.e., predictions of teacher model) and RBF network output (i.e., predictions of student model) – note that examples of Vj are out-of-bag, i.e., separate from bagging examples Li used to initially generate ensemble members, as shown in line 2 of Algorithm 1), and
	us[es] a remaining part of the training examples as the error calculation examples (see line 2 of Algorithm 1 and lines 2-6 of Algorithm 2 [Hady page 123] as detailed in claim 1 above; Examples of validation set Vj used to calculate errors are out-of-bag, i.e., separate from bagging examples Li used to initially generate ensemble members).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated generating one or more student models using at least a part of the training examples, calculating errors by using error calculation examples different from the part of the training examples used to generate the one or more student models, and using a remaining part of the training examples as the error calculation examples as taught by Hady into the co-pending application because they are both directed towards semi-supervised ensemble learning frameworks. Incorporating the teachings of Hady would allow for selection of the most informative unlabeled examples through leveraging validation error on labeled examples (“At each iteration and for each regressor, the companion committee labels the unlabeled examples then the regressor select the most informative newly-labeled examples for itself, where the selection confidence is based on estimating the validation error” [Hady pages 129-130 Conclusions and Future Work]), wherein utilizing labeled examples allows for more accurate estimation of the error (“Fortunately, since the bootstrap sampling [5] is used to construct the committee, the out-of-bootstrap examples are considered for a more accurate estimate of ∈’j” [Hady pages 125-126 Confidence Measure]).
	Claim 1 of the co-pending application does not expressly teach selecting each example for which the calculated error is significant, extract[ing] each example similar to the selected example from the data retention means, and output[ting] the extracted example as an example for which the error is predicted to be significant, as recited in claim 2 of the instant application, and further does not expressly teach calculat[ing] a degree of appearance, and determin[ing] each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant, as recited in claim 3 of the instant application.
	In the same field of endeavor, Kee teaches an ensemble learning framework that utilizes sets of labeled and unlabeled data to answer queries (“In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning” [Kee Abstract]; “We describe general BAL procedures discussed in Sections 3.1 to 3.3 , namely QBC only (QO), QBC and diversity (QD), and QBC, diversity, and density (QDD) settings, respectively. The overall procedures are similar in BAL scheme, yet show difference in the objective function and instance selection. Initially, a set of labeled instances, L , a set of unlabeled instances, U, and a constant batch size, q , are given” [Kee page 405 Batch active learning procedures]) that extracts each unlabeled example similar to the selected labeled example from the data retention means, and outputs the extracted unlabeled example as an example for which the error is predicted to be significant (“One should determine a distance measure to evaluate instance utility introduced by diversity for unlabeled instances x ∗. This decision is similar to linkage selection in clustering. A common approach is to take the minimum distance to the labeled instances and the unlabeled instances already in a query 
    PNG
    media_image4.png
    43
    218
    media_image4.png
    Greyscale
 where dist( x ∗ , x ) is a distance function, L , U and Q are sets of labeled, unlabeled, and previously selected query instances, respectively.” [Kee page 404 Incorporating diversity]; see lines 7-14 in Algorithm 2 QD: Batch active learning with query-by-committee and diversity – 
    PNG
    media_image5.png
    197
    530
    media_image5.png
    Greyscale
 [Kee page 406]; The disclosed algorithm, for each selected labeled example, determines, and outputs via variable assignment (s in line 11), a related unlabeled example with maximal uncertainty (F(x) term in argmax expression) and minimal distance (i.e., similarity) to the labeled example (min(D(x)) term in argmax expression)),
	and calculates a degree of appearance, and determines each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant (“Unlike diversity which is generally incorporated in BAL, density has been often overlooked, but it is necessary to take into account density to prevent outliers in queries and label more representative instances which may improve uncertainty sampling further. Some studies incorporate density consideration with uncertainty under serial AL setting. One approach is to utilize density estimation (DE) methods. … Both DE and ER approaches assume that instances from denser regions are more informative in estimating the decision boundary, yet how they compare denser regions are different. DE approaches evaluate the information of an instance based on the actual density at the instance, p ( x ), derived from the underlying probability distribution in the feature space...Introducing a density factor extends Eq. (6) to the following form 
    PNG
    media_image6.png
    31
    377
    media_image6.png
    Greyscale
 where f ( x ), d ( x ), and h ( x ) are the uncertainty, diversity, and density functions, respectively, and 0 ≤λ, β ≤1 such that λ + β ≤ 1 control the relative importance” [Kee page 404-405 Incorporating diversity and density]; see line 12 in Algorithm 3 QDD: Batch active learning with query-by-committee, diversity and density – 
    PNG
    media_image7.png
    22
    592
    media_image7.png
    Greyscale
” [Kee page 407]; A related unlabeled example s may be selected through adjusted uncertainty measure u(x), which is a sum (weighted through importance terms λ and B) of uncertainty f(x), diversity d(x) and density h(x), wherein density h(x) is a confidence measure (i.e., degree of appearance) drawn from the underlying distribution of the data)).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated extract[ing] each unlabeled example similar to the selected labeled example from the data retention means, and output[ting] the extracted unlabeled example as an example for which the error is predicted to be significant, and determin[ing] each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant as taught by Kee into the co-pending application because they are both directed towards an ensemble learning framework that utilizes sets of labeled and unlabeled data to answer queries. Incorporating the teachings of Kee would boost model performance through inclusion of diversity and density terms for example selection [Kee Abstract].
	Claim 1 of the co-pending application does not expressly teach generat[ing] new error calculation examples by oversampling from the training examples, as recited in claim 4 of the instant application.
	In the same field of endeavor, Nandi teaches a means of re-sampling training examples (“In practice, we can't travel to a parallel universe (...yet) and re-collect this data, but we can simulate this using the bootstrap method. The idea behind bootstrap is simple: If we resample points with replacement from our data, we can treat the re-sampled dataset as a new dataset we collected in a parallel universe. Using the bootstrap method, I can create 2,000 re-sampled datasets from our original data and compute the mean of each of these datasets” [Nandi pages 3-4]) that generates new examples by oversampling from existing examples (“We want to use the general principle of bootstrap to sample with replacement from our minority class, but we want to adjust each re-sampled value to avoid exact duplicates of our original data. This is where the Synthetic Minority Oversampling Technique (SMOTE) algorithm comes in. The SMOTE algorithm can be broken down into four steps: 1. Randomly pick a point from the minority class. 2. Compute the k-nearest neighbors (for some pre-specified k) for this point. 3. Add k new points somewhere between the chosen point and each of its neighbors” [Nandi page 7]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated generat[ing] new examples by oversampling from existing examples as taught by Nandi into the co-pending application because they both are directed towards re-sampling of training examples. Given that availability of labeled training data is commonly limited in real-life scenarios, incorporating the teachings of Nandi would thereby address class imbalance in labeled examples and thereby improve representativeness of sampled groups.
  	Claim 1 of the co-pending application does not expressly teach using examples other than the training example as the error calculation examples, as recited in claim 7 of the instant application.
	In the same field of endeavor, Yang discloses an ensemble learning framework that utilizes sets of labeled and unlabeled data to answer queries (“The second class, retraining free active learning, contains the remaining methods which not need repeatedly train the model for each unlabeled instance during one single selection. For example, uncertainty sampling and query-by-committee belong to this category…We concentrate on the pool-based active learning setting which assumes a large pool of unlabeled data along with a small set of labeled data already available [2].” [Yang page 2647]) that calculat[es] errors using examples other than the training example as the error calculation examples (“Firstly, let us introduce some preliminaries and notation. Let  
    PNG
    media_image8.png
    31
    151
    media_image8.png
    Greyscale
 represent the training data set that consists of m labeled instances and U be the pool of unlabeled instances” [Yang page 2647 Retraining-Based Active Learning]; “Expected error reduction has demonstrated its effectiveness on text classification domain [8]. There are also some followup work of EER contributed by other researchers [9] [10] [11]. EER aims to select the sample which will reduce the future generalization error. Since we can not see the test data, the unlabeled pool can be used as the validation set to predict the future test error” [Yang page 2647 Expected Error Reduction]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated calculat[ing] errors using examples other than the training example as the error calculation examples as taught by Yang into the co-pending application because they are both directed towards ensemble learning frameworks that utilizes sets of labeled and unlabeled data to answer queries. Given that availability of labeled training data is commonly limited in real-life scenarios, incorporating the teachings of Yang would be beneficial in instances where all available labeled data is reserved for initial training of ensemble models.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-9 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Independent Claims (Claim 1, Claim 8, Claim 9):
Step 1: Claim 1 is drawn to an apparatus, claim 8 is drawn to a method, and claim 9 is drawn to a product. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 1, 8, and 9 each recite a judicially recognized exception of an abstract idea.
Claim 1 recites, inter alia:
assign labels to the training examples using a teacher model; – This limitation amounts to a human expert (teacher model – see Claim Interpretation above) annotating data based on observation and analysis, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
generate one or more student models using at least a part of the training examples to which the labels are assigned, – This limitation, under a broadest reasonable interpretation, amounts to observing examples, identifying patterns, and constructing rules or algorithms (i.e., a student model – see Claim Interpretation above) to reason about new situations, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
and calculate errors between predictions of the one or student models and predictions of the teacher model by using error calculation examples different from the part of the training examples used to generate the one or more student models; – This limitation expressly recites a procedure of using mathematical methods (calculat[ing] errors) to quantify difference between variables (predictions), and therefore recites mathematical calculation.
extract and output each example for which the error is to be significant based on the calculated errors, from the data retention means – This limitation amounts to observing a list of values and selecting those which exceed a predetermined threshold (error is to be significant) and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
Claims 8 and 9 recite substantially similar abstract idea limitations to those recited in claim 1, and therefore recite the same judicial exception.
Step 2A Prong 2: The following additional elements recited in claims 1, 8, and 9 also do not integrate the recited judicial exceptions into a practical application.
	Claim 1 additionally recites:
An information processing device comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: – This limitation amounts to mere instructions to implement an abstract idea on a computer or computer components
receive training examples formed by features – This limitation amounts to an insignificant pre-solution step of gathering data to enable further analysis, and therefore recites insignificant extra-solution activity.
retain examples formed by features in a data retention means – This limitation amounts to an insignificant intermediary step of storing data to enable further analysis, and therefore recites insignificant extra-solution activity
Claims 8 and 9 recite substantially similar additional elements to those recited in claim 1, and therefore also do not integrate the recited judicial exceptions into a practical application.
Step 2B: The additional elements recited in claims 1, 8, and 9, viewed individually or as an ordered combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
	Claim 1 additionally recites:
An information processing device comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: – Mere instructions to implement an abstract idea on a computer or computer components do not provide an inventive concept or significantly more to the recited abstract idea.
receive training examples formed by features – Receiving data is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
retain examples formed by features in a data retention means – Storing data in memory is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Storing and retrieving information in memory”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claims 8 and 9 recite substantially similar additional elements to those recited in claim 1, and therefore also do not integrate the recited judicial exceptions into a practical application.
Even when considered as an ordered combination, the additional elements recited in the claims ultimately do no more than add insignificant steps of computer implementation, data gathering, and data storage to an abstract procedure of data observation, pattern recognition, and mathematical calculation. As such, claims 1, 8 and 9 are not patent eligible.
Dependent Claims (Claims 2-7):
	Dependent claims 2-7 narrow the scope of independent claim 1, and likewise narrow the recited judicial exception. They recite abstract idea limitations that are similar to those recited within the independent claims (i.e., mental processes and/or mathematical concepts), and thereby merely expand on the exception at issue. The dependent claims also do not recite any further additional elements that successfully integrate the recited judicial exception into a practical application or provide significantly more than the recited abstract ideas themselves. Consequently, claims 2-7 are also rejected under 35 U.S.C. 101.
Step 1: Claims 2-7 are drawn to an apparatus. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 2-7 each recite a judicially recognized exception of an abstract idea.
	Claim 2 recites, inter alia:
selects each example for which the calculated error is significant, extracts each example similar to the selected example from the data retention means, and outputs the extracted example as an example for which the error is predicted to be significant – This limitation amounts to observing a list of values and selecting those which exceed a predetermined threshold (error is to be significant) and additionally recognizing and selecting other values which are “similar” in nature, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
Claim 3 recites, inter alia:
calculates a degree of appearance, and determines each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant  – This limitation amounts to using mathematical methods to determine likelihood/confidence measures (calculate a degree of appearance), sum up values (weighted sum) and compare quantities to a threshold measure (significant error), and therefore recites mathematical calculation.
Claim 4 recites, inter alia:
generates new error calculation examples by oversampling from the training examples – This limitation amounts to using mathematical methods (oversampling, a statistical random sampling technique) to determine duplicates of existing values, and therefore recites mathematical calculation.
	Claim 5 recites, inter alia:
generates the one or more student models, – Similarly to parent claim 1, this limitation amounts to, under a broadest reasonable interpretation, observing examples, identifying patterns, and constructing rules or algorithms (i.e., a student model – see Claim Interpretation above) to reason about new situations, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper.
and calculates the errors using a remaining part of the training examples as the error calculation examples – Similarly to parent claim 1, this limitation expressly recites a procedure of using mathematical methods on existing values (calculat[ing] errors using a remaining part of the training examples) to quantify difference between variables (predictions), and therefore recites mathematical calculation.
	Claim 6 recites, inter alia:
generates a plurality of sample groups by random sampling with duplicates from the training examples, – This limitation amounts to using mathematical methods (random sampling with duplicates) to determine duplicates of existing values, and therefore recites mathematical calculation.
generates the one or more student models using respective sampling groups, – Similarly to parent claim 1, this limitation amounts to, under a broadest reasonable interpretation, observing examples, identifying patterns, and constructing rules or algorithms (i.e., a student model – see Claim Interpretation above) to reason about new situations, and therefore recites a process of evaluation that a human could reasonably perform in the mind or using pen and paper
calculates the errors using, as the error calculation examples, samples included in the training examples but not included in the sample groups for each of the one or more student models, and calculates an average of the errors calculated for the one or more students as the errors with respect to the predictions of the one or more students and the predictions of the teacher model – Similarly to parent claim 1, this limitation expressly recites a procedure of using mathematical methods on existing values (calculat[ing] errors using a remaining part of the training examples, calculat[ing] an average of the errors) to quantify difference between variables (predictions), and therefore recites mathematical calculation.
Claim 7 recites, inter alia:
calculates the errors using examples other than the training example as the error calculation examples – Similarly to parent claim 1, this limitation expressly recites a procedure of using mathematical methods on existing values (calculat[ing] errors on examples other than the training example) to quantify difference between variables (predictions), and therefore recites mathematical calculation.
Step 2A Prong 2: Claims 2-7 do not recite any further additional elements besides those recited in the independent claims, and therefore do not integrate the recited judicial exceptions into a practical application.
Step 2B: Claims 2-7 do not recite any further additional elements besides those recited in the independent claims, and therefore do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
As such, claims 2-7 also are not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


	Claims 1, 5-6, and 8-9 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available 2009).
	Regarding claim 1, Hady discloses An information processing device (“In this paper, a semi-supervised regression framework, denoted by CoBCReg is proposed, in which an ensemble of diverse regressors is used for semi-supervised learning that requires neither redundant independent views nor different base learning algorithms. Experimental results show that CoBCReg can effectively exploit unlabeled data to improve the regression estimates” [Hady Abstract]; “An experimental study is conducted to evaluate CoBCReg framework on six data sets described in Table 1…All algorithms are implemented using WEKA library [12]” [Hady pages 126-127 Methodology]; Evaluating the disclosed CoBCReg framework through processing datasets and utilizing open source machine learning libraries inherently relies upon conventional computer implementation (i.e., device comprising memory coupled to at least one processor) to perform necessary functions) comprising:
	a memory storing instructions; ([Hady pages 126-127 Methodology] as detailed above) and
one or more processors configured to execute the instructions ([Hady pages 6-7 Methodology] as detailed above) to:
	receive training examples formed by features; (“An experimental study is conducted to evaluate CoBCReg framework on six data sets described in Table 1… The input features and the real-valued outputs are scaled to [0, 1]. For each experiment, 5 runs of 4-fold cross-validation have been performed. That is, for each data set, 25% are used as test set, while the remaining 75% are used as training examples where 10% of the training examples are randomly selected as the initial labeled data set L while the remaining 90% of the 75% of data are used as unlabeled data set U” [Hady pages 126-127 Methodology])
	assign labels to the training examples using a teacher model; (“For each experiment, 5 runs of 4-fold cross-validation have been performed. That is, for each data set, 25% are used as test set, while the remaining 75% are used as training examples where 10% of the training examples are randomly selected as the initial labeled data set L while the remaining 90% of the 75% of data are used as unlabeled data set U” [Hady pages 126-127 Methodology]; Via execution of cross-validation, a component of the disclosed device (i.e., teacher model) accesses stored data to assign a set of labeled training examples L)
	generate one or more student models using at least a part of the training examples to which the labels are assigned, (“In the experiments, an initial ensemble of four RBF network regressors, N = 4, is constructed by Bagging… Table 2 present the average of the RMSEs of the four RBF Network regressors used in CoBCReg and the RMSE of CoBCReg on the test set at iteration 0 (initial) trained only on the 10% available labeled data L” [Hady page 127 Methodology and Results]; see lines 1-3 of Algorithm 1. COBC for Regression (wherein L is the set of m labeled training examples, N is the number of committee members (ensemble size), and hi is each ensemble member) – 
    PNG
    media_image1.png
    63
    666
    media_image1.png
    Greyscale
 [Hady page 123]; The disclosed algorithm initially generates each RBF network regressor (i.e., student model) from a respective set of labeled training examples Li) and calculate errors between predictions of the one or more student models and predictions of the teacher model by using error calculation examples different from the part of the training examples used to generate the one or more student models; (“For each iteration t and for each ensemble member hi, a set U’ of u examples is drawn randomly from U. The SelectRelevantExamples method is applied such that the companion committee Hi (ensemble consists of all members except hi) estimates the output of each unlabeled example in U’” [Hady page 124 Co-training by Committee for Regression (CoBCReg)]; “Then, the root mean squared error (RMSE) of hj is evaluated first (∈ j )… It is worth mentioning that the RMSEs ∈ j and ∈’ j should be estimated accurately. If the training data of hj is used, this will under-estimate the RMSE. Fortunately, since the bootstrap sampling [5] is used to construct the committee, the out-of-bootstrap examples are considered for a more accurate estimate of ∈’j” [Hady pages 125-126 Confidence Measure]; see lines 8-9 of Algorithm 1 – 
    PNG
    media_image2.png
    51
    586
    media_image2.png
    Greyscale
, and lines 2-6 of Algorithm 2. SelectRelevantExamples – 
    PNG
    media_image3.png
    156
    712
    media_image3.png
    Greyscale
 [Hady page 123]; The disclosed algorithm calculates error for out-of-bag examples from validation set Vj (i.e., error calculation examples) by calculating root mean squared error (RMSE) between example label (i.e., predictions of teacher model) and RBF network output (i.e., predictions of student model) – note that examples of Vj are out-of-bag, i.e., separate from bagging examples Li used to initially generate ensemble members, as shown in line 2 of Algorithm 1)
	retain examples formed by features in a data retention means; (That is, for each data set, 25% are used as test set, while the remaining 75% are used as training examples where 10% of the training examples are randomly selected as the initial labeled data set L while the remaining 90% of the 75% of data are used as unlabeled data set U” [Hady pages 126-127 Methodology]; All examples, including unlabeled examples, are accessed from the data set and retained by the disclosed device)
	extract and output each example for which the error is to be significant based on the calculated errors, from the data retention means (“Thus, for each regressor hj , create a pool U_ of u unlabeled examples… Finally, the unlabeled example ˜xj which maximizes the relative improvement of the RMSE (Δxu ) is selected as the most relevant example labeled by companion committee Hj” [Hady page 125 Confidence Measure]; see lines 7-14 of Algorithm 2 –
    PNG
    media_image9.png
    202
    508
    media_image9.png
    Greyscale
 [Hady page 123]; Based on measure Δxu, which is determined based on calculated errors (see 
    PNG
    media_image10.png
    25
    192
    media_image10.png
    Greyscale
in line 5 of Algorithm 2), the disclosed algorithm returns each example that meets a condition with respect to its impact on the calculated errors (i.e., error is significant)).
	Regarding claim 5, Hady discloses the limitations of parent claim 1, and further generates the one or more student models, (see lines 1-3 of Algorithm 1. COBC for Regression [Hady page 123] as detailed in claim 1 above; The disclosed algorithm initially generates each RBF network regressor (i.e., student model) from a respective set of labeled training examples Li) and calculates the errors using a remaining part of the training examples as the error calculation examples (see line 2 of Algorithm 1 and lines 2-6 of Algorithm 2 [Hady page 123] as detailed in claim 1 above; Examples of validation set Vj used to calculate errors are out-of-bag, i.e., separate from bagging examples Li used to initially generate ensemble members).
	Regarding claim 6, Hady discloses the limitations of parent claim 1, and further generates a plurality of sample groups by random sampling with duplicates from the training examples (see line 2 of Algorithm 1 – 
    PNG
    media_image11.png
    38
    885
    media_image11.png
    Greyscale
 [Hady page 123]), generates the one or more student models using respective sampling groups (see line 3 of Algorithm 1 – 
    PNG
    media_image12.png
    26
    337
    media_image12.png
    Greyscale
[Hady page 123]), calculates the errors using, as the error calculation examples, samples included in the training examples but not included in the sample groups for each of the one or more student models (see line 2 of Algorithm 1 and lines 2-6 of Algorithm 2 [Hady page 123] as detailed in claim 1 above; Examples of validation set Vj used to calculate errors are out-of-bag, i.e., separate from bagging examples Li used to initially generate ensemble members), and calculates an average of the errors calculated for the one or more students as the errors with respect to the predictions of the one or more students and the predictions of the teacher model (“Figure 1 shows the RMSE of CoBCReg (CoBCReg), and the average of the RMSEs of the four regressors used in CoBCReg (RBFNNs) at the different SSL iterations. The dash and solid horizontal lines show the average of the RMSEs of the four regressors and the RMSE of the ensemble trained using only the 10% labeled data, respectively, as a basline for the comparison” [Hady page 217 Results]; see Figure 1 [Hady page 218]).
	Regarding claims 8 and 9, they are method and product claims that largely correspond to the apparatus of claim 1, which is already disclosed by Hady as detailed above. Consequently, they are rejected for the same reasons as claim 1.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available conference 2009), as applied to claim 1 above, further in view of Kee (“Query-by-committee improvement with diversity and density in batch active learning”, available online 3 May 2018).
Regarding claim 2, Hady teaches the limitations of parent claim 1, and further teaches select[ing] each example for which the calculated error is significant (see lines 7-14 of Algorithm 2 [Hady page 123] as detailed in claim 1 above; Based on measure Δxu, which is determined based on calculated errors (see 
    PNG
    media_image10.png
    25
    192
    media_image10.png
    Greyscale
in line 5 of Algorithm 2), the disclosed algorithm returns each unlabeled example that meets a condition with respect to its impact on the calculated errors (i.e., error is significant))
However, Hady does not expressly teach extract[ing] each example similar to the selected example from the data retention means, and output[ting] the extracted example as an example for which the error is predicted to be significant.
In the same field of endeavor, Kee teaches a query-by-committee ensemble learning framework that utilizes sets of labeled and unlabeled data to answer queries (“In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning” [Kee Abstract]; “We describe general BAL procedures discussed in Sections 3.1 to 3.3 , namely QBC only (QO), QBC and diversity (QD), and QBC, diversity, and density (QDD) settings, respectively. The overall procedures are similar in BAL scheme, yet show difference in the objective function and instance selection. Initially, a set of labeled instances, L , a set of unlabeled instances, U, and a constant batch size, q , are given” [Kee page 405 Batch active learning procedures]) that extracts each unlabeled example similar to the selected labeled example from the data retention means, and outputs the extracted unlabeled example as an example for which the error is predicted to be significant (“One should determine a distance measure to evaluate instance utility introduced by diversity for unlabeled instances x ∗. This decision is similar to linkage selection in clustering. A common approach is to take the minimum distance to the labeled instances and the unlabeled instances already in a query 
    PNG
    media_image4.png
    43
    218
    media_image4.png
    Greyscale
 where dist( x ∗ , x ) is a distance function, L , U and Q are sets of labeled, unlabeled, and previously selected query instances, respectively.” [Kee page 404 Incorporating diversity]; see lines 7-14 in Algorithm 2 QD: Batch active learning with query-by-committee and diversity – 
    PNG
    media_image5.png
    197
    530
    media_image5.png
    Greyscale
 [Kee page 406]; The disclosed algorithm, for each selected labeled example, determines, and outputs via variable assignment (s in line 11), a related unlabeled example with maximal uncertainty (F(x) term in argmax expression) and minimal distance (i.e., similarity) to the labeled example (min(D(x)) term in argmax expression)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated extract[ing] each unlabeled example similar to the selected labeled example from the data retention means, and output[ting] the extracted unlabeled example as an example for which the error is predicted to be significant as taught by Kee into Hady because they are both directed towards ensemble learning frameworks that utilize sets of labeled and unlabeled data to answer queries. It is noted that Hady expressly discusses adaptability of the disclosed CoBCReg semi-supervised learning framework to a query by committee active learning environment (“There are many interesting directions for future work…Finally, to enhance the performance of CoBCReg by interleaving it with Query by Committee [8]. Combining semi-supervised learning and active learning within the Co-Training setting has been applied effectively for classification” [Hady page 130 Conclusions and Future Work]), as well as the suitability of additional confidence measures, beyond that disclosed in Hady, for selecting relevant examples (“Third, to explore other confidence measures that are more efficient and effective” [Hady page 130 Conclusions and Future Work]). A person of ordinary skill in the art would thereby recognize the value of incorporating the uncertainty, diversity, and density functions [Kee pages 404-405 Incorporating density and diversity], as taught by Kee, into the CoBCReg framework of Hady by modifying the algorithmic selection of relevant examples to prioritize selection of uncertain / least confident unlabeled examples, as is typical for an active learning framework. Incorporating these teachings would thereby enable utilization of the CoBCReg framework for an active learning training objective as suggested by Hady, and further boost model performance through inclusion of diversity and density terms for example selection [Kee Abstract].
Regarding claim 3, the combination of Hady and Kee teaches the limitations of parent claim 2, and Kee further teaches calculat[ing] a degree of appearance, and determin[ing] each error calculation example for which a weighted sum of the degree of appearance and the error as an example for which the error is significant (“Unlike diversity which is generally incorporated in BAL, density has been often overlooked, but it is necessary to take into account density to prevent outliers in queries and label more representative instances which may improve uncertainty sampling further. Some studies incorporate density consideration with uncertainty under serial AL setting. One approach is to utilize density estimation (DE) methods. … Both DE and ER approaches assume that instances from denser regions are more informative in estimating the decision boundary, yet how they compare denser regions are different. DE approaches evaluate the information of an instance based on the actual density at the instance, p ( x ), derived from the underlying probability distribution in the feature space...Introducing a density factor extends Eq. (6) to the following form 
    PNG
    media_image6.png
    31
    377
    media_image6.png
    Greyscale
 where f ( x ), d ( x ), and h ( x ) are the uncertainty, diversity, and density functions, respectively, and 0 ≤λ, β ≤1 such that λ + β ≤ 1 control the relative importance” [Kee page 404-405 Incorporating diversity and density]; see line 12 in Algorithm 3 QDD: Batch active learning with query-by-committee, diversity and density – 
    PNG
    media_image7.png
    22
    592
    media_image7.png
    Greyscale
” [Kee page 407]; A related unlabeled example s may be selected through adjusted uncertainty measure u(x), which is a sum (weighted through importance terms λ and B) of uncertainty f(x), diversity d(x) and density h(x), wherein density h(x) is a confidence measure (i.e., degree of appearance) drawn from the underlying distribution of the data)
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available conference 2009), as applied to claim 1 above, further in view of Nandi (“Sampling Based Methods for Class Imbalance in Datasets”, available online 15 May 2017).
Regarding claim 4, Hady teaches the limitations of parent claim 1.
However, Hady does not expressly teach generat[ing] new error calculation examples by oversampling from the training examples.
In the same field of endeavor, Nandi teaches a means of data re-sampling through bootstrapping techniques (“In practice, we can't travel to a parallel universe (...yet) and re-collect this data, but we can simulate this using the bootstrap method. The idea behind bootstrap is simple: If we resample points with replacement from our data, we can treat the re-sampled dataset as a new dataset we collected in a parallel universe. Using the bootstrap method, I can create 2,000 re-sampled datasets from our original data and compute the mean of each of these datasets” [Nandi pages 3-4]) that generates new examples by oversampling from existing examples (“We want to use the general principle of bootstrap to sample with replacement from our minority class, but we want to adjust each re-sampled value to avoid exact duplicates of our original data. This is where the Synthetic Minority Oversampling Technique (SMOTE) algorithm comes in. The SMOTE algorithm can be broken down into four steps: 1. Randomly pick a point from the minority class. 2. Compute the k-nearest neighbors (for some pre-specified k) for this point. 3. Add k new points somewhere between the chosen point and each of its neighbors” [Nandi page 7]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated generates new examples by oversampling from existing examples as taught by Nandi into Hady because they both are directed towards data re-sampling through bootstrapping techniques. Hady expressly utilizes bootstrap sampling to generate sampling groups for each ensemble predictor, the sampling groups further including labeled out-of-bag samples for validation error calculation (“Fortunately, since the bootstrap sampling [5] is used to construct the committee, the out-of-bootstrap examples are considered for a more accurate estimate of ∈’ j” [Hady page 126]). Given that availability of labeled training data is commonly limited in real-life scenarios (“For regression tasks, labeling the examples for training is a time consuming, tedious and expensive process” [Hady page 129 Conclusions and Future Work]) incorporating the oversampling taught by Nandi into Hady would thereby address potential class imbalance in labeled examples and thereby improve representativeness of bootstrap sampled groups.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Hady (“Semi-supervised Learning for Regression with Co-training by Committee”, available conference 2009), as applied to claim 1 above, further in view of Yang (“Active Learning Using Uncertainty Information”, available conference 2016).
Regarding claim 7, Hady teaches the limitations of parent claim 1.
However, Hady does not expressly teach calculating the errors using examples other than the training example as the error calculation examples.
In the same field of endeavor, Yang discloses an ensemble learning framework that utilizes sets of labeled and unlabeled data to answer queries (“The second class, retraining free active learning, contains the remaining methods which not need repeatedly train the model for each unlabeled instance during one single selection. For example, uncertainty sampling and query-by-committee belong to this category…We concentrate on the pool-based active learning setting which assumes a large pool of unlabeled data along with a small set of labeled data already available [2].” [Yang page 2647]) that calculat[es] errors using examples other than the training example as the error calculation examples (“Firstly, let us introduce some preliminaries and notation. Let  
    PNG
    media_image8.png
    31
    151
    media_image8.png
    Greyscale
 represent the training data set that consists of m labeled instances and U be the pool of unlabeled instances” [Yang page 2647 Retraining-Based Active Learning]; “Expected error reduction has demonstrated its effectiveness on text classification domain [8]. There are also some followup work of EER contributed by other researchers [9] [10] [11]. EER aims to select the sample which will reduce the future generalization error. Since we can not see the test data, the unlabeled pool can be used as the validation set to predict the future test error” [Yang page 2647 Expected Error Reduction]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated calculat[ing] errors using examples other than the training example as the error calculation examples as taught by Yang into Hady because they are both directed towards ensemble learning frameworks that utilizes sets of labeled and unlabeled data to answer queries. Given that availability of labeled training data is commonly limited in real-life scenarios (“For regression tasks, labeling the examples for training is a time consuming, tedious and expensive process” [Hady page 129 Conclusions and Future Work]) incorporating the teaching of Yang into Hady by modifying sampling of validation sets to consist of unlabeled examples would be beneficial in instances where all available labeled data is reserved for initial training of ensemble models.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hady et al. (“Combining Committee-Based Semi-Supervised Learning and Active Learning”, available 2010) discloses two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY M BALAKRISHNAN whose telephone number is (571) 272-0455. The examiner can normally be reached 10am-5pm EST Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.M.B./
Examiner, Art Unit 2143 
	

/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

May 16, 2023
Application Filed
Apr 04, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/766,854
Patent 12585912
GATED LINEAR CONTEXTUAL BANDITS
2y 5m to grant Granted Mar 24, 2026
17/517,698
Patent 12468967
METHOD AND SYSTEM FOR GENERATING A SOCIO-TECHNICAL DECISION IN RESPONSE TO AN EVENT
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
43%
Grant Probability
99%
With Interview (+85.7%)
3y 12m
Median Time to Grant
Low
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.