DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-7 and 9 have been amended. Claims 1-9 have been examined.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested: Error Prediction Model Generator
Claim Interpretation
Claims 1, 8, and 9 provide limitations including “the error is predicted to be significant based on the error prediction model.” While the claims fail to support a prediction of significance using a precise numerical measurement, one of ordinary skill would be apprised of the scope through descriptions in Applicant’s as-filed specification, e.g. at ¶ 0027 which provides a discussion of using a regression model to determine significance, and also at ¶ 0031 which discusses an average or variance “close to a maximum value,” or “greater than a predetermined threshold value.” Broad interpretation is made in view of Applicant’s disclosure. See MPEP 2173.05(b)(I).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-3 and 5-9 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 6 and 8-9 of copending Application No. 18037149 in view of “Learning Loss for Active Learning” by Yoo et al. (“Yoo”).
In regard to claim 1, 18037149 claims:
1. An information processing device comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: 18037149 claim 1: “An information processing device comprising: a memory storing instructions; and one or more processors configured to execute the instructions to”
accept training examples formed by features; claim 1: “receive training examples formed by features;”
assign labels to the training examples; claim 1: “assign labels to the training examples …”
generate one or more student models using the training examples to which the labels are assigned, and claim 1: “generate one or more student models using at least a part of the training examples to which the labels are assigned,”
calculate errors between predictions of the one or more student models and the labels; claim 1: “calculate errors between predictions of the one or more student models and predictions of the teacher model …”
18037149 does not expressly claim:
generate an error prediction model which is a model for predicting the errors; and This is taught by Yoo Fig. 2, “loss prediction module.” Also Fig. 3, “loss prediction module outputs a predicted loss …” Also section 3.2, “The loss prediction module is core to our task-agnostic active learning since it learns to imitate the loss defined in the target model.” Also section 3.3, “Then, the final loss function to jointly learn both of the target model and the loss prediction module is defined as Ltarget(
y
^
, y)+ λLloss(ˆl, l).” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Yoo’s error prediction model with student models of 18037149 in order to train a model using more informative data as suggested by Yoo (see 2nd paragraph right column on p. 94).
18037149 also claims:
output each example for which the error is predicted to be significant based on the error prediction model. Claim 1: “output each example for which the error is to be significant based on the calculated errors.”
In regard to claim 2, 18037149 in view of Yoo also teaches:
2. The information processing device according to claim 1, wherein the processor generates a differentiable error prediction model based on the errors between the predictions of the one or more student models and the labels regarding a plurality of the training examples. Yoo, “After that, we learn the model set over
L
2
K
1
to obtain { Θ1target , Θ1loss
}.” Also see Fig. 2, depicting differentiation between iterations of loss prediction.
In regard to claim 3, 18037149 in view of Yoo also teaches:
3. The information processing device according to claim 1, wherein the error prediction model is a regression model, and the processor predicts the errors of the examples based on a slope of the regression model. Yoo, p. 100, bottom left column, “Our loss prediction module predicts the regression loss with about 75% of ranking accuracy (Figure 5), which enables efficient active learning in this problem.” Note that use of a regression model inherently relies upon slope as a defining feature of regression.
In regard to claim 5, 18037149 claims:
5. The information processing device according to claim 1, wherein the processor assigns the labels to the training examples by using a teacher model which is generated using the training examples, and the processor calculates the errors between the labels corresponding to predictions of the teacher model and the predictions of the one or more student models. Claim 1: “calculate errors between predictions of the one or more student models and predictions of the teacher model by using error calculation examples different from the part of the training examples used to generate the one or more student models;”
In regard to claim 6, 18037149 claims:
6. The information processing device according to claim 1, wherein the processor generates the one or more student models using examples corresponding to at least a part of the training examples, and claim 1: “generate one or more student models using at least a part of the training examples.”
calculates the errors using examples different from the examples used to generate the one or more student models. Claim 1: “calculate errors … by using error calculation examples different from the part of the training examples used to generate the one or more student models;”
In regard to claim 7, 18037149 claims:
7. The information processing device according to claim 1, wherein the processor generates a plurality of sampling groups by random sampling with duplicates from the training examples, Claim 6: “wherein the processor generates a plurality of sample groups by random sampling with duplicates from the training examples”
generates the one or more student models using each of the sampling groups, claim 6: “generates the one or more student models using respective sampling groups,”
calculates, for each of the one or more student models, the errors with respect to data which are included in the training examples but not included in the sampling group, and claim 6: “calculates the errors using, as the error calculation examples, samples included in the training examples but not included in the sample groups for each of the one or more student models,”
calculates an average of the errors calculated for the one or more student models. Claim 6: “calculates an average of the errors calculated for the one or more students as the errors with respect to the predictions of the one or more students.”
In regard to claims 8-9:
See 18037149 claims 8-9.
This is a provisional nonstatutory double patenting rejection.
Claim 4 is provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 18037149 in view of Yoo and U.S. Patent Application Publication 20150170020 by Garimella ("Garimella").
In regard to claim 4, 18037149 in view of Yoo also teaches:
4. The information processing device according to claim 1, wherein the error prediction model is a model which outputs a differentiable average … of the errors; and Yoo, p. 95, bottom right column, “Each feature map is reduced to a fixed dimensional feature vector through a global average pooling (GAP) layer and a fully-connected layer.” p. 98, Fig. 5 and left col., par. 3, “These binary scores from every pair of test sets are averaged to a value named “ranking accuracy”.” Note that Fig. 5 depicts a differentiable average.
18037149 and Yoo do not expressly teach:
… and variance … the processor outputs each example for which the error is predicted to be significant based on at least one of the differentiable average and variance. This is taught by Garimella, ¶ 0018, “One method of training models may use stochastic gradients. In stochastic gradient training, a modification to each parameter of a model may be based on the error in the output produced by the model. A derivative, or "gradient," can be computed that corresponds to the direction in which each individual parameter of the model is to be adjusted in order to improve the model output (e.g., to produce output that is closer to the correct or preferred output for a given input). The average variance per node at the output of the intermediate hidden linear layer may be estimated by assuming a fixed variance at each previous non-linear hidden output.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Garimella’s gradients with the error model of 18037149 and Yoo in order to provide model training as suggested by Garimella.
This is a provisional nonstatutory double patenting rejection.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-3 and 6-9 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Learning Loss for Active Learning” by Yoo et al. (“Yoo”).
In regard to claim 1, Yoo discloses:
1. An information processing device comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: Yoo, e.g. section 4, “We have implemented our method and all the recognition tasks with PyTorch [40].” Yoo broadly teaches implementation using conventional technologies and datasets which inherently rely upon conventional implementation using programmed computers.
accept training examples formed by features; Yoo, section 3.1 “h is a feature set of x.” Also Fig. 3, “Given an input …”
assign labels to the training examples; Yoo, Fig. 3, “Given an input, the target model outputs a target prediction …” Also section 3.3, “With the target annotation y of x, …”
generate one or more student models using the training examples to which the labels are assigned, and Yoo, Fig. 3, “… learn the target model.” Also section 3.3, “Given a training data point x, we obtain a target prediction through the target model as yˆ = Θtarget(x),”
calculate errors between predictions of the one or more student models and the labels; Yoo, Fig. 3, “The target prediction and the target annotation are used to compute a target loss …” Also section 3.3, “With the target annotation y of x, the target loss can be computed as l = Ltarget(yˆ, y) to learn the target model.”
generate an error prediction model which is a model for predicting the errors; and Yoo, Fig. 2, “loss prediction module.” Also Fig. 3, “loss prediction module outputs a predicted loss …” Also section 3.2, “The loss prediction module is core to our task-agnostic active learning since it learns to imitate the loss defined in the target model.” Also section 3.3, “Then, the final loss function to jointly learn both of the target model and the loss prediction module is defined as Ltarget(
y
^
, y)+ λLloss(ˆl, l).”
output each example for which the error is predicted to be significant based on the error prediction model. Yoo, section 3.1, “After initial training, we evaluate all the data points in the unlabeled pool by the loss prediction module to obtain data-loss pairs {(x, ˆl)|x ∈ U 0N-K}. Then, human oracles annotate the data points of the K-highest losses.” Also section 3.3, top left column on p. 97, “This loss prediction module will pick the most informative data points and ask human oracles to annotate them for the next active learning stage s + 1.”
In regard to claim 2, Yoo also discloses:
2. The information processing device according to claim 1, wherein the processor generates a differentiable error prediction model based on the errors between the predictions of the one or more student models and the labels regarding a plurality of the training examples. Yoo, “After that, we learn the model set over
L
2
K
1
to obtain { Θ1target , Θ1loss
}.” Also see Fig. 2, depicting differentiation between iterations of loss prediction.
In regard to claim 3, Yoo also discloses:
3. The information processing device according to claim 1, wherein the error prediction model is a regression model, and the processor predicts the errors of the examples based on a slope of the regression model. Yoo, p. 100, bottom left column, “Our loss prediction module predicts the regression loss with about 75% of ranking accuracy (Figure 5), which enables efficient active learning in this problem.” Note that use of a regression model inherently relies upon slope as a defining feature of regression.
In regard to claim 6, Yoo also discloses:
6. The information processing device according to claim 1, wherein the processor generates the one or more student models using examples corresponding to at least a part of the training examples, and calculates the errors using examples different from the examples used to generate the one or more student models. Yoo, section 3.1 in view of Fig. 1-(b), “After initial training, we evaluate all the data points in the unlabeled pool by the loss prediction module to obtain data-loss pairs {(x, ˆl)|x ∈ U 0 N-K}. Then, human oracles annotate the data points of the K-highest losses. The labeled dataset L02K is updated with them and becomes L12K. After that, we learn the model set over L12K to obtain { Θ1target ,Θ1loss}. “
In regard to claim 7, Yoo discloses:
7. The information processing device according to claim 1, wherein the processor generates a plurality of sampling groups by random sampling with duplicates from the training examples, Yoo section 3.1, “Then, we uniformly sample K data points at random from the unlabeled pool.”
generates the one or more student models using each of the sampling groups, Yoo section 3.1 “Once the initially labeled dataset L0K is obtained, we jointly learn an initial target model Θ1target …”
calculates, for each of the one or more student models, the errors with respect to data which are included in the training examples but not included in the sampling group, and Yoo, section 3.1, “After initial training, we evaluate all the data points in the unlabeled pool by the loss prediction module to obtain data-loss pairs …”
calculates an average of the errors calculated for the one or more student models. Yoo, section 4.1 on p. 98, 3rd par. left column, “These binary scores from every pair of test sets are averaged to a value named “ranking accuracy.”
In regard to claim 8, Yoo discloses:
8. An information processing method comprising: See Yoo, at least Fig. 3, broadly depicting a method.
All further limitations of claim 8 have been addressed in the above rejection of claim 1.
In regard to claim 9, Yoo discloses:
9. A non-transitory computer-readable recording medium storing a program, the program causing a computer to perform a process comprising: Yoo, e.g. section 4, “We have implemented our method and all the recognition tasks with PyTorch [40].” Yoo broadly teaches implementation using conventional technologies and datasets which inherently rely upon conventional implementation using programmed computers. Note that PyTorch in particular is a deep learning library which requires implementation using computer processors that utilize non-transitory recording media in order to store instructions and data.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yoo as applied above, and further in view of U.S. Patent Application Publication 20150170020 by Garimella ("Garimella").
In regard to claim 4, Yoo does not expressly disclose:
4. The information processing device according to claim 1, wherein the error prediction model is a model which outputs a differentiable average … of the errors; and Yoo, p. 95, bottom right column, “Each feature map is reduced to a fixed dimensional feature vector through a global average pooling (GAP) layer and a fully-connected layer.” p. 98, Fig. 5 and left col., par. 3, “These binary scores from every pair of test sets are averaged to a value named “ranking accuracy”.” Note that Fig. 5 depicts a differentiable average.
Yoo does not expressly disclose:
… and variance … the processor outputs each example for which the error is predicted to be significant based on at least one of the differentiable average and variance. This is taught by Garimella, ¶ 0018, “One method of training models may use stochastic gradients. In stochastic gradient training, a modification to each parameter of a model may be based on the error in the output produced by the model. A derivative, or "gradient," can be computed that corresponds to the direction in which each individual parameter of the model is to be adjusted in order to improve the model output (e.g., to produce output that is closer to the correct or preferred output for a given input). The average variance per node at the output of the intermediate hidden linear layer may be estimated by assuming a fixed variance at each previous non-linear hidden output.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Garimella’s gradients with Yoo’s error model in order to provide model training as suggested by Garimella.
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yoo as applied above, and further in view of “Active Generative Adversarial Network for Image Classification” by Kong et al. (“Kong”).
In regard to claim 5, Yoo does not expressly disclose:
5. The information processing device according to claim 1, wherein the processor assigns the labels to the training examples by using a teacher model which is generated using the training examples, and the processor calculates the errors between the labels corresponding to predictions of the teacher model and the predictions of the one or more student models. This is taught by Kong, 1st full par. at upper left on p. 4, “AC-GAN model is used to generate labeled samples.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Kong’s AC-GAN teacher model with Yoo’s target model in order to improve learning performance as suggested by Kong (see p. 2, right column, bottom of 2nd paragraph).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent Application Publication 20190188598 by Ishida et al. teaches thresholds for data selection. See ¶ 0025, “The threshold 10c is used to define a range (magnitude) of an error upon the learning of the error prediction model 14. For example, when ±10% of a true price are used to determine an error as a large error or a small error, ±10% of the true price are used as the threshold 10c.”
U.S. Patent 10963792 to Kim et al. teaches data selection based upon averages and variances. See col. 10, lines 7-15, “if at least part of the variances is equal to or greater than the variance threshold, a process of determining said at least part of the variances as the specific variances, and (iii) a process of acquiring each piece of the specific unlabeled data, corresponding to each of the specific variances, as the sub unlabeled data, and a process of excluding remaining unlabeled data, corresponding to the variances less than the variance threshold, from the training data.” Also Fig. 7 and col. 11 lines 60-64, “… apply the learning operation to the existing labeled data and thus to generate existing output information and may acquire an averaged loss, calculated by averaging the existing losses on the existing output information, as the base loss.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703. The examiner can normally be reached M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/James D. Rutten/Primary Examiner, Art Unit 2121