Office Action Analysis: 18174625 — NON-TRANSITORY RECORDING MEDIUM, INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
Acknowledgment is made of the Information Disclosure Statement dated 02/26/2023. All of the cited references have been considered.
Drawings
	The drawings have been received on 02/26/2023. These drawings are accepted.
Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding Claim 1,
	Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 1 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“for a classification model for classifying input data into one or another of a plurality of classes that was trained using a first data set, identifying, in a second data set that is different from the first data set, one or more items of data having a specific datum of which a degree of contribution to a change in a classification criterion is greater than a predetermined threshold, the classification criterion being a classification criterion of the classification model during re- training based on the second data set; and”
“from among the one or more items of data, detecting an item of data, for which a loss reduces for the classification model by change to the classification criterion by re-training based on the second data set, as an item of data of an unknown class not contained in the plurality of classes.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., identifying, detecting). The above limitations in the context of this claim encompass, inter alia, identifying items of data, detecting an item of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Regarding Claim 2,
	Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 2 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“wherein the classification criterion is a weight to identify a decision plane indicating a boundary between classes in the classification model.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., identifying). The above limitations in the context of this claim encompass, inter alia, identifying items of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding Claim 3,
	Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 3 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“wherein a movement distance, when each of the items of data contained in the second data set is moved so as to reduce an update value of the classification criterion of the classification model during re-training based on the second data set, is computed as the degree of contribution.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., moving). The above limitations in the context of this claim encompass, inter alia, moving items of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding Claim 4,
	Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 4 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“wherein an item of data having a positive increase amount of the loss when each of the one or more items of data is moved in a direction to suppress change of the classification criterion by re-training based on the second data set is detected as the item of data of an unknown class.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., moving). The above limitations in the context of this claim encompass, inter alia, moving items of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding Claim 5,
	Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 5 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“when the classification model is a differentiable model, a magnitude of a first gradient of the classification criterion with respect to the loss is computed as the update value, and a magnitude of a second gradient of each of the items of data contained in the second data set with respect to the magnitude of the first gradient is computed as the movement distance.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., identifying). The above limitations in the context of this claim encompass, inter alia, identifying items of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding Claim 6,
	Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 6 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“wherein an item of data having a positive inner product between the second gradient and a third gradient of each of the one or more items of data with respect to the loss is detected as being the item of data of an unknown class.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., identifying). The above limitations in the context of this claim encompass, inter alia, identifying items of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding Claim 7,
	Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Step 1 Analysis: Claim 7 is directed to a non-transitory recording medium, i.e., a machine, one of the statutory categories.
	Step 2A Prong One Analysis: The limitations:
“wherein a correct label is appended to each of the items of data contained in the second data set based on a classification result by the classification model for each of the items of data contained in the second data set, and an error between the classification result by the classification model for each of the items of data contained in the second data set and the correct label is computed as the loss.”
As drafted, under their broadest reasonable interpretation, cover concepts performed in human mind (including an observation, evaluation, judgement, or opinion, e.g., appending). The above limitations in the context of this claim encompass, inter alia, appending correct labels to items of data (corresponding to mental processes which can be done mentally or by pen and paper).
Step 2A Prong Two Analysis: Please see the corresponding analysis of Claim 1.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding Claim 8,
Claim 8 recites a device for performing steps similar of claim 1 and is rejected with the same rationale, mutatis mutandis, in view of the following additional elements, considered individually and as an ordered combination with the additional elements identified above, failing to integrate the abstract idea into a practical application or amount to significantly more than the abstract idea:
“An information processing device comprising:”
“a memory; and a processor coupled to the memory, the processor being configured to execute processing, the processing comprising:”
	This is a recitation of generic computing components to be used in performing the abstract idea, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea. See MPEP 2106.05(f).
	
Regarding Claim 9,
	Claim 9 recites a device for performing steps substantially similar to those of claim 2 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 10,
	Claim 10 recites a device for performing steps substantially similar to those of claim 3 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 11,
	Claim 11 recites a device for performing steps substantially similar to those of claim 4 and is rejected with the same rationale, mutatis mutandis.
Regarding Claim 12,
	Claim 12 recites a device for performing steps substantially similar to those of claim 5 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 13,
	Claim 13 recites a device for performing steps substantially similar to those of claim 6 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 14,
	Claim 14 recites a device for performing steps substantially similar to those of claim 7 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 15,
Claim 15 recites a method for performing steps similar of claim 1 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 16,
	Claim 16 recites a method for performing steps substantially similar to those of claim 2 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 17,
	Claim 17 recites a method for performing steps substantially similar to those of claim 3 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 18,
	Claim 18 recites a method for performing steps substantially similar to those of claim 4 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 19,
	Claim 19 recites a method for performing steps substantially similar to those of claim 5 and is rejected with the same rationale, mutatis mutandis.

Regarding Claim 20,
	Claim 20 recites a method for performing steps substantially similar to those of claim 6 and is rejected with the same rationale, mutatis mutandis.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 3, 4, 8, 9, 10, 11, 15, 16, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Koh et al. (Understanding Black-box Predictions via Influence Functions); hereinafter Koh in view of Zhang et al. (US20200349434A1); hereinafter Zhang and in further view of Zhu et al. (Confidence-based stopping criteria for active learning for data annotation); hereinafter Zhu
Claim 1 is rejected over Koh, Zhang and Zhu.
	Regarding claim 1, Koh teaches identifying, in a second data set that is different from the first data set, one or more items of data having a specific datum of which a degree of contribution [to a change in a classification criterion is greater than a predetermined threshold,] (“We highlight two key differences from x · xtest. First, σ(−yθTx) gives points with high training loss more influence, revealing that outliers can dominate the model parameters. Second, the weighted covariance matrix                         
                            
                                
                                    H
                                
                                
                                    
                                        
                                            θ
                                             
                                        
                                        ^
                                    
                                     
                                
                                
                                    -
                                    1
                                
                            
                        
                    measures the “resistance” of the other training points to the removal of z; if                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            L
                            (
                            z
                            ,
                            
                                
                                    θ
                                
                                ^
                            
                            )
                        
                     points in a direction of little variation, its influence will be higher since moving in that direction will not significantly increase the loss on other training points.”; [Section 2.3. Relation to Euclidean]) 
	the classification criterion being a classification criterion of the classification model during re-training based on the second data set; and (“Since removing a point z is the same as upweighting by ∈ =                         
                            -
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                        
                    , we can linearly approximate the parameter change due to removing z by computing                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    -
                                    z
                                
                            
                        
                     -                         
                            
                                
                                    θ
                                     
                                
                                ^
                            
                        
                     ≈                         
                            -
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    I
                                
                                
                                    u
                                    p
                                    ,
                                    p
                                    a
                                    r
                                    a
                                    m
                                    s
                                
                            
                            (
                            z
                            )
                        
                    ”; [Section 2.1. Upweighting a training point]; and “Influence functions assume that the weight on a training point is changed by an infinitesimally small . To investigate the accuracy of using influence functions to approximate the effect of removing a training point and retraining, we compared … while results were noisier, it was still able to identify the most influential points.”; [4.1. Influence functions vs. leave-one-out retraining])
from among the one or more items of data, detecting an item of data, [for which a loss reduces for the classification model by change to the classification criterion by re-training based on the second data set, as an item of data of an unknown class not contained in the plurality of classes. (“ambiguous or mislabeled training images are effective points to attack, since the model has low confidence and thus high loss on them, making them highly influential (recall Section 2.3). For example, the image in Fig 5 contains both a dog and a fish and is highly ambiguous; as a result, it is the training example that the model is least confident on”; [Section 5.2. Adversarial training examples])
Koh does not appear to explicitly teach a non-transitory recording medium storing a program that causes a computer to execute a process, the process comprising:
for a classification model for classifying input data into one or another of a plurality of classes that was trained using a first data set,
However, Zhang teaches a non-transitory recording medium storing a program that causes a computer to execute a process, the process comprising: (“in various embodiments, a computer readable storage medium as used herein can include non-transitory and tangible computer readable storage mediums.” [0089])
for a classification model for classifying input data into one or another of a plurality of classes that was trained using a first data set, (“The training data samples 102 can include data samples that were previously used to train and/or develop a particular ML model (not shown) that resulted in correct or accurate predictions by the ML model (referred to herein as correctly predicted training samples).”; [0039]; and “For example, in various embodiments, the data samples comprise images and the ML model comprises an inferencing model configured to automatically classify the images and/or features in the images.”; [0028]))
It would have been obvious before the effective filing date to combine the influential data points of Koh with the test dataset and classification model of Zhang to effectively evaluate new, previously unknown datasets from multiple sources (Zhang, [0025]). Koh and Zhang are analogous art because they both concern determining the influence of data points on a trained classification model.
Koh does not appear to explicitly teach [identifying, in a second data set that is different from the first data set, one or more items of data having a specific datum of which a degree of contribution] to a change in a classification criterion is greater than a predetermined threshold,
However, Zhu teaches [identifying, in a second data set that is different from the first data set, one or more items of data having a specific datum of which a degree of contribution] to a change in a classification criterion is greater than a predetermined threshold, (“we present a method, called the Threshold Update (TU) strategy, to automatically adjust the predefined threshold of a stopping criterion during the active learning process. This method considers the
potential ability of each unlabeled example on changing decision boundaries and checks whether there is any classification label change to the remaining unlabeled examples during two recent consecutive learning cycles (previous and current). It checks whether the active learning becomes stable when the current stopping criterion is satisfied. If not, we believe there are some remaining unlabeled examples that can potentially shift the decision boundaries. In such cases, the threshold of the current stopping criterion can be revised to keep the active learning process going”; page 11)
It would have been obvious before the effective filing date to combine the influential data points of Koh with the classification error of Zhu to improve learning performance (Zhu, page 15). Kho and Zhu are analogous art because they both concern classification models.
Claim 2 is rejected over Koh, Zhang and Zhu with the incorporation of claim 1.
Regarding claim 2, Koh teaches a weight to identify (“Since removing a point z is the same as upweighting by ∈ =                         
                            -
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                        
                    , we can linearly approximate the parameter change due to removing z by computing                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    -
                                    z
                                
                            
                        
                     -                         
                            
                                
                                    θ
                                     
                                
                                ^
                            
                        
                     ≈                         
                            -
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    I
                                
                                
                                    u
                                    p
                                    ,
                                    p
                                    a
                                    r
                                    a
                                    m
                                    s
                                
                            
                            (
                            z
                            )
                        
                    ”; [Section 2.1. Upweighting a training point]).
Koh does not appear to explicitly teach wherein the classification criterion is [a weight to identify] a decision plane indicating a boundary between classes in the classification model.
However, Zhang teaches wherein the classification criterion is [a weight to identify] a decision plane indicating a boundary between classes in the classification model. (“the outlier detection model 114 can classify data samples into inliers and outliers by inexplicitly measuring the distance between individual cases from an unseen data sample and the training dataset (e.g., one data sample compared against the entirety of the training data samples) … Another suitable outlier detection method that can be employed by the outlier detection model 114 to detect inliers and outliers from unseen data against the training data set (e.g., including the collection of the training data samples 102) can include the one-class support vector machine (OCSVM) method. The OCSVM method identifies the smallest hypersphere consisting of all the data. Data points that fall inside the hypersphere are considered to be inliers, whereas data points outside the hypersphere are considered to be outliers.”; [0049])
It would have been obvious before the effective filing date to combine the influential data points of Koh with the test dataset and classification model of Zhang to effectively evaluate new, previously unknown datasets from multiple sources (Zhang, [0025]). Koh and Zhang are analogous art because they both concern determining the influence of data points on a trained classification model.
Claim 3 is rejected over Koh, Zhang and Zhu with the incorporation of claim 1.
Regarding claim 3, Koh teaches wherein a movement distance, when each of the items of data contained in the second data set is moved so as to reduce an update value of the classification criterion of the classification model during re-training based on the second data set, is computed as the degree of contribution. (“We highlight two key differences from x · xtest. First, σ(−yθTx) gives points with high training loss more influence, revealing that outliers can dominate the model parameters. Second, the weighted covariance matrix                         
                            
                                
                                    H
                                
                                
                                    
                                        
                                            θ
                                             
                                        
                                        ^
                                    
                                     
                                
                                
                                    -
                                    1
                                
                            
                        
                    measures the “resistance” of the other training points to the removal of z; if                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            L
                            (
                            z
                            ,
                            
                                
                                    θ
                                
                                ^
                            
                            )
                        
                     points in a direction of little variation, its influence will be higher since moving in that direction will not significantly increase the loss on other training points.”; [Section 2.3. Relation to Euclidean]) 
Claim 4 is rejected over Koh, Zhang and Zhu with the incorporation of claim 1.
Regarding claim 4, Koh teaches wherein an item of data having a positive increase amount of the loss when each of the one or more items of data is moved in a direction to suppress change of the classification criterion by re-training based on the second data set is detected as the item of data of an unknown class. (“ambiguous or mislabeled training images are effective points to attack, since the model has low confidence and thus high loss on them, making them highly influential (recall Section 2.3). For example, the image in Fig 5 contains both a dog and a fish and is highly ambiguous; as a result, it is the training example that the model is least confident on”; [Section 5.2. Adversarial training examples]; and “We highlight two key differences from x · xtest. First, σ(−yθTx) gives points with high training loss more influence, revealing that outliers can dominate the model parameters. Second, the weighted covariance matrix                         
                            
                                
                                    H
                                
                                
                                    
                                        
                                            θ
                                             
                                        
                                        ^
                                    
                                     
                                
                                
                                    -
                                    1
                                
                            
                        
                    measures the “resistance” of the other training points to the removal of z; if                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            L
                            (
                            z
                            ,
                            
                                
                                    θ
                                
                                ^
                            
                            )
                        
                     points in a direction of little variation, its influence will be higher since moving in that direction will not significantly increase the loss on other training points.”; [Section 2.3. Relation to Euclidean])
Claim 8 is rejected over Koh, Zhang and Zhu.
	Regarding claim 8, Koh does not appear to explicitly teach an information processing device comprising:
a memory; and
a processor coupled to the memory, the processor being configured to execute processing, the processing comprising:
However, Zhang teaches an information processing device comprising:
a memory; and
a processor coupled to the memory, the processor being configured to execute processing, the processing comprising: (“As used herein, the term “memory” and “memory unit” are interchangeable. Further, one or more embodiments described herein can execute code of the computer executable components in a distributed manner, e.g., multiple processors combining or working cooperatively to execute code from one or more distributed memory units.”; [0093])
It would have been obvious before the effective filing date to combine the influential data points of Koh with the test dataset and classification model on an information processing device of Zhang to effectively evaluate new, previously unknown datasets from multiple sources (Zhang, [0025]). Koh and Zhang are analogous art because they both concern determining the influence of data points on a trained classification model.
The remainder of claim 8 is claim 1 in the form of a processor and is rejected for the same reasons as claim 1 stated above.
Dependent claim 9 is claim 2 in the form of a processor and is rejected for the same reasons as claim 2 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 8, see the rejection of claim 8 above.
Dependent claim 10 is claim 3 in the form of a processor and is rejected for the same reasons as claim 3 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 8, see the rejection of claim 8 above.
Dependent claim 11 is claim 4 in the form of a processor and is rejected for the same reasons as claim 4 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 8, see the rejection of claim 8 above.
Claim 15 is rejected over Koh, Zhang and Zhu.
	Regarding claim 15, Koh does not to appear to explicitly teach an information processing method comprising:

by a processor,
However, Zhang teaches an information processing method comprising:
by a processor, (“As used herein, the term “memory” and “memory unit” are interchangeable. Further, one or more embodiments described herein can execute code of the computer executable components in a distributed manner, e.g., multiple processors combining or working cooperatively to execute code from one or more distributed memory units.”; [0093])
It would have been obvious before the effective filing date to combine the influential data points of Koh with the test dataset and classification model on an information processing device of Zhang to effectively evaluate new, previously unknown datasets from multiple sources (Zhang, [0025]). Koh and Zhang are analogous art because they both concern determining the influence of data points on a trained classification model.
The remainder of claim 15 is claim 1 in the form of a method and is rejected for the same reasons as claim 1 stated above.
Dependent claim 16 is claim 2 in the form of a method and is rejected for the same reasons as claim 2 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Dependent claim 17 is claim 3 in the form of a method and is rejected for the same reasons as claim 3 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Dependent claim 18 is claim 4 in the form of a method and is rejected for the same reasons as claim 4 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Dependent claim 19 is claim 5 in the form of a method and is rejected for the same reasons as claim 5 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Dependent claim 20 is claim 6 in the form of a method and is rejected for the same reasons as claim 6 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Koh, Zhang and Zhu in view of Pruthi et al. (Estimating Training Data Influence by Tracing Gradient Descent); hereinafter Pruthi
Claim 5 is rejected over Koh, Zhang, Zhu and Pruthi with the incorporation of claim 1.
	Regarding claim 5, Koh does not teach wherein, when the classification model is a differentiable model, a magnitude of a first gradient of the classification criterion with respect to the loss is computed as the update value, and
a magnitude of a second gradient of each of the items of data contained in the second data set with respect to the magnitude of the first gradient is computed as the movement distance.
However, Pruthi teaches wherein, when the classification model is a differentiable model, a magnitude of a first gradient of the classification criterion with respect to the loss is computed as the update value, and (“Since the step-sizes used in updating the parameters in the training process are typically quite small, we can approximate the change in the loss of a test example in a given iteration t via a simple first-order approximation: l(wt+1,z’) = (wt,z’) + ∇(wt,z’)·(wt+1 −wt)+O(||wt+1 − wt ||2). Here, the gradient is with respect to the parameters and is evaluated at wt. Now, if stochastic gradient descent is utilized in training the model, using the training point zt at iteration t, then the change in parameters is wt+1 − wt = −ηt∇(wt,zt), where ηt is the step size in iteration t.”; Section 3.2. First-order Approximation to Idealized Influence, and Extension to Minibatches)
a magnitude of a second gradient of each of the items of data contained in the second data set with respect to the magnitude of the first gradient is computed as the movement distance. (“To handle minibatches of size b ≥1, we compute the influence of a minibatch on the test point z, mimicking the derivation in Section 3.1, and then take its first-order approximation: First-Order Approximation … , because the gradient for the minibatch Bt is … . Then, for each training point z ∈ Bt, we attribute the … portion of the influence Bt on the test point z’”; Section 3.2 First-order Approximation to Idealized Influence, and Extension to Minibatches)
It would have been obvious before the effective filing date to combine the influential data points of Koh with the gradients of Pruthi to improve the analysis of the training data (Pruthi, Section 1. Motivation). Koh and Pruthi are analogous art because they both concern data influence.
Dependent claim 12 is claim 5 in the form of a processor and is rejected for the same reasons as claim 5 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 8, see the rejection of claim 8 above.
Dependent claim 19 is claim 5 in the form of a method and is rejected for the same reasons as claim 5 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Koh, Zhang, Zhu and Pruthi in further view of Shi et al. (Gradient Matching for Domain Generalization); hereinafter Shi
Claim 6 is rejected over Koh, Zhang, Zhu, Pruthi and Shi with the incorporation of claim 1.
	Regarding claim 6, Koh does not teach wherein an item of data having a positive inner product between the second gradient and a third gradient of each of the one or more items of data with respect to the loss is detected as being the item of data of an unknown class.
	However, Shi teaches wherein an item of data having a positive inner product between the second gradient and a third gradient of each of the one or more items of data with respect to the loss is detected as being the item of data of an unknown class. (“we propose an inter-domain gradient matching objective that targets domain generalization by maximizing the inner product between gradients from different domains. Since direct optimization of the gradient inner product can be computationally prohibitive—it requires computation of second-order derivatives —- we derive a simpler first-order algorithm named Fish that approximates its optimization.”; [Abstract]; and “The goal of domain generalization is to train models that performs well on unseen, out-of-distribution data, which is crucial in practice for model deployment in the wild.” [Section 1 Introduction, Figure 1]).
It would have been obvious before the effective filing date to combine the influential data points of Koh with the inner product of Shi to improve model performance (Shi, page 5). Koh and Shi are analogous art because they both concern unseen data.
Dependent claim 13 is claim 6 in the form of a processor and is rejected for the same reasons as claim 6 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 8, see the rejection of claim 8 above.
Dependent claim 20 is claim 6 in the form of a method and is rejected for the same reasons as claim 6 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 15, see the rejection of claim 15 above.
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Koh, Zhang and Zhu and in further view of Lee et al. (Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks); hereinafter Lee
Claim 7 is rejected over Koh, Zhang, Zhu and Lee with the incorporation of claim 1.
	Regarding claim 7, Koh does not teach wherein a correct label is appended to each of the items of data contained in the second data set based on a classification result by the classification model for each of the items of data contained in the second data set, and 
an error between the classification result by the classification model for each of the items of data contained in the second data set and the correct label is computed as the loss.
	However, Lee teaches wherein a correct label is appended to each of the items of data contained in the second data set based on a classification result by the classification model for each of the items of data contained in the second data set, and (“For unlabeled data, Pseudo-Labels, just picking up the class which has the maximum predicted probability every weights update, are used as if they were true labels.”; Section 1 Introduction)
an error between the classification result by the classification model for each of the items of data contained in the second data set and the correct label is computed as the loss. (“The whole network can be trained by minimizing supervised loss function [Equation 4] where C is the number of labels, yi’s is the 1-of-K code of the label, fi is the network output for the i’th label, x is input vector.”; Section 2.1 Deep Neural Networks)
It would have been obvious before the effective filing date to combine the influential data points of Koh with optimizing training of a classifier with unlabeled data of Lee to provide a “simpler way of training neural network in a semi-supervised fashion” by increasing the amount of “labeled” data (Lee, Section 1. Introduction). Koh and Lee are analogous art because they both concern classification.
Dependent claim 14 is claim 7 in the form of a processor and is rejected for the same reasons as claim 7 stated above. For the rejection of the limitations specifically pertaining to the processor of claim 8, see the rejection of claim 8 above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID H TRAN whose telephone number is (703)756-1525. The examiner can normally be reached M-F 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID H TRAN/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
NON-TRANSITORY RECORDING MEDIUM, INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

This examiner grants 14% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

NON-TRANSITORY RECORDING MEDIUM, INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

This examiner grants 14% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email