Office Action Analysis: 18184428 — MACHINE LEARNING MODEL TRAINING USING FEATURE SPACE ANALYSIS

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The abstract of the disclosure is objected to because it is less than 50 words.  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
101 Subject Matter Eligibility Analysis
Step 1: Claims 1-20 are within the four statutory (a process, machine, manufacture or composition of matter.) Claims 1-6, 17-20 describe a machine and 7-16 describes a process.
With respect to claim 1:
Step 2A Prong 1: The claim recites an abstract idea enumerated in the 2019 PEG
evaluate a first training data input item … to determine a first feature space point in a feature space; (This is an abstract idea of a "Mental Process." The "evaluate" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The evaluation could be made manually by an individual.)
generate first training output data based on the first feature space point, wherein the first training output data represents a first class of a plurality of classes; (This is an abstract idea of a "Mental Process." The "generate" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The generation could be made manually by an individual.)
evaluate feature space data regarding a plurality of feature space points generated from evaluating at least a subset of the plurality of training data input vectors; (This is an abstract idea of a "Mental Process." The "evaluate" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The evaluation could be made manually by an individual.)
determine, based on results of evaluating the feature space data, that a separability criterion is not satisfied; (This is an abstract idea of a "Mental Process." The "determine" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The determination could be made manually by an individual.)
modify a structure of the artificial neural network; and  (This is an abstract idea of a "Mental Process." The "modify" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The modification could be made manually by an individual.)
Step 2A Prong 2: The judicial exception is not integrated into a practical application
Additional elements:
obtain a corpus of training data comprising a plurality of training data input vectors and a plurality of reference data output vectors, wherein a reference data output vector of the plurality of reference data output vectors represents a desired output generated by an artificial neural network from a corresponding training data input vector of the plurality of training data input vectors; (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
execute a first training epoch to train the artificial neural network using the corpus of training data (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
using the artificial neural network (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
execute a second training epoch of the artificial neural network. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception
The additional element “obtain a corpus of training data…” adds insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
The additional elements “execute a first training epoch…”, “using the artificial neural network”, and “execute a second training epoch…” are recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
When considered in combination, these additional elements represent insignificant extra-solution activity and mere instructions to apply an expectation, which do not provide an inventive concept.
Therefore, claim 1 is ineligible.
With respect to claim 2: 
Step 2A Prong 1: claim 2, which incorporates the rejection of claim 2, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
to modify the structure of the artificial neural network, the one or more processors are further programmed by the executable instructions to add a layer to the artificial neural network or add a node to the layer of the artificial neural network. (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element adds insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
Therefore, claim 2 is ineligible.
With respect to claim 3: 
Step 2A Prong 1: claim 3, which incorporates the rejection of claim 1, recites an additional abstract idea:
identify a first feature space point cluster comprising a subset of the plurality of feature space points, wherein the first feature space point cluster is associated with the first class; and (This is an abstract idea of a "Mental Process." The "identify" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The identification could be made manually by an individual.)
determine that the first feature space point cluster is less than a threshold distance from a second feature space point cluster associated with a second class of the plurality of classes. (This is an abstract idea of a "Mental Process." The "determine" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The determination could be made manually by an individual.)
Step 2A Prong 2: claim 3 does not recite any additional elements and thus cannot be integrated into a practical application.
Step 2B: claim 3 does not recite an additional element.
Therefore, claim 3 is ineligible.
With respect to claim 4: 
Step 2A Prong 1: claim 4, which incorporates the rejection of claim 3, recites an additional abstract idea:
determine a distance of the first feature space point cluster from the second feature space point cluster, wherein the distance comprises one of: a Bhattacharyya distance, a Mahalanobis distance, or a Wasserstein metric. (this is an abstract idea of a “mathematical concept”. The recited “Bhattacharyya distance, a Mahalanobis distance, or a Wasserstein metric” represents a metrics that would fall under the “mathematical concepts” grouping.)
Step 2A Prong 2: claim 4 does not recite any additional elements and thus cannot be integrated into a practical application.
Step 2B: claim 4 does not recite an additional element.
Therefore, claim 4 is ineligible.
With respect to claim 5: 
Step 2A Prong 1: claim 5, which incorporates the rejection of claim 1, recites an additional abstract idea:
determine, based on results of evaluating the feature space data, that a convergence criterion is satisfied. (This is an abstract idea of a "Mental Process." The "determine" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The determination could be made manually by an individual.)
Step 2A Prong 2: claim 5 does not recite any additional elements and thus cannot be integrated into a practical application.
Step 2B: claim 5 does not recite an additional element.
Therefore, claim 5 is ineligible.
With respect to claim 6: 
Step 2A Prong 1: claim 6, which incorporates the rejection of claim 1, recites an additional abstract idea:
determine, based on results of evaluating the feature space data, that a convergence criterion is not satisfied. (This is an abstract idea of a "Mental Process." The "determine" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The determination could be made manually by an individual.)
Step 2A Prong 2: claim 6 does not recite any additional elements and thus cannot be integrated into a practical application.
Step 2B: claim 6 does not recite an additional element.
Therefore, claim 6 is ineligible.
With respect to claim 7: 
	The claim recites similar limitations as corresponding to claim 1. Therefore, the same subject matter analysis that was utilized for claim 1, as described above, is equally applicable to claim 7. Therefore, claim 7 is ineligible.
With respect to claim 8: 
	The claim recites similar limitations as corresponding to claim 6. Therefore, the same subject matter analysis that was utilized for claim 6, as described above, is equally applicable to claim 8. Therefore, claim 8 is ineligible.
With respect to claim 9: 
	The claim recites similar limitations as corresponding to claim 5. Therefore, the same subject matter analysis that was utilized for claim 5, as described above, is equally applicable to claim 9. Therefore, claim 9 is ineligible.
With respect to claim 10: 
	The claim recites similar limitations as corresponding to claim 3. Therefore, the same subject matter analysis that was utilized for claim 3, as described above, is equally applicable to claim 10. Therefore, claim 10 is ineligible.
With respect to claim 11: 
	The claim recites similar limitations as corresponding to claim 4. Therefore, the same subject matter analysis that was utilized for claim 4, as described above, is equally applicable to claim 11. Therefore, claim 11 is ineligible.
With respect to claim 12: 
Step 2A Prong 1: claim 12, which incorporates the rejection of claim 7, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
obtaining a supplemental corpus of training data; and (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
executing a training epoch using the supplemental corpus of training data. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element “obtaining…” adds insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
The additional element “executing a training…” is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
Therefore, claim 12 is ineligible.
With respect to claim 13: 
Step 2A Prong 1: claim 13, which incorporates the rejection of claim 7, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
modifying the training of the artificial neural network comprises reinitializing at least a subset of parameters of the artificial neural network. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
Therefore, claim 13 is ineligible.
With respect to claim 14: 
Step 2A Prong 1: claim 14, which incorporates the rejection of claim 7, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
modifying the training of the artificial neural network comprises modifying a loss function, used to update parameters of the artificial neural network, to adjust loss function output associated with a subset of the plurality of feature space points. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
Therefore, claim 14 is ineligible.
With respect to claim 15: 
Step 2A Prong 1: claim 15, which incorporates the rejection of claim 7, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
adding a layer to an artificial neural network; removing the layer from the artificial neural network; adding a node to an existing layer of the artificial neural network; removing a node from the existing layer of the artificial neural network; changing a type of a node; or adjusting a hyperparameter. (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element adds insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).Therefore, claim 15 is ineligible.With respect to claim 16: 
Step 2A Prong 1: claim 16, which incorporates the rejection of claim 7, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
modifying the training of the artificial neural network comprises generating a kernel for the artificial neural network, wherein the kernel is configured to evaluate a subset of the plurality of feature space points. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
Therefore, claim 16 is ineligible.
With respect to claim 17: 
	The claim recites similar limitations as corresponding to claim 1. Therefore, the same subject matter analysis that was utilized for claim 1, as described above, is equally applicable to claim 17. Therefore, claim 17 is ineligible.
With respect to claim 18: 
	The claim recites similar limitations as corresponding to claims 2 & 3. Therefore, the same subject matter analysis that was utilized for claims 2 & 3, as described above, is equally applicable to claim 18. 
Therefore, claim 18 is ineligible.
With respect to claim 19: 
	The claim recites similar limitations as corresponding to claim 14. Therefore, the same subject matter analysis that was utilized for claim 14, as described above, is equally applicable to claim 19. Therefore, claim 19 is ineligible.
With respect to claim 20: 
	The claim recites similar limitations as corresponding to claim 2. Therefore, the same subject matter analysis that was utilized for claim 2, as described above, is equally applicable to claim 20. Therefore, claim 20 is ineligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-10, 12, 14-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kansizoglou (NPL: ‘Deep Feature Space: A Geometrical Perspective’) in view of Srivastava (NPL: ‘Dropout: a simple way to prevent neural networks from overfitting’).
Regarding claim 1, Kansizoglou teaches:
A system comprising: computer-readable memory storing executable instructions; and one or more processors programmed by the executable instructions to at least: (Abstract and IV. Experiments implies they had hardware to perform the experiments)
obtain a corpus of training data comprising a plurality of training data input vectors and a plurality of reference data output vectors, wherein a reference data output vector of the plurality of reference data output vectors represents a desired output generated by an artificial neural network from a corresponding training data input vector of the plurality of training data input vectors; (Section IV. Experiments A. Datasets describes their training datasets)
execute a first training epoch to train the artificial neural network using the corpus of training data, wherein to train the artificial neural network, the one or more processors are programed to: (Section IV. Experiments we can see that they execute multiple training epochs)
evaluate a first training data input item using the artificial neural network to determine a first feature space point in a feature space; and (Section IV. Experiments “In other words, for n ∈ N∗ number of points, we end up with n feature vectors in F [feature space] classified by the hyperplanes defined by the weights of the last layer.”)
generate first training output data based on the first feature space point, wherein the first training output data represents a first class of a plurality of classes; (Section III Method “The term deep feature vector will be employed to describe the whole output of the penultimate layer, thus referring to a vector that captures the input’s properties as quantified by the entire DNN and forming a global descriptor of the input data”)
evaluate feature space data regarding a plurality of feature space points generated from evaluating at least a subset of the plurality of training data input vectors; (Section III. Method D. Metric describes evaluating the features).
determine, based on results of evaluating the feature space data, that a separability criterion is not satisfied; (Section III. Method subsection 2) Simple Case describes the separation line which is used to separate features. Fig. 1 and Fig. 2 illustrate this. Also subsection D. Metrics defines in detail their separability metric which is used to see if overfitting occurs. This implies that the separability metric is used as some criteria for overfitting)
Kansizoglou does not teach:
modify a structure of the artificial neural network; and
execute a second training epoch of the artificial neural network.
However Srivastava does:
modify a structure of the artificial neural network; and (Page 3 “The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections, as shown in Figure 1.”)
execute a second training epoch of the artificial neural network. (Figure 4. Shows multiple weight updates implying multiple training epoch).
Kansizoglou and Srivastava are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava. Kansizoglou describes they can detect overfitting (Section III. Method D. Metric) and Srivastava is trying to prevent overfitting with their modifications.
Regarding claim 2, Kansizoglou in view of Srivastava teaches claim 1 as outlined above. Srivastava further teaches:
to modify the structure of the artificial neural network, the one or more processors are further programmed by the executable instructions to add a layer to the artificial neural network or add a node to the layer of the artificial neural network. (Page 3 “The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections, as shown in Figure 1.” Sometimes they remove parts of the model and sometimes they add parts to the model.)
Regarding claim 3, Kansizoglou in view of Srivastava teaches claim 1 as outlined above. Kansizoglou further teaches:
identify a first feature space point cluster comprising a subset of the plurality of feature space points, wherein the first feature space point cluster is associated with the first class; and (Page 7 Separability metric describes using the feature points and finding a distance between them)
determine that the first feature space point cluster is less than a threshold distance from a second feature space point cluster associated with a second class of the plurality of classes. (Page 7 Separability metric describes using the feature points and finding a distance between them)
Regarding claim 5, Kansizoglou in view of Srivastava teaches claim 1 as outlined above. Kansizoglou further teaches:
determine, based on results of evaluating the feature space data, that a convergence criterion is satisfied. (Fig. 11 shows the accuracy of the model over multiple epochs. In view of the specification convergence criterion is based on how accurate the model is. Fig. 11 also shows the loss of the model graphed. They use a Softmax loss (Introduction mentions this) function implying some sort of convergence. Also in Section III method subsection 1) Preface they say “In any other case, either the angle should initially be fixed to minimize the calculated loss, or the loss is too small and no further variation is required.” )
Regarding claim 6, Kansizoglou in view of Srivastava teaches claim 1 as outlined above. Kansizoglou further teaches:
determine, based on results of evaluating the feature space data, that a convergence criterion is not satisfied. (Fig. 11 shows the accuracy of the model over multiple epochs. In view of the specification convergence criterion is based on how accurate the model is. Fig. 11 also shows the loss of the model graphed. They use a Softmax loss (Introduction mentions this) function implying some sort of convergence. Also in Section III method subsection 1) Preface they say “In any other case, either the angle should initially be fixed to minimize the calculated loss, or the loss is too small and no further variation is required.” )
Regarding claim 7, Kansizoglou teaches:
A computer-implemented method comprising: under control of a computing system comprising one or more processors configured to execute specific instructions, (Abstract and IV. Experiments implies they had hardware to perform the experiments)
initiating training of an artificial neural network using a corpus of training data comprising a plurality of training data input items and a plurality of reference data output items, wherein training the artificial neural network comprises: (Section IV. Experiments A. Datasets describes their training datasets that is used to train the model)
evaluating a first training data input item using the artificial neural network to determine a first feature space point in a feature space; and (Section IV. Experiments “In other words, for n ∈ N∗ number of points, we end up with n feature vectors in F [feature space] classified by the hyperplanes defined by the weights of the last layer.”)
generating first training output data based on the first feature space point, wherein the first training output data represents a first class of a plurality of classes; (Section III Method “The term deep feature vector will be employed to describe the whole output of the penultimate layer, thus referring to a vector that captures the input’s properties as quantified by the entire DNN and forming a global descriptor of the input data”)
evaluating feature space data regarding a plurality of feature space points generated from evaluating at least a subset of the plurality of training data input items; (Section III. Method D. Metric describes evaluating the features).
determining, based on results of evaluating the feature space data, that a separability criterion is not satisfied; and (Section III. Method subsection 2) Simple Case describes the separation line which is used to separate features. Fig. 1 and Fig. 2 illustrate this. Also subsection D. Metrics defines in detail their separability metric which is used to see if overfitting occurs. This implies that the separability metric is used as some criteria for overfitting)
Kansizoglou does not fully teach modifying training of the artificial neural network based on the separability criterion not being satisfied. 
However Srivastava teaches the parts Kansizoglou lacks
modifying training of the artificial neural network based on the separability criterion not being satisfied.  (Kansizoglou teaches an alternative training based on a criteria not being satisfied Section V. Conclusion “It has been shown that in cases of low centrality and separability values in the unimodal feature extractors, an alternative training strategy should be considered”. Then Srivastava teaches modifying the training of the artificial neural network more explicitly Page 3 “The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections, as shown in Figure 1.”).
Kansizoglou and Srivastava are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava. Kansizoglou describes they can detect overfitting (Section III. Method D. Metric) and Srivastava is trying to prevent overfitting with their modifications.

Regarding claim 8, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Kansizoglou further teaches:
determining that a convergence criterion is not satisfied, wherein modifying the training of the artificial neural network is further based on the convergence criterion not being satisfied. (Fig. 11 shows the accuracy of the model over multiple epochs. In view of the specification convergence criterion is based on how accurate the model is. Fig. 11 also shows the loss of the model graphed. They use a Softmax loss (Introduction mentions this) function implying some sort of convergence. Also in Section III method subsection 1) Preface they say “In any other case, either the angle should initially be fixed to minimize the calculated loss, or the loss is too small and no further variation is required.” )
Regarding claim 9, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Kansizoglou further teaches:
determining that a convergence criterion is satisfied, wherein modifying the training of the artificial neural network is further based on the convergence criterion being satisfied. (Fig. 11 shows the accuracy of the model over multiple epochs. In view of the specification convergence criterion is based on how accurate the model is. Fig. 11 also shows the loss of the model graphed. They use a Softmax loss (Introduction mentions this) function implying some sort of convergence. Also in Section III method subsection 1) Preface they say “In any other case, either the angle should initially be fixed to minimize the calculated loss, or the loss is too small and no further variation is required.” )
Regarding claim 10, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Kansizoglou further teaches:
identifying a first feature space point cluster comprising a subset of the plurality of feature space points, wherein the first feature space point cluster is associated with a first class of the plurality of classes; and (Page 7 Separability metric describes using the feature points and finding a distance between them)
determining that the first feature space point cluster is less than a threshold distance from a second feature space point cluster associated with a second class of the plurality of classes. (Page 7 Separability metric describes using the feature points and finding a distance between them)
Regarding claim 12, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Kansizoglou further teaches:
obtaining a supplemental corpus of training data; and executing a training epoch using the supplemental corpus of training data. (Section IV. Experiments A. Datasets describes their training datasets that is used to train the model. They could use any of the datasets to train the model a second time)
Regarding claim 14, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Srivastava further teaches:
modifying the training of the artificial neural network comprises modifying a loss function, used to update parameters of the artificial neural network, to adjust loss function output associated with a subset of the plurality of feature space points. (Page 6 “In dropout, we minimize the loss function stochastically under a noise distribution. This can be seen as minimizing an expected loss function. Previous work of Globerson and Roweis (2006); Dekel et al. (2010) explored an alternate setting where the loss is minimized when an adversary gets to pick which units to drop”)
Regarding claim 15, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Srivastava further teaches:
adding a layer to an artificial neural network; removing the layer from the artificial neural network; adding a node to an existing layer of the artificial neural network; removing a node from the existing layer of the artificial neural network; changing a type of a node; or adjusting a hyperparameter. (Page 3 “The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections, as shown in Figure 1.”)
Regarding claim 16, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Kansizoglou further teaches:
modifying the training of the artificial neural network comprises generating a kernel for the artificial neural network, wherein the kernel is configured to evaluate a subset of the plurality of feature space points. (Based on the specification is seems the kernel is used to classify new groups that don’t fit into other groups. Kansizoglou does classification of features and this is described in section III. Method A. Feature Space Division)
Regarding claim 17,
A system comprising: computer-readable memory storing a corpus of training data comprising a plurality of training data input items and a plurality of reference data output items; and one or more processors programmed by executable instructions to at least: (Abstract and IV. Experiments implies they had hardware to perform the experiments)
initiate training of a machine learning model using the corpus of training data, wherein to train the machine learning model, the one or more processors are further programmed by the executable instructions to: (Section IV. Experiments A. Datasets describes their training datasets that is used to train the model)
evaluate a first training data input item using the machine learning model to determine a first feature space point in a feature space; and (Section IV. Experiments “In other words, for n ∈ N∗ number of points, we end up with n feature vectors in F [feature space] classified by the hyperplanes defined by the weights of the last layer.”)
generate first training output data based on the first feature space point; (Section III Method “The term deep feature vector will be employed to describe the whole output of the penultimate layer, thus referring to a vector that captures the input’s properties as quantified by the entire DNN and forming a global descriptor of the input data”)
evaluate feature space data regarding a plurality of feature space points generated from evaluating at least a subset of the plurality of training data input items; (Section III. Method D. Metric describes evaluating the features).
determine, based on results of evaluating the feature space data, that a separability criterion is not satisfied; and (Section III. Method subsection 2) Simple Case describes the separation line which is used to separate features. Fig. 1 and Fig. 2 illustrate this. Also subsection D. Metrics defines in detail their separability metric which is used to see if overfitting occurs. This implies that the separability metric is used as some criteria for overfitting)
Kansizoglou does not fully teach modifying training of the artificial neural network based on the separability criterion not being satisfied. 
However Srivastava teaches the parts Kansizoglou lacks
modifying training of the artificial neural network based on the separability criterion not being satisfied.  (Kansizoglou teaches an alternative training based on a criteria not being satisfied Section V. Conclusion “It has been shown that in cases of low centrality and separability values in the unimodal feature extractors, an alternative training strategy should be considered”. Then Srivastava teaches modifying the training of the artificial neural network more explicitly Page 3 “The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections, as shown in Figure 1.”).
Kansizoglou and Srivastava are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava. Kansizoglou describes they can detect overfitting (Section III. Method D. Metric) and Srivastava is trying to prevent overfitting with their modifications.
Regarding claim 19, Kansizoglou in view of Srivastava teaches claim 17 as outlined above. Srivastava further teaches:
modify a loss function, used to update parameters of the machine learning model, to adjust loss function output associated with a subset of the plurality of feature space points. (Page 6 “In dropout, we minimize the loss function stochastically under a noise distribution. This can be seen as minimizing an expected loss function. Previous work of Globerson and Roweis (2006); Dekel et al. (2010) explored an alternate setting where the loss is minimized when an adversary gets to pick which units to drop”)
Regarding claim 20, Kansizoglou in view of Srivastava teaches claim 17 as outlined above. Srivastava further teaches:
add a layer to the machine learning model, wherein the machine learning model is an artificial neural network. (Page 3 “The term “dropout” refers to dropping out units (hidden and visible) in a neural network. By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections, as shown in Figure 1.” Sometimes they remove parts of the model and sometimes they add parts to the model.)

Claims 4, 11, 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kansizoglou in view of Srivastava and Khanna (US 2022/0083901 A1)
Regarding claim 4, Kansizoglou in view of Srivastava teaches claim 1 as outlined above. Neither teach:
determine a distance of the first feature space point cluster from the second feature space point cluster, wherein the distance comprises one of: a Bhattacharyya distance, a Mahalanobis distance, or a Wasserstein metric.
However does Khanna does ([0032] “In one embodiment, the measure of distance among feature vectors in multivariate space is represented by the Mahalanobis distance. Those skilled in the art will appreciate various other distance metrics may be used, including, but not limited to Wasserstein distance, Bhattacharyya distance, Kolmogorov-Smirnov statistic, Energy distance, Lubaszyk-Karmowski metric and f-divergences, such as Kullback-Leibler divergence.”)
Kansizoglou, Srivastava and Khanna are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava with the distance metrics of Khanna. One would want to do this for better statistical modeling (Khanna [0032]).

Regarding claim 11, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Neither teach:
determining a distance of the first feature space point cluster from the second feature space point cluster, wherein the distance comprises one of: a Bhattacharyya distance, a Mahalanobis distance, or a Wasserstein metric.
However does Khanna does ([0032] “In one embodiment, the measure of distance among feature vectors in multivariate space is represented by the Mahalanobis distance. Those skilled in the art will appreciate various other distance metrics may be used, including, but not limited to Wasserstein distance, Bhattacharyya distance, Kolmogorov-Smirnov statistic, Energy distance, Lubaszyk-Karmowski metric and f-divergences, such as Kullback-Leibler divergence.”)
Kansizoglou, Srivastava and Khanna are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava with the distance metrics of Khanna. One would want to do this for better statistical modeling (Khanna [0032]).
Regarding claim 13, Kansizoglou in view of Srivastava teaches claim 7 as outlined above. Khanna further teaches:
modifying the training of the artificial neural network comprises reinitializing at least a subset of parameters of the artificial neural network. ([0077] “The machine learning model training system 302 may receive feedback from the machine learning model or other sources and may retrain the model by sending a machine learning parameter update.” Parameter update could be new parameters)
Kansizoglou, Srivastava and Khanna are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava with the parameter update of Khanna. One would want to do this to achieve a more accurate neural network (Khanna [0077]).

Regarding claim 18, Kansizoglou in view of Srivastava teaches claim 17 as outlined above. Kansizoglou further teaches:
identify a first feature space point cluster comprising a subset of the plurality of feature space points, wherein the first feature space point cluster is associated with a first class of a plurality of classes; (Page 7 Separability metric describes using the feature points and finding a distance between them)
determine that the first feature space point cluster is less than a threshold distance from a second feature space point cluster associated with a second class of the plurality of classes; and  (Page 7 Separability metric describes using the feature points and finding a distance between them)
Neither Kansizoglou or Srivastava teaches:
determine a distance of the first feature space point cluster from the second feature space point cluster, wherein the distance comprises one of: a Bhattacharyya distance, a Mahalanobis distance, or a Wasserstein metric.
However does Khanna does ([0032] “In one embodiment, the measure of distance among feature vectors in multivariate space is represented by the Mahalanobis distance. Those skilled in the art will appreciate various other distance metrics may be used, including, but not limited to Wasserstein distance, Bhattacharyya distance, Kolmogorov-Smirnov statistic, Energy distance, Lubaszyk-Karmowski metric and f-divergences, such as Kullback-Leibler divergence.”)
Kansizoglou, Srivastava and Khanna are considered analogous art to the claimed invention because they are in the same field of endeavor neural network training. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the feature analysis of Kansizoglou with the neural network modification process of Srivastava with the distance metrics of Khanna. One would want to do this for better statistical modeling (Khanna [0032]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL PATRICK GRUSZKA whose telephone number is (571)272-5259. The examiner can normally be reached M-F 9:00 AM - 6:00 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL GRUSZKA/Examiner, Art Unit 2121                                                                                                                                                                                                        
/James D. Rutten/Primary Examiner, Art Unit 2121
Read full office action
MACHINE LEARNING MODEL TRAINING USING FEATURE SPACE ANALYSIS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MACHINE LEARNING MODEL TRAINING USING FEATURE SPACE ANALYSIS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email