DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant argues the Step 2A, Prong One analysis is conclusory (Remarks, pgs. 2-4). The rejection is maintained because the claim limitations recite mathematical/statistical calculations and evaluative determinations that constitute an abstract idea. Specifically, the claim recites “determining respective values of an acquisition function,” “determining an autocorrelation,” and “wherein the respective value… is determined as a function of a root mean square using the autocorrelation.” These limitations are expressly directed to computing numeric quantities using mathematical relationships (acquisition function values, autocorrelation, and RMS), which are a mathematical concept.
Further, the claim recites that autocorrelation is determined “via a plurality of feature representations of various layers of the model,” which amounts to evaluating information derived from feature representations to determine a correlation measure. This is an evaluation/comparison-based determination (i.e., selecting/assessing samples based on computed scores), which also falls within mental processes when claimed at this level of generality. The claim therefore recites an abstract idea under Step 2A, Prong One.
Applicant’s arguments under Step 2A, Prong Two are not persuasive (Remarks, pgs. 4-8). The claim’s alleged “additional elements” do not integrate the judicial exception into a practical application, because they amount to (i) obtaining/organizing data for use with the abstract calculations, (ii) training a model using the selected data, and (iii) using the resulting model output to perform a result-oriented action. In particular, “providing annotated data” and “acquiring, from the unannotated data, those… whose values… satisfy a criterion” merely recite data gathering and data selection to feed the abstract scoring/correlation computations, which does not constitute integration into a practical application. Further, “training a model for a classification of data as a function of the annotated data” is itself generic machine-learning information processing performed on data using the results of the abstract acquisition-function/autocorrelation/RMS computations. Finally, the recitation of “controlling a physical operation of a vehicle or a robot based on an output signal from the trained model” is claimed at a high level of generality and merely uses the output of the abstract model to perform an intended result without reciting any particular improvement to the operation of a vehicle/robot, any specific control architecture, or any specific manner of achieving improved control. As claimed, these additional elements therefore amount to no more than applying the abstract idea in a conventional machine-learning workflow and using the result, which does not integrate the exception into a practical application under MPEP 2106.04(d).
Applicant’s reliance on statements in the specification regarding alleged technological improvements is also unavailing because the claims do not recite any concrete mechanism that effects an improvement in computer functionality or in vehicle/robot control. While the specification may discuss the desirability of “reliable uncertainties” in safety-critical settings, the claims do not require any particular uncertainty estimation technique beyond the recited abstract mathematical/statistical computations, nor do they impose any specific control-loop constraints, actuator/sensor handling, stability/safety logic, or other technical implementation details that would reflect an improvement to a technological process. Rather, the claims recite computing acquisition-function values using autocorrelation and RMS and then using those computed values to select data and train a classifier, followed by a results-based “controlling” step. Such field-of-use context or post-result usage of an abstract output does not render the claim integrated into a practical application where the claim fails to tie the abstract calculations to a specific technological improvement. To the extent the prior Office Action described “and controlling a physical operation of vehicle or a robot based on an output signal from the trained model” as directed to extra-solutional activity under MPEP 2106.05(g), it is now more properly directed to field of use under MPEP 2106.05(h).
Applicant’s arguments under Step 2B are likewise not persuasive because the claim, considered as an ordered combination, does not include additional elements that amount to significantly more than the judicial exception (Remarks, pg. 9). The remaining limitations outside the abstract mathematical/statistical computations recite generic data provision/selection, generic training of a classification model, and generic output-based action (control) performed using the model’s output. These features are conventional components of implementing abstract analysis and decision logic on a general-purpose computing system, and they merely apply the abstract idea using routine machine-learning operations without adding any specialized hardware, non-conventional control technique, or other inventive concept. Accordingly, the claim does not recite an inventive concept under Step 2B and the rejection under 35 U.S.C. 101 is maintained.
Applicant’s argument that the Office “misidentifies” equation (4) of Stumpf as a root mean square is not persuasive (Remarks, pg. 10). Although Stumpf characterizes equation (4) as a standard deviation (sd) metric, standard deviation is computed as the square root of an average of squared terms, i.e., the squared deviations from a mean value. Accordingly, Stumpf’s equation (4) teaches computing a dispersion statistic having a root/mean/square mathematical structure (an RMS-type statistic), and therefore meets the claim requirement that the acquisition function value is determined as a function of a root mean square. The claims do not require that the RMS be taken over raw data values without centering, nor do they exclude RMS-type statistics computed over deviations; thus, Stumpf’s sd computation remains within the scope of the claimed “function of a root mean square.” To the extent the prior Office Action described Stumpf’s sd as “a root mean square,” the rejection is clarified as follows: Stumpf’s sd of equation (4) is an RMS-type statistic (square-root of an average of squared deviations), and the claim requires only that the acquisition function value be determined as a function of such a root/mean/square computation.
Applicant’s argument that Stumpf fails to disclose “wherein samples of the unannotated data whose root mean square exceeds a threshold value are acquired” is not persuasive (Remarks, pg. 10). Stumpf teaches computing the dispersion statistic and using it as a basis for selection (e.g., selecting the region/candidate batch associated with the highest dispersion/standard deviation), which necessarily involves comparing the computed value against a selection criterion. Such a selection criterion encompasses implementations based on exceeding a threshold value, including a fixed threshold, a dynamic threshold derived from the distribution of candidate scores, or an equivalent thresholding procedure implementing “select the highest” or “select those satisfying a criterion.” At minimum, implementing Stumpf’s selection using a threshold comparison is a routine and predictable design choice for operationalizing score-based selection, and would have been obvious to a POSITA.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Regarding claim 1 and analogous claims 14 and 15:
Step 1: is the claim directed to one of the four statutory categories?
Claim 1 is directed to a method, claim 14 is directed to a machine, and claim 15 is directed to a non-transitory computer-readable storage medium.
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes. The limitations: “determining respective values of an acquisition function for unannotated data;” and “determining an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step,” and “wherein the respective value of the acquisition function of each sample is determined as a function of a root mean square using the autocorrelation, in at least one dimension;” are directed to a mathematical concept under MPEP 2106.04(a)(2)(I).
Further, the limitation: “wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model;” are directed to a mental process of evaluation under MPEP 2106.04(a)(2)(III).
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitations: “providing annotated data;” and “acquiring, from the unannotated data and for the active machine learning, those of the unannotated data whose values for the acquisition function satisfy a criterion;” are directed to mere data gathering under MPEP 2106.05(g).
Further, the limitations: “training a model for a classification of data as a function of the annotated data;” is directed to extra-solutional activity under MPEP 2106.05(g),
Still further, the limitation: “and controlling a physical operation of vehicle or a robot based on an output signal from the trained model” are directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the
judicial exception?
No. The limitations: “providing annotated data;” and “acquiring, from the unannotated data and for the active machine learning, those of the unannotated data whose values for the acquisition function satisfy a criterion;” are directed to well-understood, routine, and conventional activity of “receiving or transmitting data over a network” under MPEP 2106.05(d).
Further, the limitation: “training a model for a classification of data as a function of the annotated data;” is directed to generic training under MPEP 2106.05(g).
Still further, the limitation: “and controlling a physical operation of vehicle or a robot based on an output signal from the trained model” is directed to field of use under MPEP 2106.05(h).
Regarding claim 2:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the feature representation is from at least one layer of the model,” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “wherein the feature representation is from at least one layer of the model,” is directed to field of use under MPEP 2106.05(h).
Regarding claim 3:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. Further, the limitation: “wherein the annotated data is determined from the subset by manual, or semi- automatic, or automatic annotation of unannotated data” is directed to a mental process of judgment under MPEP 2106.04(a)(2)(III).
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “selecting a subset from the unannotated data;” is directed to selecting or manipulating data under MPEP 2106.05(g).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “selecting a subset from the unannotated data;” is directed to selecting or manipulating data under MPEP 2106.05(g).
Regarding claim 4:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the subset includes the acquired unannotated data for the active machine learning” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “wherein the subset includes the acquired unannotated data for the active machine learning” is directed to field of use under MPEP 2106.05(h).
Regarding claim 6:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
Further, the limitation: “wherein samples of the unannotated data whose root mean square exceeds a threshold value are acquired from the unannotated data” is directed to mere data gathering under MPEP 2106.05(g).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
Further, the limitation: “wherein samples of the unannotated data whose root mean square exceeds a threshold value are acquired from the unannotated data” is directed to well-understood, routine, and conventional activity of “receiving or transmitting data over a network” under MPEP 2106.05(d).
Regarding claim 7:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. Further, the limitation: “wherein the threshold value is determined as a function of at least one sample from the annotated data with which the model is trained” is directed to a mathematical concept under MPEP 2106.04(a)(2)(I).
Regarding claim 8:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. Further, the limitation: “a check being made as to whether an abort criterion is satisfied, and the active machine learning being ended when the abort criterion is satisfied” is directed to a mental process of evaluation under MPEP 2106.04(a)(2)(III).
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the model is iteratively trained” is directed to extra-solutional activity under MPEP 2106.05(g).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “wherein the model is iteratively trained” is directed to generic training under MPEP 2106.05(d).
Regarding claim 9:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the abort criterion defines a reference for an accuracy of a classification of annotated or unannotated data by the model, the abort criterion being satisfied when the accuracy of the classification reaches or exceeds the reference” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “wherein the abort criterion defines a reference for an accuracy of a classification of annotated or unannotated data by the model, the abort criterion being satisfied when the accuracy of the classification reaches or exceeds the reference” is directed to field of use under MPEP 2106.05(h).
Regarding claim 10:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. Further, the limitation: “wherein unannotated data are randomly selected in a first iteration of the method for a determination of the annotated data” is directed to a mathematical process under MPEP 2106.04(a)(2)(I).
Regarding claim 11:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein only data that are not already acquired for the subset are selected from the unannotated data for the subset” is directed to selecting a type of data to be manipulated under MPEP 2106.05(g).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “wherein only data that are not already acquired for the subset are selected from the unannotated data for the subset” is directed to well-understood, routine, and conventional activity of “receiving or transmitting data over a network” under MPEP 2106.05(d).
Regarding claim 12:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1. Further, the limitation: “wherein for the trained model, as a function of the root mean square, it is established via the autocorrelation whether the sample to be assessed differs from a sample from a training set of samples with which the trained model has been trained” is directed to a mathematical concept under MPEP 2106.04(a)(2)(I).
Regarding claim 13:
Step 2A, prong 1: Is the claim directed to a law of nature, a natural phenomenon, or an abstract
idea?
Yes, the claim is dependent on claim 1.
Step 2A, prong 2: Do the additional elements integrate into a practical application?
No. The limitation: “wherein the trained model is an artificial neural network” is directed to field of use under MPEP 2106.05(h).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The limitation: “wherein the trained model is an artificial neural network” is directed to field of use under MPEP 2106.05(h).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-7, 11-13 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over by STUMPF, et al. “Active Learning in the Spatial Domain for Remote Sensing Image Classification” (hereafter STUMPF) in view of CHENG et al. “Deep High-order Supervised Hashing for Image Retrieval” (hereafter CHENG), further in view of US Pre-Grant Patent 2017/0334066 (Levine et al; Levine).
Regarding claim 1, STUMPF teaches the invention substantially as claimed, including:
A computer-implemented method for active machine learning, the method comprising the following steps: providing annotated data; (Page 2496, Algorithm 2, Page 2493, Section II, Paragraph 1, Lines 8-12, “The general underlying idea of most AL approaches is to initialize a machine learning model using a small training set (i.e., “providing annotated data’’) and to exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labeled by the user and added to the training set.”. In effect, the algorithm contains an initial labeled training set)
training a model for a classification of data as a function of the annotated data; (Page 2496, Algorithm 2, Page 2493, Section II, Paragraph 1, Lines 8-12, “The general underlying idea of most AL approaches is to initialize a machine learning model (i.e., “training a model”) using a small training set (i.e., “annotated data’’) and to exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labeled by the user and added to the training set.” and Page 2494, Section III, Paragraph 1, Lines 9-12,“Here, only a binary classification problem (0—nonlandslide; 1—landslide) is considered (i.e., “classification of data”), but the vote entropy can be easily extended to multiple probabilities in a multiclass setting,”. In effect, the algorithm shows that a model is trained off of the training set)
determining respective values of an acquisition function for unannotated data; (Page 2495, Equation 5, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd, which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples (i.e., “unannotated data”) in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “determining respective values of an acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches”)
acquiring, from the unannotated data and for the active machine learning, those of the unannotated data whose values for the acquisition function satisfy a criterion; (Page 2496, Algorithm 2, Page 2495, Right Column Paragraph 2, Lines 4-6, “The corresponding query function (i.e., “acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches (i.e., “satisfy a criterion”)”. In effect Algorithm 2 contains U (i.e., “unannotated data”) which is a pool of unlabeled samples which is refined based on W×.)
and determining an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model; (Page 2496, Algorithm 2, Page 2494, Right Column, Paragraph 1, Lines 14-24 “The sliding window is subsequently applied to the search grid to locate the global maximum. Fig. 1(b) shows the effect of changing the resolution of the search grid at a constant search window size (w = 100 m) and illustrates the uncertainty in the position of the maximum, which typically is <= g/2 … Based on this sliding-window method (i.e., “determining an autocorrelation”), a region-based query function can be formulated” and Page 2495, Right Column, Paragraph 1, Lines 7-10, “distance metrics used for clustering can also be employed to directly measure the dispersion of samples (i.e., “unannotated data”) in feature space (i.e., “respective feature representation for each sample”) and thereby quantify the diversity of the batch.” And Page 2497, Right Column, Paragraph 3, Lines 1-4, “Using a semivariogram analysis with an exponential model fit [49], it is possible to measure the spatial autocorrelation of the gray values within the images and gain quantitative insights into the spatial structure of the observed areas.”. In effect, Algorithm 2 shows that sliding window method in a function to assess unlabeled data.)
wherein the respective value of the acquisition function of each sample is determined as a function of a root mean square using the autocorrelation, in at least one dimension. (Page 2495, Equations 4 & 5, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd (i.e., “a root mean square”), which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “determining respective values of an acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches” and Page 2496 Left Column, Paragraph 2, Lines 1-4, “Using a semivariogram analysis with an exponential model fit [49], it is possible to measure the spatial autocorrelation (i.e., “in at least one dimension”) of the gray values within the images and gain quantitative insights into the spatial structure of the observed areas.”. In effect, 1/Wᵐ * (sum over Wᵐ) is interpreted as the mean and the (sum over Wᵐ) is squared. Thus, the variables are under a root, creating a function of a root mean square function.)
While STUMPF teaches an active machine learning method, STUMPF does not explicitly discloses the autocorrelation within the layers of a model:
and determine an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model.
However, in analogous art, CHENG teaches:
and determine an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model. (Page 2963, Abstract, Sentence 5, "This layer captures the local feature (i.e., “respective feature representation”) interactions of the image by outer product, employing the autocorrelation information and cross-correlation information of deep features. Furthermore, our DHoSH method systematically exploits the high-order statistics of features of multiple layers.", Page 2965, Right Column, Paragraph 2, Sentence 3-4, " Therefore, besides obtaining autocorrelation information (i.e., “autocorrelation is determined”) of individual convolutional layer… information of features (i.e., “plurality of feature representations”) of different convolutional layers [i.e., “feature representations of various layers of the model”]. Its forward and backward propagation share the same forms with Eq. (4) and Eq. (5), respectively. Note that both autocorrelation information and cross-correlation information can be employed using various layers (i.e., “various layers of the model”)”, and Page 2696, Section III (C), Sentences 1-3, “introduce the performance of our DHoSH-PO model with different layers on CIFAR-10 and MNIST. We can see clearly that DHoSH-PO model employs the autocorrelation information with pool5 layer achieving the best competitive performance among all code length on both datasets”)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined CHENG’s teaching of feature representation within layers of a model determines the autocorrelation with STUMPF’s teaching of which unlabeled data should be labeled by active learning, with a reasonable expectation of success, to create a method to select valuable unlabeled data to be labeled with active learning, as in KUMPF, with use of feature representation in layers for autocorrelation, as in CHENG. A person of ordinary skill would have been motivated to make this combination to “The comprehensive comparison leads us to the conclusion that both autocorrelation and cross-correlation information significantly improve the performance. It demonstrates the effectiveness of the proposed high-order (second-order) statistics.” [Page 2696, Right Column, Paragraph 5, Lines 13-14 & Page 2697, Left Column, Paragraph 1, Lines 1-3, CHENG].
Neither Stumpf nor Cheng explicitly teach:
1. and controlling a physical operation of vehicle or a robot based on an output signal from the trained model.
Levine teaches:
1. and controlling a physical operation of vehicle or a robot based on an output signal from the trained model.
(Levine, ¶0043)
“In an interactive setting, the agent's candidate actions and internal state (such as the pose of the robot gripper) also influence the next image, and both can be integrated into the model by tiling a vector of the concatenated internal state and candidate action(s) across the spatial extent of the lowest-dimensional activation map [i.e. and controlling a physical operation of vehicle or a robot]. Note, though, that the agent's internal state (e.g., the current robot gripper pose) is only input into the network at the initial time step, and in some implementations must be predicted from the actions in future time steps. For example, the robot gripper pose can be predicted at a future time step based on modification of the current robot gripper pose in view of the candidate action(s). In other words, the robot gripper pose at a future time step can be determined based on assuming that candidate action(s) of prior time step(s) have been implemented. The neural network may be trained using an /.sub.2 reconstruction loss [i.e. based on an output signal from the trained model].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Stumpf and Cheng with Levine. The motivation is to improve upon the framework of the field of robotics through the use of predictive neural networks; “to scale real-world interaction learning to a variety of scenes and objects, acquiring manually labeled object information becomes increasingly impractical. To learn about physical object motion without necessitating labels of objects, implementations described herein employ an action-conditioned motion prediction model (Levine, ¶0029).”
Regarding claim 2, while STUMPF teaches an active machine learning method, STUMPF does not explicitly discloses feature representation from layers of a model:
The method as recited in claim 1, wherein the feature representation is from at least one layer of the model.
However, in analogous art, CHENG teaches:
The method as recited in claim 1, wherein the feature representation is from at least one layer of the model. (Page 2963, Abstract, Sentence 5, "This layer captures the local feature (i.e., “respective feature representation”) interactions of the image by outer product, employing the autocorrelation information and cross-correlation information of deep features. Furthermore, our DHoSH method systematically exploits the high-order statistics of features of multiple layers." and Page 2965, Right Column, Paragraph 2, Sentence 4, "We exploit the cross-correlation information of features of different convolutional layers. Its forward and backward propagation share the same forms with Eq. (4) and Eq. (5), respectively. Note that both autocorrelation information and cross-correlation information can be employed using various layers (i.e., “at least one layer of the model”).)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined CHENG’s teaching of feature representation within layers of a model with STUMPF’s teaching of which unlabeled data should be labeled by active learning, with a reasonable expectation of success, to create a method to select valuable unlabeled data to be labeled with active learning, as in KUMPF, with use of feature representation in layers of a model, as in CHENG. A person of ordinary skill would have been motivated to make this combination to “The comprehensive comparison leads us to the conclusion that both autocorrelation and cross-correlation information significantly improve the performance. It demonstrates the effectiveness of the proposed high-order (second-order) statistics.” [Page 2696, Right Column, Paragraph 5, Lines 13-14 & Page 2697, Left Column, Paragraph 1, Lines 1-3, CHENG]
Regarding claim 3, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 1, further comprising:
selecting a subset from the unannotated data; (Page 2496, Algorithm 2, Page 2495, Left Column, Paragraph 2, Lines 1-5, “Subsequently, all samples with a center of gravity that is spatially contained within W× are queried (i.e., “selecting a subset from the set of unannotated data”). For a binary classification, it is then convenient to ask the user to identify only the positive examples and automatically assign all nonlabelled samples as negative examples.”. In effect Algorithm 2 contains U (i.e., “set of unannotated data”) which is a pool of unlabeled samples which is refined based on W×.)
wherein the annotated data is determined from the subset by manual, or semi-automatic, or automatic annotation of unannotated data. (Page 2496, Algorithm 2, Page 2495, Left Column, Paragraph 2, Lines 1-5, “Subsequently, all samples with a center of gravity that is spatially contained within W× are queried. For a binary classification, it is then convenient to ask the user to identify only the positive examples [i.e., “manual annotation of unannotated data”] and automatically assign all nonlabelled samples as negative examples (i.e., “automatic annotation of unannotated data”).”. In effect Algorithm, step 6 shows that Si (i.e., “unannotated data”) is then annotated.)
Regarding claim 4, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 3, wherein the subset includes the acquired unannotated data for the active machine learning. (Page 2496, Algorithm 2, Page 2495, Left Column, Paragraph 2, Lines 1-5, “Subsequently, all samples with a center of gravity that is spatially contained within W× are queried [i.e., “subset of unannotated data”]. For a binary classification, it is then convenient to ask the user to identify only the positive examples and automatically assign all nonlabelled samples as negative examples. (i.e., “active machine learning”)”. In effect Algorithm 2 contains U (i.e., “set of unannotated data”) which is a pool of unlabeled samples which is refined based on W×.))
Regarding claim 6, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 1, wherein samples of the unannotated data whose root mean square exceeds a threshold value are acquired from the unannotated data. (Page 2496, Algorithm 2, Page 2495, Right Column Paragraph 2, Lines 4-6, “The corresponding query function (i.e., “acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches (i.e., “exceeds a threshold value”)”. In effect Algorithm 2 contains U (i.e., “unannotated data”) which is a pool of unlabeled samples which is refined based on W×. Also, a threshold value could be a value that exceeds the 2nd biggest value from the candidate batches.)
Regarding claim 7, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 6, wherein the threshold value is determined as a function of at least one sample from the annotated data with which the model is trained. (Page 2495, Equation 4, Page 2496, Algorithm 2, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd, which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (i.e., “annotated data”) (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “function of at least one sample from the annotated data”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches (i.e., “threshold value is determined”)”. In effect Algorithm 2, Step 1 shows that it is trained off of a training set. Also, a threshold value could be a value that exceeds the 2nd biggest value from the candidate batches which is calculated from the training data.)
Regarding claim 11, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 3, wherein only data that are not already acquired for the subset are selected from the unannotated data for the subset. (Page 2496, Algorithm 2, Page 2495, Left Column, Paragraph 2, Lines 1-5, “For a binary classification, it is then convenient to ask the user to identify only the positive examples and automatically assign all nonlabelled samples as negative examples … Note that step 4 creates a buffer around all W queried in previous iterations to avoid querying previously labeled parts (i.e., “data that are not already acquired for the subset are selected”) of the image.”. In effect Algorithm 2 contains U (i.e., “set of unannotated data”) which is a pool of unlabeled samples which is refined based on W×.)
Regarding claim 12, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 1, wherein for the trained model, as a function of the root mean square, it is established via the autocorrelation whether the sample to be assessed differs from a sample from a training set of samples with which the trained model has been trained. (Page 2496, Algorithm 2, Page 2494, Right Column, Paragraph 1, Lines 14-24 “The sliding window is subsequently applied to the search grid to locate the global maximum. Fig. 1(b) shows the effect of changing the resolution of the search grid at a constant search window size (w = 100 m) and illustrates the uncertainty in the position of the maximum, which typically is <= g/2 … Based on this sliding-window method (i.e., “established via autocorrelation”), a region-based query function can be formulated” and Page 2495, Right Column, Paragraph 1, Lines 10-18, “The first one, which is formulated in (4), is σd (i.e., “root mean square”), which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples (i.e., “sample to be assessed”) in relation to the already acquired training data (i.e., “differs from a sample from a training set”) (see Fig. 2).”. In Algorithm 2, refer to steps 3 & 5. Step 3 establishes the correlation of samples and then in step 5, a root mean square function is calculated according to the sliding-window method.)
Regarding claim 13, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 12, wherein the trained model is an artificial neural network. (Page 2492, Section I, Paragraph 1, Lines 3-5, “State-of-the-art supervised algorithms, such as … artificial neural networks (i.e., “artificial neural network”) … have already been developed … for land cover analysis”)
Regarding claim 14, STUMPF teaches the invention substantially as claimed, including:
A device for active machine learning, the device configured to: provide annotated data; (Page 2496, Algorithm 2, Page 2493, Section II, Paragraph 1, Lines 8-12, “The general underlying idea of most AL approaches is to initialize a machine learning model using a small training set (i.e., “providing annotated data’’) and to exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labeled by the user and added to the training set.”. In effect, the algorithm contains an initial labeled training set)
train a model for a classification of data as a function of the annotated data; (Page 2496, Algorithm 2, Page 2493, Section II, Paragraph 1, Lines 8-12, “The general underlying idea of most AL approaches is to initialize a machine learning model (i.e., “training a model”) using a small training set (i.e., “annotated data’’) and to exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labeled by the user and added to the training set.” and Page 2494, Section III, Paragraph 1, Lines 9-12,“Here, only a binary classification problem (0—nonlandslide; 1—landslide) is considered (i.e., “classification of data”), but the vote entropy can be easily extended to multiple probabilities in a multiclass setting,”. In effect, the algorithm shows that a model is trained off of the training set)
determine respective values of an acquisition function for unannotated data; (Page 2495, Equation 5, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd, which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples (i.e., “unannotated data”) in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “determining respective values of an acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches”)
acquire, from the unannotated data and for the active machine learning, those of the unannotated data whose values for the acquisition function satisfy a criterion; (Page 2496, Algorithm 2, Page 2495, Right Column Paragraph 2, Lines 4-6, “The corresponding query function (i.e., “acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches (i.e., “satisfy a criterion”)”. In effect Algorithm 2 contains U (i.e., “unannotated data”) which is a pool of unlabeled samples which is refined based on W×.)
and determine an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model; (Page 2496, Algorithm 2, Page 2494, Right Column, Paragraph 1, Lines 14-24 “The sliding window is subsequently applied to the search grid to locate the global maximum. Fig. 1(b) shows the effect of changing the resolution of the search grid at a constant search window size (w = 100 m) and illustrates the uncertainty in the position of the maximum, which typically is <= g/2 … Based on this sliding-window method (i.e., “determining an autocorrelation”), a region-based query function can be formulated” and Page 2495, Right Column, Paragraph 1, Lines 7-10, “distance metrics used for clustering can also be employed to directly measure the dispersion of samples (i.e., “unannotated data”) in feature space (i.e., “respective feature representation for each sample”) and thereby quantify the diversity of the batch.” And Page 2497, Right Column, Paragraph 3, Lines 1-4, “Using a semivariogram analysis with an exponential model fit [49], it is possible to measure the spatial autocorrelation of the gray values within the images and gain quantitative insights into the spatial structure of the observed areas.”. In effect, Algorithm 2 shows that sliding window method in a function to assess unlabeled data.)
wherein the respective value of the acquisition function of each sample is determined as a function of a root mean square using the autocorrelation, in at least one dimension. (Page 2495, Equations 4 & 5, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd (i.e., “a root mean square”), which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “determining respective values of an acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches” and Page 2496 Left Column, Paragraph 2, Lines 1-4, “Using a semivariogram analysis with an exponential model fit [49], it is possible to measure the spatial autocorrelation (i.e., “in at least one dimension”) of the gray values within the images and gain quantitative insights into the spatial structure of the observed areas.”. In effect, 1/Wᵐ * (sum over Wᵐ) is interpreted as the mean and the (sum over Wᵐ) is squared. Thus, the variables are under a root, creating a root mean square function.)
While STUMPF teaches an active machine learning method, STUMPF does not explicitly discloses the autocorrelation within the layers of a model:
and determine an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model.
However, in analogous art, CHENG teaches:
and determine an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model. (Page 2963, Abstract, Sentence 5, "This layer captures the local feature (i.e., “respective feature representation”) interactions of the image by outer product, employing the autocorrelation information and cross-correlation information of deep features. Furthermore, our DHoSH method systematically exploits the high-order statistics of features of multiple layers.", Page 2965, Right Column, Paragraph 2, Sentence 3-4, " Therefore, besides obtaining autocorrelation information (i.e., “autocorrelation is determined”) of individual convolutional layer… information of features (i.e., “plurality of feature representations”) of different convolutional layers [i.e., “feature representations of various layers of the model”]. Its forward and backward propagation share the same forms with Eq. (4) and Eq. (5), respectively. Note that both autocorrelation information and cross-correlation information can be employed using various layers (i.e., “various layers of the model”)”, and Page 2696, Section III (C), Sentences 1-3, “introduce the performance of our DHoSH-PO model with different layers on CIFAR-10 and MNIST. We can see clearly that DHoSH-PO model employs the autocorrelation information with pool5 layer achieving the best competitive performance among all code length on both datasets”)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined CHENG’s teaching of feature representation within layers of a model determines the autocorrelation with STUMPF’s teaching of which unlabeled data should be labeled by active learning, with a reasonable expectation of success, to create a method to select valuable unlabeled data to be labeled with active learning, as in KUMPF, with use of feature representation in layers for autocorrelation, as in CHENG. A person of ordinary skill would have been motivated to make this combination to “The comprehensive comparison leads us to the conclusion that both autocorrelation and cross-correlation information significantly improve the performance. It demonstrates the effectiveness of the proposed high-order (second-order) statistics.” [Page 2696, Right Column, Paragraph 5, Lines 13-14 & Page 2697, Left Column, Paragraph 1, Lines 1-3, CHENG].
Neither Stumpf nor Cheng explicitly teach:
1. and controlling a physical operation of vehicle or a robot based on an output signal from the trained model.
Levine teaches:
1. and control a physical operation of vehicle or a robot based on an output signal from the trained model.
(Levine, ¶0043)
“In an interactive setting, the agent's candidate actions and internal state (such as the pose of the robot gripper) also influence the next image, and both can be integrated into the model by tiling a vector of the concatenated internal state and candidate action(s) across the spatial extent of the lowest-dimensional activation map [i.e. and control a physical operation of vehicle or a robot]. Note, though, that the agent's internal state (e.g., the current robot gripper pose) is only input into the network at the initial time step, and in some implementations must be predicted from the actions in future time steps. For example, the robot gripper pose can be predicted at a future time step based on modification of the current robot gripper pose in view of the candidate action(s). In other words, the robot gripper pose at a future time step can be determined based on assuming that candidate action(s) of prior time step(s) have been implemented. The neural network may be trained using an /.sub.2 reconstruction loss [i.e. based on an output signal from the trained model].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Stumpf and Cheng with Levine. The motivation is to improve upon the framework of the field of robotics through the use of predictive neural networks; “to scale real-world interaction learning to a variety of scenes and objects, acquiring manually labeled object information becomes increasingly impractical. To learn about physical object motion without necessitating labels of objects, implementations described herein employ an action-conditioned motion prediction model (Levine, ¶0029).”
Regarding claim 15, STUMPF teaches the invention substantially as claimed, including:
A non-transitory computer-readable storage medium on which is stored a computer program including computer-readable instructions for active machine learning, the instructions, when executed by a computer, causing the computer to perform the following steps: providing annotated data; (Page 2498, Section V, Paragraph 1, Lines 1-2, “All experiments are carried out using the RF algorithm (i.e., “computer-readable instructions for active machine learning” as an algorithm requires instructions to be stored on a computer memory for execution) with 500 trees.”, Page 2496, Algorithm 2, Page 2493, Section II, Paragraph 1, Lines 8-12, “The general underlying idea of most AL approaches is to initialize a machine learning model using a small training set (i.e., “providing annotated data’’) and to exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labeled by the user and added to the training set.”. In effect, the algorithm contains an initial labeled training set)
training a model for a classification of data as a function of the annotated data; (Page 2496, Algorithm 2, Page 2493, Section II, Paragraph 1, Lines 8-12, “The general underlying idea of most AL approaches is to initialize a machine learning model (i.e., “training a model”) using a small training set (i.e., “annotated data’’) and to exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labeled by the user and added to the training set.” and Page 2494, Section III, Paragraph 1, Lines 9-12,“Here, only a binary classification problem (0—nonlandslide; 1—landslide) is considered (i.e., “classification of data”), but the vote entropy can be easily extended to multiple probabilities in a multiclass setting,”. In effect, the algorithm shows that a model is trained off of the training set)
determining respective values of an acquisition function for unannotated data; (Page 2495, Equation 5, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd, which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples (i.e., “unannotated data”) in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “determining respective values of an acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches”)
acquiring, from the unannotated data and for the active machine learning, those of the unannotated data whose values for the acquisition function satisfy a criterion; (Page 2496, Algorithm 2, Page 2495, Right Column Paragraph 2, Lines 4-6, “The corresponding query function (i.e., “acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches (i.e., “satisfy a criterion”)”. In effect Algorithm 2 contains U (i.e., “unannotated data”) which is a pool of unlabeled samples which is refined based on W×.)
and determining an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model; (Page 2496, Algorithm 2, Page 2494, Right Column, Paragraph 1, Lines 14-24 “The sliding window is subsequently applied to the search grid to locate the global maximum. Fig. 1(b) shows the effect of changing the resolution of the search grid at a constant search window size (w = 100 m) and illustrates the uncertainty in the position of the maximum, which typically is <= g/2 … Based on this sliding-window method (i.e., “determining an autocorrelation”), a region-based query function can be formulated” and Page 2495, Right Column, Paragraph 1, Lines 7-10, “distance metrics used for clustering can also be employed to directly measure the dispersion of samples (i.e., “unannotated data”) in feature space (i.e., “respective feature representation for each sample”) and thereby quantify the diversity of the batch.” And Page 2497, Right Column, Paragraph 3, Lines 1-4, “Using a semivariogram analysis with an exponential model fit [49], it is possible to measure the spatial autocorrelation of the gray values within the images and gain quantitative insights into the spatial structure of the observed areas.”. In effect, Algorithm 2 shows that sliding window method in a function to assess unlabeled data.)
wherein the respective value of the acquisition function of each sample is determined as a function of a root mean square using the autocorrelation, in at least one dimension. (Page 2495, Equations 4 & 5, Page 2495, Right Column, Paragraph 1, Lines 10-20, “The first one, which is formulated in (4), is σd (i.e., “a root mean square”), which represents the standard deviation of the Euclidean distances ρk (X, c) between each of the samples in the candidate batch (c ∈ W ᵐ) and their respective nearest training point (s ∈ X) … Here, |Wᵐ| denotes the cardinality of the candidate set. In general, a larger σd indicates a higher feature space spread of the contained samples in relation to the already acquired training data (see Fig. 2). The corresponding query function (i.e., “determining respective values of an acquisition function”) formulated in (5) can be used to select the region with the highest standard deviation out of m candidate batches” and Page 2496 Left Column, Paragraph 2, Lines 1-4, “Using a semivariogram analysis with an exponential model fit [49], it is possible to measure the spatial autocorrelation (i.e., “in at least one dimension”) of the gray values within the images and gain quantitative insights into the spatial structure of the observed areas.”. In effect, 1/Wᵐ * (sum over Wᵐ) is interpreted as the mean and the (sum over Wᵐ) is squared. Thus, the variables are under a root, creating a root mean square function.)
While STUMPF teaches an active machine learning method, STUMPF does not explicitly discloses the autocorrelation within the layers of a model:
and determining an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model.
However, in analogous art, CHENG teaches:
and determining an autocorrelation using a respective feature representation for each sample from the unannotated data to be assessed for the acquiring step, wherein for the sample to be assessed, the autocorrelation is determined via a plurality of feature representations of various layers of the model. (Page 2963, Abstract, Sentence 5, "This layer captures the local feature (i.e., “respective feature representation”) interactions of the image by outer product, employing the autocorrelation information and cross-correlation information of deep features. Furthermore, our DHoSH method systematically exploits the high-order statistics of features of multiple layers.", Page 2965, Right Column, Paragraph 2, Sentence 3-4, " Therefore, besides obtaining autocorrelation information (i.e., “autocorrelation is determined”) of individual convolutional layer… information of features (i.e., “plurality of feature representations”) of different convolutional layers [i.e., “feature representations of various layers of the model”]. Its forward and backward propagation share the same forms with Eq. (4) and Eq. (5), respectively. Note that both autocorrelation information and cross-correlation information can be employed using various layers (i.e., “various layers of the model”)”, and Page 2696, Section III (C), Sentences 1-3, “introduce the performance of our DHoSH-PO model with different layers on CIFAR-10 and MNIST. We can see clearly that DHoSH-PO model employs the autocorrelation information with pool5 layer achieving the best competitive performance among all code length on both datasets”)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined CHENG’s teaching of feature representation within layers of a model determines the autocorrelation with STUMPF’s teaching of which unlabeled data should be labeled by active learning, with a reasonable expectation of success, to create a method to select valuable unlabeled data to be labeled with active learning, as in KUMPF, with use of feature representation in layers for autocorrelation, as in CHENG. A person of ordinary skill would have been motivated to make this combination to “The comprehensive comparison leads us to the conclusion that both autocorrelation and cross-correlation information significantly improve the performance. It demonstrates the effectiveness of the proposed high-order (second-order) statistics.” [Page 2696, Right Column, Paragraph 5, Lines 13-14 & Page 2697, Left Column, Paragraph 1, Lines 1-3, CHENG].
Neither Stumpf nor Cheng explicitly teach:
1. and controlling a physical operation of vehicle or a robot based on an output signal from the trained model.
Levine teaches:
1. and controlling a physical operation of vehicle or a robot based on an output signal from the trained model.
(Levine, ¶0043)
“In an interactive setting, the agent's candidate actions and internal state (such as the pose of the robot gripper) also influence the next image, and both can be integrated into the model by tiling a vector of the concatenated internal state and candidate action(s) across the spatial extent of the lowest-dimensional activation map [i.e. and controlling a physical operation of vehicle or a robot]. Note, though, that the agent's internal state (e.g., the current robot gripper pose) is only input into the network at the initial time step, and in some implementations must be predicted from the actions in future time steps. For example, the robot gripper pose can be predicted at a future time step based on modification of the current robot gripper pose in view of the candidate action(s). In other words, the robot gripper pose at a future time step can be determined based on assuming that candidate action(s) of prior time step(s) have been implemented. The neural network may be trained using an /.sub.2 reconstruction loss [i.e. based on an output signal from the trained model].”
One of ordinary skill in the art, at the time the invention was filed, would have been motivated to modify Stumpf and Cheng with Levine. The motivation is to improve upon the framework of the field of robotics through the use of predictive neural networks; “to scale real-world interaction learning to a variety of scenes and objects, acquiring manually labeled object information becomes increasingly impractical. To learn about physical object motion without necessitating labels of objects, implementations described herein employ an action-conditioned motion prediction model (Levine, ¶0029).”
Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over STUMPF, in view of CHENG, further in view of US Pre-Grant Patent 2017/0334066 (Levine et al; Levine)., and in further view of HARMA et al. US 20220044148 A1 (hereafter HARMA).
Regarding claim 8, STUMPF teaches the invention substantially as claimed, including:
the active machine learning (Page 2503, Right Column, Paragraph 1, Lines 4-7, “Integrating stratified bootstrap sampling in the tree construction during the AL (i.e., “active machine learning”) iterations leads to convergence during the learning process and provides a significantly higher F-measure”)
while STUMPF teaches an active machine learning method, STUMPF does not explicitly disclose an abort criterion:
The method as recited claim 1, wherein the model is iteratively trained, a check being made as to whether an abort criterion is satisfied, and the active machine learning being ended when the abort criterion is satisfied.
However, in analogous art, HARMA teaches:
The method as recited claim 1, wherein the model is iteratively trained, a check being made as to whether an abort criterion is satisfied, and the active machine learning being ended when the abort criterion is satisfied. (Paragraph [0096], “During training (i.e., “iteratively trained”), the generic prediction model (i.e., “the model”) is applied to each sample input data to generate a respective number of predicted sample answer data (i.e., each associated with a respective actual sample answer data). The prediction model is then modified with the aim of reducing an overall/average difference (e.g., accuracy value) between each predicted sample answer data and associated actual sample answer data. This process can be iteratively repeated (e.g., … until a difference is below a predetermined value)” and Paragraph [0146], “determining a similarity between input data of the benchmark data and input data of the training data 25 and/or a similarity between known answer data of the benchmark test and known answer data of the training data. This similarity, or similarities, may be used in step 89 to determine (i.e., “a check being performed as to whether an abort criterion is satisfied”) whether to perform a method of modifying the prediction model” and Paragraph [0040], “determining whether input data (to be processed by the prediction model) is statistically different to example input data used to train the prediction model. It can be assumed that if there is no statistical difference (i.e., “abort criterion is satisfied”) (i.e., there is a similarity) between input data and example input data, then no drift has occurred—and the prediction model continues (i.e., “being ended”) to accurately define a relationship between input data and answer data.”. In effect this shows that when the model is not in drift, the abort criterion is satisfied.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined HARMA’s teaching of an abort criterion with STUMPF & CHENG’s teaching of which unlabeled data should be labeled by active learning, with a reasonable expectation of success, to create a method to select valuable unlabeled data to be labeled with active learning, as in KUMPF, in use of an abort criterion for a model, as in HARMA. A person of ordinary skill would have been motivated to make this combination to “improving a general prediction model has a direct effect on the processing performance and accuracy of a processing element using the prediction model.” [Paragraph [0013], HARMA]
Regarding claim 9, while STUMPF teaches an active learning method, STUMPF does not explicitly disclose abort criterion:
The method as recited in claim 8, wherein the abort criterion defines a reference for an accuracy of a classification of annotated or unannotated data by the model, the abort criterion being satisfied when the accuracy of the classification reaches or exceeds the reference.
However, in analogous art, HARMA teaches:
The method as recited in claim 8, wherein the abort criterion defines a reference for an accuracy of a classification of annotated or unannotated data by the model, the abort criterion being satisfied when the accuracy of the classification reaches or exceeds the reference. (Paragraph [0040], “determining whether input data (to be processed by the prediction model) is statistically different to example input data used to train the prediction model. It can be assumed that if there is no statistical difference (i.e., “abort criterion being satisfied when the accuracy of the classification reaches or exceeds the reference”) (i.e., there is a similarity) between input data and example input data, then no drift has occurred—and the prediction model continues to accurately define a relationship between input data and answer data.”, Paragraph [0130], “This is because a sudden drift indicates that the existing training data no longer accurately reflects the relationship between input data and answer data, and is therefore unreliable. New training data 25 should therefore be used to correct the prediction model 2′.” and Paragraph [0160],” There is therefore a desire to accurately identify a drift of input data [i.e., “abort criterion defines a reference for an accuracy of a classification of unannotated data by the model”], which can be used to control whether the prediction model needs to be modified”. In effect this shows that data drift causes degradation of accuracy in a model)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to have combined HARMA’s teaching of an abort criterion based on accuracy with STUMPF & CHENG’s teaching of which unlabeled data should be labeled by active learning, with a reasonable expectation of success, to create a method to select valuable unlabeled data to be labeled with active learning, as in KUMPF, in determining an abort criterion value, as in HARMA. A person of ordinary skill would have been motivated to make this combination to “improving a general prediction model has a direct effect on the processing performance and accuracy of a processing element using the prediction model.” [Paragraph [0013], HARMA]
Regarding claim 10, STUMPF teaches the invention substantially as claimed, including:
The method as recited in claim 8, wherein unannotated data are randomly selected in a first iteration of the method for a determination of the annotated data. (Page 2498, Right Column, Paragraph 3, Lines 10-12, “. The initiation of each run is performed through stratified random sampling (i.e., “randomly selected”) in order to ensure the presence of at least one example per class for the first iteration. All segments of the data sets are labeled (i.e., “determination of annotated data”), and therefore, learning can be performed on the full map.”)
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL JUSTIN BREENE whose telephone number is (571)272-6320. Examiner
interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-
based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO
Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on 303-297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786 9199 (IN USA OR CANADA) or 571-272-1000.
/P.J.B./ Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129