Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “21” has been used to designate both “Training Data Storage Unit” and “First Parameter Storage Unit” in figure 5. The specification para 35 call it training data storage unit 20 “The input unit 15 acquires a set of the input image and the correct answer data from the training data storage unit 20 via the interface 13.” Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc. In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because the abstract contains legal phraseology and reference numbers. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-10 rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. The claim(s) recite(s) significantly more. The subject matter eligibility test for products and process is describe below for claim 1 in view of dependent claims.
Regarding claim 1:
Step 1: Is the claim to a process machine manufacture or composition of matter?
Yes – Claim 1 recites a learning device, which is a system that falls under the statutory categories.
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes – The claim recites the following:
“generate a probabilistic inference result that is probabilistically generated for an input data;” - The limitation recites a mathematical process of generating a probabilistic inference result (see MPEP 2106.04(a)(2)I).
“generate a formatted inference result obtained by formatting the probabilistic inference result;” - The limitation recites a mental process of generating a formatted probabilistic inference result (see MPEP 2106.04(a)(2)I).
Step 2 Prong 2: Does the claim recite additional elements that integrate the judicial exception into a particular application? No –
The claim includes the additional element(s):
“ A learning device comprising: at least one memory configured to store instructions; and at least one processor configured to execute the instructions to: train a correction learning model that is a learning model configured to correct the formatted inference result, based on the input data, correct answer data corresponding to the input data, and the formatted inference result.”
The additional elements fall under “apply it” as using a generic computer to train a correction learning model configure to correct the formatted inference result based on the inputted data. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No - The claim does not include additional elements that are sufficient to amount to a significantly more than the judicial exemption. As an order whole, the claim is directed to generating probabilistic inference results by using inputted data. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of training and formatting fall under using generic computer to apply an exemption and mere data gathering. The method does not improve on the function of a computer, transforms an article into another article, nor is it applied by a particular machine, making the claim not patent eligible.
Regarding claim 2:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to claim 1, wherein the at least one processor is configured to execute the instructions to generate the probabilistic inference result based on a probabilistic inference model that is a model obtained by probabilistically changing one or more parameters of an already-trained model whose inference result is to be corrected by the correction learning model.”
The additional elements fall under “apply it” as using a generic computer to generate the probabilistic inference result based on a probabilistic inference model by changing on or more parameters of the trained model. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
Regarding claim 3:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to claim 2, wherein the already-trained model is a model based on a neural network, and wherein the probabilistic inference result generation means is configured at least one processor is configured to execute the instructions to generate the probabilistic inference result based on the probabilistic inference model that is a model obtained by probabilistically setting one or more weight parameters of the already-trained model to 0.”
The additional elements fall under “apply it” as using a generic computer to generate a probabilistic inference model by setting one or more weight parameters to 0. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
Regarding claim 4:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to ”
The additional elements fall under Insignificant Extra-Solution Activity. See MPEP 2106.5(g). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Regarding claim 5:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to claim 4, wherein the formatting means is configured at least one processor is configured to execute the instructions to format the probabilistic inference result into a data format necessary for input to the input layer or to the intermediate layer.”
The additional elements fall under “apply it” as using a generic computer to format probabilistic inference into data necessary for the input or the intermediate layer. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
Regarding claim 6:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to any one of claims 1 to 5, further comprising claim 1,an input means configured wherein the at least one processor is further configured to execute the instructions to apply, to data used for training of an already- trained model whose inference result is to be corrected by the correction learning model, an augmentation that is not used in the training, to thereby generate the input data and the correct answer data corresponding to the input data.”
The additional elements fall under “apply it” as using a generic computer to generate augmentation to the data (see MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Regarding claim 7:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to any one of claims 1 to 6, claim 1, of wherein the already-trained model whose inference result is to be corrected by the correction learning model is trained with labels based on separate name definition in which feature points in a symmetrical relation are separately labeled, ”
The additional element falls under the “apply it” by using computers to train the model using labels based on sperate name definition (MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
“wherein the correction learning model is trained with labels based on same name definition in which the feature points in the symmetrical relation are labeled as a same label, and”
The additional element falls under the “apply it” by using computers to train the model using labels based on same name definition (MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
“and wherein the formatting means is configured at least one processor is configured to execute the instructions to generate the formatted inference result labeled based on the same name definition into which the probabilistic inference result labeled based on the separate name definition is converted.”
The additional element falls under the “apply it” by using computers to generate to generate the formatted inference result labeled based on the same name definition (MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Regarding claim 8:
Step 2A Prong 2, Step 2B: The additional element(s):
“The learning device according to claim 7, further comprising an input means configured wherein the at least one processor is configured to execute the instructions to generate, as the input data, a reversed image obtained by reversing an image used for training the already-trained model which learned to extract feature points of an object having a symmetry shown in the image, ”
The additional element falls under the “apply it” by using computers to generate, as the input data, a reversed image (MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
“and to generate correct answer data corresponding to the reversed image from the correct answer data corresponding to the image, wherein the training means is configured at least one processor is configured to execute the instructions to train the correction learning model based on the formatted inference result, the reversed image, and the correct answer data corresponding to the reversed image.”
The additional element falls under the “apply it” by using computers to generate a correct answer and train the correction learning model (MPEP 2106.05(f)). The judicial exemptions do not integrate into a practical application nor provide an improvement. The process does not provide an inventive concept nor provides a practical application.
Claim 9 recite a method and is analogous to the system of claims 1. Therefore, the rejections of claim 1 above applies to claims 9.
Claim 10 recite a computer readable medium product and is analogous to the system of claims 1. Therefore, the rejections of claim 1 above applies to claims 10.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Makropoulos et al. (WO2020249972) (“Makropoulos”) in view of Lee et al. (KR20190134865A) (“Lee”).
Regarding claim 1 and analogous claims 9 and 10, Makropoulos teaches a learning device comprising: at least one memory configured to store instructions; and at least one processor configured to execute the instructions to: (Makropoulos Page 6 line 15-22, Referring now to Figure 4, there is shown the use of the trained probabilistic model 440 (the model 350 from Figure 3 once trained).
During inference of the base network 420, 430, 445, 450, the latent space 430 of the transformed input is observed and variants of the latent space 430 according to the previously learned probabilistic modelling function 440 are injected back into the network. After observing a large number of outputs based on the injected variants, frequent changes of the output 450 are assumed to indicate uncertainty, whereas in contrast no changes or minor changes are assumed to indicate robust decision making [generate a probabilistic inference result that is probabilistically generated for an input data].
Page 8 line12-18, The system is configured, using any suitable arrangement of computing hardware in a local or distributed/networked arrangement, to have a data acquisition device 510 in communication with a processor 520. The data acquisition device 510 can be a sensor (e.g. an ultrasound sensor, x-ray sensor, etc) or a dedicated separate device (e.g. an ultrasound machine, x-ray machine or mobile phone/tablet computer/laptop computer connected to a sensor or having a sensor integrated therein) from which data can be obtained for processing by the processor 530 (optionally via a memory storage or computer-readable medium) [A learning device comprising:
at least one memory configured to store instructions; ].
Page 8 line 22-26, The processor 530 is also in communication with a prediction confidence device 520 which can output a prediction confidence 560. The prediction confidence device 520 may be implemented as a software process that is executed by and in operation on the processor 530 or may be a standalone system or computer/process that is in communication with the processor 530 but operates independently from the processor 530. [and at least one processor configured to execute the instructions]);
((Page 22 FIG. 4,
PNG
media_image1.png
498
734
media_image1.png
Greyscale
Page 13 line 13-18, The latent space is a vector of values. These are all connected to as many output units as there are classes for the classification task. The output of these units are continuous real values, aka. logits. Logits are converted with e.g. a "Softmax" function into a space between [0, 1 ] [obtained by formatting the probabilistic inference result]. After "Softmax" the result is a vector with the same number of values (between [0, 1]) as there are classes [generate a formatted inference result]. The index of the highest value in this vector defines the predicted class (argmax));
Markopoulos does not explicitly teach
However Lee teaches (KR20190134865A Page 2 of machine translation para 10-14, In order to achieve the above object, according to an aspect of the present invention, a face detection unit for detecting a face region from the target image to generate a face detection image; A feature point output unit to output a plurality of feature points using a preset algorithm for the face detection image; A correction vector output network trained to correct coordinates of a plurality of feature points output from the feature point output unit, and outputting a correction vector for correcting the coordinates of the feature points by receiving the face detection image; And a feature point determiner configured to determine a final feature point by reflecting the correction vector for each of the feature points [to train a correction learning model that is a learning model configured to correct the formatted inference result].
The preset algorithm includes a handcraft algorithm.
The correction vector includes a t.x component for moving each of the feature point coordinates on the x-axis and a /1y component for moving on the y-axis.
The correction vector output network includes a convolutional neural network and is trained to update the coefficients of the filter applying the convolution operation.
The correction vector output network receives the cost function corresponding to the difference between the true value of the reference point of the reference image and the feature point coordinates reflecting the correction vector, which is the output of the correction vector output network, on the feature point detected by the preset algorithm. Update the filter coefficients [based on the input data, correct answer data corresponding to the input data, and the formatted inference result]).
Makropoulos and Lee are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Makropoulos to incorporate the teachings of Lee to use a correction vector output network. Doing so would to properly identify features with relatively adaptive amounts of learning data (Lee Page 2 of translation, Description para 10 - 11The present invention proposes a feature point detection apparatus and method using learning that can effectively detect feature points even with relatively adaptive amounts of learning data.
In order to achieve the above object, according to an aspect of the present invention, a face detection unit for detecting a face region from the target image to generate a face detection image; A feature point output unit to output a plurality of feature points using a preset algorithm for the face detection image; A correction vector output network trained to correct coordinates of a plurality of feature points output from the feature point output unit, and outputting a correction vector for correcting the coordinates of the feature points by receiving the face detection image; And a feature point determiner configured to determine a final feature point by reflecting the correction vector for each of the feature points.).
Claim(s) 2 are rejected under 35 U.S.C. 103 as being unpatentable over Makropoulos in view of Lee and further view of Singh et al. (US20200234110A1) (“Singh”).
Regarding claim 2, Makropoulos in view of Lee teaches the learning device according to claim 1 and analogous claims 9 and 10.
Markopoulos and Lee are combined with the same rational as in claim 1.
However Makropoulos does not explicitly teach wherein the probabilistic inference result generation means is configured at least one processor is configured to execute the instructions to generate the probabilistic inference result based on a probabilistic inference model that is a model obtained by probabilistically changing one or more parameters of an already-trained model whose inference result is to be corrected by the correction learning model.
However Singh teaches wherein the probabilistic inference result generation means is configured at least one processor is configured to execute the instructions to generate the probabilistic inference result based on a probabilistic inference model that is a model obtained by probabilistically changing one or more parameters of an already-trained model whose inference result is to be corrected by the correction learning model (Singh et al. para 0024,One or more embodiments include an adversarially-robust neural-network training system that generates neural networks with improved robustness against adversarial attacks by implementing a dynamic dropout routine and/or a cyclic learning rate routine during training. For the dynamic dropout routine, the adversarially-robust neuralnetwork training system can generate a dropout probability distribution over neurons in a particular layer of a neural network. Indeed, for a neural network to learn distinguishable features, the adversarially-robust neural-network training system ensures that the gradient loss of each label ( e.g., classification label) with respect to a given neuron is different for all neurons in the same layer. Additionally ( or alternatively), the adversarially robust neural-network training system can utilize a cyclic learning rate routine to ensure that decision boundaries learned by a neural network are different. For example, the adversarially robust neural-network training system can initialize a copy of an initially trained neural network with weights equal to the weights of the initially trained neural network and can cyclically modify the learning rate without decreasing prediction accuracy.
Para 0062, As illustrated in FIG. 3, the adversarially-robust neural-network training system 102 trains the neural network 304 to generate accurate predictions. Particularly, the adversarially-robust neural-network training system 102 accesses a training digital input 302 within a database 314 to utilize as training data for the neural network 304. For example, the adversarially-robust neural-network training system 102 inputs the training digital input 302 into the neural network 304, whereupon the neural network 304 generates a predicted classification 306. Indeed, the neural network 304 analyzes the training digital input 302 utilizing its various layers, neurons, and weights. Based on the analysis of the training digital input 302, the neural network 304 generates a predicted classification 306 of the training digital input 302 [to generate the probabilistic inference result based on a probabilistic inference model].
para 0117, For example, in one implementation, act 1210 can involve probabilistically selecting a set of neurons to drop during the training iteration based on the dynamic dropout probability distribution. Act 1210 can then involve generating an updated set of weights by back propagating the gradient losses to modify weights of the neurons of the neural network other than the neurons in the set of neurons selected to drop. Act 1210 can also involve providing second training input to the neural network for a second training iteration. Act 1210 can involve determining, for the plurality of classification labels, second gradient losses associated with neurons of one or more layers of the neural network based on the training input using the updated set of weights. Act 1210 can then involve generating, based on the second gradient losses, second similarity scores between the neurons in the one or more layers of the neural network and determining, based on the second similarity scores, a second dynamic dropout probability distribution for the neurons in the one or more layers. Act 1210 can involve probabilistically selecting a second set of neurons to drop during the training iteration based on the dynamic dropout probability distribution; and back propagating the second gradient losses to modify the updated set of weights of the neurons of the neural network other than the neurons in the second set of neurons selected to drop [that is a model obtained by probabilistically changing one or more parameters of an already-trained model]).
Makropoulos and Singh are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Makropoulos to incorporate the teachings of Singh to probabilistically change one or more parameters of an already-train model. Doing so improve the robustness of the neural network (Singh para 0070 line 1-7, As mentioned, to improve the robustness of the neural network 304, the adversarially-robust neural-network training system 102 implements a dynamic dropout routine during training. FIG. 5 illustrates an example modified version of the neural network 304 as a result of the adversarially-robust neural-network training system 102 implementing a dynamic dropout routine during a training cycle).
Regarding claim 3, Makropoulos in view of Lee and Singh teaches the learning device according to claim 2.
Markopoulos and Lee are combined with the same rational as in claim 1.
Makropoulos with respect to Singh are combined with the same rational as in claim 1.
Makropoulos further teaches wherein the already-trained model is a model based on a neural network (Makropoulos page 5 line 6-11 The convolutional layers 120 and fully connected layers 140 are together trained to classify a certain type of input data, for example x-ray data, into one or more of a set of classifications. Supervised training of the neural network can be performed by optimizing parameters or weights within the neural network using a set of training data for which the 10 classifications are known in order to cause the neural network to output the correct classifications for the training data set [based on a neural network]. Page 15-22, Referring now to Figure 4, there is shown the use of the trained probabilistic model 440 (the model 350 from Figure 3 once trained).
During inference of the base network 420, 430, 445, 450, the latent space 430 of the transformed input is observed and variants of the latent space 430 according to the previously learned probabilistic modelling function 440 are injected back into the network. After observing a large number of outputs based on the injected variants, frequent changes of the output 450 are assumed to indicate uncertainty, whereas in contrast no changes or minor changes are assumed to indicate robust decision making [wherein the already-trained model].),
Singh further teaches and wherein the probabilistic inference result generation means is configured at least one processor is configured to execute the instructions to generate the probabilistic inference result based on the probabilistic inference model that is a model obtained by probabilistically setting one or more weight parameters of the already-trained model to 0 (Singh
PNG
media_image2.png
635
1005
media_image2.png
Greyscale
[to generate the probabilistic inference result based on the probabilistic inference model]
Para 077, Thus, based on the dropout probability distribution, the adversarially-robust neural-network training system 102 performs an act 608 to drop neurons from the neural network 304 for the given training iteration. More specifically, the adversarially-robust neural-network training system 102 probabilistically selects and drops neurons from the neural network 304 based on the dropout probability distribution. For example, the adversarially-robust neural-network training system 102 drops neurons from the penultimate layer 406 (as illustrated in FIG. 5) to prevent multiple neurons from learning features that are too similar to features learned by other neurons [that is a model obtained by probabilistically setting one or more weight parameters of the already-trained model to 0]).
Claim(s) 4 are rejected under 35 U.S.C. 103 as being unpatentable over Makropoulos in view of Lee and further view of Barsoum, Emad, John Kender, and Zicheng Liu. "Hp-gan: Probabilistic 3d human motion prediction via gan." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018 (“Barsoum”).
Regarding claim 4, Makropoulos in view of Lee teaches the learning device according to claim 1.
Markopoulos and Lee are combined with the same rational as in claim 1.
Makropoulos does not explicitly teach wherein the formatted inference result is inputted, together with the input data, to an input layer to which the input data is inputted, or wherein the formatted inference result is inputted to an intermediate layer that is different from the input layer.
However Barsoum teaches wherein the formatted inference result is inputted, together with the input data, to an input layer to which the input data is inputted, or wherein the formatted inference result is inputted to an intermediate layer that is different from the input layer (Barsoum page 1533 3.1. Generative Adversarial Networks para 1, Generative adversarial networks(GAN) was introduced by [8]. It is a unsupervised learning technique inspired by the minimax theorem [34], in which the generator network and the discriminator network try to outdo each other. The training itself alternates between both networks. In the original paper, the generator learns to generate images close to real images, and the discriminator learns to distinguish between the generated image and the real image from the dataset. In a steady state, the discriminator should predict if an image from the generator network is generated or not with 50% probability.
PNG
media_image3.png
506
956
media_image3.png
Greyscale
Page 1534 3.2. Human pose prediction GAN (HP-GAN) para 1, 2 line1-6, and 3 1-6,
Figure 2 shows the high level diagram of our proposed GAN network, called HP-GAN, for Human Pose prediction. HP-GAN combines features from WGAN-GP [10], from GAN [8], and from sequence-specific optimization, in The ”Generator” block shown in Figure 2 is the sequence-to-sequence network defined in Figure 1. As described earlier, it takes as input previous poses and a z vector, and produces a sequence of human poses. The z vector is a 128-dimensional float vector drawn from a uniform or Gaussian probability distribution. ”Future poses” are the ground truth future poses from the dataset, and ”Prior poses” are their corresponding previous poses.
Prior poses and future poses are concatenated together to form a real pose sequence. Similarly, prior poses and generated poses are concatenated together to form a fake pose sequence. Both real and fake sequences are used for the critic WGAN-GP loss and for the discriminator GAN loss
The critic network is a three-layer fully connected feedforward network that outputs a single value. This value is unbounded and is used for the WGAN-GP loss. This WGAN-GP loss is the same loss as in [10]. To train the generator to produce a realistic human pose, we add two additional losses to the WGAN-GP loss. [wherein the formatted inference result is inputted, together with the input data, to an input layer to which the input data is inputted,]).
Makropoulos and Barsoum are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Makropoulos to incorporate the teachings of Barsoum to input the generated data into the input layer. Doing so improve allow the critic network to trained the generator to produce realistic human pose (Barsoum Page 1535 3.2 Human pose prediction GAN (JP-GAN) para 4, The critic network is a three-layer fully connected feed Forward network that outputs a single value. This value is unbounded and is used for the WGAN-GP loss. This WGAN-GP loss is the same loss as in [10]. To train the generator to produce a realistic human pose, we add two additional losses to the WGAN-GP loss. The first one is the consistency, or pose gradient loss, which focuses on smoothing the sequence of predicted poses. The second loss is the bone loss, which focuses on reducing the changes to the bone lengths between the predicted skeleton and the ground truth. The details of each of the losses are described below.).
Regarding claim 5, Makropoulos in view of Lee teaches the learning device according to claim 4.
Markopoulos and Lee are combined with the same rational as in claim 1.
Markopoulos and Barsoum are combined with the same rational as in claim 4.
Barsoum further teaches wherein the formatting means is configured at least one processor is configured to execute the instructions to format the probabilistic inference result into a data format necessary for input to the input layer or to the intermediate layer (Barsoum Page 1535 3.2. Human pose prediction GAN (HP-GAN), The critic network is a three-layer fully connected feedforward network that outputs a single value. This value is unbounded and is used for the WGAN-GP loss. This WGAN-GP loss is the same loss as in [10]. To train the generator to produce a realistic human pose, we add two additional losses to the WGAN-GP loss. The first one is the consistency, or pose gradient loss, which focuses on smoothing the sequence of predicted poses. The second loss is the bone loss, which focuses on reducing the changes to the bone lengths between the predicted skeleton and the ground truth. The details of each of the losses are described below [probabilistic inference result into a data format necessary for input to the input layer].).
Claim(s) 6 are rejected under 35 U.S.C. 103 as being unpatentable over Makropoulos in view of Lee and further view of Hinterstoisser, Stefan, et al. "On pre-trained image features and synthetic images for deep learning." Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 2018 (“Hinterstoisser”).
Regarding claim 6, Makropoulos in view of Lee teaches the learning device according to claim 1.
Markopoulos and Lee are combined with the same rational as in claim 1.
Markopoulos does not explicitly teach wherein the at least one processor is further configured to execute the instructions to apply, to data used for training of an already- trained model whose inference result is to be corrected by the correction learning model, an augmentation that is not used in the training, to thereby generate the input data and the correct answer data corresponding to the input data.
However Hinterstoisser teaches wherein the at least one processor is further configured to execute the instructions to apply, to data used for training of an already- trained model whose inference result is to be corrected by the correction learning model, an augmentation that is not used in the training, to thereby generate the input data and the correct answer data corresponding to the input data (Hinterstoisser page 4 3 Method, In this section, we will present our simple synthetic data generation pipeline and describe how we change existing state-of-the-art object detectors to enable them to learn from synthetic data. In this context, we will focus on object instance detection. Throughout this paper, we will mainly consider Faster-RCNN [2] since it demonstrated the best detection performance among a whole family of object detectors as shown in [1]. However, in order to show the generability of our approach, we will also present additional quantitative and qualitative results of other detectors (RFCN [ 4] and Mask-RCNN [ 19]) in Section 4. 7.
Page 5 Fig. 3.
PNG
media_image4.png
317
858
media_image4.png
Greyscale
Page 5 3.1 Synthetic Data Generation Pipeline para 2, This principle is an important assumption for our synthetic data generation pipeline, shown in Fig. 3. For each object, we start by generating a large set of poses uniformly covering the pose space in which we want to be able to detect the corresponding object. As in [30], we generate rotations by recursively dividing an icosahedron, the largest convex regular polyhedron. We substitute each triangle into four almost equilateral triangles, and iterate several times. The vertices of the resulting polyhedron give us then the two out-of-plane rotation angles for the sampled pose with respect to the coordinate center. In addition to these two out-of-plane rotations, we also use equally sampled in-plane rotations. Furthermore, we sample the scale logarithmically to guarantee an approximate linear change in pixel coverage of the reprojected object between consecutive scale levels.
3.2 Freezing a Pre-Trained Feature Extractor para 2 line4-6 and para 3,
As discussed in the introduction, for the feature extractor, we use frozen weights pre-learned on real images, to enable training the remaining part of the architecture on synthetic images only.
In practice, we use the Google's public available OpenSource version [l] of FasterRCNN and RFCN, and our own implementation of Mask-RCNN. The 'frozen' parts are taken according to [1], by training InceptionResnet and Resnetl0l on a classification task on the ImageNet-CLs dataset. We freeze InceptionResnet (v2) after the repeated use of blockl 7 and right before layer Mixed_7a and Resnetl0l after block3. All other remaining parts of the networks are not 'frozen', meaning their weights are free to adapt when we train the detector on synthetic images. [to apply, to data used for training of an already- trained model whose inference result is to be corrected by the correction learning model, an augmentation that is not used in the training,].).
Makropoulos and Hinterstoisser are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Makropoulos to incorporate the teachings of Hinterstoisser to augment data for an already trained model. Doing so allow the model to detect object form all possible viewpoints (Hinterstoisser page 14 Conclusion para 2, Our experiments suggest that simple rendering is sufficient to achieve good performances and that complicated scene composition does not seem necessary. Training from rendered 3D CAD models allows us to detect objects from all possible viewpoints which makes the need for a real data generation and expensive manual labeling pipeline redundant.).
Claim(s) 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Makropoulos in view of Lee and further view of Nowozin et al. (US2019/0392587) (“Nowozin”).
Regarding claim 7, Makropoulos in view of Lee teaches the learning device according to claim 1.
Markopoulos and Lee are combined with the same rational as in claim 1.
Makropoulos does not explicitly teach wherein the already-trained model whose inference result is to be corrected by the correction learning model is trained with labels based on separate name definition in which feature points in a symmetrical relation are separately labeled,
and wherein the correction learning model is trained with labels based on same name definition in which the feature points in the symmetrical relation are labeled as a same label, and
wherein at least one processor is configured to execute the instructions to generate the formatted inference result labeled based on the same name definition into which the probabilistic inference result labeled based on the separate name definition is converted.
However Nowozin teaches wherein the already-trained model whose inference result is to be corrected by the correction learning model is trained with labels based on separate name definition in which feature points in a symmetrical relation are separately labeled, and wherein the correction learning model is trained with labels based on same name definition in which the feature points in the symmetrical relation are labeled as a same label (Nowozin (para 0029, The conditional variational autoencoder 20 of FIG. 2 comprises a pair of connected networks, an encoder network 202 and a decoder network 206, which are trained neural networks. The encoder network 202 takes input data 201, and compresses it into a smaller, dense representation. The conditional variational autoencoder 20 encoding outputs two vectors: a vector of mean 203 and a vector of standard deviation 204. The sampler 205 is able to draw multiple samples, a single sample or a mean as a single sample from the mean vector 203 and the standard deviation vector 204. The decoder network 206 receives the input 2D data and also the output of the sampler 205. Each of the samples is then decoded, or decompressed, by the decoder network 206 before being outputted 207 from the conditional variational autoencoder 20.
Para 0030, The conditional variational autoencoder 20 of FIG. 2 is arranged to receive a plurality of 2D coordinates of feature points, for example of a body, hand, mechanical device, etc., and an encoding of their visibility and predict 2D coordinates of any non-visible feature points of the object. The blocks of FIG. 2 and their function are discussed below [trained with labels based on separate name definition]
Para 0031, The Output 2D data block 207 represents the output of the conditional variational autoencoder 20 where the 2D coordinates of undetected ( or occluded) feature points y={y' EJR} are predicted.
Para 0036, In some cases, only a partial set of all feature points are observed. This could be due to occlusions or due to a failure of a feature detection system. In such a case, the visibility mask, v, encodes which feature points of an object are detected and which are not detected. Through the use of the visibility mask at the combine data block 208, the conditional variational autoencoder 20 will predict locations of the features which are not detected and combine them with the locations of the features that are detected in a single data set.
Para 0038 line 1-6, FIG. 3a illustrates an exemplary input presented at Input 2D Data block 201, whereby the first row are feature identifiers for the feature points in an image pre-processed to extract visible feature information. The pre-processing was to detect five feature points in the image: nose, l_arm, r_arm, l_leg and r_leg [in which feature points in a symmetrical relation are separately labeled,].
Para 0039, FIG. 3b illustrates an exemplary output presented at Output 2D Data block 207, for the input illustrated in FIG. 3a. The conditional variational autoencoder 20 predicted 2D location data of y 3 for the r_arm feature point [wherein the correction learning model is trained with labels based on same name definition in which the feature points in the symmetrical relation are labeled as a same label,]),
and wherein the formatting means is configured at least one processor is configured to execute the instructions to generate the formatted inference result labeled based on the same name definition into which the probabilistic inference result labeled based on the separate name definition is converted (Nowozin para 0036, In some cases, only a partial set of all feature points are observed. This could be due to occlusions or due to a failure of a feature detection system. In such a case, the
visibility mask, v, encodes which feature points of an object are detected and which are not detected. Through the use of the visibility mask at the combine data block 208, the conditional variational autoencoder 20 will predict locations of the features which are not detected and combine them with the locations of the features that are detected in a single data set.
Para 0040, FIG. 3c illustrates the result of the Combine Data block 208 whereby the predicted 2D location data presented at Output 2D Data block 207 is combined with the input presented at Input 2D Data block 201 illustrated in FIG. 3a and the 2D location data predicted by the conditional variational autoencoder 20 replaces the r_ arm feature point 2D location data input presented at Input 2D Data block 201.).
Makropoulos and Nowozin are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Makropoulos to incorporate the teachings of Nowozin to generate formatted predictions based on the labels. Doing so to predict location of the features which are not detected and combine them with features of that are detected (Nowozin Para 0006, A system to predict a location of a feature point of an articulated object from a plurality of data points relating to the articulated object of which some possess and some are missing 2D location data. The data points are input into a machine learning model that is trained to predict 2D location data for each feature point of the articulated object that was missing location data.
Para 0036, In some cases, only a partial set of all feature points are observed. This could be due to occlusions or due to a failure of a feature detection system. In such a case, the visibility mask, v, encodes which feature points of an object are detected and which are not detected. Through the use of the visibility mask at the combine data block 208, the conditional variational autoencoder 20 will predict locations of the features which are not detected and combine them with the locations of the features that are detected in a single data set.).
Regarding claim 8, Makropoulos in view of Lee and Nowozin teaches the learning device according to claim 7.
Markopoulos and Lee are combined with the same rational as in claim 1.
Markopoulos and Nowozin are combined with the same rational as in claim 8.
Nowozin further teaches wherein the at least one processor is configured to execute the instructions to generate, as the input data, a reversed image obtained by reversing an image used for training the already-trained model which learned to extract feature points of an object having a symmetry shown in the image,
and to generate correct answer data corresponding to the reversed image from the correct answer data corresponding to the image,
wherein the at least one processor is configured to execute the instructions to train the correction learning model based on the formatted inference result, the reversed image, and the correct answer data corresponding to the reversed image (Nowazin para 0036, In some cases, only a partial set of all feature points are observed. This could be due to occlusions or due to a failure of a feature detection system. In such a case, the visibility mask, v, encodes which feature points of an object are detected and which are not detected. Through the use of the visibility mask at the combine data block 208, the conditional variational autoencoder 20 will predict locations of the features which are not detected and combine them with the locations of the features that are detected in a single data set [and to generate correct answer data corresponding to the reversed image from the correct answer data corresponding to the image,].
Para 0041, FIG. 4 illustrates a method for training of the conditional variational autoencoder 20 to predict feature point locations of bodies. A RGB video stream of a population of different people performing a set of articulated object arrangements in front of a camera is recorded. The detected set of feature points of each person is assigned to one of the training data 440, validation data 450 or test data 460, which are subsequently used to train the parameters 8, 1.jJ and cp by maximizing the evidence lower bound objective function. The data in each set is augmented by removing 2D coordinates of some of the feature points at random 441, 451, 461. Optionally, each data set 442, 452, 462 can be further augmenting by applying a rigid body transformation to the recorded articulated object arrangements and the addition of noise, thus generating views of the articulated object arrangements from different angles and locations thereby increasing the size of the training set 442, validation set, 452 and test set 462 [to generate, as the input data, a reversed image obtained by reversing an image used for training the already-trained model which learned to extract feature points of an object having a symmetry shown in the image,] (i.e. the articulated object can be arranged from a different angle that can include reversing).
Para 0042, The training objective for the conditional variational autoencoder 20 model is ELBOcvAE of Equation 2. Deep learning optimizers suitable for the machine learning task include stochastic gradient descent, adam and smorms3. The training set 442 is used to obtain estimates for the parameters 8, iji and ~ of the conditional variational autoencoder 20. Training may be performed for several iterations of the optimizer through the training set 442, known as "epochs" to produce a plurality of alternative models. Each of the plurality of alternative models may then be evaluated using the validation set 452. The validation accuracy is used to choose between multiple alternative models, for example, the alternative models will vary as a result of choices, for example due to varying a number of layers in a neural network. To assess the accuracy of the conditional variational autoencoder 20, a separate validation and test step are further employed. The model with the greatest accuracy after being tested on the validation set 452 is selected and verified on the test set 462 to obtain a final performance estimate [wherein the training means is configured at least one processor is configured to execute the instructions to train the correction learning model based on the formatted inference result, the reversed image, and the correct answer data corresponding to the reversed image.]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALFREDO CAMPOS whose telephone number is (571)272-4504. The examiner can normally be reached 7:00 - 4:00 pm M - F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALFREDO CAMPOS/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129