DETAILED ACTION
This action is responsive to the Application filed on 3/11/2026. Claims 1-25 are pending in the case. Claims 1, 11 and 25 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 03/11/2026 have been fully considered but they are not persuasive.
With respect to the rejection under 35 U.S.C. 101:
Applicant quotes claim 1 and asserts that the claim cannot be performed in the human mind and as such is not directed to a mental process.
Examiner disagrees. Applicant’s analysis does not conform to the guidance in the MPEP. A suggestion that the entire claim cannot be performed in the mind does not demonstrate the claim is not directed to a mental process. Step 2A prong one asks whether the claim “recites” a mental process. Then, Step 2A prong two asks whether the claim includes additional elements (which may not themselves be mental processes) such that the claim is either directed to or not directed to the judicial exception. For example, a claim may recite a mental process such as “determine…. A classification of motion” and in combination include that the indication is performed by a neural network (i.e an additional element). The conclusion in Step 2A prong two is that using a neural network to perform the recited abstract idea amounts to adding the words “apply it” as described in MPEP 2106.05(f). Therefore, the claim is found to be directed to a judicial exception despite not performed entirely in the human mind.
Therefore, the rejection is maintained.
With respect to the rejection under prior art:
Applicant argues that Du does not teach the amended claim.
Examiner notes that Du is not relied upon for any of the limitations in the updated rejection.
Applicant makes not comment on the previously cited art El-Ghaish, except to note that the combination of Du in view of El-Ghaish be reconsidered and is seen as allowable in light of the deficiencies of Du.
Examiner disagrees. The rejection has been updated in view of El-Ghaish further in view of Zhang.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-25 are rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea without significantly more.
Regarding Claim 1/11/20
Under step 1, claim 1 is directed to a method, which is directed to a process, one of the statutory categories. Under step 1, claim 11 is directed to one or more processors, which is directed to a machine, one of the statutory categories. Under step 1, claim 20 is directed to a non-transitory computer-readable storage medium, which is directed to a product of manufacture, one of the statutory categories.
Under Step 2A Prong 1, the claim(s) recites the following limitations which are considered mental evaluations “update at least one classification corresponding to one or more types of motion of an object by… to determine, for each of two or more images, a classification of motion of one or more objects at least partially depicted in the two or more images… comparing the classification of at least one object determined for a first image of the two or more images with a classification of the at least one object determined for a second image of the two or more images… to correct errors in at least one of the classifications based, at least in part, on the comparison.”
The human mind is capable of updating a classification, which is a mere label about abstract data, as well as make corrections and determination and comparisons about the data and/or images.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claims recite the additional element(s) the limitations “using one or more neural networks… circuitry to use one or more computer processes … medium having stored thereon a set of instructions, which if performed by one or more processors, causes the one or more processors to use one or more neural networks… use one or more computer processes” amounts to mere instructions to apply a computer technology to an abstract idea, see MPEP 2106.05(f) consideration (2).
Accordingly, the recited additional elements, when taken alone or in combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea, nor do they amount to significantly more, under Step 2B, than the judicial exception because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Claim 2/12
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to identify motion of the one or more objects by predicting labels for the one or more objects utilizing time-series regression.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“using the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 3
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to generate one or more labels corresponding to the one or more objects in each frame of the video… to correct errors in the generated one or more labels by indicating which of the generated one or more labels is correct.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“using the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 4/14
The claim is depends upon a rejected claim. The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“wherein the one or more neural networks comprise a one- dimensional (ID) convolutional neural network (CNN).”) is generally linking the use of the judicial exception to a particular technological environment or field of use, see MPEP 2106.05(h). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 5
The claim is dependent upon a rejected claim. The claim recites additional details which further describe the previously recited abstract idea: “wherein the classification of motion of one or more objects at least partially depicted in the two or more images comprise motion information about the one or more objects” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim does not recite any more additional elements beyond those identified in the parent claim. These additional elements do not integrate the abstract idea into a practical application nor provide significantly more.
Regarding Claim 6/15/22
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to perform error correction on one or more labels determined from the classification of the at least one object determined for the first image and the classification of the at least one object determined for the second image using ground truth labels corresponding to the one or more objects” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“the one or more neural networks… wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 7
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to perform error correction on one or more labels determined from the classification of the at least one object determined for the first image and the classification of the at least one object determined for the second image by creating and returning error-corrected labels” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 8/16
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to perform error correction on one or more labels of a different set of the one or more objects.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 9/19
The claim is dependent upon a rejected claim. The claim recites additional details which further describe the previously recited abstract idea: “wherein the two or more images comprise a plurality of time-associated motion capture (mocap) data points.”, “the two or more images include mocap data.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim does not recite any more additional elements beyond those identified in the parent claim. These additional elements do not integrate the abstract idea into a practical application nor provide significantly more.
Regarding Claim 10/17/18
The claim is dependent upon a rejected claim. The claim recites additional details which further describe the previously recited abstract idea: “wherein the at least one of the classifications comprise one or more predicted labels that correspond to a phase value for each instance of mocap data… wherein the classifications comprise phase value information about the one or more objects. … to generate a predicted phase value to each image of the two or more images” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim does not recite any more additional elements beyond those identified in the parent claim. These additional elements do not integrate the abstract idea into a practical application nor provide significantly more.
Regarding Claim 13
The claim depends upon a rejected claim. The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“train the one or more neural networks using one or more labels from the at least one of the classifications and one or more ground truth labels corresponding to the one or more objects.”) is generally linking the use of the judicial exception to a particular technological environment or field of use because no limitation provide specifics or details regarding how training is performed stating that labels of object motion are used merely link the judicial exception to a field of use , see MPEP 2106.05(h). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 15
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to perform error correction by analyzing one or more labels of the at least one of the classifications” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“wherein the circuitry is to use the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 21
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to generate one or more labels in each of the first image and the second image and in the two or more images to indicate which part of the motion is being performed.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 22
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to perform error correction on an identification of the motion of the one or more objects by using one or more labels generated by the one or more neural networks.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 23
The claim is dependent upon a rejected claim. The claim recites more abstract ideas: “to indicate whether an identification of the motion of one or more objects by the one or more neural networks is correct” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recited additional elements do not integrate the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 24
The claim is dependent upon a rejected claim. The claim recites additional details which further describe the previously recited abstract idea: “wherein the two or more images are included in a single video” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim does not recite any more additional elements beyond those identified in the parent claim. These additional elements do not integrate the abstract idea into a practical application nor provide significantly more.
Regarding Claim 25
The claim is dependent upon a rejected claim. The claim recites additional details which further describe the previously recited abstract idea: “wherein the at least one of the classifications comprise motion data about the one or more objects in the two or more images.” Under Step 2A Prong 1, these limitations correspond to a mental evaluation.
The claim does not recite any more additional elements beyond those identified in the parent claim. These additional elements do not integrate the abstract idea into a practical application nor provide significantly more.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim 1-25 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation. The disclosure makes no mention of
“comparing the classification of at least one object determined for a first image of the two or more images with a classification of the at least one object determined for a second image of the two or more images;
and using the one or more neural networks to correct errors in at least one of the classifications based, at least in part, on the comparison.” (claim 1, 11 and 20).
Claim 2-10, 12-19, 21-25 are rejected by virtue of dependency
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-5, 7-12, 14-25 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by El-Ghaish et al. “Human Action Recognition Using A Multi-Modal Hybrid Deep Learning Model”
Claim 1/11/20
El-Ghaish teaches, A method comprising: using one or more computer processes to update at least one classification corresponding to one or more types of motion of an object by: (Figure 1 pg 4 caption “Upon integrating pose with part shapes, our model can discriminate between the two actions.” Pg 4-5 Section 3 “The proposed model consists of two sub-models:… The outputs of the two sub-models are combined to make the final classification decision” the final classification is an updated classification of the types of actions or motion of an object.) [claim 11] One or more processors, comprising: circuitry to use one or more computer processes … [claim 20] A non-transitory computer-readable storage medium having stored thereon a set of instructions, which if performed by one or more processors, causes the one or more processors to: use one or more computer processes (pg 8 “Experiments are conducted on a workstation with an Intel (R) Xeon(R) CPU E5-2699”) using one or more neural networks to determine, for each of two or more images, a classification of motion of one or more objects at least partially depicted in the two or more images; (pg 7 section 3.2.1 “Then, the target body parts, which are the face, left hand, right hand, left foot, and right foot, in this paper, are cropped from the image… Figure 2(b) illustrates the architecture of the CL2D sub-model, which is used with each body part to represent its shape
PNG
media_image1.png
287
663
media_image1.png
Greyscale
” as shown in the figure each of a plurality of image subframes frames are used by the FE2D model to form a classification of partially depicted motion.) comparing the classification of at least one object determined for a first image of the two or more images with a classification of the at least one object determined for a second image of the two or more images; (pg 5 “The outputs of the two sub-models are combined to make the final classification decision” the final classification serves are a comparison of classification for each of the component sub models for the plurality of images.) and using the one or more neural networks to correct errors in at least one of the classifications based, at least in part, on the comparison. (pg 4.2 “To overcome this problem, we deployed fine-tuning by training each sub-model separately. The first sub-model is trained as an in dependent model over the 3D skeleton data until no improvement in validation accuracy is observed. Then, we saved the model weights that achieved the highest validation accuracy… To train the combined model, we loaded the saved weights of the two sub-models, froze the first sub-model from further training, and left the second sub-model opened to continue training with the merging part” the model is progressively trained based on the classifications and comparison of each constituent part in the merging part. Leading to correcting the total errors in classification resulting in a more accurate error reduced model.)
Claim 2/12
El-Ghaish teaches claim 1/11
El-Ghaish teaches, using the one or more neural networks to identify motion of one or more objects by predicting labels for the one or more objects utilizing time-series regression…[from claim 12] the at least one of the classifications comprise predicted labels utilizing time-series regression. ( pg 3 “CNN can be combined with LSTM to perform classification, bringing the best of the two worlds together: automatic features extraction by the CNN, and the capturing of the temporal dependencies in sequences of these features by the LSTM…The model benefits from the presence of 3D skeleton data and image data to capture the two main aspects of an action: the motion and part shape” classification of motion and part shape corresponds to identification of the motion of one or more objects by prediction labels, capturing temporal dependencies via a lstm is an example of utilizing time series regression because it uses prior moments in time in the series to explain the variance of one variable with another. Examiner notes that the specification para. 0116 any of a list of models including LSTMs are suitable for time series regression. This paragraph appears to note a function which maps a series of temporal inputs to a series of temporal outputs performs off-line regression in specification para. 0123.)
Claim 3
El-Ghaish teaches claim 1
El-Ghaish teaches, using the one or more neural networks to generate one or more labels corresponding to the one or more objects (figure 2
PNG
media_image2.png
300
715
media_image2.png
Greyscale
multiple classifications or labels for the one or more objects is generated via the neural network) using the one or more neural networks to correct errors in the generated one or more labels by indicating which of the generated one or more labels is correct. (pg 3 “Table 1 shows the results of nineteen methods/models that were tested in both the CS and CV scenarios…. The best scores that were achieved in both CV and CS scenarios, which are 62.99 % and 70.27 %” the scores are an indication of the correct labels.)
Claim 4/14
El-Ghaish teaches claim 1/11
El-Ghaish teaches, wherein the one or more neural networks comprise a one- dimensional (ID) convolutional neural network (CNN). ( pg 7 “The overall architecture of CL1D is shown in Figure 2(a). The architecture consists of two stages: 1D Feature Extraction (FE1D ) and Classification… The feature extraction stage, FE1D, uses three consecutive convolutional layers with numbers of filters 32, 48, and 64, respectively, all of size 3×1” the CNN filters are 1 dimensional, thus a 1D CNN.)
Claim 5
El-Ghaish teaches claim 1
El-Ghaish teaches, wherein the classification of motion of one or more objects at least partially depicted in the two or more images comprise motion information about the one or more objects (pg 4 figure 1 caption “One running example of two actions, Reading and Clapping,… Upon integrating pose with part shapes, our model can discriminate between the two actions.” The actions of the classified images comprise information about the motion of the objects, i.e head movement for clapping vs reading)
Claim 7
El-Ghaish teaches claim 1
El-Ghaish teaches, further comprising using the one or more neural networks to perform error correction on one or more labels determined from the classification of the at least one object determined for the first image and the classification of the at least one object determined for the second image (figure 2
PNG
media_image2.png
300
715
media_image2.png
Greyscale
multiple classifications or labels for the one or more objects is generated via the neural network pg 9 “The first sub-model is trained… until no improvement in validation accuracy is observed” training until no improvement by definition is error correction on labels.) by creating and returning error-corrected labels. (pg 3 “Table 1 shows the results of nineteen methods/models that were tested in both the CS and CV scenarios…. The best scores that were achieved in both CV and CS scenarios, which are 62.99 % and 70.27 %” increasing accuracy scores during training, indicates errors being corrected as training progresses.)
Claim 8/16
El-Ghaish teaches claim 1/11
El-Ghaish teaches, further comprising using(figure 2
PNG
media_image2.png
300
715
media_image2.png
Greyscale
multiple classifications or labels for the one or more objects is generated via the neural network pg 8 “This dataset consists of 56880 samples of 60 action classes” pg 9 “The first sub-model is trained… until no improvement in validation accuracy is observed” training until no improvement by definition is error correction on labels. This training is performed on a dataset of a plurality of sets of samples or objects)
Claim 9/19
El-Ghaish teaches claim 1/11
El-Ghaish however when addressing human recognition teaches, wherein the at least one of the classifications comprise one or more predicted labels that correspond to a phase value for each instance of mocap data. …[ from claim 19] wherein the two or more images include mocap data. (pg 2 Introduction “The human body pose can be captured using Motion Capture (MoCap) systems… Nowadays, there is a great focus on body pose data collected using inexpensive depth sensors, such as the Kinect sensor which provide both RGB and depth images in addition to the 3D body skeleton” pg 5 “Figure 1 shows samples of two different actions, in which the poses are similar at times t, t +1, and t +2 while the body part shapes (the hands in this case) can differentiate between them” pg 6 “The 3D skeleton is an excellent source of information about the body pose and motion over time” pg 7 Section 3.1.2 “The first sub-model receives randomly chosen frames from N segments, as explained above. For the skeleton produced by the Kinect v2, which has 25 joints, only 16 joints are selected” the frames are of motion capture data captured by the Kinect. As explained, the skeletal frame data are associated with a time index in the sequence.)
Claim 10
El-Ghaish teaches claim 1
El-Ghaish teaches, the at least one of the classifications comprise one or more predicted labels that correspond to a phase value for each instance of mocap data. (pg 8 “The output of each Flatten layer of the feature extraction phase of all parts are concatenated together to generate one feature vector that represents the body shape in each action.” The body shape in each action is a classification of the phase of the objects in the motion captured data in an action)
Claim 15
El-Ghaish teaches claim 11
El-Ghaish teaches, wherein the circuitry is to use the one or more neural networks to perform error correction by analyzing one or more labels of the at least one of the classifications
(pg 8 “This dataset consists of 56880 samples of 60 action classes” pg 9 “The first sub-model is trained… until no improvement in validation accuracy is observed” the training amounts to analyzing of labels)
Claim 17
El-Ghaish teaches claim 11
El-Ghaish however when addressing human recognition teaches, wherein the at least one of the classifications comprise phase value information about the one or more objects. (pg 8 “The output of each Flatten layer of the feature extraction phase of all parts are concatenated together to generate one feature vector that represents the body shape in each action.” The body shape in each action is a classification of the phase value information of the objects in the motion captured data in an action)
Claim 18
El-Ghaish teaches claim 11
El-Ghaish however when addressing human recognition teaches, wherein the circuitry is to use the one or more neural networks to generate a predicted phase value to each image of the two or more images. (pg 8 “The output of each Flatten layer of the feature extraction phase of all parts are concatenated together to generate one feature vector that represents the body shape in each action.” The body shape in each action is a predicted classification of the phase value of the objects in the motion captured data in an action)
Claim 21
El-Ghaish teaches claim 20
El-Ghaish teaches, wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks to generate one or more labels in each of the first image and the second image and in the two or more images to indicate which part of the motion is being performed. (pg 8 “The output of each Flatten layer of the feature extraction phase of all parts are concatenated together to generate one feature vector that represents the body shape in each action.” The body shape in each action is a predicted classification of the concatenated parts in the motion captured data in an action. The location of the body parts indicates which part of the motion is performed.)
Claim 22
El-Ghaish teaches claim 20
El-Ghaish teaches, wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks to perform error correction on an identification of the motion of the one or more objects by using one or more labels generated by the one or more neural networks. (pg 5 “The outputs of the two sub-models are combined to make the final classification decision” the initial output serves as labels generated by the one or more neural networks, which result in a final identification of the motion being the final classification decision. Pg 9 “to continue training with the merging part” the model is continuing training which is error correction on an identification.)
Claim 23
El-Ghaish teaches claim 20
El-Ghaish teaches, wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to use the one or more neural networks to indicate whether an identification of the motion of the one or more objects by the one or more neural networks is correct. (pg 3 “Table 1 shows the results of nineteen methods/models that were tested in both the CS and CV scenarios…. The best scores that were achieved in both CV and CS scenarios, which are 62.99 % and 70.27 %” these assessments of validation accuracy are an indication of whether the model identification of motion is correct.)
Claim 24
El-Ghaish teaches claim 20
El-Ghaish teaches, wherein the one or more images are included in a single video. (pg 4 Figure 1 caption “One running example of two actions… where the skeleton poses are relatively similar at times t, t +1, and t+2 while the body shapes are different. Upon integrating pose with part shapes, our model can discriminate between the two actions.” Pg 4 “In our proposed model, we make use of the two notions above. First, our model uses both 3D skeleton data and image data” the model takes images from a single video over time as shown in the figure.)
Claim 25
El-Ghaish teaches claim 20
El-Ghaish teaches, wherein the at least one of the classifications comprise motion data about the one or more objects in the two or more images. (pg 4 “we benefit from our knowledge of the 3D skeleton data to focus only on the body parts such as face, left hand, right hand, left foot, and right foot, in the images because these parts typically include most of the information needed to recognize the performed action that is not captured in the skeleton data… The two sub-models benefit from the power of the CNN to extract the spatial dependencies of the input stream, which is adjacent joints in the case of the skeleton, and adjacent pixels in the case of the images” the body parts contain the information needed which captures the motion of the objects, such as spatial dependencies)
Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 6 and 13 are rejected under 35 U.S.C. § 103 as being unpatentable over Stickland further in view of El-Ghaish, further in view of Zhang “real-time action recognition with enhanced motion vector cnns”.
Claim 6
El-Ghaish teaches claim 1
El-Ghaish teaches, using(pg 9 Section 4.2 “The first sub-model is trained as an in dependent model over the 3D skeleton data until no improvement in validation accuracy…we saved only the weights of this model that achieved 35% as a validation accuracy. To train the combined model,” the errors are corrected as validation accuracy improves based on the error correction of the classified labels.)
Zhang when addressing action recognition in image frames teaches, using ground truth labels corresponding to the one or more objects. (pg 6 Section 4.3 “In the third strategy, we combine the initialization with teacher’s parameters and supervision transfer to further enhance the performance of student’s net. We first train the teacher OF-CNN with optical flow fields. Then we initialize the student MV-CNN with OF-CNN’s parameters. After this, we train the student net with supervision signal from both the teacher and the ground truth”)
One would have been motivated to make this combination because both El-Ghaish and Zhang utilize neural networks for action recognition. Further, Zhang notes that to learn robust features combining both supervision signals “allows us to combine the merits of two previous strategies and further boost the generalization ability” (Section 4.3 Zhang)
Claim 13
El-Ghaish teaches claim 11
El-Ghaish teaches, wherein the circuitry is further to train the one or more neural networks using one or more labels from the at least one of the classifications (figure 2
PNG
media_image2.png
300
715
media_image2.png
Greyscale
multiple classifications or labels for the one or more objects is generated via the neural network pg 8 “This dataset consists of 56880 samples of 60 action classes” pg 9 “The first sub-model is trained… until no improvement in validation accuracy is observed” training until no improvement by definition is error correction on labels. This training is performed on a dataset of a plurality of sets of samples or objects)
El-Ghaish does not explicitly teach, and one or more ground truth labels corresponding to the one or more objects.
Zhang when addressing action recognition in image frames teaches, and one or more ground truth labels corresponding to the one or more objects. (pg 6 Section 4.3 “In the third strategy, we combine the initialization with teacher’s parameters and supervision transfer to further enhance the performance of student’s net. We first train the teacher OF-CNN with optical flow fields. Then we initialize the student MV-CNN with OF-CNN’s parameters. After this, we train the student net with supervision signal from both the teacher and the ground truth”)
One would have been motivated to make this combination because both El-Ghaish and Zhang utilize neural networks for action recognition. Further, Zhang notes that to learn robust features combining both supervision signals “allows us to combine the merits of two previous strategies and further boost the generalization ability” (Section 4.3 Zhang)
Conclusion
Prior art:
Li et al. “recognizing unseen actions in a domain-adapted embedding space” describes the use of a CNN to process features of videos which are labeled separately by an MLP.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.R.G./
Examiner, Art Unit 2122
/KAKALI CHAKI/ Supervisory Patent Examiner, Art Unit 2122