DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. This is in response to the applicant response filed on 09/09/2025. In the applicant’s response, claims 1, 5-6, 11-12 were amended; claims 13-14 were newly added. Accordingly, claims 1-14 are pending and being examined. Claims 1, 11, and 12 are independent form.
3. The rejections of the claims under 35 U.S.C. 101 have been withdrawn in view of applicant’s amendments.
Claim Rejections - 35 USC § 103
4. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
5. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. Claims 1-3, 5, and 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Yamato et al (WO2018/163555, hereinafter “Yamato” (i.e., “Yamato-Eng”) in view of Cheng (CN111259699, hereinafter “Cheng”). A machine translated English version (Cheng-Eng hereinafter) of document CN111259699 is provided by the examiner with this office action.
Regarding claim 1, Yamato (i.e., Yamato-Eng except for the figures; same as below.) discloses an intention detection device (the human behavior recognition system/method, see figs.1-3) comprising: at least one memory configured to store instructions; at least one processor (these hardware related features, see figs.1-2) configured to execute the instructions to:
generate preprocessed data associated with a human and a relevant object by processing a detection signal outputted by a sensor (see 10 of fig.3 and para.4, lines 27-28: “The image acquisition unit 10 acquires image data D1 of an image (here, a moving image) generated by the imaging device 200.” Wherein “a human” and “a relevant object” can be human B1 and bed B2 (or chair B3) shown in fig.1);
identify a motion pattern of the human (see 30 of fig.3 and pa.6, lines 21-26: “the human body feature extraction unit 30 outputs the human posture feature data D3 shown in the extracted image... Here, “person's posture characteristics” are extracted from the characteristics of the posture of the human body such as walking and sitting.”) and a relation(-ship) between the human and the object based on the preprocessed data (see 40 of fig.3 and pg.8, lines 25-36: “the peripheral feature extraction unit 40 outputs the extracted peripheral feature data D4 to the peripheral feature filter unit 50... More preferably, the “peripheral feature” includes information indicating a positional relationship between the peripheral object and each part of the human body (for example, a positional relationship with the human hand).”); and
detect at least one of an activity, a gesture or a predicted step regarding the human based on the identified motion pattern and the identified relation to integrate lexical descriptions of the at least one of the activity, the gesture or the predicted step (see 60 of fig.3 and pg.12, lines 17-20: “the behavior/action determination unit 60 determines the action class of the person shown in the image based on the time series data of the human posture feature data D3 and the filtered peripheral feature data D4a [extracted from the peripheral object]”; wherein the behavior classes are described by the words of “getting up from the chair” and the words of “sitting on the chair”, respectively; see pg.12, lines 17-31),
and detect an intension of the human based on the integrated lexical descriptions (ibid. wherein the person’s behavior is described by the words of “getting up from the chair” and the words of “sitting on the chair”, respectively; see pg.12, lines 17-31);
As explained above, the mere difference is that: Yamato does not explicitly disclose “to control a robot using a driving signal generated based on the detected intension” as recited by claim 1. However, in the same field of endeavor, Cheng, see Cheng-Eng, pg.8, lines 31-37, teaches: “an application scene of the intelligent nursing robot in this embodiment, based on gesture recognition process recognizing the video target is in the care of the hand put in the state of the chair armrest is, pattern recognition process based on myoelectric signal identified is nursing target muscle contraction process, judging whether the nursing by motion prediction model of human target the next action is intended for hand supporting body getting up or stand up, then it can be driving intelligent auxiliary nursing robot provides the corresponding action.” It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Cheng into the teachings of Yamato and control an intelligent nursing robot based on the detected person’s intension such as “getting up from the chair” to assist the person. Suggestion or motivation for doing so would have been to simultaneously perform human behavior recognition based on human motion information including moving images or video and human body surface electromyogram signal as taught by Cheng, see Abstract. Therefore, the claim is unpatentable over Yamato in view of Cheng.
Regarding claim 2, the combination of Yamato and Cheng discloses the intention detection device according to claim 1, wherein the least one processor is configured to execute the instructions to perform classification of the motion pattern and the relation by unsupervised or semi-supervised learning (Yamato-Eng, see “action/behavior discriminating 60” by the “learning process 70” in fig.3 and pg.5, lines 4-9: “The behavior discriminating unit 60 [after learning process 70 based on the teacher data D6] discriminates the behavior class of the person shown in the image based on the posture feature data D3 and the filtered peripheral feature data D4a, and outputs the result data D5.”).
Regarding claim 3, the combination of Yamato and Cheng discloses the intention detection device according to claim 2, wherein the at least one processor is configured to execute the instructions to map the motion pattern and the relation belonging to the same class to the same lexical description through the classification (Yamato-Eng, wherein the action classes include “the action of “[the human] getting up from the chair”” and “the action of “[the human] sitting on the chair””; see pg.12, lines 25-31).
Regarding claim 5, the combination of Yamato and Cheng discloses the intention detection device according to claim 1, the at least one processor is configured to execute the instructions to generate a dynamic variation signal from the preprocessed data (Yamato-Eng, see the feature vectors generated in 61 unit of fig.8; see pg.13, lines 15-39) and partition and normalize the preprocessed data by detecting characteristic time-instants based on the dynamic variation signal to identify the motion pattern (Yamato-Eng, see the normalization unit 62 and the action class determination unit 64 in fig.8; see pg.14, line 28—pg.15, line 9).
Regarding claims 11, 12, each of which is an inherent variation of claim 1, thus it is interpreted and rejected for the reasons set forth in the rejection of claim 1.
Regarding claim 13, the combination of Yamato and Cheng discloses the intention detection device according to claim 1, further comprising: a memory which stores a library regarding motion patterns, wherein each entry of the library at least indicates a class of the motion patterns, class-specific criterion for determining if a motion pattern belongs to the class, and associated lexical description with respect to the class (Yamato-Eng, see pg16, lines 1-5: “In addition, the learning unit 70 uses, for example, time series data of human posture features and peripheral features, and teacher data in which a correct behavior class is associated, and the intermediate layer 61 and the all coupling unit 63 of the behavior determination unit 60. Network parameters (e.g., weighting factor, bias)”. See pg.12, lines 25-31, wherein the behavior classes are described by the words of “getting up from the chair” and the words of “sitting on the chair”, respectively;).
Regarding claim 14, the combination of Yamato and Cheng discloses the intention detection device according to claim 1, wherein the motion pattern / object relation identifier detects one or more motion primitives based on the preprocessed data and identifies the motion pattern which includes at least one motion primitive of the one or more motion primitives (Yamato-Eng, see pg.12, lines 25-31, wherein the behavior classes are described by the words of “getting up from the chair” and the words of “sitting on the chair”, respectively).
7. Claims 4, and 6-10 are rejected under 35 U.S.C. 103 as being unpatentable over Yamato in view of Cheng and further in view of Niebles et al (“Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words”, 2009, hereinafter “Niebles”).
Regarding claim 4, the combination of Yamato and Cheng discloses the claimed invention except for “the least one processor is configured to execute the instructions to gradually enhance a library through the unsupervised or semi-supervised learning” as recited in the claim. However, in same field of endeavor, that is, in the field of human action classification in video, Niebles teaches the unsupervised leaning of human action classification learned/tested by using three datasets in order to enhance the robustness of the system (see Sec. 4, para.1). It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Niebles into the teachings of the combination of Yamato and Cheng and use more datasets as taught by Niebles to train the system of Yamato. Suggestion or motivation for doing so would have been to enhance the robustness of the system. Therefore, the claim is unpatentable.
Regarding claim 6, the combination of Yamato, Cheng, and Niebles discloses, wherein the at least one processor is configured to execute the instructions to identify a lexical description of the motion pattern and the relation (Yamato, see pg.12, lines 25-31, wherein the action classes include “the action of “[the human] getting up from the chair”” and “the action of “[the human] sitting on the chair””.), and the at least one processor is configured to execute the instructions to convert the lexical description to data in a numerical format to detect the at least one of the activity, the gesture or the predicted step (Niebles, see Sec. 3.2, para.1: “In order to learn the vocabulary of spatial-temporal words, we consider the set of descriptors corresponding to all detected spatial-temporal interest points in the training data. This vocabulary (or codebook) is constructed by clustering using the k-means algorithm and Euclidean distance as the clustering metric. The center of each resulting cluster is defined to be a spatial-temporal word (or codeword). Thus, each detected interest point can be assigned a unique cluster membership, i.e., a spatial-temporal word, such that a video can be represented as a collection of spatial-temporal words from the codebook.”).
Regarding claim 7, the combination of Yamato, Cheng, and Niebles discloses the intention detection device according to claim 6, wherein the at least one processor is configured to execute the instructions to perform nonlinear dynamic processing (Yamato, see 61a block of fig.8, wherein 61a processes the nonlinear input vectors obtained at different times, namely, time t and t-1) and nonlinear static processing (Yamato, see blocks 61j, 61k and 61i of fig.8, wherein all the blocks process the nonlinear input vectors obtained at same time t-3) of the data in the numerical format to detect the at least one of the activity, the gesture or the predicted step (Yamato, see last block 64 action class determination of fig.8).
Regarding claim 8, the combination of Yamato, Cheng, and Niebles discloses the intention detection device according to claim 7, the at least one processor is configured to execute the instructions to perform second nonlinear static processing (Yamato, see blocks 61a, 61b and 61c of fig.8, wherein all the blocks process the nonlinear input vectors obtained at same time t) of data derived from the nonlinear dynamic processing and the nonlinear static processing to detect the at least one of the activity, the gesture or the predicted step (Yamato, see last block 64 action class determination of fig.8).
Regarding claim 9, the combination of Yamato, Cheng, and Niebles discloses the intention detection device according to claim 8, wherein the at least one processor is configured to execute the instructions to perform the second nonlinear static processing further based on a dynamic variation signal generated from the preprocessed data and timing information regarding the motion pattern (Yamato, see input vector A(t) of fig.8 which is a dynamic variation signal shown by D4 and/or D4a in fig.7).
Regarding claim 10, the combination of Yamato and Cheng dose not disclose, if a lexical description of the motion pattern is unknown, search a motion pattern library for the lexical description and evaluate consistency of the lexical description over time. However, in same field of endeavor, that is, in the field of human action classification in video, Niebles teaches “In our implementation, we rely on the codebook to handle scale changes and camera motions. As long as the newly observed local features do not contain patterns of scale change and camera motion that are extremely different from those observed in the data used to form the codebook, we expect that similar local features will be assigned to consistent memberships on the codebook”. See page 305, the left col, the portion of the last paragraph. It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was made to incorporate the teachings of Niebles into the teachings of the combination of Yamato and Cheng and use unsupervised learning approach to classify human action in videos as taught by Niebles. Suggestion or motivation for doing so would have been “detect[ing] relevant activities in surveillance video, summarizing and indexing video sequences, organizing a digital video library according to relevant actions, etc”; cf, Sec. 1, para.2 of Niebles. Therefore, the claim is unpatentable.
Response to Arguments
8. Applicant's arguments with regarding claim 1 filed on 09/09/2025 have been considered but are moot in view of the new ground(s) of rejection. Specifically, as explained in the rejection of the claim, the combination of Yamato and Cheng discloses, suggests or renders in obvious the argued limitations. The examiner therefore maintains rejections.
Conclusion
9. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
10. Any inquiry concerning this communication or earlier communications from the examiner should be directed to RUIPING LI whose telephone number is (571)270-3376. The examiner can normally be reached 8:30am--5:30pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, HENOK SHIFERAW can be reached on (571)272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit https://patentcenter.uspto.gov; https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center, and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RUIPING LI/Primary Examiner, Ph.D., Art Unit 2676