DETAILED ACTION
Claim 3 has been cancelled.
Claims 12-21 have been added.
Claims 1, 2, and 4-21 are currently pending.
Claim 10 is not longer being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, due to Applicant’s amendment.
The previous rejection to claim 10 under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, is withdrawn due to Applicant’s amendment.
The previous rejections to claims 1, 2, 5-8 and 10 under 35 U.S.C. 103 are withdrawn due to Applicant’s amendment.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 11 and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Kadav et al. US Publication 2022/0101007 (hereafter “Kadav”) and Chan et al. US Publication 2022/0147736 (hereafter “Chan”).
Referring to claim 11, Kadav discloses a motion recognition method comprising:
extracting spatial features from time-series image data obtained by shooting a target object (paragraph 73, At block 1001, extract feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN));
reshaping the image data from which the spatial features are extracted to a type of time- series image data according to a following equation:
Iₛₑq (B,Seq,dim0) = reshape(X(Bx Seq,dim0))
where Iₛₑq (B,Seq,dim0) is time-series image data (paragraph 24, System 100 describes existing work where a deep learning network 104 receives a video stream 102 to localize output 106), X(Bx Seq,dim0) is image data from which spatial features are extracted (paragraph 77, At block 1009, employ video representation learning and recognition from the objects and image context to locate a target object within the video stream), and dim0 is a dimension of image data from which spatial features are extracted (paragraph 30, Regarding object detection and representation, the exemplary embodiments collapse the spatial dimensions into 1 dimension and combine the batch dimension with the temporal dimension for the feature map f);
integrating the time-series image data from which the spatial features are extracted (paragraph 25, A multi-hop transformer 118 is then employed that uses reasoning to trace back from an intermediate step where the object was fully visible towards the end of the video to fully locate the object or output 120);
extracting temporal features from the integrated time-series data (paragraph 26, Positional time encodings 215 and resolution encodings 220 are learned and summed up with feature maps 225 from the CNN 210); and
recognizing motions of the target object based on the extracted temporal features (paragraph 26, Tracking 235 is then performed by applying a Hungarian algorithm to match objects between every two consecutive frames. The N object tracks and the 1 track of image features from the CNN 210 are added with the learned positional time encoding 215 to form the memory input to the proposed Multi-hop Transformer 250, which further accepts a video query and produces the latent representation of the video).
While Kadav discloses integrating the time-series image data, Kadav does not disclose expressly integrating the time-series image data and time-series key point data of the target object.
Chan discloses a step of integrating the time-series image data and time-series key point data of the target object (paragraph 50, Process 600 subsequently combines a sequence of extracted skeleton figures of the detected person extracted from the sequence of video images to form a skeleton sequence of the detected person which depicts a continuous motion of the detected person (step 606)).
Before the effective filing date of the claimed invention, it would have obvious to a person of ordinary skill in the art to use keypoints to extract temporal features. The motivation for doing so would have been to increase the accuracy of detecting movement within images. Therefore, it would have been obvious to combine Chan with Kadav to obtain the invention as specified in claim 11.
Referring to claim 19, Kadav discloses wherein the extracting temporal features comprises extracting the temporal features from the integrated time-series data by using a transformer encoder (paragraph 25, A multi-hop transformer 118 is then employed that uses reasoning to trace back from an intermediate step where the object was fully visible towards the end of the video to fully locate the object or output 120).
Referring to claim 20, Chan discloses adding an index and position information of each key point to the time-series key point data (paragraph 31, In some embodiments, each extracted keypoint in the set of extract keypoints is defined by: a keypoint index corresponding to a particular body joint; either a set of 2D X and Y-coordinates in a 2D plane, or a set of 3D X-, Y-, and Z-coordinates in a 3D space; and a probability value for the predicted body joint of that keypoint).
Referring to claim 21, Chan discloses wherein the adding comprises:
generating an index of each key point through input embedding (paragraph 22, Cloud server 104 is further configured to re-organize the received skeleton sequences, including indexing the received skeleton sequences based on one or more data attributes. These data attributes that can be used to index the received skeleton sequences can include, but are not limited to: people IDs, camera IDs, group IDs (e.g., people that belong to different monitoring groups), and recording timestamps); and
generating position information of each key point through positional encoding (paragraph 31, In some embodiments, each extracted keypoint in the set of extract keypoints is defined by: a keypoint index corresponding to a particular body joint; either a set of 2D X and Y-coordinates in a 2D plane, or a set of 3D X-, Y-, and Z-coordinates in a 3D space; and a probability value for the predicted body joint of that keypoint).
Allowable Subject Matter
Claims 1, 2, 4-10, 12-18 are allowed.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER K HUNTSINGER whose telephone number is (571)272-7435. The examiner can normally be reached Monday - Friday 8:30 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Q Tieu can be reached at 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PETER K HUNTSINGER/Primary Examiner, Art Unit 2682