DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s amendments to the specification, filed 01 August 2025, with respect to paragraph 0031 have been fully considered and may be entered.
Response to Arguments
Applicant’s arguments, filed 01 August 2025, with respect to the rejection under U.S.C. 112(b) of claims 6-10 and 13-17 have been fully considered and are persuasive. The rejection under U.S.C. 112(b) of claims 6-10 and 13-17 has been withdrawn.
Applicant’s arguments, filed 01 August 2025, with respect to the rejection under U.S.C. 101 of claims 1-17 have been fully considered and are persuasive. The rejection under U.S.C. 101 of claims 1-17 has been withdrawn.
Applicant’s arguments, filed 01 August 2025, with respect to the rejection under U.S.C. 102(a)(1) of claims 1, 4, and 11 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground of rejection under U.S.C. 103 is made in view of Holohan (U.S. Publ. No. 20200285844 A1).
Applicant’s arguments, filed 01 August 2025, with respect to the rejection under U.S.C. 103 of claims 8-10 have been fully considered and are persuasive. The rejection under U.S.C. 103 of claims 8-10 has been withdrawn.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Bhanu et al. (US Publication No. 20200394413 A1) in view of Holohan (U.S. Publ. No. 20200285844 A1).
In regards to Claim 1, Bhanu teaches a method (Reference “method”, see Specification paragraph 0002) for identifying critical points of motion (Reference “keypoint”, see Specification paragraph 0057, where a keypoint may be interpreted as a critical point) in a video (Reference “video sequence”, see Specification paragraph 0057) comprising: using a feature extraction neural network trained to identify features among a plurality of images of movements (Reference “features”, “3D-CNN”, and “tracklets”, see Specification paragraph 0083, where a 3D-CNN is a neural network and is used to extract features from tracklets which is a series of images) wherein the features are associated with a critical point of movement (Reference “keypoint”, see Specification paragraph 0057 where the keypoints are associated in frames of a soccer player in a dribbling action) wherein the feature extraction neural network is trained using engineered labels to generate a custom neural network (Reference “labelled”, see Specification 0114 where the engineering specifications of their label system is described including human body keypoints, bounding boxes, and ground truth masks. These are used to train a neural network generating a customized neural network) specifically designed for extracting high-level features that distinguish critical motion points (Reference “keypoints of soccer players”, see Specification paragraph 0057 where the resulting neural network is able to analyze every frame of soccer players using pose information to obtain keypoints of soccer players, specifically the one who is dribbling the soccer ball which reads as the critical motion point in the frames); receiving a video sequence including a plurality of video frames comprising a motion captured by a camera (Reference “video camera”, see Specification paragraph 0113, where the video camera is operated by a camera operator who pans and zooms in a soccer match) identifying, by the feature extraction neural network, individual frames from the plurality of video frames (Reference “DEI” and “Mask R-CNN”, see Specification paragraph 0076, where MASK R-CNN is a neural network and is used as an image registration method on the video sequence) wherein the identified individual frames include a known critical point in the captured motion based on identified features associated with a critical point of movement (Reference “discriminator”, see Specification paragraph 0067, where the discriminator is a neural network which takes an image and is trained to identify labels to images. Also note Reference “OpenPose”, “DEI”, and “dribble energy image”, see Specification paragraph 0057 and 0059 further describing individual frame registration in a player’s dribble motion) wherein the feature extraction neural network converts two-dimensional images into compressed representations optimal for temporal training (Reference “spatial-temporal information”, see Specification paragraph 0059 where the frames of a video sequence are encoded into spatial-temporal information in a single image and is done so to specifically improve training of a neural network); and displaying the identified frames which show the critical points in the captured motion (Reference “sequences” in 300, 302, 304 of Figure 3 and “registrations” in 400 of Figure 4, and note both figures showing different registrations and sequences generated and how they can be displayed).
However, Bhanu fails to disclose wherein the critical points are specific points in the motion that are more critical to generating an outcome of success from the movement than other points Instead, Holohan teaches wherein the critical points are specific points in the motion that are more critical to generating an outcome of success from the movement than other points (Reference “phases” and “most important aspects of jump shot”, see Specification paragraph 0049 where the five phases described as the most important aspects to making a jump shot are described. Further, in paragraph 0060 the specific frame associated with each of these critical points in the motion is described such as the frame showing the crest of the jump for the release phase. The features of each of these phases or critical points and what causes them to be attributes to a successful jump shot or jump shooter are described in paragraph 0063 for example describing the desired quick release of a jump shooter). The motivation for using critical points which are specific points that are important to creating a success is also described by Holohan (See Specification paragraph 0063) where when for example the system can recommend drills to create a quicker release which would in turn decrease likelihood for a defender to react to such a shot. Further, the adaptability of Holohan’s invention is also described (See Specification paragraph 0096) to be easily performed on various activities including golf swings, running, or other physical activities. Therefore, it would have been obvious to one in the art of ordinary skill before the effective filing date to modify Bhanu in view of Holohan to specifically use critical points that are associated with an outcome of success.
Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Bhanu et al. (US Publication No. 20200394413 A1) in view of Holohan (U.S. Publ. No. 20200285844 A1) further in view of Kadav (U.S. Publ. No. 20210082144 A1).
Regarding Claim 2, Bhanu teaches the method of claim 1, further comprising a temporal neural network (Reference “spatial-temporal information”, see Specification paragraph 0084, where the CNN processes the DEI containing spatial-temporal information) but fails to teach training a temporal neural network based on input from the feature extraction neural network. Instead, Kadav teaches training (Reference “Training”, see Specification paragraph 0078 which describes the annotated or labelled videos input to the system as training data sets) a temporal neural network (Reference “temporal matching”, see Figure 3 and second instance of Figure 5 [Note this likely should have been labelled Figure 6 when Bhanu was filed], as well as Specification paragraph 0114 where the transformer network performing the temporal matching is a neural network) based on an input of the neural network feature extraction neural network (Reference “keypoint estimation”, see Specification paragraph 0110, where the aforementioned keypoint estimations are generated by the network). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Regarding Claim 3, Bhanu teaches the method of claim 2 as disclosed above.
However, Bhanu fails to teach converting an output of the temporal neural network model into time stamps of the plurality of video frames. Instead, Kadav teaches converting an output of the temporal neural network model into time stamps of the plurality of video frames (Reference “timestamp”, see Specification paragraph 008 and Figure 2 timestamp for each pair of poses is described). Kadav also teaches the motivation to modify Bhanu when discussing needing to match frames with only the temporally closest frames (see Specification paragraph 0062-0063, where the last segmentation tokens in the transformer network are described. Note the distance between timesteps can be set in a range from 1 which is the closest timestep in temporal distance or as mentioned to another constant such as 4 to overcome “irregular framerates”). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Claims 4-7 and 11-17 are rejected under 35 U.S.C. 103 as being unpatentable over Bhanu et al. (US Publication No. 20200394413 A1) in view of Kadav (U.S. Publ. No. 20210082144 A1).
Regarding Claim 4, Bhanu discloses A method of automated identification of frames for video analysis of critical motion (as noted above in rejection of Claim 1, frames in keypoints of soccer dribbling actions are identified by a method) comprising: receiving a raw sequence of video data (as noted above in rejection of Claim 1, the system and method disclosed receives video from a camera) receiving selections of the raw sequence of video data, wherein the selections include manual entered labels (see Specification paragraph 0158 where these images are annotated to categorize players with and without the ball by five experts which is used and input as training data and alternatively replaced) automatically converting the manual entered labels to machine readable labels (Reference “bounding box”, “keypoint”, and “affine transformation”, see Specification paragraph 0114-0116, where bounding boxes and keypoints label the images and are sorted into sets. Also note the original image and its boundary box receive a transformation in the method, where a transformation would clearly show a digital or machine readable representation has been made in order to achieve this computer method) wherein the converting includes determining optimal statistical ranges through image similarity scoring (Reference “player’s joint model’s”, see Specification paragraph 0070 where the mean of a joint model structure is described which in turn is based upon the image registration techniques described such as using image sequences of the players’ dribbling to determine hip joint area, see Specification paragraph 0060 and 0061 for further details of this image registration) to determine which images (Reference “Dribbling style classification”, see Specification paragraph 0076 where each frame is processed by a Mask R-CNN to classify different dribbling styles. These styles are shown in Table 1 such as the stepover or chop being dribbling styles) should be counted as keyframes for training (Reference “training”, see Specification paragraph 0078 describing selection of frames for training a GAN); training a neural network feature extractor based on the machine readable labels (Reference “Training Set”, see Specification paragraphs 0165, Table 5 where the previously mentioned annotated data of players with and without the ball are used to train a CNN, and see Table 6 and Figure 22 where the accuracy of many different trained neural networks which were also trained with annotated data are shown to predict player with ball. Similarly, see Specification paragraph 0070 describing the training of a GAN using the previously converted temporal information) to identify video frames that include critical points of motion (Reference “R-CNN”, see Specification paragraph 0055, where the R-CNN localizes and segments the player performing dribbling skill from each frame), wherein the neural network feature extractor is configured to convert two-dimensional images into compressed representations optimal for temporal training (Reference “encoding” and “temporal”, see Specification paragraph 0059 or as described previously in rejection of claim 1 where the images are encoded into spatial-temporal information which specifically allows for better or more optimal training), and wherein the neural network feature extractor incorporates a custom loss function for self-evaluation and optimal maximum learning (Reference “loss function” and “”, see Specification paragraph 0070 where a loss function is described which helps refine data mappings it has learned. Further loss functions are described in paragraph 0097-0100 where classification, regression, and localization losses are also described which are used in evaluation and learning).
However, Bhanu does not necessarily disclose wherein the raw sequence of video data includes frames with unique timestamps. Instead, Kadav discloses wherein the raw sequence of video data includes frames with unique timestamps (Reference “timestamp”, see Specification paragraph 0070 describing the 1 timestamp between poses). Further, it is noted timestamps in a video data are widely implemented already in many well know and common video formats such as mp4 or MPEG which are widely used video file formats which by definition are time-based media files where a timestamp is associated to each frame of video data. Therefore, under KSR rationale A, this is a combination of prior art elements according to known methods to yield predictable results. That is, the video data Bhanu encode into spatial-temporal information likely already includes frames with unique timestamps. The argument can also be made the limited number of video formats available, and the backing of the ISO standard for files such as MP4 or MPEG, KSR Rationale E “Obvious to Try” also clearly applies in that frames with unique timestamps would already be included in the most popular file formats. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhanu in view of Kadav to include timestamps in vide data.
Claim 5 is rejected due to similar limitations as stated above in rejection of Claim 2.
Regarding Claim 6, Bhanu discloses the method of claim 4 as disclosed above.
However, Bhanu fails to disclose extracting a selected video frame for each unique timestamp; labeling the extracted video frames; grouping remaining unlabeled video frames; and calculating a temporal distance between labeled video frames and unlabeled video frames extracting a selected video frame for each unique timestamp (Reference “pose” and “timestep”, see Specification paragraph 000062-63, where poses extracted are labelled by respective timestep); labeling the extracted video frames (Specification paragraph 0078, where the labelling and annotation of the frames and poses are described of the data set fed into the matching networks); grouping remaining unlabeled video frames (Reference “premise” and “hypothesis”, see Specification paragraph 0055 where the poses are divided into premise and hypothesis pairs with a premise by definition being the basis of inference and a hypothesis being the unlabeled frame to be matched); and calculating a temporal distance between labeled video frames and unlabeled video frames (Reference “TOKS”, “temporal based oks”, and “temporal matching”, see Specification paragraph 0055 where the timesteps or distances are known as inputs and as mentioned in paragraph 0070’s description of Figure 2, the temporal distance for each pair is shown in the segment axis). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Regarding Claim 7, Bhanu teaches the method of claim 6, but fails to disclose associating unlabeled video frames with a labeled video frame based on a shortest temporal distance between each unlabeled video frame to the labeled video frames. Instead, Kadav discloses associating unlabeled video frames with a labeled video frame based on a shortest temporal distance between each unlabeled video frame to the labeled video frames (see Specification paragraph 0062-0063, where the last segmentation tokens in the transformer network are described. Note the distance between timesteps can be set in a range from 1 which is the closest timestep in temporal distance or as mentioned to another constant such as 4 to overcome “irregular framerates”). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Claim 11 is rejected due to similar limitations as stated above in Claim 4. Additionally, Bhanu discloses a computer program product (Reference “software”, see Specification paragraph 0180, where the system is implemented as software that reads as a computer program product) , the computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media (Reference “software” and “processing system memory”, see Specification paragraph 018, where a processing system memory is a computer readable storage media and the software is stored upon it).
Claim 12 is rejected due to similar limitations as stated above in Claims 5 and 11.
Claim 13 is rejected due to similar limitations as stated above in Claims 6 and 11.
Claim 14 is rejected due to similar limitations as stated above in Claims 7 and 11.
Regarding Claim 15, Bhanu teaches the method of claim 14, but fails to teach wherein the program instructions further comprise generating a similarity score for each labeled video frame and its associated unlabeled video frames based on an image comparison of content in the labeled video frame and the labeled video frame's associated unlabeled video frames. Instead, Kadav discloses generating a similarity score (Reference “similarity score”, see previously cited Figure 3 and second instance of Figure 5 describing the similarity score generated) for each labeled video frame and its associated unlabeled video frames based on an image comparison of content in the labeled video frame and the labeled video frame's associated unlabeled video frame (As previously mentioned in Claim 5, the pairs of both a “premise” and “hypothesis” read as labelled and unlabeled video frames being matched and are paired together or in other words associated to each other). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Regarding Claim 16, Bhanu teaches the method of claim 15, but fails to teach wherein the program instructions further comprise labeling one of the unlabeled video frames in the event the similarity score for the labeled video frame and the labeled video frame's associated unlabeled video frames falls within a threshold range of values. Instead, Kadav teaches labeling one of the unlabeled video frames in the event the similarity score for the labeled video frame and the labeled video frame's associated unlabeled video frames falls within a threshold range of values (Reference “ID”, “Match Score”, and “Similarity score”, see Figure 3 where the ID assignment chooses the maximum from a range of values; also note the similarity score as mentioned in rejection of claim 8). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Regarding Claim 17, Bhanu teaches the method of claim 16, but fails to teach wherein the program instructions further comprise using the unlabeled video frames that has been labeled in training the neural network feature extractor to identify video frames that include critical points of motion. Instead, Kadav teaches using the unlabeled video frames that has been labeled in training the neural network feature extractor to identify video frames that include critical points of motion (see Figure 3 and note the previous rejections of claims 8 and 9 discussing the labelled frames, but also note that in Figure 3 and 5 all frames identified are keypoints of a human in motion which read as a critical point of motion). Kadav also teaches the motivation to modify Bhanu by saving resources over purely visual approaches (Reference “Pose Entailment” see Specification paragraph 0053) and makes the network less “susceptible to unwanted variations such as lighting”. Therefore, it would have been obvious to one of ordinary skill before the effective filing date to modify Bhanu with a temporal neural network as taught by Kadav.
Allowable Subject Matter
Claims 8-10 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding Claim 8, Bhanu in view of Kadav teaches the method of Claim 7 as shown above, but fails to teach generating a similarity score for each labeled video frame and its associated unlabeled video frames based on an image comparison of content in the labeled video frame and the labeled video frame's associated unlabeled video frames, wherein the similarity score is calculated using one of a normalized root mean squared error algorithm or a structural similarity index algorithm, wherein the method further comprises: collecting similarity scores across all videos in a training dataset removing outliers from the collected similarity scores; determining a median value from the similarity scores after removing the outliers; and using the median value as a threshold, wherein unlabeled video frames with similarity scores less than or equal to the median value are labeled as keyframes, and unlabeled video frames with similarity scores greater than the median value are placed in a no-label category. The closest teaching to this claim is found in Specification paragraph 0080 of Kadav where normalized probabilities and average precisions are described in relation to scores of the athlete keypoints tracked. However, this fails to teach the limitations of claim 8 and the claims it depends upon as a whole. Therefore, claim 8 contains allowable subject matter.
Claims 9 and 10 are also objected to being dependent upon Claim 8 with no further rejections.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER JOHN RODGERS whose telephone number is (703)756-1993. The examiner can normally be reached 5:30AM to 2:30PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached on (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALEXANDER JOHN RODGERS/Examiner, Art Unit 2661
/KATHLEEN M BROUGHTON/
Primary Examiner, Art Unit 2661