Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10 February, 2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1 – 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps. See MPEP § 2172.01.
Claim 1 states the limitations “executing appearance modeling of the nodes” and “executing motion modeling of the nodes”. As claimed, it is unclear what “appearance modeling” and “motion modeling” are intended to mean. As understood by the examiner, a step of modeling when in the context of image processing generally involves a step of generating or modifying an image or 3D space to include the related information. The claimed step however does not appear to perform this function as it relies on self-attention layers within a graph transformer network and no form of output is detailed. As such, the function of this step is left ambiguous by the term “appearance modeling” being unduly broad. The examiner is unsure if this step is intended to generate or modify an existing image with information that is determined by the GTN or if this step performs some other function.
The same rationale applies to the term “motion modeling”. As written in the claim, it is not clear how motion modeling is performed or what it entails with respect to the applicant’s case. The examiner is unable to determine if the motion modeling is generating or tracking movement of a node specifically within an image or if it pertains to a different function. As such, the term “motion modeling” is also unduly broad.
Claims 2 - 20 are rejected for the same rationale. Additionally, without understanding the bounds of the claimed terms “appearance modeling” and “motion modeling”, the bounds of the dependent claims cannot be understood.
Additionally, with respect to claims 1, 11, and 20, it is not clear if the motion modeling is performed on “the nodes” after the appearance modeling and possible alteration of said nodes, or if the motion modeling is performed on “the nodes” that are generated in the step of “maintaining a graph comprising nodes and weighted edges between at least a portion of the nodes.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 – 5, 7 – 15, and 17 - 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ramezani et al (U.S. Patent No. 11361449 B2, hereinafter “Ramezani”) in view of Liao et al (U.S. Patent Publication No. 2024/0242462 A1, hereinafter “Liao”).
Regarding claim 1, Ramezani teaches a method of three-dimensional object tracking across cameras, comprising, by a computer system:
receiving detection outcomes generated by a three-dimensional object detector from a plurality of synchronized camera inputs (Col. 4, Lines 45 – 48: Referring to FIG. 1A, a sensor system 10 can capture scenes S1, S2, S3 of the environment 12 at times t1, t2, and t3 respectively, and generate a sequence of images I1, I2, I3, etc.; Col 4, Lines 54 – 57: In some implementations… each point has certain coordinates in a 3D space… ; Col. 4, Lines 61 – 64: After the sensor system 10 generates the images I1, I2, I3, etc., an object detector 12 can detect features F1, F2, F3, and F4 in image I1, features F’1, F’2, F’3, and F’4 in image I2, and features F’’1, F’’2, F’’3, and F’’4 in image I3.);
responsive to the receiving, maintaining a graph comprising nodes and weighted edges between at least a portion of the nodes (Col. 5, Line 11 – 25: As illustrated in FIG. 1A, the message passing graph 50 generates layer 52A for the image I1 generated at time t1, layer 52B for the image I2 generated at time t2 and layer 52C for the image I3 generated at time t3. In general, the message passing graph 50 can include any suitable number of layers, each with any suitable number of features (however, as discussed below, the multi-object tracker 14 can apply a rolling window to the message passing graph 50 so as to limit the number of layers used at any one time). Each of the layers 52A-C includes feature nodes corresponding to the features the object detector 12 identified in the corresponding image (represented by oval shapes). The features nodes are interconnected via edges which, in this example implementation, also include edge nodes (represented by rectangular shapes); Examiner’s note: There is no indication of a relationship between the graph and the detected objects from the limitation above. Examiner recommends indicating a relationship between the maintained graph and the object detections of two limitations. The claim as written shows no reason to assume they are related in any sense.);
executing appearance modeling of the nodes via (Col. 3, Line 55- 58 : The system then can implement message passing in accordance with graph neural networks message propagation techniques, so that the graph operates as a message passing graph (with two classes of nodes, feature nodes and edge nodes).; Col. 7, Line 6 – 16: As indicated above, each layer of the graph 50 or 70 represents a timestep, i.e., corresponds to a different instance in time. The data association can be understood as a graph problem in which a connection between observations in time is an edge. The multi-object tracker 14 implements a neural network that learns from examples how to make these connections, and further learns from examples whether an observation is true (e.g., the connection between nodes d30 observation is true (e.g., the connection between nodes d30 and d11) or false (e.g., resulting in the node d21 not being connected to any other nodes at other times, and thus becoming a false positive feature node).); and
executing motion modeling of the nodes (Col. 6, Line 15 – 23: In particular, feature nodes and d10, d20 and d30 can be considered predicted true positive detection nodes connected to other predicted true positive detection nodes ( e.g., d30 to d11) in later-in-time layers. The predicted true positive detection nodes are connected across layers in a pairwise manner via finalized edges 80, generated after data association. Thus, the multi-object tracker 14 determines where the same object is in the imagery collected at time t and in the imagery collected at time t+1.).
Ramezani does not explicitly teach executing appearance modeling of the nodes via a self-attention layer of a graph transformer network.
However, Liao does teach executing appearance modeling of the nodes via a self-attention layer of a graph transformer network (¶ 0070: Graph node data 118 may have any suitable data structure that indicates a likelihood for one or more nodes of the node graph represented by node graph data is a game focus and/or includes a sporting object.; ¶ 0080: Processing continues at operation 1103, where a graph node classification model is applied to the sets of features of the node graph to detect a game focus region of the scene… In some embodiments, the graph node classification model is a graph attentional network.; Examiner’s note: Examiner is interpreting appearance modeling to mean any feature, location, or physical characteristic calculations of the nodes in light of the indefinite rejection.);
Additionally, Liao teaches executing motion modeling of the nodes (¶ 0060: The operations for a player and region combination are illustrated with respect to player 810 and region 804. As shown, player 810 is moving in a movement direction 831. Movement direction 831 (nip) for player 810 may be detected using player detection and temporal tracking using any suitable technique or techniques.; Examiner’s note: Examiner is interpreting motion modeling to mean any motion characteristic calculations of the nodes in light of the indefinite rejection.).
Ramezani and Liao are considered to be analogous art as both pertain multi-camera multi-object tracking. Therefore, it would have been obvious to one of ordinary skill in the art to combine the neural network for object detection (as taught by Ramezani) and game focus estimation in team sports for immersive video (as taught by Liao) before the effective filing date of the claimed invention. The motivation for this combination of references would be the system of Liao attains and refines single camera information by associating all single camera information so as to improve accuracy. (See ¶ 0043).
This motivation for the combination of Ramezani and Liao is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim 2, the Ramezani and Liao combination teaches the method of claim 1.
Additionally, Liao teaches wherein the nodes represent tracked objects comprising at least one of appearance features or motion features (¶ 0027: Such raw data is transformed to graph node classification model input data, which may include a node graph and a set of features for each node of the node graph.; ¶ 0028: For each node, node features ( e.g., features corresponding to American football and designed to provide accurate and robust classification by the graph node classification model such as a GCN or GNN) are generated, and the node graph and feature sets are provided to or fed to a pretrained graph node classification model such as DeepGCN to perform a node classification.).
Regarding claim 3, the Ramezani and Liao combination teaches the method of claim 1.
Additionally, Ramezani teaches wherein the weighted edges are computed based at least in part on node similarity (Col. 6, Line 15 – 23: In particular, feature nodes and d10, d20 and d30 can be considered predicted true positive detection nodes connected to other predicted true positive detection nodes ( e.g., d30 to d11) in later-in-time layers. The predicted true positive detection nodes are connected across layers in a pairwise manner via finalized edges 80, generated after data association.).
Regarding claim 4, the Ramezani and Liao combination teaches the method of claim 3.
Additionally, Ramezani and Liao teach wherein the node similarity is computed based on at least one of appearance similarity or location similarity between the tracked objects (Ramezani Col. 6, Line 15 – 23: In particular, feature nodes and d10, d20 and d30 can be considered predicted true positive detection nodes connected to other predicted true positive detection nodes ( e.g., d30 to d11) in later-in-time layers. The predicted true positive detection nodes are connected across layers in a pairwise manner via finalized edges 80, generated after data association.; Liao ¶ 0045: Notably, herein a node is defined based on data within a selected region and an edge connects two nodes when the regions have a shared boundary therebetween.).
Regarding claim 5, the Ramezani and Liao combination teaches the method of claim 1.
Additionally, Liao teaches wherein the appearance modeling yields resultant appearance-modeling data (¶ 0070: Graph node data 118 may have any suitable data structure that indicates a likelihood for one or more nodes of the node graph represented by node graph data is a game focus and/or includes a sporting object.; ¶ 0080: Processing continues at operation 1103, where a graph node classification model is applied to the sets of features of the node graph to detect a game focus region of the scene… In some embodiments, the graph node classification model is a graph attentional network.; Examiner’s note: Examiner is interpreting appearance modeling to mean any feature, location, or physical characteristic calculations of the nodes in light of the indefinite rejection.).
Regarding claim 7, the Ramezani and Liao combination teaches the method of claim 1.
Additionally, Liao teaches wherein the motion modeling yields resultant motion-modeling data (¶ 0060: The operations for a player and region combination are illustrated with respect to player 810 and region 804. As shown, player 810 is moving in a movement direction 831. Movement direction 831 (nip) for player 810 may be detected using player detection and temporal tracking using any suitable technique or techniques.; Examiner’s note: Examiner is interpreting motion modeling to mean any motion characteristic calculations of the nodes in light of the indefinite rejection.).
Regarding claim 8, the Ramezani and Liao combination teaches the method of claim 1.
Additionally, Ramezani teaches comprising post-processing the resultant motion-modeling data via motion propagation and node merging (Col. 6, Line 29 – 35: Features d1t, d2t . . . d-Nt+T within the rolling window 72 30 can be considered active detection nodes, with at least some of the active detection nodes interconnected by active edges 82, generated with learned feature representation. Unlike the nodes in the earlier-in-time layers 74, the values (e.g., GRU outputs) of the nodes and edges continue to change while 35 these nodes are within the rolling window 72.; Col. 7, Line 11 – 16: The multi-object tracker 14 implements a neural network that learns from examples how to make these connections, and further learns from examples whether an observation is true (e.g., the connection between nodes d30 and d10) or false (e.g., resulting in the node d21 not being connected to any other nodes at other times, and thus becoming a false positive feature node).).
Regarding claim 9, the Ramezani and Liao combination teaches the method of claim 8.
Additionally, Ramezani teaches wherein the post-processing comprises adding a node to the graph via link prediction (Col. 8, Line 6 – 16: update graph( ): This function is called after every timestep, to add new nodes (detections) and corresponding edges to the end of the currently active part of the graph (e.g., the part of the graph within a sliding time window, as discussed below), and fix parameters of and exclude further changes to the oldest set of nodes and edges from the currently active part of the graph, as the sliding time window no longer includes their layer. This essentially moves the sliding time window one step forward.).
Regarding claim 9, the Ramezani and Liao combination teaches the method of claim 8.
Additionally, Ramezani teaches wherein the post-processing comprises removing a node from the graph via link prediction (Col. 8, Line 17 – 22: prune graph( ): This function removes low probability edges and nodes from the currently active part of the graph using a user specified threshold. This function can be called whenever memory/compute requirements exceed what is permissible ( e.g., exceed a predetermined value(s) for either memory or processing resources).).
Regarding claim 11, claim 11 has been analyzed with regard to claim 1 and is rejected for the same reasons of obviousness as used above as well as in accordance with Ramezani’s further teaching on:
Memory (Col. 2, Lines 14 – 17: In another embodiment, a non-transitory computer-readable medium stores thereon instructions executable by one or more processors to implement a multi-object tracking architecture.);
At least one processor coupled to the memory and configured to implement a method (Col. 2, Lines 14 – 17: In another embodiment, a non-transitory computer-readable medium stores thereon instructions executable by one or more processors to implement a multi-object tracking architecture.)…
Regarding claim 12, claim 12 has been analyzed with regard to respective claim 2 and is rejected for the same reasons of obviousness as used above.
Regarding claim 13, claim 13 has been analyzed with regard to respective claim 3 and is rejected for the same reasons of obviousness as used above.
Regarding claim 14, claim 14 has been analyzed with regard to respective claim 4 and is rejected for the same reasons of obviousness as used above.
Regarding claim 15, claim 15 has been analyzed with regard to respective claim 5 and is rejected for the same reasons of obviousness as used above.
Regarding claim 17, claim 17 has been analyzed with regard to respective claim 7 and is rejected for the same reasons of obviousness as used above.
Regarding claim 18, claim 18 has been analyzed with regard to respective claim 8 and is rejected for the same reasons of obviousness as used above.
Regarding claim 19, the Ramezani and Liao combination teaches the system of claim 18.
Additionally, Ramezani teaches wherein the post-processing comprises at least one of adding a node to the graph or removing a node from the graph via link prediction (Col. 8, Line 6 – 16: update graph( ): This function is called after every timestep, to add new nodes (detections) and corresponding edges to the end of the currently active part of the graph (e.g., the part of the graph within a sliding time window, as discussed below), and fix parameters of and exclude further changes to the oldest set of nodes and edges from the currently active part of the graph, as the sliding time window no longer includes their layer. This essentially moves the sliding time window one step forward.; Col. 8, Line 17 – 22: prune graph( ): This function removes low probability edges and nodes from the currently active part of the graph using a user specified threshold. This function can be called whenever memory/compute requirements exceed what is permissible ( e.g., exceed a predetermined value(s) for either memory or processing resources).).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Liao et al (U.S. Patent Publication No. 2023/0377335 A1) teaches a system for key person recognition in multi-camera immersive video attained for a scene including detecting predefined person formations in the scene based on an arrangement of the persons in the scene, generating a feature vector for each person in the detected formation, and applying a classifier to the feature vectors to indicate one or more key persons in the scene.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW JONES whose telephone number is (703)756-4573. The examiner can normally be reached Monday - Friday 8:00-5:00 EST, off Every Other Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached at (571) 272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW B. JONES/Examiner, Art Unit 2667
/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667