Office Action Analysis: 18505761 — MULTI-CAMERA MACHINE LEARNING VIEW TRACKING

Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 17 is objected to because of the following informalities:
In Claim 17, “The system of claim 10, wherein the computer program further causes the hardware processor to an output from a visual branch to an output of a location branch to combine the visual and location information” should read as “The system of claim 10, wherein the computer program further causes the hardware processor to add an output from a visual branch to an output of a location branch to combine the visual and location information.”
Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 5, 8, 10-12, 14, and 17 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by Quach et. al (“DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking”).

Regarding Claim 1, Quach teaches a method for tracking movement, comprising: 
Introduction, pg. 1: “Multi-Camera Multiple Object Tracking (MC-MOT) plays an essential role in computer vision due to its potential in many real-world applications such as self-driving cars, crowd behavior analysis, anomaly detection, etc.”
performing person detection in frames from multiple video streams to identify detection images;
3.1 Problem Formulation, pg. 3: “Similar to prior MC-MOT methods [16, 26], the task of tracking in each camera is assumed to be performed by an off-the-shelf single-camera MOT tracker. We choose DeepSORT [41] in this work, but it can be simply replaced by any other MOT trackers. At time step t, we obtain a set of local tracklets L(t) c ={l(t)} provided by each single-camera MOT tracker, where each l(t) feature vector.”
Explanation: DyGLIP assumes per-camera tracking/detection. DeepSORT includes person detection per frame. 
combining visual and location information from the detection images to generate scores for pairs of detection images across the multiple video streams and across frames of respective video streams;
3.2 Dynamic Graph Formulation, pg. 3: “We denote f(v) as the feature vector associate with a node v ∈ V(t)… We denote f(v) as the feature vector associate with a node v ∈ V(t).”
3.3.1 Graph Structural Attention Layer, pg. 4: “In other words, the structural attention layer (SAL) takes the concatenation of node embeddings or features, i.e., f(v) ∈ RDF, and its camera positional encoding, i.e., cv ∈ RDC, as the input, ev = {f(v)||cv} ∈ RDE, where DE =DF +DC.”
3.4 Link Prediction, pg. 4: “Given transformed features of a pair of nodes (e(t) vi and e(t) vj), we compute the features or measurement that represent the similarity between those two nodes, and then it will be used as input for the classifier… The higher the score is, the more likely the two nodes are linked.”
Explanation: Re-ID features correspond to the visual appearance information in 3.2, and the authors explicitly combine visual features (ReID) and camera positional encoding (location context) as shown in 3.3.1. The excerpt from 3.4 Link Prediction corresponds to generating scores for pairs of detection images across multiple cameras over time. 
generating a pairwise detection graph using the detection images as nodes and the scores as weighted edges (Fig. 2 (shown below)); 

    PNG
    media_image1.png
    331
    705
    media_image1.png
    Greyscale

3.2 Dynamic Graph Formulation, pg. 3: “At a particular time step t, we construct a graph G(t) = (V(t), E(t)), where the vertex set Vt contains all the track lets tracked up to time t…Given two nodes vi and vj, an edge exists that links the two vertices if these two tracklets represent the same object.”
3.4 Link Prediction, pg. 4: “The classifier provides a probability score s ∈ [0,1]. The higher the score is, the more likely the two nodes are linked.”
Explanation: The detection images correspond to nodes, the pairwise scores correspond to edges, and scores represent the edge weights. Figure 2 visually illustrates nodes and predicted links forming connected components. 
and changing from a current view of the multiple video streams to a next view of the multiple video streams, responsive to a determination that a score between consecutive frames of the view is below a threshold value and that a score between coincident frames of the current view and the next view is above the threshold value (Fig. 6 (shown below)).  

    PNG
    media_image2.png
    251
    718
    media_image2.png
    Greyscale

Abstract: “Compared to existing methods, our new model offers several advantages, including bet ter feature representations and the ability to recover from lost tracks during camera transitions.”
3.4 Link Prediction, pg. 4: “The classifier provides a probability score s ∈ [0,1]. The higher the score is, the more likely the two nodes are linked.”
3.5 Model Learning, pg. 5: “We first use a binary cross-entropy loss function to enforce nodes within a connected component to have similar feature embeddings.”
4.3 Ablation Studies, pg. 6: “More specifically, this section aims to demonstrate the following appealing properties of the proposed method: (1) Better feature representations, even in severe changes in lighting conditions between the cameras; and (2) Recovery of correct representations for objects that lost tracks during camera transitions.”
Explanation: These excerpts show thresholding. If score is above threshold, then link. If score is below threshold, then no link. This enables switching association across cameras. Figure 6 shows correction of identity after camera transition. 

Regarding Claim 2, Quach teaches the method of claim 1, further comprising synchronizing the multiple video streams to identify temporal correspondences between frames of the multiple video streams (Fig. 2 (shown above)).  
3.3.2 Graph Temporal Attention Layer, pg. 4: “Atemporal attention layer (TAL) is designed to capture the temporal evolution scheme in terms of links between nodes in a set of dynamic graphs…In particular, for each node v ∈ V(t), we combine timestamp position encoding (pv ∈ RDH) with the output from the SAL to obtain an order-aware sequence of input features for TAL as, xv = h1 v +p1 v,h2 v +p2 v,··· , hW v + pW v , where W is temporal window size, i.e., W = 3 in our experiments, for each node v ∈ V.”
Explanation: DyGLIP explicitly models temporal relationships across frames and cameras. Figure 2 shows frames at t-W, t-1, and t, demonstrating synchronization of temporal information across cameras. 

Regarding Claim 3, Quach teaches the method of claim 1, further comprising extracting the visual information based on a visual similarity between detection images (Fig. 4 (shown below)).  

    PNG
    media_image3.png
    233
    714
    media_image3.png
    Greyscale

3.2 Dynamic Graph Formulation, pg. 3: “From our experiments, we choose f(v) to be the re-id feature of the tracklet associated with node v.”
3.4 Link Prediction, pg. 4: “Given transformed features of a pair of nodes (e(t) vi and e(t) vj), we compute the features or measurement that represent the similarity between those two nodes, and then it will be used as input for the classifier…We try with two different measurements and classifiers in Section 4.3, i.e., cosine distance by computing dot product of two feature vectors with Sigmoid as classifier…”
Explanations: These excerpts show similarity scoring. Figure 4 shows clustering of visual features (ReID vs attentional features).

Regarding Claim 5, Quach teaches the method of claim 1, wherein generating the pairwise detection graph includes determining edges between detection images from different frames of a same video stream and determining edges between detection images from different video streams at corresponding times (Fig. 2 (shown above)).
3.2 Dynamic Graph Formulation, pg. 3: “At a particular time step t, we construct a graph G(t) = (V(t), E(t)), where the vertex set Vt contains all the track lets tracked up to time t…Given two nodes vi and vj, an edge exists that links the two vertices if these two tracklets represent the same object.”
Explanation: These excerpts show dynamic graph construction and edge definition. The nodes include tracklets across frames (temporal) and across cameras (multi-stream). Figure 2 explicitly illustrates nodes from multiple cameras and links across time and across cameras. 

Regarding Claim 8, Quach teaches the method of claim 1, wherein combining the visual and location information includes adding an output from a visual branch to an output of a location branch.
3.3.1 Graph Structural Attention Layer, pg. 4: “In other words, the structural attention layer (SAL) takes the concatenation of node embeddings or features, i.e., f(v) ∈ RDF, and its camera positional encoding, i.e., cv ∈ RDC, as the input, ev = {f(v)||cv} ∈ RDE, where DE =DF +DC.”
Explanation: This explicitly describes combining visual features (ReID) and location features (camera encoding) through a branch fusion mechanism. 

Regarding Claim 10, Quach teaches all of the limitations as in the consideration of claim 1 above. Quach further teaches the hardware processor and memory that perform the same steps as claim 1.
4.2 Experimental Setup, pg. 6: “The attention module is implemented in Tensorflow [1].”
Explanation: Tensorflow uses a hardware processor and memory. 

Regarding Claim 11, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 2 above. 

Regarding Claim 12, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 3 above. 

Regarding Claim 14, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 5 above. 

Regarding Claim 17, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 8 above. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Quach et. al in view of He et. al (“Multi-Target Multi-Camera Tracking by Tracklet-to-Target Assignment”).

Regarding Claim 4, Quach teaches the method of claim 1, but fails to teach that it further comprises extracting the location information based on a projection of two-dimensional coordinates into a three-dimensional environment for the detection images and determining a distance between the projected coordinates. While Quach teaches multi-camera tracking and spatial relationship between tracked objects, he does not teach projecting 2D coordinates into a 3D environment and determining a distance between projected coordinates. 
	However, He demonstrates the projection of 2D coordinates into another spatial environment, the use of projected coordinates for spatial reasoning, and location-based comparison, stating that “for a local tracklet TL i (u), we obtain its reference plane projection by projecting foot point of each bounding box into the global coordinate system of the reference plane, where the reference plane locations of the foot points are calculated by using the homography correspondence between camera I and the reference plane…when two local tracklets u and v are from the same camera (i.e., i=j), we calculate their similarity score according to their appearance and motion similarities in the image plane…” (D. Similarity Measure of Local Tracklets, pg. 7). 
	Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to include projection-based location extraction as taught by He. He explicitly identifies why projection and spatial reasoning are needed, stating that “dramatic variations in visual appearance and ambient environment caused by different viewpoints from different cameras make the cross-camera local tracklet matching extremely difficult…” (Introduction, pg. 2). Because cross-camera tracking is difficult due to viewpoint variation, a person of ordinary skill in the art would have been motivated to project 2D detections into a common spatial coordinate system and compare distances in order to improve association accuracy and achieve better spatial consistency across cameras. This represents the use of a known technique to improve similar systems and combining prior art elements according to known methods to yield predictable results. 

Regarding Claim 13, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 4 above. 

Claims 6-7, 15-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Quach et. al in view of Ross (US2022022006A1). 

Regarding Claim 6, Quach teaches the method of claim 1, but fails to teach that it further comprises performing an action that includes generating a report for a healthcare professional for decision-making related to a patient’s treatment, based on tracked movement of the patient.
	However, Ross teaches generating notifications and actionable information for healthcare professionals based on tracked patient movement, stating that “in one important aspect of the invention, the patient's movement to or from an area can also trigger secondary actions…by way of example but not limitation, the charge nurse can get a text message when a new patient arrives on the unit; and/or a surgeon can receive a message when his patient is in the OR…” (paragraph [0059]). Ross also explains that location data can be used by healthcare personnel to monitor patients, stating that “an “inpatient” physician could see a live list of all of their patients' locations” (paragraph [0060]). 
	Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the tracking system of Quach to generate reports or notifications for healthcare professionals based on tracked patient movement as taught by Ross. Ross explains that monitoring patient movement allows clinical staff to be informed when certain events occur, such as patient arrival or movement between hospital units (see paragraph [0059] above). A person of ordinary skill in the art would have recognized that incorporating such reporting functionality into a multi-camera tracking system would improve the usefulness of the tracking information for clinical workflow and decision-making. Applying Ross’s reporting functionality to the tracking framework of Quach would have been a predictable use of known techniques to improve an existing system. 

Regarding Claim 7, Quach teaches the method of claim 1, but fails to teach that the current view includes a detected person within a healthcare facility and wherein the multiple video streams are generated by video cameras within the healthcare facility. While Quach teaches detecting persons in video frames and using multiple video streams from multiple cameras, he does not teach that the detected person is within a healthcare facility. 
	However, Ross explicitly teaches tracking people within a healthcare facility environment, stating that “in accordance with the present invention, and looking now at FIGS. 1 and 2, a wearable wireless tracker module 20, such as a Wi-Fi module, is used to individually tag a patient 30 and track their movements within a hospital environment” (paragraph [0056]). Ross further explains that the system tracks individuals throughout a healthcare facility, stating that “the present invention comprises the provision and use of a novel smart/integrated indoor positioning system (IPS) for tracking the location of a patient within a healthcare environment” (paragraph [0041]). 
	Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to apply the multi-camera tracking system of Quach within the healthcare environment described by Ross. Ross explains that hospitals need improved systems to determine patient locations, stating that “patients are constantly being moved about for diagnostic tests and surgeries/interventions during a hospitalization… for all hospital staff, a patient's current location is usually shared via word of mouth despite its obvious drawbacks” (paragraph [0005/0008]). A person of ordinary skill in the art would have been motivated to use multi-camera techniques such as those taught by Quach to monitor individuals within healthcare facilities in order to improve tracking and monitoring of patient movement within hospitals. Implementing the known multi-camera tracking system of Quach in the known healthcare monitoring environment of Ross would have been a predictable application of prior art elements according to their established functions. 

Regarding Claim 15, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 6 above. 

Regarding Claim 16, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 7 above. 

Regarding Claim 19, it is rejected for the reasons discussed above with respect to claims 1, 6, and 7, the limitations of which are substantially reproduced in claim 19. 

Regarding Claim 20, Quach in view of Ross teaches the method of claim 19, and additional limitations are met as in the consideration of claim 5 above. 

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Quach et. al in view of Veas et. al (“Techniques for View Transition in Multi-Camera Outdoor Environments”). 

Regarding Claim 9, Quach teaches the method of claim 1, but fails to teach that changing from the current view to the next view includes changing a display on a user interface device to display the next view on a screen. While Quach teaches transitioning between views and multi-camera system context, he does not teach changing a display on a user interface device and displaying the next view on a screen.
	However, Veas teaches UI-based display transitions between camera views. Veas teaches displaying video streams on a user device, stating that “mobile computers and wireless video transmission can enable a user in an outdoor environment to observe video feeds from multiple cameras deployed in the user’s surroundings” (Introduction, pg. 1). Veas further teaches changing displays to show different views, stating that “each of them initially shows the local camera’s video…a button press (Figure 3(A)) takes the user to a transition view presenting the spatial relationship between local and remote camera…a second button press transitions to the remote view” (Viewpoint transitions in multi-camera environments, pg. 3). Veas further teaches UI-based switching between views, stating that “using the techniques, users can browse the video stream from either the local or the remote camera, or they can smoothly move to a view where both videos are visible” (Fig. 2 caption, pg. 4). Lastly, Veas teaches explicit display interaction, stating that “while being in the field rather than in the office, the view shown on the display device is inconsistent with what the observer perceives directly” (View discrepancy, pg. 3).
	Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to include UI-based display switching as taught by Veas. Veas identifies the need to visualize and transition between camera views, stating that “different techniques exist to visualize the incoming video streams… the techniques allow mobile users to gain extra information about the surroundings, the objects and the actors in the environment by observing a site from different perspectives” (Abstract). Because multi-camera systems require visualization and navigation between views, a person of ordinary skill in the art would have been motivated to apply UI-driven transitions between camera views using display updates in order to enable users to view tracked objects across cameras and improve usability and situational awareness. This represents the use of a known technique to improve similar systems and combining prior art elements according to known methods to yield predictable results.

Regarding Claim 18, Quach teaches the system of claim 10, and additional limitations are met as in the consideration of claim 9 above. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Fang et. al (US 20230237801 A1) teaches methods, devices, apparatuses, computing platforms, and articles related to providing multi-camera or multi-view person or object association in continuous frames. It would also render a 102 rejection for claims 1-3, 5, 8, 10-12, 14, and 17. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM ADU-JAMFI whose telephone number is (571)272-9298. The examiner can normally be reached M-T 8:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached at (571) 270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WILLIAM ADU-JAMFI/Examiner, Art Unit 2677                                                                                                                                                                                                        
/ANDREW W BEE/Supervisory Patent Examiner, Art Unit 2677
Read full office action
MULTI-CAMERA MACHINE LEARNING VIEW TRACKING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

MULTI-CAMERA MACHINE LEARNING VIEW TRACKING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email