Last updated: April 19, 2026

Application No. 18/618,974

MULTI-SUBJECT MULTI-CAMERA TRACKING FOR HIGH-DENSITY ENVIRONMENTS

Non-Final OA §102

Filed

Mar 27, 2024

Examiner

GARCIA, SANTIAGO

Art Unit

2673

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

1 (Non-Final)

Interview Optional

— +12.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1015 resolved cases, 2023–2026

Examiner Intelligence

GARCIA, SANTIAGO View full profile →

Grants 88% — above average

Career Allow Rate

895 granted / 1015 resolved

+26.2% vs TC avg

Moderate +13% lift

Without

With

+12.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

21 currently pending

Career history

1036

Total Applications

across all art units

Statute-Specific Performance

§101

7.6%

-32.4% vs TC avg

§103

60.2%

+20.2% vs TC avg

§102

18.7%

-21.3% vs TC avg

§112

2.3%

-37.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1015 resolved cases

Office Action

§102

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 04/19/2024, have being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Terekhov (US 2018/0218515).
As per claims 1, 15, and 20 Terekhov teaches, one or more processors and system and method comprising processing circuitry to: cluster a plurality of representations corresponding to a behavior of one or more subjects within an environment based on streaming data corresponding to a first time frame to generate one or more clusters based at least on a similarity between representations of the plurality of representations (Terekhov, ¶[0036] “ a human that is classified as belonging to a specific category; a human plus a specific kind of object; a human plus a specific kind of environment; a group of humans; a group of humans identified or named; a group of humans classified as belonging to a specific category; a group of human beings plus a specific kind of object; a group of human beings plus a specific kind of environment; any relationship between one or more humans, or parts of humans, and one or more objects; any relationship between one or more humans, or parts of humans, and one or more environments; any object which is linked to another object.”  By classifying this represents a similarity between representations of the plurality of representations and ¶[0131] “An event is a set of data, which may represent the activity of a track, group or detection for a specific moment in time or specific time period or specific set of time frames.” represents corresponding to a time frame and ¶[0056] complex events describe, classify or infer a person's behavior or intent or needs” This represents representations corresponding to a behavior), wherein the streaming data includes behavior data of one or more subjects within the environment generated using a plurality of optical sensors (Terekhov, ¶[0073] “the computer vision process is implemented by a computer vision engine localized in a sensor or SoC or GPU. [0074] the computer vision process is implemented by a computer vision engine localized in a camera or other device including a sensor, or in a hub or gateway connected to that device, or in a remote server, or distributed across any permutation of these. [0075] the computer vision process is implemented by a computer vision engine that is localized in one or more of the following: (a) an edge layer that processes raw sensor data;” This represents the environment generated using a plurality of optical sensors since there is vision in these sensors therefore optical and ¶ [0107] “A method is provided for analyzing a captured image from a scene from one or several image sensors”); determine, based at least on trajectory tracking data for the one or more subjects, whether individual clusters of the one or more clusters are associated with a respective prior behavior state of one or more first behavior states (Terekhov, ¶[0019]-[0031]” if the trajectory or pose of several humans, as defined in a track record for each human, is sufficiently similar, then a track record for a group consisting of those humans, is automatically created” the track record automatically created represents a prior behavior by keeping a record of such behavior); assign one or more second behavior states to the one or more clusters based on at least one of the individual clusters of the one or more clusters not having an associated behavior state from the one or more first behavior states (Terekhov, ¶[0034]- [0040] “an event is created when a new track record is made [0041] an event is created when a new track record is made for a higher-level object because one or more track records for lower level objects are created” by creating a new track record); and update at least one of the one or more first behavior states or the one or more second behavior states based on the plurality of representations to generate updated behavior states (Terekhov, ¶[0040] “an event is created when a new track record is made” this represents updating the new behavior, and this new behavior then gets recorded).

As per claim 2, Terekhov teaches, the one or more processors of claim 1, wherein the one or more second behavior states are assigned to the one or more clusters using a matching algorithm, and wherein the processing circuitry is further to: initialize a new behavior state for at least one cluster of the one or more clusters based on the at least one cluster not being assigned to at least one of the one or more second behavior states by the matching algorithm (Terekhov, ¶[0107] “ A detection algorithm is applied to the image in order to detect one or several objects and/or one or several object features. A grouping algorithm is then applied in order to group the one or several detections (or ‘detects’) into a group or a cluster. A tracking algorithm is then used to track the detections or group of detections and to generate one or several ‘Track Records’, wherein a ‘Track Record’ (TR) represents one or several events based on the behavior of the detections or group of detections.” This represents second behavior states are assigned to the one or more clusters using a matching algorithm, and wherein the processing circuitry is further to: initialize a new behavior state for at least one cluster of the one or more clusters based on the at least one cluster not being assigned to at least one of the one or more second behavior states by the matching algorithm, as when there is a new behavior or feature this new record gets recorded with algorithm).

As per claim 3, Terekhov teaches, the one or more processors of claim 1, wherein the behavior data comprises at one or both of appearance data and spatiotemporal data represented by the plurality of representations (Terekhov, ¶ [0114] “One or more of the following may be recorded within a TR: trajectory, acceleration, pose, gesture, appearance, disappearance, and identity.” Data of appearance data).

As per claim 4, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to: generate the plurality of representations based on processing individual data feeds from at least one individual sensor of the plurality of optical sensors (Terekhov, ¶[0167] “The image sensor module may receive a video stream and analyses the video on a frame-by-frame basis, and may subsequently continuously create events that can then be pushed into other smart or connected devices as specific commands.”  This would be an individual feed, as seen in fig.1).

As per claims 5 and 16, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to: generate the plurality of representations using a machine learning model trained to detect one or more characteristics representing the one or more subjects based on the streaming data (Terekhov, ¶ [004] “Other techniques are based on machine learning techniques such as Convolutional Neural Networks (CNN). However, such techniques involve training a system with a large number of examples based on the objects that the system needs to analyses.”  This represents generate the plurality of representations using a machine learning model trained to detect one or more characteristics representing the one or more subjects based on the streaming data provided by the sensors).
As per claim 6, Terekhov teaches, the one or more processors of claim 1, wherein the streaming data comprises synchronized optical image streaming data that includes individual image feeds from the plurality of optical sensors, wherein at least two optical sensors of the plurality of optical sensors are synchronized to capture the individual image feeds at the same time (Terekhov, ¶ [0118] “One or several image sensors may be used to survey a scene. The sensors may include one or more of the following sensors: sensors operating in the visible spectra, sensors operating in infra-red spectra, thermal sensors, ultra sonic sensors, sensors operating in non-visible spectra and sensors for acceleration or movement detection.” several of the sensors listed in paragraph [0118] are considered optical sensors, as they operate by detecting light or electromagnetic radiation within or near the visible spectrum, and each of these sensors would individually stream).
As per claim 7, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to: map the plurality of representations to a global image coordinate system based at least on camera calibration parameters associated with the plurality of optical sensors ( Terekhov, ¶[0171] “Evacuation systems: by understanding precisely locations and people's presence,” This represents a global image coordinate by having the precisely locations).
As per claim 8, Terekhov teaches, the one or more processors of claim 1, wherein the plurality of representations individually represent one or both of appearance data and spatiotemporal data for a respective subject of the one or more subjects (Terekhov, ¶ [0114] “One or more of the following may be recorded within a TR: trajectory, acceleration, pose, gesture, appearance, disappearance, and identity.” Data of appearance data).
As per claim 9, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to: associate individual behavior states from at least one of the one or more first behavior states or the one or more second behavior states with a global identifier (ID) (Terekhov, ¶[0067] “[0067] the computer vision process is used to monitor the behavior of one or more persons” individual behaviors get monitored). 
As per claims 10 and 17, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to cluster the plurality of representations based on a hierarchical clustering process that performs operations to: determine a subject prediction number based on a first clustering process that clusters the plurality of representations based at least on a similarity of representations from the plurality of representations; and apply, to the plurality of representations, a second clustering process that is constrained to cluster the plurality of representations based at least on the subject prediction number (Terekhov, ¶[0146] “. A procedure may be used to predict an area for the hand position with a high probability. From the knowledge that a human body has a limited number of angles of freedom, by drawing lines and angles near the joints of the human body, it is possible to predict the shape, size and area for the hand positions. The right hand position is therefore predicted by drawing the lines 1204 and 1205, whereas the left hand is predicted by drawing the lines 1206 and 1207. Some corrections when drawing the lines may be needed. The shape of an area may be determined by either tracing a curve, circle, or eclipse using the trajectory of the hand, or by using a circle sector area or an ellipse sector area.”  This represents the prediction number by predicting an area for the hand position with a high probability for example).
As per claims 11 and 18, Terekhov teaches, the one or more processors of claim 2, wherein the matching algorithm comprises at least one of: an iterative matching combinatorial optimization algorithm; an algorithm that solves an assignment problem by matching agents to tasks; a Hungarian matching algorithm; a Kuhn-Munkres algorithm; or a Munkres assignment algorithm (Terekhov, ¶[0107] “ A detection algorithm is applied to the image in order to detect one or several objects and/or one or several object features” This is equivalent to “an iterative matching combinatorial optimization algorithm” ).
As per claim 12, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to apply a linear programming algorithm to implement overlapping behavior suppression to avoid associating multiple clusters of the one or more clusters with more than one of the one or more subjects (Terekhov, ¶[0107] “A grouping algorithm is then applied in order to group the one or several detections (or ‘detects’) into a group or a cluster.” This represents to apply a linear programming algorithm to implement overlapping).

As per claim 13, Terekhov teaches, the one or more processors of claim 1, wherein the processing circuitry is further to cause a display of a computer vision-based view of the one or more subjects for at least a portion of the environment based at least on the updated behavior states (Terekhov, ¶[0071] “the computer vision process outputs no video or still images for display on a monitor or screen and from which an individual can be identified. “   ).
As per claims 14 and 19, Terekhov teaches, the one or more processors of claim 1, wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for three-dimensional assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources (Terekhov, ¶[003] “The ability to understand people's presence and behaviors offers a plethora of applications in a wide range of domain such as in home automation”   this a control system for an autonomous or semi-autonomous machine by controlling the home automation as applicant discloses “at least one of”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SANTIAGO GARCIA whose telephone number is (571)270-5182. The examiner can normally be reached Monday-Friday 9:30am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571) 272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SANTIAGO GARCIA/Primary Examiner, Art Unit 2673                                                                                                                                                                                                        



/SG/

Read full office action

Prosecution Timeline

Mar 27, 2024

Application Filed

Feb 02, 2026

Non-Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/171,442

Patent 12599912

Method for controlling and/or regulating the feed of material to be processed to a crushing and/or screening plant of a material processing device

2y 5m to grant Granted Apr 14, 2026

18/208,945

Patent 12598596

CHANNEL SELECTION BASED ON MULTI-HOP NEIGHBORING-ACCESS-POINT FEEDBACK

2y 5m to grant Granted Apr 07, 2026

17/806,958

Patent 12587818

DEVICE AND ROLE BASED USER AUTHENTICATION

2y 5m to grant Granted Mar 24, 2026

17/950,849

Patent 12574708

COMMUNICATION FOR USER EQUIPMENT GROUPS

2y 5m to grant Granted Mar 10, 2026

18/300,185

Patent 12574764

CLIENT COOPERATIVE TROUBLESHOOTING

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

88%

Grant Probability

99%

With Interview (+12.8%)

2y 5m

Median Time to Grant

Low

PTA Risk

Based on 1015 resolved cases by this examiner. Grant probability derived from career allow rate.