Last updated: April 19, 2026
Application No. 18/494,442
ARTIFICIAL INTELLIGENCE ENABLED DISTANCE EVENT DETECTION USING IMAGE ANALYSIS

Final Rejection §103
Filed
Oct 25, 2023
Examiner
ABDI, AMARA
Art Unit
2668
Tech Center
2600 — Communications
Assignee
Micron Technology, Inc.
OA Round
2 (Final)
Interview Optional

— -7.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 816 resolved cases, 2023–2026
Examiner Intelligence

ABDI, AMARA View full profile →
Grants 83% — above average
Career Allow Rate
677 granted / 816 resolved
+21.0% vs TC avg
Minimal -8% lift
Without
With
+-7.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
33 currently pending
Career history
849
Total Applications
across all art units
Statute-Specific Performance

§101
9.8%
-30.2% vs TC avg
§103
60.7%
+20.7% vs TC avg
§102
10.2%
-29.8% vs TC avg
§112
10.0%
-30.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 816 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s response to the last office action, filed February 23, 2026 has been entered and made of record. Claims 1, 8, 10, and 20 have been amended. By this amendment, claims 1-24 are now pending in this application.

Response to Arguments
Applicant’s arguments with respect to claims 1-24 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-6, and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Adam, (US-PGPUB 2021/0357649) in view of Nord et al, (US-PGPUB 20210034865)

In regards to claim 1, Adam discloses a system, (100 in Fig. 1), comprising: 
an ingestion component, (Par. 0051, a camera 107 may further include a processor, which  can instruct the transceiver to transmit video feeds to an external or internal computing device 140, [i.e., the  processor 202 in Fig. 2 of the internal/external computing device 140 corresponds to the ingestion component]), including: 
one or more cameras configured to capture image frames, (see at least: Par. 0050-0051, video cameras 107 are arranged to capture video of various areas 110, where each of the cameras 107 may provide a constant video feed of one or more of the areas 110); and 
a first one or more computing components, (Par. 0051, the processor of camera 107), configured to: 
obtain image frames from the one or more cameras, wherein each computing component, from the first one or more computing components, is associated with a respective camera from the one or more cameras, (see at least: Fig. 1, Par. 0051, the processor of the camera 107 can instruct the transceiver to transmit video feeds to an external or internal computing device 140, [i.e., obtaining image frames from the one or more cameras, and each computing component …. is associated with a respective camera from the one or more cameras, “the processor of camera 107 implicitly obtains video feed of one or more of the areas 110, from the one or more cameras 107, and the processor of the camera 107, is implicitly associated with the at least one camera 107]);
an inferencing component, (140 in Figs. 1-2, Par. 0068 [i.e., AI or machine learning social distance measurements]), including: 
a second one or more computing components, (202 in Fig. 2), configured to: 
obtain the image frames from the ingestion component, (see at least: Fig. 2 and/or Fig. 5, and Pars. 0068-69 and/or Pars. 0085-0086, the object count determiner component 104 in Fig. 2 and/or facial recognition component 406 in Fig. 5 meets this limitation); and 
provide the image frames to a computing device 140, (see at least: Par. 0062, the video feeds captured by cameras 107 are passed to a computing device 140); and 
the computing device 140, (the computing device 140 comprises a processor 202) configured to: 
detect objects in the image frames using an artificial intelligence object detection model, (see at least: Par. 0021, the facial recognition component is configured to compare properties of the obtained image of the person of interest to properties of the one or more video frames in the video feed in which the person of interest is identified; and from Par.0052, the system 100 can access a database or other data store of images and use image processing algorithms, facial recognition techniques, and machine learning techniques on a set of images in order to establish what objects in the video frames are likely to represent a person 130, [i.e., detect objects in the image frames, “implicitly by detecting objects that constitute a person 130 from the video feed”, using an artificial intelligence object detection model, “using facial recognition component and machine learning techniques”]); and 
provide modified image frames that include an indication of detected objects depicted in the modified image frames, (see at least: Par. 0052, bounding boxes are placed about identified new objects, and boxes whose dimensions fall below or above certain thresholds or boxes whose position in the video feed changes above a certain threshold can be used to eliminate non-human object; and from Par. 0054, the system 100 includes a video processing pipeline that identifies the bounds of each person in a frame using for example, overlaying box bounding technology onto image frames of the video feed, [i.e., providing modified image frames, “implicit by using overlaying box bounding technology onto image frames of the video feed, (thereafter referenced bounding box image frames)”, that include an indication of detected objects depicted in the modified image frames, “bounding each person in a frame constitute an indication of detected objects depicted in the modified image frames”]);
a post-processing component, (140 in Fig. 1), including: 
a third one or more computing components, (102 in Fig. 2), configured to: 
obtain the modified image frames, (see at least: Par. 0054-0055, the system 100 includes a video processing pipeline that identifies the bounds of each person in a frame using for example, overlaying box bounding technology onto image frames of the video feed, [i.e., obtaining the modified image frames, overlaying box bounding technology onto image frames of the video feed for each person in a frame”]);
compute, for a modified image frame that includes indications of two or more objects, one or more distances between two objects included in the two or more objects based on respective indications associated with the two objects, (see at least: Par. 0054-0055, the algorithm used for distance calculation will convert the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements, by converting the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real physical distance measurements, [i.e., computing, for a modified image frame that includes indications of two or more objects, “the bounding box image frames that implicitly indicates one or more persons in a frame”, one or more distances between two objects included in the two or more objects, “implicitly computing real physical distance measurements between two persons”, based on respective indications associated with the two objects, “using the size of a person bounding box to contribute to the distance calculation between the two persons”]); and 
detect a violation based on a distance, from the one or more distances, satisfying a threshold, (see at least: Par. 0056, the distance determiner component 102 may utilize this information, (the size of the bounding box of the person), to generate the distance alert 112, such as generating a social distance violation alert if the physical distance between two objects violates a minimum distance threshold condition-- e.g., within six feet, [i.e., detecting a violation based on a distance, “generating a social distance violation alert”, from the one or more distances, satisfying a threshold, “if the physical distance between two objects violates a minimum distance threshold condition, (e.g., the physical distance between two objects less than six feet)]).
However, while disclosing that the system 100 uses the machine learning technique, (Par. 0052); Adam does not expressly disclose that the image frames are provided to a graphics processing component for object detection; and the graphics processing component comprising a graphics processing unit (GPU) separate from the one or more second computing components. 
Nord et al discloses that the image frames are provided to a graphics processing component for object detection; and the graphics processing component comprising a graphics processing unit (GPU) separate from the one or more second computing components, (see at least: Par. 0076, the object-detection system 110 may receive and store data reflecting the respective one or more objects of interest; and from Par. 0087, each object-detection model 113a-b is configured to receive a given image of a scanned scene; and from Par. 0091,  GPU 114 of the object-detection system 110 may include one or more graphics processing units that are individually and/or collectively configured, perhaps along with the processor 111, to train object-detection models 113a-b and/or execute such trained object-detection models 113a-b, [i.e., the image frames are provided to a graphics processing component for object detection, “implicit by receiving a given image of a scanned scene by each object-detection model 113a-b, which each object-detection model 113a-b are trained by GPU 114 for implicitly detecting one or more objects in the scene”, and the graphics processing component comprising a graphics processing unit (GPU), (114 in Fig. 1), separate from the one or more second computing components, (processor 111), “the GPU 114 is implicitly separate from processor 111”]).
Adam and Nord are combinable because they are both concerned with bounding boxes-based objects detection. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Adam, to add the graphics processing component 114 that is separate from processor 111, as though by Nord, to the Adam’s learning machine, in order to train one or more object-detection models, (Nord, Par. 0091), to identify objects of interest within an image of a scene, (Nord, Par. 0003).

In regards to claim 5, the combine teaching Adam and Nord as whole discloses the limitations of claim 1.
Adam further discloses wherein the indication of detected objects includes bounding boxes, (Adam, see at least: Par. 0052, bounding boxes are placed about identified new objects), and 
In the other hand, Nord discloses wherein the graphics processing component, to provide the modified image frames, is configured to: insert a bounding box around each detected object depicted in the image frames to generate the modified image frames, (see at least: Par. 0091, the GPU 114 trains the object-detection models 113a-b and/or execute such trained object-detection models 113a-b, and from Par. 0087, each object-detection model 113a-b is configured to receive a given image of a scanned scene, evaluate the given image for one or more objects of interest, and then generate one or more sets of object-detection conclusions for the given image, including … a localization description of the given perceived object of interest within the given image (e.g., the dimensions and/or location of an appropriate bounding box for the given perceived object of interest …); [i.e., insert a bounding box around each detected object depicted in the image frames, “implicit by defining a bounding box for the perceived object of interest within the image, based on training execute such trained object-detection models 113a-b by GPU 114”, to generate the modified image frames, “implicitly generating images of humans with respective bounding boxes outlining the human objects”]).

In regards to claim 6, the combine teaching Adam and Nord as whole discloses the limitations of claim 5.
Adam further discloses wherein the third one or more computing components, to compute the one or more distances, are configured to: compute the one or more distances between the two objects based on a distance between respective bounding boxes of the two objects as depicted in the modified image frame, (see at least: Par. 0054-0055, the system 100 includes a video processing pipeline that identifies the bounds of each person in a frame using for example, overlaying box bounding technology onto image frames of the video feed, and the algorithm used for distance calculation will convert the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements, [i.e., computing distances between the two objects based on a distance between respective bounding boxes of the two objects as depicted in the modified image frame, “convert the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements”]).

In regards to claim 8, the combine teaching Adam and Nord as whole discloses the limitations of claim 1.
Adam further discloses wherein the information associated with the violation includes at least one of: an indication of a camera, from the one or more cameras, that captured image data used to detect the violation, a time associated with the violation, a date associated with the violation, a location associated with the violation, or a duration of the violation, (see at least: Par. 0070, the video feeds include video frame information that permits time-stamp identification of the status of an area at a particular time; and from Par. 0083, set time period, “time associated with the violation”).

In regards to claim 9, the combine teaching Adam and Nord as whole discloses the limitations of claim 1.
Adam further discloses wherein the third one or more computing components, to detect the violation, are configured to: determine a quantity of modified image frames, over a time window, that are associated with distances, from the one or more distances, that satisfy the threshold; and detect the violation based on the quantity satisfying a violation threshold, (Par. 0054, video processing pipeline that identifies the bounds of each person in a frame using for example, overlaying box bounding technology onto image frames of the video feed, and this information is passed on to the next stage in the video processing pipeline for calculation of the distance between each person and its nearest neighbor; and from Par. 0056, the object count determiner component 104 may implement a presentation layer to convey information in the distance alert 112, (i.e., a social distance violation alert if the physical distance between two objects violates a minimum distance threshold condition), such that the object count alert 114 that identifies a number of people in a zone, or a maximum occupancy alert if the object count meets or violates a maximum occupancy threshold. For example, the maximum occupancy threshold may be four persons, [i.e., determining a quantity of modified image frames, over a time window, “identifying a number of people implicitly using the bounding boxes in a zone over set period of time”, that are associated with distances, from the one or more distances, that satisfy the threshold, “implicit by conveying information in the distance alert 112, (if the physical distance between two people violates a minimum distance threshold condition), by the object count determiner component 104 that counts the number of people over the set period of time”; and detect the violation based on the quantity satisfying a violation threshold, “maximum occupancy alert if the object count meets or violates a maximum occupancy threshold”).

Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Adam and Nord, as applied to claim 1 above; and further in view of Bendtson, (US-PGPUB 20210337133)

In regards to claim 2, the combine teaching Adam and Nord as whole discloses the limitations of claim 1.
Adam further discloses a monitoring component including: a fourth one or more computing components configured to: obtain, from the post-processing component, information associated with the violation and image data associated with the violation, (see at least: Par. 0057, the system 100 would determine that the people 130 at area 110A are violating social distancing rules; and from Fig. 6, and Par. 0093-0096, generate a list of each identified and unidentified persons including their name and a captured image of each identified and unidentified persons, based on the captured images, [i.e., the third-party, such as law enforcement, implicitly obtains information associated with the violation, “list of each identified and unidentified persons violating social distancing rules”, and image data associated with the violation, “captured image of each identified and unidentified persons “]; and 
provide a user interface for display that includes indications of violations, including the violation, locations associated with respective violations, and a frequency of violations associated with respective locations, (Par. 0083, searches for additional persons that contacted a POI 430A for a set time period … for the previous 24 hours; and from Par. 0091-0092,  identifying what persons came into contact with a POI, as well one or more video frames in the one or more video feeds in which the person of interest is identified using facial recognition; and from Par. 0093-0096, transmit the list, including name and a captured image of the persons, that violating social distancing rules, to appropriate authority such as law enforcement in charge of the surveilled area, [i.e., implicitly providing a user interface for display that includes indications of violations, including the violation, “violating social distancing rules”]).
The combine teaching Adam and Nord as whole does not expressly disclose provide a user interface for display that includes locations associated with respective violations, and a frequency of violations associated with respective locations.

However, Bendtson discloses providing a user interface for display that includes locations associated with respective violations, (see at least: Par. 0004, the video analytics software typically attach metadata to the video stream indicating a time and position in the frame where the objects or activity have been detected. Therefore, a heatmap can be displayed based on the metadata indicating object or activity detection, [i.e., providing a user interface for display that includes locations associated with respective violations, “the color scale of the heatmap indicate the cumulative amount of the activity detected over the time period, and can be displayed based on the metadata, to indicate the time and the position in the frame where the objects or activity have been detected]); and a frequency of violations associated with respective locations,  (see at least: Par. 0003, 0036, a user using a user device 160, “mobile device”, can select to display a heatmap of activity over a specified period of time, such as the heatmap representing video data can be displayed overlaid onto an image from a video camera, and will use a color scale to represent certain objects or activity detected in the video data from that camera over a specified period of time, where the color scale indicates the cumulative amount of the activity detected over the time period. [i.e., providing a user interface, “user device”, for display that includes a frequency of violations associated with respective locations, “implicit by using the color scale to represent certain objects or activity detected in the video data, indicating the cumulative amount of the activity detected over the time period, and based on the metadata, indicating the position in the frame where the objects or activity have been detected]).

Adam, Nord, and Bendtson are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam and Nord, to use the color scale to represent certain objects or activity detected in the video data, as though by Bendtson, in order to indicate the cumulative amount of the activity detected over the time period, (Par. 0003).

In regards to claim 3, the combine teaching Adam, Nord, and Bendtson as whole discloses the limitations of claim 2.
Adam further discloses wherein the user interface includes an indication of violations over time, (see at least: Par. 0083, the system 400 searches the video feeds gathered by cameras 407 for additional persons that contacted a POI 430A for a set time period; and from Par. 0093-0096, transmit the list, including name and a captured image of the persons, that violating social distancing rules, to appropriate authority such as law enforcement in charge of the surveilled area, [i.e., wherein the user interface, “appropriate authority”, includes an indication of violations associated with the respective locations over time, “persons violating social distancing rules for a set time period”]).
In the other hand, Bendtson discloses wherein the user interface includes an indication of violations associated with the respective locations over time, (see at least: Par. 0004, the video analytics software typically attach metadata to the video stream indicating a time and position in the frame where the objects or activity have been detected. Therefore, a heatmap can be displayed based on the metadata indicating object or activity detection, [i.e., the user interface, “user device”, includes an indication of violations associated with the respective locations over time, “implicit by attaching metadata to the video stream indicating a time and position in the frame where the objects or activity have been detected).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Adam, and Nord, as applied to claim 1 above; and further in view of Jones, SR. et al, (US-PGPUB 20220076554), (thereafter referenced Jones)
The combine teaching Adam and Nord as whole discloses the limitations of the claim 1.
The combine teaching Adam and Nord as whole does not expressly disclose wherein the third one or more computing components are further configured to: process the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, and wherein the third one or more computing components, to compute the one or more distances, are configured to: compute the one or more distances using the top-down views of the modified image frames.
Jones discloses wherein the third one or more computing components are further configured to: process the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, (see at least: Par. 0006, pipeline is computing the transform, (calibration), that morphs the perspective view of the input video into a bird's-eye (top-down) view, involving selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view. Further, Par. 0007-0008, the pipeline involves applying a human detector to the perspective views to draw a bounding box around each person, and given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view, [i.e., process the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, “given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view”]); and 
wherein the third one or more computing components, to compute the one or more distances, are configured to: compute the one or more distances using the top-down views of the modified image frames, (see at least: Par. 0008, applying the transformation for the ground plane from the calibration step to the bottom-center point of each person's bounding box, resulting in their position in the bird's eye view and compute the bird's eye view distance between every pair of people and scale the distances by the scaling factor estimated from calibration, to highlight people whose distance is below the minimum acceptable distance, [i.e., compute the one or more distances using the top-down views of the modified image frames, “compute the bird's eye view distance between every pair of people, using the bounding box in the bird's eye view”]).
Adam, Nord, and Jones are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam and Nord, to apply the transform, (calibration), of the perspective view of the input video into a bird's-eye (top-down) view, as though by Jones, to the Adam bounding boxes around people, in order to compute the bird's eye view distance between every pair of people and highlight people whose distance is below the minimum acceptable distance, (Jones, Par. 0008).


Claims 10-11, 13-14, 18, 20-22, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Adam, (US-PGPUB 2021/0357649), in view of Nord et al, (US-PGPUB 20210034865); and further in view of Jones, SR. et al, (US-PGPUB 20220076554), (thereafter referenced Jones)

In regards to claim 10, Adam discloses a method, comprising: 
obtaining, by an image processing system and from one or more cameras, a stream of image frames, (see at least: Par. 0050-0051, video cameras 107 are arranged to capture video of various areas 110, where each of the cameras 107 may provide a constant video feed of one or more of the areas 110, [i.e., implicitly obtaining stream of image frames by one or more video cameras 107]); 

detecting, by machine learning techniques on a set of images in order to establish what objects in the video frames are likely to represent a person 130, [i.e., detecting objects in the image frames, “implicit by detecting the objects that constitute a person 130 from the video feed”, using an artificial intelligence object detection model, “using facial component and machine learning techniques”]); and 
generating, by the image processing system, one or more modified images of the one or more image frames, the one or more modified images including indications of detected objects depicted in the one or more image frames, (see at least: Par. 0052, bounding boxes are placed about identified new objects, and boxes whose dimensions fall below or above certain thresholds or boxes whose position in the video feed changes above a certain threshold can be used to eliminate non-human object; and from Par. 0054, the system 100 includes a video processing pipeline that identifies the bounds of each person in a frame using for example, overlaying box bounding technology onto image frames of the video feed, [i.e., generating, by the image processing system, one or more modified images of the one or more image frames, one or more modified images, “implicit by using overlaying box bounding technology onto image frames of the video feed, (thereafter referenced bounding box image frames)”, the one or more modified images including indications of detected objects depicted in the one or more image frames, “bounding each person in a frame constitute an indication of detected objects depicted in the modified image frames”]);



calculating, by the image processing system, distances between one or more pairs of objects detected in the one or more modified images, the distances being calculated using the indications outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements, by converting the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real physical distance measurements, [i.e., calculating, by the image processing system, distances between one or more pairs of objects detected in the one or more modified images, “real physical distance measurements”, the distances being calculated using the indications, “bounding boxes of each person 130 and neighboring persons”]); 
detecting, by the image processing system, one or more events based on one or more distances, from the distances, satisfying a threshold, (see at least: Par. 0056, the distance determiner component 102 and/or the object count determiner component 104 may implement a presentation layer to convey information in the distance alert 112, such as, but not limited to, an average distance between all people in a zone, a closest distance of any person to another person, or a social distance violation alert if the physical distance between two objects violates a minimum distance threshold condition—e.g., within six feet, [i.e., detecting, by the image processing system, one or more events, “social distance violation”, based on one or more distances, “one or more physical distances between two objects” from the distances, satisfying a threshold, “violates a minimum distance threshold condition—e.g., within six feet”]); and 
providing, by the image processing system, a user interface for display that indicates the one or more events detected based on the stream of image frames, (see at least: Par. 0076, when calculated distance not satisfying the distance threshold upon comparing each calculated distance to the distance threshold, the processor may be configured to generate an alert that is a notification transmitted to appropriate authorities over a wireless network (e.g., a security guard, the police), it can be an audio or visual alert generated and broadcast locally by a visual display, [i.e., providing, by the image processing system, a user interface, “appropriate authorities”, for display that indicates the one or more events detected based on the stream of image frames, “generate a visual alert by a visual display”]).
Adam does not expressly disclose obtaining, by an inferencing component of the image processing system, one or more image frames included in the stream of image frames, wherein the inferencing component comprising one or more computing components; detecting, by a graphic processing component of the image processing system and using an object detection model, one or more objects depicted in one or more image frames included in the stream of image frames, wherein the graphics processing unit comprises a graphics processing unit (GPU) separate from the one or more computing components; processing, by the image processing system, the one or more modified images to transform a perspective of the one or more modified images to a uniform view; and calculating, by the image processing system, distances between one or more pairs of objects detected in the one or more modified images, the distances being calculated using the uniform view.
Nord discloses obtaining, by an inferencing component of the image processing system, one or more image frames included in the stream of image frames, wherein the inferencing component comprising one or more computing components, (see at least: Par. 0080, a detection device 122 may transmit image data for the captured one or more images to the object-detection system 110 via one or more of the links 130, [i.e., obtaining, by an inferencing component, (110 in Fig. 1, “i.e., the machine learning models 113a and/or 113b”), of the image processing system, (100 in Fig. 1), one or more image frames included in the stream of image frames, “the object-detection system 110 implicitly receives or obtained the image data, included in the stream of image frames, transmitted from the detection device 122, wherein the inferencing component, (110 in Figs 1-2), comprising one or more computing components, (111 in Fig. 2)]). Nord further discloses detecting, by a graphic processing component of the image processing system and using an object detection model, one or more objects depicted in one or more image frames included in the stream of image frames, wherein the graphics processing unit comprises a graphics processing unit (GPU) separate from the one or more computing components, (see at least: Par. 0091,  GPU 114 of the object-detection system 110 may include one or more graphics processing units that are individually and/or collectively configured, perhaps along with the processor 111, to train object-detection models 113a-b and/or execute such trained object-detection models 113a-b, [i.e., detecting, by a graphic processing component, (114 in Fig. 2), of the image processing system, (100 in Fig. 1), and using an object detection model, (113a and/or 113b in Fig. 2), one or more objects depicted in one or more image frames included in the stream of image frames, “each object-detection model 113a-b are trained by GPU 114 for implicitly detecting one or more objects in the scene”, wherein the graphics processing unit comprises a graphics processing unit (GPU), (GPU 114 in Fig. 1), separate from the one or more computing components, (111 in Fig. 1), “as shown in Fig. 2, the GPU 114 of the object-detection system 110 is separate from the processor 111”]).
Adam and Nord are combinable because they are both concerned with bounding boxes-based objects detection. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Adam, to add the graphics processing component 114 that is separate from processor 111, as though by Nord, to the Adam’s learning machine, in order to train one or more object-detection models, (Nord, Par. 0091), to identify objects of interest within an image of a scene, (Nord, Par. 0003).
The combine teaching Adam and Nord as whole does not expressly disclose processing, by the image processing system, the one or more modified images to transform a perspective of the one or more modified images to a uniform view; and calculating, by the image processing system, distances between one or more pairs of objects detected in the one or more modified images, the distances being calculated using the uniform view.
However, Jones discloses processing, by the image processing system, the one or more modified images to transform a perspective of the one or more modified images to a uniform view, (see at least: Par. 0006, pipeline is computing the transform, (calibration), that morphs the perspective view of the input video into a bird's-eye (top-down) view, involving selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view. Further, Par. 0007-0008, the pipeline involves applying a human detector to the perspective views to draw a bounding box around each person, and given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view, [i.e., processing, by the image processing system, the one or more modified images, “bounding box for each person”, to transform a perspective of the one or more modified images to a uniform view, “morphing the perspective view of the input video into a bird's-eye, using the points of corners bounding boxes”. Note that a bird's-eye view corresponds to the uniform view]); and
calculating, by the image processing system, distances between one or more pairs of objects detected in the one or more modified images, the distances being calculated using the uniform view, (see at least: Par. 0008, applying the transformation for the ground plane from the calibration step to the bottom-center point of each person's bounding box, resulting in their position in the bird's eye view and compute the bird's eye view distance between every pair of people and scale the distances by the scaling factor estimated from calibration, to highlight people whose distance is below the minimum acceptable distance, [i.e., calculating distances between one or more pairs of objects, “compute the bird's eye view distance between every pair of people”, detected in the one or more modified images, “implicitly detected in the bounding box in the bird's eye view”, the distances being calculated using the uniform view, “the bird's eye view distance”]).
Adam, Nord, and Jones are combinable because they are both concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam and Nord, to apply the transform, (calibration), of the perspective view of the input video into a bird's-eye (top-down) view, as though by Jones, to the Adam bounding boxes around people, in order to compute the bird's eye view distance between every pair of people and highlight people whose distance is below the minimum acceptable distance, (Jones, Par. 0008).

In regards to claim 11, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 10.
Adam further discloses wherein the image processing system is associated with a decoupled cloud-based system architecture, (see at least: Par. 0062, the computing device 140 is a cloud-based structure and may be located remote from the environment 111, “remote or decoupled cloud-based system”).

In regards to claim 13, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 10.
Jones further discloses wherein processing the one or more modified images comprises: obtaining, for a view of a camera of the one or more cameras, a set of transform reference points associated with transforming the view to the uniform view, (see at least: Par. 0006, as the input frames are monocular (taken from a single camera), the simplest calibration method involves selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view, [i.e., obtaining, for a view of a camera of the one or more cameras, a set of transform reference points associated with transforming the view to the uniform view, “selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view”]); and 
transforming modified images, from the one or more modified images, that are associated with the camera to the uniform view using the set of transform reference points, (see at least: Par. 0007-0008, applying a human detector to the perspective views to draw a bounding box around each student; and apply said transformation to the bottom-center point of each person's bounding box, resulting in their position in the bird's eye view, [i.e., transforming modified images, from the one or more modified images, that are associated with the camera to the uniform view, “apply said transformation to the bottom-center point of each person's bounding box”, using the set of transform reference points, “implicitly using the points at corners of a rectangle in the bird's-eye view”]).

In regards to claim 14, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 10.
Jones further discloses wherein the uniform view is a top-down view, (see at least: Par. 0007-0008, “the bird's-eye view”, corresponds to the top-down view).

In regards to claim 18, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 10.
Adam further discloses wherein the one or more objects include at least one of: a person, a vehicle, a machine, or a device, (Adam, see at least: Fig. 1, Par. 0050, “one or more persons 130”.


Regarding claim 20, claim 20 recites substantially similar limitations as set forth in claim 10. As such, claim 20 is rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “an apparatus”. However, Adam discloses the “apparatus”, (Adam, see at least: Fig. 1, and Par. 0050, “system 100”). Note that the bounding boxes, in claim 20, corresponds to the “one or more modified images”, in claim 10.

In regards to claim 21, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 20.
Adam further discloses means for providing a report indicating the information associated with the event, wherein the information includes at least one of: an indication of a camera, from the set of cameras, that captured image data used to detect the event, a time associated with the event, a date associated with the event, a location associated with the event, or a duration of the event, (see at least: Par. 0075-0076, when the calculated distance is not satisfying the distance threshold upon comparing each calculated distance to the distance threshold, the processor may be configured to generate an alert that is a visual alert generated and broadcast locally to an appropriate authority, by a visual display, [i.e., means for providing a report indicating the information associated with the event, “implicit by transmitting an alert that is a notification to appropriate authorities”, wherein the information includes at least one of: … a date associated with the event, “distance not satisfying the distance threshold, indicating a distance violation”, …]).

In regards to claim 22, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 20.
Adam further discloses wherein the means for calculating the distances between two or more objects depicted in the one or more images comprises: means for converting the pixel distances between the respective bounding boxes associated with the two or more objects using a ratio value that is based on a measurement of a reference object included in the one or more images, (Par. 0053, an image processing algorithm is run against the frame with the reference objects included, and generates a data file providing ratios that can be used by a run time solution to convert pixel distance to real, physical distance, wherein the reference objects may be images of persons having known heights and having known pixel distance conversions at particular distances from the cameras 107; and from Par. 0054, algorithm used for distance calculation will convert the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements, as discussed herein, [i.e., means for converting the pixel distances between the respective bounding boxes associated with the two or more objects, “computing a pixel distance between an outer bounds of the first bounding box associated with a first object, of the two objects, and an outer bounds of the second bounding box associated with a second object of the two objects”, using a ratio value that is based on a measurement of a reference object included in the one or more images, “based on ratio provided by the generated data that implicitly have images of POIs ”]).

In regards to claim 25, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 20.
Adam further discloses wherein the means for detecting the one or more objects includes an artificial intelligence object detection model, (Adam, see at least: Par. 0021, the facial recognition component is configured to compare properties of the obtained image of the person of interest to properties of the one or more video frames in the video feed in which the person of interest is identified; and from Par.0052, the system 100 can access a database or other data store of images and use image processing algorithms, facial recognition techniques, and machine learning techniques on a set of images in order to establish what objects in the video frames are likely to represent a person 130, [i.e., detecting the one or more objects, “implicitly by detecting objects that constitute a person 130 from the video feed”, including an artificial intelligence object detection model, “using facial recognition component and machine learning techniques”]).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Adam, Nord, and Jones, as applied to claim 10 above; and further in view of Farber, (US-PGPUB 20220019810)
The combine teaching Adam, Nord, and Jones as whole discloses the 
limitations of claim 10
The combine teaching Adam, Nord, and Jones as whole does not expressly disclose wherein detecting the one or more events comprises: detecting a percentage of image frames, from the stream of image frames, over a time window that are associated with detected events; and detecting an event based on the percentage of the image frames satisfying an event threshold.
Farber discloses wherein detecting the one or more events comprises: detecting a percentage of image frames, from the stream of image frames, over a time window that are associated with detected events, (see at least: Par. 0054, the camera 110 may tag or otherwise associate metadata with at least one image frame captured by the camera 110, where the metadata may include, as examples, the time the image was captured, the duration of a recorded event, and the pixel change percentage between image frames, [i.e.,  detecting a percentage of image frames, from the stream of image frames, “implicit by detecting the pixel change percentage between image frames”, over a time window that are associated with detected events, “the duration of a recorded event”]); and detecting an event based on the percentage of the image frames satisfying an event threshold, (see at least: Par. 0054-0055, the camera 110 may determine the presence and position of objects within the images by running an object detection algorithm on the image frames. Further, Par. 0055, the analysis 504 may include extracting specific features from the pixels of the images captured by the camera 100 and metadata from the images; and from Par. 0056-0057, system 100 may then analyze 506 extracted metadata to determine which condition is present within the images; and may determine which features, “features from the pixels of the images”, are present in at least 80% of the images associated with a condition, [i.e., detecting an event, “implicit by features from the pixels of the images associated with a condition”, based on the percentage of the image frames satisfying an event threshold, “features from the pixels of the images associated with a condition, that are present in at least 80% of the images associated with a condition’]).
Adam, Nord, Jones, and Farber are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam, Nord, and Jones, to use the percentage of pixel change between image frames, as though by Farber, in order to determine the presence and position of objects within the images; and track the position of any detected objects across the image frames, (Farber, Par. 0054)

Claims 19 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Adam, Nord, and Jones, as applied to claims 10 and 20 above; and further in view of Bendtson et al, (US-PGPUB 20210337133)

In regards to claim 19, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 10.
Adam further discloses wherein the user interface includes information associated with events, including the one or more events, captured by all cameras included in the one or more cameras, (see at least: Par. 0057, the system 100 would determine that the people 130 at area 110A are violating social distancing rules, based on monitoring an environment, using the video stream; and from Par. 0076, when calculated distance not  satisfying the distance threshold upon comparing each calculated distance to the distance threshold, the processor may be configured to generate an alert that is a visual alert generated and broadcast locally to an appropriate authorities, by a visual display, [i.e., wherein the user interface includes information associated with events, including the one or more events, “violating social distancing rules”, captured by all cameras included in the one or more cameras, “implicit by monitoring the environment using the video feed”]).
The combine teaching Adam, Nord, and Jones as whole does not expressly disclose wherein the user interface includes an indication of a frequency of events over time for respective cameras included in the one or more cameras.
However, Bendtson discloses wherein the user interface includes a color scale indicating a frequency of events, including the events, associated with respective cameras, from the set of cameras, over time, (see at least: Par. 0003, 0036, a user using a user device 160, “mobile device”, can select to display a heatmap of activity over a specified period of time, such as the heatmap representing video data can be displayed overlaid onto an image from a video camera, and will use a color scale to represent certain objects or activity detected in the video data from that camera over a specified period of time, where the color scale indicates the cumulative amount of the activity detected over the time period, [i.e., wherein the user interface, “user device”, includes an indication of a frequency of events over time for respective cameras included in the one or more cameras, ““using a color scale to represent certain objects or activity detected in the video data, to indicate the cumulative amount of the activity detected over the time period”]).
Adam, Nord, Jones, and Bendtson are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam, Nord, and Jones, to use the color scale to represent certain objects or activity detected in the video data, as though by Bendtson, in order to indicate the cumulative amount of the activity detected over the time period, (Par. 0003).

In regards to claim 23, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 20.
The combine teaching Adam, Nord, and Jones as whole does not expressly disclose wherein the user interface includes a color scale indicating a frequency of events, including the events, associated with respective cameras, from the set of cameras, over time.
However, Bendtson discloses wherein the user interface includes a color scale indicating a frequency of events, including the events, associated with respective cameras, from the set of cameras, over time, (see at least: Par. 0003, 0036, a user using a user device 160, “mobile device”, can select to display a heatmap of activity over a specified period of time, such as the heatmap representing video data can be displayed overlaid onto an image from a video camera, and will use a color scale to represent certain objects or activity detected in the video data from that camera over a specified period of time, where the color scale indicates the cumulative amount of the activity detected over the time period, [i.e., the user interface, “user device”, includes a color scale indicating a frequency of events, including the events, associated with respective cameras, from the set of cameras, over time, “using a color scale to represent certain objects or activity detected in the video data, to indicate the cumulative amount of the activity detected over the time period”]).
Adam, Nord, Jones, and Bendtson are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam, Nord, and Jones, to use the color scale to represent certain objects or activity detected in the video data, as though by Bendtson, in order to indicate the cumulative amount of the activity detected over the time period, (Par. 0003).

In regards to claim 24, the combine teaching Adam, Nord, and Jones as whole discloses the limitations of claim 20.
The combine teaching Adam, Nord, and Jones as whole does not expressly disclose wherein the user interface includes an indication of a frequency of events, including the event, over time and with respect to locations corresponding to respective cameras from the set of cameras.
However, Bendtson discloses wherein the user interface includes an indication of a frequency of events, including the event, over time, (see at least: Par. 0003, 0036, “see the rejection of claim 23 above for more details), and with respect to locations corresponding to respective cameras from the set of cameras, (see at least: Par. 0004, the video analytics software typically attach metadata to the video stream indicating a time and position in the frame where the objects or activity have been detected. Therefore, a heatmap can be displayed based on the metadata indicating object or activity detection, [i.e., the user interface includes an indication of a frequency of events, “the color scale indicates the cumulative amount of the activity detected over the time period”, and with respect to locations corresponding to respective cameras from the set of cameras, “the color scale of the heatmap indicate the cumulative amount of the activity detected over the time period, and can be displayed based on the metadata, to indicate the time and the position in the frame where the objects or activity have been detected”])
Adam, Nord, Jones, and Bendtson are combinable because they are all concerned with object tracking. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Adam, Nord, and Jones, to display the metadata within the heatmap, as though by Bendtson, in order to indicate the time and the position in the frame where the objects or activity have been detected, (Bendtson, Par. 0004)

Allowable Subject Matter
Claims 7, and 15-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

With respect to claim 7, the prior art of record, alone or in reasonable combination, does not teach or suggest, the underlined following limitation(s), (in consideration of the claim as a whole):
“computing a vertical pixel distance between a first reference point of a first bounding box associated with a first object, of the two objects, and a second reference point of a second bounding box associated with a second object of the two objects; computing a horizontal pixel distance between the first reference point and the second reference point; and modifying, using a distance ratio, the vertical pixel distance to a vertical distance and the horizontal pixel distance to a horizontal distance; and compute the distance based on the vertical distance and the horizontal distance”.

The relevant prior art of record, Adam, (US-PGPUB 2021/0357649), discloses 
a third one or more computing components, (102 in Fig. 2), configured to: 
obtain the modified image frames, (see at least: Par. 0054-0055, the system 100 includes a video processing pipeline that identifies the bounds of each person in a frame using for example, overlaying box bounding technology onto image frames of the video feed, [i.e., obtaining the modified image frames, overlaying box bounding technology onto image frames of the video feed for each person in a frame”]);
compute, for a modified image frame that includes indications of two or more objects, one or more distances between two objects included in the two or more objects based on respective indications associated with the two objects, (see at least: Par. 0054- 0055, the algorithm used for distance calculation utilizes a size of a person bounding box to contribute to the distance calculation, by converting the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real physical distance measurements, [i.e., computing, for a modified image frame that includes indications of two or more objects, “the bounding box image frames that implicitly indicates one or more persons in a frame”, one or more distances between two objects included in the two or more objects, “implicitly computing real physical distance measurements between two persons”, based on respective indications associated with the two objects, “using the size of a person bounding box to contribute to the distance calculation between the two persons”]); and 
detect a violation based on a distance, from the one or more distances, satisfying a threshold, (see at least: Par. 0056, the distance determiner component 102 may utilize this information, (the size of the bounding box of the person), to generate the distance alert 112, such as generating a social distance violation alert if the physical distance between two objects violates a minimum distance threshold condition-- e.g., within six feet, [i.e., detecting a violation based on a distance, “generating a social distance violation alert”, from the one or more distances, satisfying a threshold, “if the physical distance between two objects violates a minimum distance threshold condition, (e.g., the physical distance between two objects less than six feet)]).
Adam further discloses wherein the third one or more computing components, to compute the one or more distances, are configured to: modify, using a distance ratio, the pixel distance to a real, physical distance, (Par. 0053, an image processing algorithm is run against the frame with the reference objects included, and generates a data file providing ratios that can be used by a run time solution to convert pixel distance to real, physical distance, wherein the reference objects may be images of persons having known heights and having known pixel distance conversions at particular distances from the cameras 107; and from Par. 0054, algorithm used for distance calculation will convert the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements, as discussed herein, [i.e., implicitly computing a pixel distance between an outer bounds of the first bounding box associated with a first object, of the two objects, and an outer bounds of the second bounding box associated with a second object of the two objects; computing pixel distance between the first person and the second person; and modifying, using a distance ratio, the pixel distance to a real, physical distance]).
However, while disclosing computing pixel distance between the first person and the second person; and modifying, using a distance ratio, the pixel distance to a real, physical distance, (Par. 0053); Adam fails to teach or suggest, either alone or in combination with the other cited references, computing a vertical pixel distance between a first reference point of a first bounding box associated with a first object, of the two objects, and a second reference point of a second bounding box associated with a second object of the two objects; computing a horizontal pixel distance between the first reference point and the second reference point; and modifying, using a distance ratio, the vertical pixel distance to a vertical distance and the horizontal pixel distance to a horizontal distance; and compute the distance based on the vertical distance and the horizontal distance.

A further prior art of record, Jones et al, (US-PGPUB 20220076554), discloses wherein the third one or more computing components are further configured to: process the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, (see at least: Par. 0006, pipeline is computing the transform, (calibration), that morphs the perspective view of the input video into a bird's-eye (top-down) view, involving selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view. Further, Par. 0007-0008, the pipeline involves applying a human detector to the perspective views to draw a bounding box around each person, and given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view, [i.e., process the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, “given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view”]); and 
wherein the third one or more computing components, to compute the one or more distances, are configured to: compute the one or more distances using the top-down views of the modified image frames, (see at least: Par. 0008, applying the transformation for the ground plane from the calibration step to the bottom-center point of each person's bounding box, resulting in their position in the bird's eye view and compute the bird's eye view distance between every pair of people and scale the distances by the scaling factor estimated from calibration, to highlight people whose distance is below the minimum acceptable distance, [i.e., compute the one or more distances using the top-down views of the modified image frames, “compute the bird's eye view distance between every pair of people, using the bounding box in the bird's eye view”]).
However, while disclosing drawing a bounding box around each person; and selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view, (Par. 0006-0008); Jones fails to teach or suggest, either alone or in combination with the other cited references, a vertical pixel distance, and a horizontal pixel distance; and computing the distance based on the vertical distance and the horizontal distance

With respect to claim 15, the prior art of record, alone or in reasonable combination, does not teach or suggest, the underlined following limitation(s), (in consideration of the claim as a whole):
calculating, for a pair of objects from the one or more pairs of objects, a first pixel distance between a first indication of a first object depicted in a modified image and a second indication of a second object depicted in the modified image; calculating a second pixel distance between the first indication and the second indication; modifying, using a ratio value, the first pixel distance to a first actual distance and the second pixel distance to a second actual distance; and calculating a distance between the first object and the second object based on the first actual distance and the second actual distance.

The relevant prior art of record, Adam, (US-PGPUB 2021/0357649), discloses an image processing algorithm is run against the frame with the reference objects included, and generates a data file providing ratios that can be used by a run time solution to convert pixel distance to real, physical distance, wherein the reference objects may be images of persons having known heights and having known pixel distance conversions at particular distances from the cameras 107, (Par. 0053); and the algorithm used for distance calculation will convert the Euclidian distance between the outer bounds of the bounding boxes about each person 130 and neighboring persons into real, physical distance measurements, as discussed herein, (Par. 0054), [i.e., implicitly computing a pixel distance between an outer bounds of the first bounding box associated with a first object, of the two objects, and an outer bounds of the second bounding box associated with a second object of the two objects; and modifying, using a distance ratio, the pixel distance to a real, physical distance]).
However, while disclosing computing pixel distance between the first person and the second person; and modifying, using a distance ratio, the pixel distance to a real, physical distance, (Par. 0053); Adam fails to teach or suggest, either alone or in combination with the other cited references, calculating, for a pair of objects from the one or more pairs of objects, a first pixel distance between a first indication of a first object depicted in a modified image and a second indication of a second object depicted in the modified image; calculating a second pixel distance between the first indication and the second indication; modifying, using a ratio value, the first pixel distance to a first actual distance and the second pixel distance to a second actual distance; and calculating a distance between the first object and the second object based on the first actual distance and the second actual distance.

A further prior art of record, Jones et al, (US-PGPUB 20220076554), discloses processing the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, (see at least: Par. 0006, pipeline is computing the transform, (calibration), that morphs the perspective view of the input video into a bird's-eye (top-down) view, involving selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view. Further, Par. 0007-0008, the pipeline involves applying a human detector to the perspective views to draw a bounding box around each person, and given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view, [i.e., process the modified image frames to transform the modified image frames from an angled perspective view to a top-down view, “given the bounding box for each person now, we estimate their (x, y) location in the bird's-eye view”]); and 
computing the one or more distances using the top-down views of the modified image frames, (see at least: Par. 0008, applying the transformation for the ground plane from the calibration step to the bottom-center point of each person's bounding box, resulting in their position in the bird's eye view and compute the bird's eye view distance between every pair of people and scale the distances by the scaling factor estimated from calibration, to highlight people whose distance is below the minimum acceptable distance, [i.e., compute the one or more distances using the top-down views of the modified image frames, “compute the bird's eye view distance between every pair of people, using the bounding box in the bird's eye view”]).
However, while disclosing drawing a bounding box around each person; and selecting four points in the perspective view and mapping them to the corners of a rectangle in the bird's-eye view, (Par. 0006-0008); Jones fails to teach or suggest, either alone or in combination with the other cited references, calculating, for a pair of objects from the one or more pairs of objects, a first pixel distance between a first indication of a first object depicted in a modified image and a second indication of a second object depicted in the modified image; calculating a second pixel distance between the first indication and the second indication; modifying, using a ratio value, the first pixel distance to a first actual distance and the second pixel distance to a second actual distance; and calculating a distance between the first object and the second object based on the first actual distance and the second actual distance.
Regarding claims 16-17, claim 16-17 are allowable in view of their dependency from claim 10.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)272-0273. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        03/18/2026
Read full office action
Prosecution Timeline

Oct 25, 2023
Application Filed
Nov 22, 2025
Non-Final Rejection — §103
Feb 02, 2026
Interview Requested
Feb 19, 2026
Examiner Interview Summary
Feb 19, 2026
Applicant Interview (Telephonic)
Feb 23, 2026
Response Filed
Mar 18, 2026
Final Rejection — §103
Apr 14, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/569,692
Patent 12602822
METHOD DEVICE AND STORAGE MEDIUM FOR BACK-END OPTIMIZATION OF SIMULTANEOUS LOCALIZATION AND MAPPING
2y 5m to grant Granted Apr 14, 2026
18/962,814
Patent 12597252
METHOD OF TRACKING OBJECTS
2y 5m to grant Granted Apr 07, 2026
18/288,713
Patent 12576595
SYSTEMS AND METHODS FOR IMPROVED VOLUMETRIC ADDITIVE MANUFACTURING
2y 5m to grant Granted Mar 17, 2026
18/222,744
Patent 12574469
VIDEO SURVEILLANCE SYSTEM, VIDEO PROCESSING APPARATUS, VIDEO PROCESSING METHOD, AND VIDEO PROCESSING PROGRAM
2y 5m to grant Granted Mar 10, 2026
18/222,360
Patent 12563154
VIDEO SURVEILLANCE SYSTEM, VIDEO PROCESSING APPARATUS, VIDEO PROCESSING METHOD, AND VIDEO PROCESSING PROGRAM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
76%
With Interview (-7.5%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 816 resolved cases by this examiner. Grant probability derived from career allow rate.