Last updated: April 19, 2026
Application No. 18/081,049
MACHINE LEARNING-BASED VIDEO ANALYTICS USING CAMERAS WITH DIFFERENT FRAME RATES

Final Rejection §103
Filed
Dec 14, 2022
Examiner
SATCHER, DION JOHN
Art Unit
2676
Tech Center
2600 — Communications
Assignee
Cisco Technology Inc.
OA Round
4 (Final)
Interview Optional

— +14.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 39 resolved cases, 2023–2026
Examiner Intelligence

SATCHER, DION JOHN View full profile →
Grants 85% — above average
Career Allow Rate
33 granted / 39 resolved
+22.6% vs TC avg
Moderate +14% lift
Without
With
+14.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
14.2%
-25.8% vs TC avg
§103
61.9%
+21.9% vs TC avg
§102
15.1%
-24.9% vs TC avg
§112
8.3%
-31.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 39 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 08/11/2025 has been entered.
Response to Amendment
Applicant’s Amendments filed on 02/02/2026 has been entered and made of record. 
Currently pending Claim(s):
Independent Claim(s): 
Amended Claim(s):
Cancelled Claim(s):
New Claim(s):
1, 3, 6–11, 13 and 16–26
1, 11 and 20
1, 6, 7, 9, 11, 16, 17, 19 and 20
2, 4, 5, 12, 14 and 15
21–26



Response to Applicant’s Arguments
This office action is responsive to Applicant’s Arguments/Remarks Made in an Amendment received on 02/02/2026.
In view of applicant Arguments/Remarks and amendment filed on 02/02/2026 with respect to independent claims 1, 11 and 20 under 35 U.S.C 103, claim rejection has been fully considered and the arguments are found to be persuasive (See Page(s) 9–11), therefore the claim rejection with respect to 35 U.S.C. 103 still applies.
Applicant argues, in summary the applied prior art (Kerst and Nakai) does not disclose or suggest (See page(s) 9 and 10):
“by a device using a first processor of a video processing unit, first video data from a first camera using machine learning model, the processing of the first video data including making a machine learning inference related to detection or classification of at least one of an object, event, or behavior depicted in the first video data;”
However, the Examiner respectfully disagrees with the Applicant’s line of reasoning. The Examiner has thoroughly reviewed the Applicant arguments but respectfully believes that the cited references to reasonably and properly meet the claimed limitations.
Kerst is teaching using a first and second camera to process a first and second video. See Kerst, ¶ [0028], “ Moving object detection logic 1004 receives images from one or more of the cameras 201, 202 , 203n, 204n over time and detects changes in the images from one image to the next”. The Examiner is interpreting the processing as receiving images using the first and second camera. Each camera has a processor in order to receive the images as a camera would need a processor in order to record those images. 
Kerst teaches processing the first video of the first camera using a machine learning model to recognize and object or determine the motion of an object or recognize the object. See Kerst ,¶ [0028], “In another example, moving object detection logic 1004 utilizes machine learning logic 1006 to detect moving objects within the image. Further, moving object detection logic 1004, in some examples, can identify the type of the object (e.g., a moving bird, plane, drone, etc)”. 
Applicant argues, in summary the applied prior art (Kerst and Nakai) does not disclose or suggest (See page(s) 9 and 10):
“ processing, by device using a second processor of the video processing unit, second video data from a second camera, the second video data being associated with a lower frame rate and a higher resolution than the first video data, the processing of the second video data including improving image quality of the second video data”
However, the Examiner respectfully disagrees with the Applicant’s line of reasoning. The Examiner has thoroughly reviewed the Applicant arguments but respectfully believes that the cited references to reasonably and properly meet the claimed limitations.
Applicant argues that Kerst is not teaching a second video with High Resolution and low frame rate but Examiner is not using Kerst to teach that limitation and is using Nakai to teach a second camera that processes a second video in high resolution and low frame rate. See Nakai, ¶ [0049], “The present embodiment in the aforementioned manner uses the camera 201a of low resolution and high framerate and the camera 201b of high resolution and low framerate”. 
Applicant argues that Kerst and Nakai does not teach processing the second video data to improve the quality. The new limitation has changed the scope of the claim 1 and Examiner is incorporating a new reference Dinh to teach processing a second video from a second camera to improve the quality of the video. See Dinh, ¶ [0149], “second raw image data obtained by using the second camera module may be noise-reduced by using the second learning model”. 
Applicant argues, in summary the applied prior art (Kerst and Nakai) does not disclose or suggest (See page(s) 9 and 10):
“performing, by the device, a mapping of the machine learning inference about the first video data to the second video data, wherein the mapping comprises mapping coordinates associated with the machine learning inference to corresponding coordinates of the second video data”
However, the Examiner respectfully disagrees with the Applicant’s line of reasoning. The Examiner has thoroughly reviewed the Applicant arguments but respectfully believes that the cited references to reasonably and properly meet the claimed limitations.
Kerst teaches mapping the object detected in the machine learning logic to the objects detected in other cameras. See Kerst, ¶ [0029], “Object mapping logic 1010 can determine the location of the objects in various ways. For instance, object mapping logic 1010 can receive calibration information relative to cameras 201, 202, 203n, 204n and identify each pixel as an angular and/or distal location relative to the camera(s). Thus, the detected pixel change and pixel location of the object in the image (or sets of images from multiple cameras) can be converted into a geolocation (e.g., latitude, longitude, altitude/elevation, etc.) of the object”. Applicant argues that Kerst is not teaching mapping from a first cameras machine learning inference to a second cameras video that is high resolution and low framerate. Examiner is using Nakai to teach a second video with high resolution and low frame rate. Kerst is teaching mapping the pixel changes detected from the object motion model and mapping it to another camera.
Examiner is incorporating Dinh in claim 1 to teach processing a second video from a second camera to improve the quality of the video. Examiner maintains that Kerst and Nakai teach the limitations as presented and as rejected for claim 11 and 20. Additional citations and/or modified citations may be present to more concisely address limitations. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claim(s) 1, 6–9, 17 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Kerst et al. (US 20210409655 A1, hereafter, “Kerst”) in view of in view of Nakai (US 2021/0274088 A1, hereafter, “Nakai”) further in view of Dinh et al. (US 20220132001 A1, hereafter, “Dinh”).
Regarding claim 1, Kerst teaches a method comprising: 
processing, by a device using a first processor of a video processing unit, first video data from a first camera using machine learning model (See Kerst, ¶ [0094], In one embodiment, object recognition occurs by machine learning, in which data relative to the environment being surveyed by cameras 802 and 804 is implemented to recognize stationary objects and determine their location), the processing of the first video data including making a machine learning inference related to detection or classification of at least one of an object, event, or behavior depicted in the first video data (See Kerst, ¶ [0094], In one embodiment, object recognition occurs by machine learning, in which data relative to the environment being surveyed by cameras 802 and 804 is implemented to recognize stationary objects and determine their location.  ¶ [0028], Moving object detection logic 1004 receives images from one or more of the cameras 201, 202 , 203n, 204n over time and detects changes in the images from one image to the next, …, In another example, moving object detection logic 1004 utilizes machine learning logic 1006 to detect moving objects within the image. Further, moving object detection logic 1004, in some examples, can identify the type of the object (e.g., a moving bird, plane, drone, etc). Note: Object recognition is an inference or classification of the object that is performed on at least one of the cameras); 
processing, by device using a second processor of the video processing unit, second video data from a second camera (See Kerst, ¶ [0025], FIG. 2 is a block diagram showing an example environment with which embodiments of the present invention are particularly applicable. As shown, environment 200 includes camera 201 and camera 202. ¶ [0028], Moving object detection logic 1004 receives images from one or more of the cameras 201, 202 , 203n, 204n over time and detects changes in the images from one image to the next), [the second video data being associated with a lower frame rate and a higher resolution than the first video data, the processing of the second video data including improving image quality of the second video data]; 
performing, by the device, a mapping of the machine learning inference about the first video data to the second video data (See Kerst, ¶ [0029], Object mapping logic 1010 can determine the location of the objects in various ways), wherein the mapping comprises mapping coordinates associated with the machine learning inference to corresponding coordinates of the second video data (See Kerst, ¶ [0029], Object mapping logic 1010 can determine the location of the objects in various ways. For instance, object mapping logic 1010 can receive calibration information relative to cameras 201, 202, 203n, 204n and identify each pixel as an angular and/or distal location relative to the camera(s). Thus, the detected pixel change and pixel location of the object in the image (or sets of images from multiple cameras) can be converted into a geolocation (e.g., latitude, longitude, altitude/elevation, etc.) of the object. Based on the size of the object in the image or the overlapping of multiple geolocation estimations from the different cameras, the absolute geolocation of the object can be determined. Note: Examiner is interpreting the mapping as determining the absolute position based on the relative position of the object from both cameras); and 
providing by the device, an indication of the mapping for display (See Kerst, ¶ [0033], Display logic 1018 receives data from various components of image detection mapping system 1000 and displays interfaces on a display device 1102. ¶ [0003], The object detection system also includes display logic configured to display the location of the object. ¶ [0105], As shown in FIG. 10B, a target detail interface is displayed, in which the images from each camera corresponding to a detected object may be simultaneously shown. In this example, interface 2000 shows detected object 2002 from the FOV of a first and second camera. See also, [FIG. 10B], 2002. Note: The display logic is displaying the location (absolute position) of the object on both cameras).
However, Kerst fail(s) to teach the second video data being associated with a lower frame rate and a higher resolution than the first video data, the processing of the second video data including improving image quality of the second video data.
Nakai, working in the same field of endeavor, teaches: the second video data being associated with a lower frame rate and a higher resolution than the first video data (See Nakai, ¶ [0049], The present embodiment in the aforementioned manner uses the camera 201a of low resolution and high framerate and the camera 201b of high resolution and low framerate).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference the second video data being associated with a lower frame rate and a higher resolution than the first video data based on the method of Nakai’s reference. The motivation would have been to lessen the burden of large amounts of the image data at the high framerate and high resolution (See Nakai, ¶ [0050]).
However, Kerst and Nakai fail(s) to the processing of the second video data including improving image quality of the second video data.
Dinh, working in the same field of endeavor, teaches: the processing of the second video data including improving image quality of the second video data (See Dinh, ¶ [0149], second raw image data obtained by using the second camera module may be noise-reduced by using the second learning model).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference the processing of the second video data including improving image quality of the second video data based on the method of Dinh’s reference. The motivation would have been enabling faster image processing and provide higher quality images (See Dinh, ¶ [0009]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Dinh and Nakai with Kerst to obtain the invention as specified in claim 1.
Regarding claim 6, Kerst teaches the method as in claim 1, wherein performing the mapping comprises: mapping coordinates output by the machine learning model relative to the video data from the first camera (See Kerst, ¶ [0027], In one example, still object detection logic 1002 utilizes machine learning logic 1006 to detect still or substantially motionless objects within the images. Note: Object detection is different than object recognition in that it outputs position of the object) to coordinates of the second video data from the second camera (See Kerst, ¶ [0029], Object mapping logic 1010 can determine the location of the objects in various ways. For instance, object mapping logic 1010 can receive calibration information relative to cameras 201, 202, 203n, 204n and identify each pixel as an angular and/or distal location relative to the camera(s). Thus, the detected pixel change and pixel location of the object in the image (or sets of images from multiple cameras) can be converted into a geolocation (e.g., latitude, longitude, altitude/elevation, etc.) of the object. Based on the size of the object in the image or the overlapping of multiple geolocation estimations from the different cameras, the absolute geolocation of the object can be determined).
Regarding claim 7, Kerst in view of Nakai further in view of Dinh teaches the method as in claim 1, [wherein processing the second video data from the second camera comprises: performing rescaling, noise reduction, de-skewing, thresholding, or a morphological operation on the second video data from the second camera].
However, Kerst and Nakai fail(s) to wherein processing the second video data from the second camera comprises: performing rescaling, noise reduction, de-skewing, thresholding, or a morphological operation on the second video data from the second camera.
Dinh, working in the same field of endeavor, teaches: wherein processing the second video data from the second camera comprises: performing rescaling, noise reduction, de-skewing, thresholding, or a morphological operation on the second video data from the second camera (See Dinh, ¶ [0149], second raw image data obtained by using the second camera module may be noise-reduced by using the second learning model).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference wherein processing the second video data from the second camera comprises: performing rescaling, noise reduction, de-skewing, thresholding, or a morphological operation on the second video data from the second camera based on the method of Dinhs reference. The motivation would have been enabling faster image processing and provide higher quality images (See Dinh, ¶ [0009]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Dinh with Nakai and Kerst to obtain the invention as specified in claim 7.
Regarding claim 8, Kerst teaches the method as in claim 1, wherein the device comprises the first camera and the second camera (See Kerst, ¶ [0025], FIG. 2 is a block diagram showing an example environment with which embodiments of the present invention are particularly applicable. As shown, environment 200 includes camera 201 and camera 202).
Regarding claim 9, Kerst teaches the method as in claim 1, wherein the providing includes overlaying the indication on the second video data from the second camera that has been processed by the device (See Kerst, ¶ [0105], As shown in FIG. 10B, a target detail interface is displayed, in which the images from each camera corresponding to a detected object may be simultaneously shown, …, As shown, a box is highlighted around detected object 2002 for emphasis. However, other indicators of detected object 2002 may be utilized as well. Interface 2000 further includes location panel 2010, which displays latitude, longitude, speed, bearing, and alert region, in this example. In other examples, other locational parameters may be displayed as well (e.g., altitude). Additionally, bearing and range to detected object 2002 from a selected point may also be displayed, such as from a marker placed on the map, an incident responder, a camera position, etc. See also [FIG. 10B]).
Regarding claim 17, claim 17 is rejected the same as claim 7 and the arguments similar to that presented above for claim 7 are equally applicable to the claim 17, and all of the other limitations similar to claim 7 are not repeated herein, but incorporated by reference.
Regarding claim 23, claim 23 is rejected the same as claim 7 and the arguments similar to that presented above for claim 7 are equally applicable to the claim 23, and all of the other limitations similar to claim 7 are not repeated herein, but incorporated by reference.
Claim(s) 3 is rejected under 35 U.S.C. 103 as being unpatentable over Kerst et al. (US 20210409655 A1, hereafter, “Kerst”) in view of Nakai (US 2021/0274088 A1, hereafter, “Nakai”) further in view of Dinh et al. (US 20220132001 A1, hereafter, “Dinh”) and further in view of Hildreth et al. (US 7058204 B2, hereafter, “Hildreth”).
Regarding claim 3, Kerst in view of Nakai further in view of Dinh teaches the method as in claim 1, [wherein the mapping is based in part on a physical distance between the first camera and the second camera]
However, Kerst, Nakai and Dinh fail(s) to teach wherein the mapping is based in part on a physical distance between the first camera and the second camera.
Hildreth, working in the same field of endeavor, teaches: wherein the mapping is based in part on a physical distance between the first camera and the second camera (See Hildreth, [Col. 15, ln. 31–37], Continuing with the description of combination module 312, a reference vector 907, as illustrated in FIG. 9B, is defined such that it passes through the positions of both cameras 101 and 102 on the reference plane where the 35 reference plane is defined such that the axis of rotation of the cameras define the normal of the reference plane. See also [FIG. 9B]).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference to wherein the mapping is based in part on a physical distance between the first camera and the second camera based on the method of Hildreth’s reference. The suggestion/motivation would have been to accurately track an object of interest (See Hildreth, [Col. 1, ln. 17–58]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Hildreth with Kerst, Nakai and Dinh to obtain the invention as specified in claim 3.
Claim(s) 10 is rejected under 35 U.S.C. 103 as being unpatentable over Kerst et al. (US 20210409655 A1, hereafter, “Kerst”) in view of Nakai (US 2021/0274088 A1, hereafter, “Nakai”) further in view of Dinh et al. (US 20220132001 A1, hereafter, “Dinh”) and further in view of Esquivel, et al. (US 2021/0118180 A1, hereafter, “Esquivel”).
Regarding claim 10, Kerst in view of Naki and further in view of Dinh teaches the method as in claim 1, [wherein the machine learning model comprises a neural network].
However, Kerst, Nakai and Dinh fail(s) to teach wherein the machine learning model comprises a neural network.
Esquivel, working in the same field of endeavor, teaches: wherein the machine learning model comprises a neural network (See Esquivel, ¶ [0031], the example object identifier 204 of the example multicamera calibration controller 110 illustrated in FIG. 2 implements a machine learning model (e.g., linear regression, logistic regression, deep neural network (DNN), convolutional neural network (CNN), and/or multi-stage CNN)).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference wherein the machine learning model comprises a neural network based on the method of Esquivel’s reference. The suggestion/motivation would have been to decrease the complexity and increase the accuracy of calibration for use in the mapping (See Esquivel, ¶ [0002]. ¶ [0016–0017]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Esquivel with Kerst, Nakai and Dinh to obtain the invention as specified in claim 10.
Claim(s) 11, 16, 18, 19, 20, 22, 24 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Kerst et al. (US 20210409655 A1, hereafter, “Kerst”) in view of Nakai (US 2021/0274088 A1, hereafter, “Nakai).
Regarding claim 11, Kerst teaches an apparatus, comprising: 
[a network interface to communicate with a computer network]; 
a video processing unit coupled to the network, the video processing unit including a first processor and a second processor (See Kerst, ¶ [0033], Display logic 1018 receives data from various components of image detection mapping system 1000 and displays interfaces on a display device 1102. Of course, image detection mapping system 1000 may include one or more processor(s)/controller(s) as indicated by block 1020, as well as other items indicated by block 1100); and 
[a memory configured to store instructions that, when executed by the video processing unit, configure the video processing unit to]: 
process using the first processor, the first video data from a first camera using a machine learning model (See Kerst, ¶ [0094], In one embodiment, object recognition occurs by machine learning, in which data relative to the environment being surveyed by cameras 802 and 804 is implemented to recognize stationary objects and determine their location), the processing of the first video data including making an inference related to at least one of an object, event, or behavior depicted in the first video data ((See Kerst, ¶ [0094], In one embodiment, object recognition occurs by machine learning, in which data relative to the environment being surveyed by cameras 802 and 804 is implemented to recognize stationary objects and determine their location.  ¶ [0028], Moving object detection logic 1004 receives images from one or more of the cameras 201, 202 , 203n, 204n over time and detects changes in the images from one image to the next, …, In another example, moving object detection logic 1004 utilizes machine learning logic 1006 to detect moving objects within the image. Further, moving object detection logic 1004, in some examples, can identify the type of the object (e.g., a moving bird, plane, drone, etc). Note: Object recognition is an inference or classification of the object that is performed on at least one of the cameras));
process, using the second processor, second video data from a second camera (See Kerst, ¶ [0025], FIG. 2 is a block diagram showing an example environment with which embodiments of the present invention are particularly applicable. As shown, environment 200 includes camera 201 and camera 202. ¶ [0028], Moving object detection logic 1004 receives images from one or more of the cameras 201, 202 , 203n, 204n over time and detects changes in the images from one image to the next), [the second video data being associated with that has a lower frame rate and a higher resolution than the first video data];
perform a mapping of the inference about the first video data to the second video data, wherein the mapping comprises mapping coordinates associated with the inference to corresponding coordinates of the second video data (See Kerst, ¶ [0029], Object mapping logic 1010 can determine the location of the objects in various ways. For instance, object mapping logic 1010 can receive calibration information relative to cameras 201, 202, 203n, 204n and identify each pixel as an angular and/or distal location relative to the camera(s). Thus, the detected pixel change and pixel location of the object in the image (or sets of images from multiple cameras) can be converted into a geolocation (e.g., latitude, longitude, altitude/elevation, etc.) of the object. Based on the size of the object in the image or the overlapping of multiple geolocation estimations from the different cameras, the absolute geolocation of the object can be determined. Note: Examiner is interpreting the mapping as determining the absolute position based on the relative position of the object from both cameras); and
provide the second video data and an indication of the mapping for display (See Kerst, ¶ [0033], Display logic 1018 receives data from various components of image detection mapping system 1000 and displays interfaces on a display device 1102. ¶ [0003], The object detection system also includes display logic configured to display the location of the object. ¶ [0105], As shown in FIG. 10B, a target detail interface is displayed, in which the images from each camera corresponding to a detected object may be simultaneously shown. In this example, interface 2000 shows detected object 2002 from the FOV of a first and second camera. See also, [FIG. 10B], 2002. Note: The display logic is displaying the location (absolute position) of the object on both cameras).
However, Kerst fail(s) to teach a network interface to communicate with a computer network; a memory configured to store instructions that, when executed by the video processing unit, configure the video processing unit to; the second video data being associated with that has a lower frame rate and a higher resolution than the first video data.
Nakai, working in the same field of endeavor, teaches: a network interface to communicate with a computer network (See Nakai, ¶ [0029], The communication unit 208, for example, is a local area network (LAN) card); 
a memory configured to store instructions that, when executed by the video processing unit, configure the video processing unit to (See Nakai, ¶ [0022], As illustrated in FIG. 2, …,  a memory 205); 
the second video data being associated with that has a lower frame rate and a higher resolution than the first video data (See Nakai, ¶ [0049], The present embodiment in the aforementioned manner uses the camera 201a of low resolution and high framerate and the camera 201b of high resolution and low framerate).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference a network interface to communicate with a computer network; a memory configured to store instructions that, when executed by the video processing unit, configure the video processing unit to; the second video data being associated with that has a lower frame rate and a higher resolution than the first video data based on the method of Nakai’s reference. The motivation would have been to lessen the burden of large amounts of the image data at the high framerate and high resolution (See Nakai, ¶ [0050]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Nakai with Kerst to obtain the invention as specified in claim 11.
Regarding claim 16, Kerst teaches the apparatus as in claim 11, wherein the video processing unit is configured to perform the mapping by: mapping coordinates output by the machine learning model relative to the first video data from the first camera to coordinates of the second video data from the second camera (See Kerst, ¶ [0029], Object mapping logic 1010 can determine the location of the objects in various ways. For instance, object mapping logic 1010 can receive calibration information relative to cameras 201, 202, 203n, 204n and identify each pixel as an angular and/or distal location relative to the camera(s). Thus, the detected pixel change and pixel location of the object in the image (or sets of images from multiple cameras) can be converted into a geolocation (e.g., latitude, longitude, altitude/elevation, etc.) of the object. Based on the size of the object in the image or the overlapping of multiple geolocation estimations from the different cameras, the absolute geolocation of the object can be determined).
Regarding claim 18, Kerst teaches the apparatus as in claim 11, wherein the apparatus comprises the first camera and the second camera (See Kerst, ¶ [0025], FIG. 2 is a block diagram showing an example environment with which embodiments of the present invention are particularly applicable. As shown, environment 200 includes camera 201 and camera 202).
Regarding claim 19, Kerst teaches the apparatus as in claim 11, wherein the video processing unit is configured to provide the second video data and the indication by overlaying the indication on for the second video data that has been processed by the apparatus (See Kerst, ¶ [0105], As shown in FIG. 10B, a target detail interface is displayed, in which the images from each camera corresponding to a detected object may be simultaneously shown, …, As shown, a box is highlighted around detected object 2002 for emphasis. However, other indicators of detected object 2002 may be utilized as well. Interface 2000 further includes location panel 2010, which displays latitude, longitude, speed, bearing, and alert region, in this example. In other examples, other locational parameters may be displayed as well (e.g., altitude). Additionally, bearing and range to detected object 2002 from a selected point may also be displayed, such as from a marker placed on the map, an incident responder, a camera position, etc. See also [FIG. 10B]).
Regarding claim 20, claim 20 is rejected the same as claim 11 and the arguments similar to that presented above for claim 11 are equally applicable to the claim 20, and all of the other limitations similar to claim 11 are not repeated herein, but incorporated by reference. Furthermore, Kerst teaches: A tangible, non-transitory, computer-readable medium storing program instructions that cause a device to execute a process comprising (See Kerst, [FIG. 2], PROCESSOR(S)/CONTROLLER(S) 1020).
Regarding claim 22, claim 22 is rejected the same as claim 6 and the arguments similar to that presented above for claim 6 are equally applicable to the claim 22, and all of the other limitations similar to claim 6 are not repeated herein, but incorporated by reference.
Regarding claim 24, claim 24 is rejected the same as claim 18 and the arguments similar to that presented above for claim 18 are equally applicable to the claim 24, and all of the other limitations similar to claim 18 are not repeated herein, but incorporated by reference.
Regarding claim 25, claim 25 is rejected the same as claim 19 and the arguments similar to that presented above for claim 19 are equally applicable to the claim 25, and all of the other limitations similar to claim 19 are not repeated herein, but incorporated by reference.
Claim(s) 13 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Kerst et al. (US 20210409655 A1, hereafter, “Kerst”) in view of Nakai (US 2021/0274088 A1, hereafter, “Nakai”) further in view of Hildreth et al. (US 7058204 B2, hereafter, “Hildreth”).
Regarding claim 13, Kerst in view of Nakai teaches the apparatus as in claim 11, [wherein the mapping is based in part on a physical distance between the first camera and the second camera]
However, Kerst and Nakai fail(s) to teach wherein the mapping is based in part on a physical distance between the first camera and the second camera.
Hildreth, working in the same field of endeavor, teaches: wherein the mapping is based in part on a physical distance between the first camera and the second camera (See Hildreth, [Col. 15, ln. 31–37], Continuing with the description of combination module 312, a reference vector 907, as illustrated in FIG. 9B, is defined such that it passes through the positions of both cameras 101 and 102 on the reference plane where the 35 reference plane is defined such that the axis of rotation of the cameras define the normal of the reference plane. See also [FIG. 9B]).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference to wherein the mapping is based in part on a physical distance between the first camera and the second camera based on the method of Hildreth’s reference. The suggestion/motivation would have been to accurately track an object of interest (See Hildreth, [Col. 1, ln. 17–58]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Hildreth with Kerst and Nakai to obtain the invention as specified in claim 13.
Regarding claim 21, claim 21 is rejected the same as claim 3 and the arguments similar to that presented above for claim 3 are equally applicable to the claim 21, and all of the other limitations similar to claim 3 are not repeated herein, but incorporated by reference.
Claim(s) 26 is rejected under 35 U.S.C. 103 as being unpatentable over Kerst et al. (US 20210409655 A1, hereafter, “Kerst”) in view of Nakai (US 2021/0274088 A1, hereafter, “Nakai”) further in view of Esquivel, et al. (US 2021/0118180 A1, hereafter, “Esquivel”).
Regarding claim 26, Kerst in view of Naki teaches the tangible, non-transitory, computer-readable medium as in claim 20, [wherein the machine learning model comprises a neural network].
However, Kerst and Nakai fail(s) to teach wherein the machine learning model comprises a neural network.
Esquivel, working in the same field of endeavor, teaches: wherein the machine learning model comprises a neural network (See Esquivel, ¶ [0031], the example object identifier 204 of the example multicamera calibration controller 110 illustrated in FIG. 2 implements a machine learning model (e.g., linear regression, logistic regression, deep neural network (DNN), convolutional neural network (CNN), and/or multi-stage CNN)).
Thus, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kerst’s reference wherein the machine learning model comprises a neural network based on the method of Esquivel’s reference. The suggestion/motivation would have been to decrease the complexity and increase the accuracy of calibration for use in the mapping (See Esquivel, ¶ [0002]. ¶ [0016–0017]).
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Esquivel with Kerst and Nakai to obtain the invention as specified in claim 26.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Jang et al. (US 20220166521 A1) teaches the camera 140 may be a device for capturing an external environment, and in detail, may include a first camera 140A and a second camera 140B. Here, the camera 140 may be a low frame rate camera for capturing the outside of the vehicle 10 and receiving information on a low frame rate generated by the counterpart vehicle 20 (and an external device). The second camera 140B may be a high frame rate camera for receiving information on a high frame rate generated by the counterpart vehicle 20. Hereinafter, the camera 140 will be described in terms of the first camera 140A for capturing the outside and receiving information on a low frame rate.
Zhu et al. (CN 113099146 A) teaches The embodiment of the invention claims a video generation method, device and related device, specifically can be applied to the camera, smart phone and so on, so as to improve the video quality, wherein the method comprises: obtaining the first video data collected by the first camera in the first time period; the first video data comprises a multi-frame first video frame; obtaining the image data collected by the second camera in the first time period; the image data comprises one or more images; adjusting the resolution of the first video frame of each frame in the multi-frame first video frame; obtaining the second video data; based on the image data, performing image fusion on one frame or multiple frames of the second video frame in the multi-frame second video frame, obtaining the third video data. The invention can be applied to multiple technical field of intelligent video processing and so on, which can more intelligently, more accurately improve the resolution of video.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DION J SATCHER whose telephone number is (703)756-5849. The examiner can normally be reached Monday - Thursday 5:30 am - 2:30 pm, Friday 5:30 am - 9:30 am PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at (571) 272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DION J SATCHER/           Patent Examiner, Art Unit 2676     

/Henok Shiferaw/           Supervisory Patent Examiner, Art Unit 2676
Read full office action
Prosecution Timeline

Dec 14, 2022
Application Filed
Nov 27, 2024
Non-Final Rejection — §103
Jan 07, 2025
Interview Requested
Jan 29, 2025
Applicant Interview (Telephonic)
Jan 29, 2025
Examiner Interview Summary
Mar 03, 2025
Response Filed
Apr 04, 2025
Final Rejection — §103
Jul 02, 2025
Interview Requested
Jul 09, 2025
Examiner Interview Summary
Jul 09, 2025
Applicant Interview (Telephonic)
Aug 11, 2025
Request for Continued Examination
Aug 12, 2025
Response after Non-Final Action
Sep 29, 2025
Non-Final Rejection — §103
Jan 21, 2026
Interview Requested
Feb 02, 2026
Response Filed
Mar 13, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/119,435
Patent 12586218
MOTION ESTIMATION WITH ANATOMICAL INTEGRITY
2y 5m to grant Granted Mar 24, 2026
18/469,583
Patent 12579787
INSTRUMENT RECOGNITION METHOD BASED ON IMPROVED U2 NETWORK
2y 5m to grant Granted Mar 17, 2026
17/981,891
Patent 12573066
Depth Estimation Using a Single Near-Infrared Camera and Dot Illuminator
2y 5m to grant Granted Mar 10, 2026
18/063,819
Patent 12555263
SYSTEMS AND METHODS FOR TWO-STAGE OBJECTION DETECTION
2y 5m to grant Granted Feb 17, 2026
17/993,651
Patent 12548140
DETERMINING PROCESS DEVIATIONS THROUGH VIDEO ANALYSIS
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+14.2%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 39 resolved cases by this examiner. Grant probability derived from career allow rate.
MACHINE LEARNING-BASED VIDEO ANALYTICS USING CAMERAS WITH DIFFERENT FRAME RATES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email