Last updated: April 17, 2026
Application No. 18/403,306
SYSTEM, METHOD, AND DEVICE FOR CAPTURING HIGH RESOLUTION IMAGES OVER A PANORAMIC SCENE IN NEAR AND FAR FIELDS FOR IMAGING OF PERSONS

Non-Final OA §102§103
Filed
Jan 03, 2024
Examiner
PHAM, NHUT HUY
Art Unit
2674
Tech Center
2600 — Communications
Assignee
unknown
OA Round
1 (Non-Final)
Interview Optional

— +26.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

PHAM, NHUT HUY View full profile →
Grants 79% — above average
Career Allow Rate
42 granted / 53 resolved
+17.2% vs TC avg
Strong +27% interview lift
Without
With
+26.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
31 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
62.2%
+22.2% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
14.5%
-25.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The United States Patent & Trademark Office appreciates the application that is submitted by the inventor/assignee. The United States Patent & Trademark Office reviewed the following application and has made the following comments below.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/06/2024 is considered and attached.

Claim Status
Claims 1, 4-10, 16-18 and 20 are rejected under 35 USC § 102 in view of Filho.
Claims 2-3 and 12-15 are rejected under 35 USC § 103 in view of Filho in view of Hoch.




1st Claim Interpretation (Contingent Limitation)
Under MPEP 2111.04, Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure.  The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.
For example, assume a method claim requires step A if a first condition happens and step B if a second condition happens. If the claimed invention may be practiced without either the first or second condition happening, then neither step A or B is required by the broadest reasonable interpretation of the claim. 
In this case, Claim 3 recites “when” then listing two paths/methods “receiving the attentive image … when a face of the detected person is capturable” (first path) and “when the face of the detected person is not capturable … extracting a cropped portion of the pre-attentive image” (second path). Considering the limitation “outputting the attentive image” from claim 1, the first path is required, making the second path optional. While citations have been provided for completeness and rapid prosecution, only first path is required.  Applicant’s comments and/or amendments relating to this issue are invited to clarify the claim language and the prosecution history.


2nd Claim Interpretation (Contingent Limitation)
Under MPEP 2111.04, Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure.  However, "[t]he broadest reasonable interpretation of a system claim having structure that performs a function, which only needs to occur if a condition precedent is met, still requires structure for performing the function should the condition occur." Ex parte Schulhauser, Appeal 2013-007847 (PTAB April 28, 2016).
In this case, Claims 12 are System/Apparatus claims reciting “when” then listing two paths/methods “receiving the attentive image … when a face of the detected person is capturable” (first path) and “when the face of the detected person is not capturable … extracting a cropped portion of the pre-attentive image” (second path). Considering the limitation “outputting the attentive image” from claim 1, the first path is required, making the second path optional. While citations have been provided for completeness and rapid prosecution, the prior art must only teach the structure that performs the function of the contingent step along with the other recited claim limitations.  Applicant’s comments and/or amendments relating to this issue are invited to clarify the claim language and the prosecution history.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4-9 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by Filho et al. (Perroni Filho, Hélio, et al. "Attentive Sensing for Long-Range Face Recognition." WACV (Workshops), https://openreview.net/forum?id=UuUrJJTBlG, published 12/31/2022, hereinafter Filho)
CLAIM 1
In regards to Claim 1, Filho teaches a computer-implemented method (Filho, Page 615, section 4. Vision Pipeline: “RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server”) for capturing high resolution images of a scene for analysis of one or more persons, the scene comprising the one or more persons to be analyzed (Filho, Abstract: “we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system”; see figures 1-2 and section 4. Vision Pipeline. Filho teaches a method to detect people in a wide-angle image and then track the detected people with a high resolution narrow-field camera), the method comprising:

    PNG
    media_image1.png
    659
    972
    media_image1.png
    Greyscale
receiving one or more pre-attentive images, the one or more pre-attentive images capturing the scene; (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see modified figure 3 (a) below)

detecting one or more persons in the one or more pre-attentive images; (Filho, Page 615, section 4. Vision Pipeline: “… from the four pre-attentive RGBD cameras. RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server. Person detection is performed on RGB images using the YOLOv4 object detector”; See figure 4)
determining a feature vector and a geo-location for at least one of the detected persons in the pre-attentive image, the geo-location comprising an azimuthal location; (Filho, Page 615, section 4. Vision Pipeline: “For each detected person, the corresponding image crop is extracted, and a feature vector encoding their appearance is computed by a wide residual network [32]. The detection is also geo-located … we geo-locate the person at the azimuth of the bounding box centroid at a distance determined by the mean of the depth returns within the bounding box”)
matching the feature vector and geo-location to a previously detected person for tracking of such detected person (Filho, Page 615, section 4. Vision Pipeline: “After being computed, geo-located detections (composed of a feature vector and ground plane coordinates) are submitted to a tracker. Detections are paired to existing tracks through the Hungarian algorithm [17], based on a metric that combines appearance similarity (defined as the cosine distance between feature vectors) and Euclidean distance in ground plane coordinate”), and where there is no match, initializing a tracking of a new person; (Filho, Page 615, section 4. Vision Pipeline: “…Any detection not assigned to an existing track is used to initialize a new track…”)
receiving an attentive image that captures the tracked person (Filho, Page 615, section 4. Vision Pipeline: “3. A ROS bridge node for capturing attentive still frames”; see modified figure 3 (b) below) by directing gaze at the azimuthal location (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a 
    PNG
    media_image2.png
    223
    691
    media_image2.png
    Greyscale
person of interest”; Page 616, see reconstructed text below),


    PNG
    media_image3.png
    786
    962
    media_image3.png
    Greyscale
the attentive image comprising a smaller field-of-view (FoV) than the one or more pre-attentive images; (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing … With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV … Attentive sensing … yielding an 8 deg horizontal FoV”. Filho teaches a pre-attentive camera system that outputs a panoramic FOV and an attentive camera system that outputs an 8-degree FOV)

and outputting the attentive image. (Filho, Page 615, left col, last paragraph: “Pre-attentive and attentive frames are streamed over a dedicated Wi-Fi link to an external server that runs person detection, face detection and face recognition algorithms”. The Examiner interprets “outputting an image” as generating/producing/exporting image data, or sending image data from one device to another)

CLAIM 4
In regards to Claim 4, Filho teaches the method of Claim 1. In addition, Filho teaches the one or more pre-attentive images comprise a panoramic view of the scene. (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see modified figure 3 (a) in the rejection of claim 1)









CLAIM 5

    PNG
    media_image4.png
    588
    924
    media_image4.png
    Greyscale
In regards to Claim 5, Filho teaches the method of Claim 1. In addition, Filho teaches recognizing the detected person by matching a vector associated with the detected person to vectors in a data store (Filho, page 616, right col: “Our face recognition module is built on deepFace [25] … We provide access to a gallery of reference face images associated with unique IDs”, see reconstructed text below, see step 2-4), and outputting a positive recognition where the vector is matched to the data store (Filho, page 616, right col, see reconstructed text below, step 5), and outputting a negative recognition otherwise. (Filho, page 616, right col, see reconstructed text below, step 6)





CLAIM 6

    PNG
    media_image5.png
    1070
    1394
    media_image5.png
    Greyscale
In regards to Claim 6, Filho teaches the method of Claim 1. In addition, Filho teaches the one or more pre-attentive images comprises depth information (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing is provided by four RealSense D455 RGBD 1280×720 pixels cameras”; Page 615, section 4. Vision Pipeline: “…from the four pre-attentive RGBD cameras. RGB and depth images are acquired”), and wherein receiving the attentive image further comprises pre-focusing using a focus determined from the depth information. (Filho, page 616, left col, see reconstructed text and figure 4 below; Page 620, section 9: “Updating to an SLR camera that allows more fully-programmable focus would also allow us to focus the camera as we rotate the mirror”)

CLAIM 7
In regards to Claim 7, Filho teaches the method of Claim 6. In addition, Filho teaches the tracking uses ground-plane coordinates. (Filho, page 615, section 4. Vision Pipeline: “1. A multi-camera RGBD tracking pipeline providing pre-attentive panoramic human tracking in ground-plane coordinates;” See modified figure 4 in the rejection of claim 6) 

CLAIM 8

    PNG
    media_image6.png
    213
    689
    media_image6.png
    Greyscale
In regards to Claim 8, Filho teaches the method of Claim 7. In addition, Filho teaches detection of the one or more persons comprises a bounding box around the detected person, wherein where depth information is available within the bounding box and a central tendency of the depth information is less than a predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight. The Examiner notes Filho’s “mean of depth returns” corresponds to “a central tendency of the depth information”), geo-location of the detected person comprises an azimuth of a spatial parameter of the bounding box at a distance determined by a central tendency of the depth information within the bounding box (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight. The Examiner notes Filho’s “bounding box centroid” corresponds to “a spatial parameter of the bounding box”), 


    PNG
    media_image7.png
    182
    682
    media_image7.png
    Greyscale
and wherein where depth information is not available within the bounding box or the central tendency of the depth information exceeds the predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight.), geo-location of the detected person comprises back-projecting a center of the bottom of the bounding box to a ground plane. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight.)


CLAIM 9

    PNG
    media_image8.png
    171
    685
    media_image8.png
    Greyscale
In regards to Claim 9, Filho teaches the method of Claim 8. In addition, Filho teaches matching the feature vector and the geo-location to the previously detected person comprises using a metric that combines a distance measure between feature vectors and Euclidean distance in ground plane coordinates. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below) 
CLAIM 10
In regards to Claim 10, Filho teaches A system for capturing high resolution images of a scene for analysis of one or more persons, the scene comprising the one or more persons to be analyzed (Filho, Abstract: “we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system”; see figures 1-2 and section 4. Vision Pipeline. Filho teaches a method to detect people in a wide-angle image and then track the detected people with a high resolution narrow-field camera), the system comprising one or more processors in communication with a data storage (Filho, Page 615, section 4. Vision Pipeline: “RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server”. The Examiner notes a computer implies processor, memory and storage device), the system in communication with one or more pre-attentive cameras (Filho, page 614, section 3. Hardware Platform: “Pre-Attentive sensing is provided by four RealSense D455 RGBD 1280×720 pixels cameras …”) and an attentive camera (Filho, Page 614, section 3. Hardware Platform: “Attentive sensing is provided by a Sony Alpha 7C 3840×2160 pixel camera with a Sony E PZ 18-200mm powered zoom lens …”), the one or more processors, using instructions stored on the data storage, are configured to execute:
a pre-attentive module to receive one or more pre-attentive images from the one or more pre-attentive cameras, the one or more pre-attentive images capturing the scene (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see 
    PNG
    media_image1.png
    659
    972
    media_image1.png
    Greyscale
modified figure 3 (a) below)

, to detect one or more persons in the one or more pre-attentive images (Filho, Page 615, section 4. Vision Pipeline: “… from the four pre-attentive RGBD cameras. RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server. Person detection is performed on RGB images using the YOLOv4 object detector”; See figure 4), to determine a feature vector and a geo-location for at least one of the detected persons in the one or more pre-attentive images, the geo-location comprising an azimuthal location (Filho, Page 615, section 4. Vision Pipeline: “For each detected person, the corresponding image crop is extracted, and a feature vector encoding their appearance is computed by a wide residual network [32]. The detection is also geo-located … we geo-locate the person at the azimuth of the bounding box centroid at a distance determined by the mean of the depth returns within the bounding box”), and to match the feature vector and geo-location to a previously detected person for tracking of such detected person (Filho, Page 615, section 4. Vision Pipeline: “After being computed, geo-located detections (composed of a feature vector and ground plane coordinates) are submitted to a tracker. Detections are paired to existing tracks through the Hungarian algorithm [17], based on a metric that combines appearance similarity (defined as the cosine distance between feature vectors) and Euclidean distance in ground plane coordinate”), and where there is no match, initializing a tracking of a new person; (Filho, Page 615, section 4. Vision Pipeline: “…Any detection not assigned to an existing track is used to initialize a new track…”)

    PNG
    media_image2.png
    223
    691
    media_image2.png
    Greyscale
an attentive module to receive an attentive image that captures the tracked person (Filho, Page 614, section 3. Hardware Platform: “Attentive sensing is provided by a Sony Alpha 7C 3840×2160 pixel camera with a Sony E PZ 18-200mm powered zoom lens …”; Page 615, section 4. Vision Pipeline: “3. A ROS bridge node for capturing attentive still frames”; see modified figure 3 (b) below) by directing a gaze of the attentive camera at the azimuthal location (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below), 


    PNG
    media_image3.png
    786
    962
    media_image3.png
    Greyscale
the attentive image comprising a smaller field-of-view (FoV) than the one or more pre-attentive images (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing … With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV … Attentive sensing … yielding an 8 deg horizontal FoV”. Filho teaches a pre-attentive camera system that outputs a panoramic FOV and an attentive camera system that outputs an 8-degree FOV); 

and an output module to output the attentive image. (Filho, Page 615, left col, last paragraph: “The servo motor and Sony camera are controlled by an onboard computer through USB connections. Pre-attentive and attentive frames are streamed over a dedicated Wi-Fi link to an external server”. The Examiner interprets “outputting an image” as generating/producing/exporting image data, or sending image data from one device to another)

CLAIM 16
 A device for capturing high resolution images of a scene for analysis of one or more persons, the scene comprising the one or more persons to be analyzed (Filho, Abstract: “we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system”; see figures 1-2 and section 4. Vision Pipeline. Filho teaches a system to detect people in a wide-angle image and then track the detected people with a high resolution narrow-field camera), the device comprising:

    PNG
    media_image1.png
    659
    972
    media_image1.png
    Greyscale
one or more pre-attentive cameras positioned to capture one or more pre-attentive images that combined provide a panoramic view of the scene (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see modified figure 3 (a) below);

a mirror mounted on a controllable structure to direct a gaze of the mirror to a specified area of the panoramic view, the specified area comprising the person to be analyzed (Filho, page 615, left col: “Attentive gaze control is accomplished by a mirror mounted at a 45 deg angle on a servo motor coaxial with the attentive optic axis. Rotation of the motor shifts the azimuth of the attentive FoV, providing attentive resolution in any direction of interest identified from the pre-attentive stream.”; page 615, section 4: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”);
an attentive camera directed towards the mirror to capture an attentive image comprising the directed gaze of the mirror of the scene (Filho, Page 614, section 3. Hardware Platform: “Attentive sensing is provided by a Sony Alpha 7C 3840×2160 pixel camera with a Sony E PZ 18-200mm powered zoom lens …”; see FIG. 1), the attentive image comprising a smaller field-of-view (FoV) than the combination of the one or more pre-attentive images. (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing … With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV … Attentive sensing … yielding an 8 deg horizontal FoV”. Filho teaches a pre-attentive camera system that outputs a panoramic FOV and an attentive camera system that outputs an 8-degree FOV)

CLAIM 17
In regards to Claim 17, Filho teaches the device of Claim 16. In addition, Filho teaches the mirror is positioned at an oblique angle relative to horizontal (Filho, page 615, left col: “Attentive gaze control is accomplished by a mirror mounted at a 45 deg angle on a servo motor coaxial with the attentive optic axis”; see Fig. 1 below), and wherein the attentive camera is directed towards the mirror and positioned above or below the mirror. (See Fig. 1 below, the attentive camera is mounted below the mirror, with camera’s lens directed 
    PNG
    media_image9.png
    470
    715
    media_image9.png
    Greyscale
toward the mirror)

CLAIM 18
In regards to Claim 18, Filho teaches the device of Claim 17. In addition, Filho teaches the panoramic view comprises a 360-degree panoramic view (Filho, page 614, section 3: “Pre-Attentive sensing is provided by four RealSense D455 RGBD 1280×720 pixels cameras mounted horizontally at 90 deg intervals. With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV”. The Examiner notes a setup of four cameras with 90 degree HOV, mounted perpendicular to each other (see FIG. 1 above), capture a 360 degree view image) and wherein the controllable structure comprises a motor that provides 360-degree rotation of the mirror to permit 360-degree azimuthal fixations over the panoramic view. (Filho, page 615, left col: “Attentive gaze control is accomplished by a mirror mounted at a 45 deg angle on a servo motor coaxial with the attentive optic axis. Rotation of the motor shifts the azimuth of the attentive FoV, providing attentive resolution in any direction of interest identified from the pre-attentive stream”)

CLAIM 20

    PNG
    media_image2.png
    223
    691
    media_image2.png
    Greyscale
In regards to Claim 20, Filho teaches the device of Claim 16. In addition, Filho teaches the one or more pre-attentive images are used to detect one or more persons in the one or more pre-attentive images and used to determine a feature vector and a geo-location for at least one of the detected persons in the pre-attentive image (Filho, Page 615, section 4. Vision Pipeline: “For each detected person, the corresponding image crop is extracted, and a feature vector encoding their appearance is computed by a wide residual network [32]. The detection is also geo-located … we geo-locate the person at the azimuth of the bounding box centroid at a distance determined by the mean of the depth returns within the bounding box”), the geo-location comprising an azimuthal location, wherein the motor directs the gaze of the mirror to the azimuthal location, and wherein the attentive image captured by the attentive camera comprises an image of the azimuthal location. (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 2-3 and 11-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Filho in view of Hoch et al. (US-20210136292-A1, hereinafter Hoch).
CLAIM 2

    PNG
    media_image10.png
    437
    689
    media_image10.png
    Greyscale
In regards to Claim 2, Filho teaches the method of claim 1. In addition, Filho teaches identifying the detected person comprises performing facial analysis (Filho, page 616, right col, see reconstructed text below), 


    PNG
    media_image2.png
    223
    691
    media_image2.png
    Greyscale
and wherein receiving the attentive image that captures the detected person by directing gaze at the azimuthal location is performed (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below)

Filho does not explicitly disclose when a face of the detected person is capturable within the field of view of the attentive image.
Hoch is in the same field of art of multi-camera facial recognition system. Further, Hoch teaches when a face of the detected person is capturable within the field of view of the attentive image. (Hoch, ¶ [0080]: “the second sensor is capturing an image of the object at a quality level above a threshold … the hardware of the second sensor enables obtaining the quality level above the threshold when at least a zoom of 20X is used and the object is entirely within the image”. Hoch teaches a image sensor with narrow FOV, and a method to capture image of an object only when the object is entirely within the FOV of the image)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Filho by incorporating the conditional property of image sensor that is taught by Hoch, to make an image sensor that only capture images of certain quality; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to ensure quality of captured facial image is sufficient for identification process (Hoch, ¶ [0056]: “ The resolution of the second sensor may be sufficiently high for the target application, but the object may be moving too fast for the capability of the second sensor to capture a clear and/or high quality image…An insufficiently resolution and/or insufficient zoom may result in inability to perform the target application, for example, inability to correctly identify who the person is”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 3
In regards to Claim 3, the combination of Filho and Hoch teaches the method of Claim 2. In addition, the combination of Filho and Hoch teaches when the face of the detected person is not capturable within the field of view of the attentive image, the method further comprising extracting a cropped portion of the pre-attentive image associated with a facial region of the detected person. (See 1st claim interpretation. See rejection of claim 2, the combination of Filho and Hoch teaches the first path) 

CLAIM 11
In regards to Claim 11, Filho teaches the system of claim 10. In addition, Filho teaches identifying the detected person comprises performing facial analysis (Filho, page 616, right col, see reconstructed text below), 

    PNG
    media_image10.png
    437
    689
    media_image10.png
    Greyscale


    PNG
    media_image2.png
    223
    691
    media_image2.png
    Greyscale
and wherein receiving the attentive image that captures the detected person by directing gaze at the azimuthal location is performed (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below)

Filho does not explicitly disclose when a face of the detected person is capturable within the field of view of the attentive image.
Hoch is in the same field of art of multi-camera facial recognition system. Further, Hoch teaches when a face of the detected person is capturable within the field of view of the attentive image. (Hoch, ¶ [0080]: “the second sensor is capturing an image of the object at a quality level above a threshold … the hardware of the second sensor enables obtaining the quality level above the threshold when at least a zoom of 20X is used and the object is entirely within the image”. Hoch teaches a image sensor with narrow FOV, and a method to capture image of an object only when the object is entirely within the FOV of the image)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Filho by incorporating the conditional property of image sensor that is taught by Hoch, to make an image sensor that only capture images of certain quality; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to ensure quality of captured facial image is sufficient for identification process (Hoch, ¶ [0056]: “ The resolution of the second sensor may be sufficiently high for the target application, but the object may be moving too fast for the capability of the second sensor to capture a clear and/or high quality image…An insufficiently resolution and/or insufficient zoom may result in inability to perform the target application, for example, inability to correctly identify who the person is”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 12
In regards to Claim 12, the combination of Filho and Hoch teaches the system of Claim 11. In addition, the combination of Filho and Hoch teaches when the face of the detected person is not capturable within the field of view of the attentive image, the method further comprising extracting a cropped portion of the pre-attentive image associated with a facial region of the detected person. (See 2st claim interpretation. See rejection of claim 11, the combination of Filho and Hoch teaches the first path. Filho, page 615, section 4, see reconstructed text below, and Figure 11. Filho discloses his system supports the function 
    PNG
    media_image11.png
    259
    1071
    media_image11.png
    Greyscale
extracting image crop of facial region)
 

CLAIM 14
In regards to Claim 14, the combination of Filho and Hoch teaches the system of Claim 12. In addition, the combination of Filho and Hoch teaches the tracking uses ground-plane coordinates. (Filho, page 615, section 4. Vision Pipeline: “1. A multi-camera RGBD tracking pipeline providing pre-attentive panoramic human tracking in ground-plane coordinates;” See modified figure 4 in the rejection of claim 6), and wherein detection of the one or more persons comprises a bounding box around the detected person, wherein where depth information is available within the bounding box and a central tendency of the depth information is less than a predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight. The Examiner notes Filho’s “mean of depth returns” corresponds to “a central tendency of the depth information”), geo-location of the detected person comprises an azimuth of a spatial parameter of the bounding box at a distance determined by a central tendency of the depth information within the bounding box (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight. The Examiner notes Filho’s “bounding box centroid” corresponds to “a spatial parameter of the 
    PNG
    media_image6.png
    213
    689
    media_image6.png
    Greyscale
bounding box”), 


    PNG
    media_image7.png
    182
    682
    media_image7.png
    Greyscale
and wherein where depth information is not available within the bounding box or the central tendency of the depth information exceeds the predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight.), geo-location of the detected person comprises back-projecting a center of the bottom of the bounding box to a ground plane. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight.)




CLAIM 13
In regards to Claim 13, the combination of Filho and Hoch teaches the system of Claim 11. In addition, the combination of Filho and Hoch teaches focusing is performed approximately in parallel with directing the gaze of the attentive camera at the azimuthal location. (Filho, page 620, section 9, fourth paragraph: “Updating to an SLR camera that allows more fully-programmable focus would also allow us to focus the camera as we rotate the mirror, based upon the estimated distance to the target face…”)

CLAIM 15

    PNG
    media_image8.png
    171
    685
    media_image8.png
    Greyscale
In regards to Claim 15, the combination of Filho and Hoch teaches the system of Claim 13. In addition, the combination of Filho and Hoch teaches matching the feature vector and the geo-location to the previously detected person comprises using a metric that combines a distance measure between feature vectors and Euclidean distance in ground plane coordinates. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below) 





CLAIM 19
Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Filho in view of Li et al. (Li, Ruijin, et al. "High-Precision Automatic Calibration Modeling of Point Light Source Tracking Systems." Sensors 21.7, published 2021, hereinafter Li).
In regards to Claim 19, Filho teaches the device of Claim 17.
Filho does not explicitly disclose the controllable structure further comprises a second motor to control an elevation of the gaze of the mirror.

    PNG
    media_image12.png
    259
    1165
    media_image12.png
    Greyscale
Li is in the same field of art of mirror-based tracking system. Further, Li teaches the controllable structure further comprises a second motor to control an elevation of the gaze of the mirror. (Li, page 3, section 2.1, see annotated text below. Li teaches a system with two motors, one to rotate the mirror around azimuth axis, one to rotate the mirror around the pitch axis)

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Filho by incorporating a second motor to control the mirror that is taught by Li, to make a system that can rotate the mirror in both axis; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to incorporate a mechanism to control the gaze elevation (Filho, page 620, section 9 Future work: “Another limitation of our system is that only the azimuth of gaze is controlled; the gaze elevation is fixed to horizontal. Incorporating even a small degree of gaze elevation control would allow the system to continue to function for small children as well as adults who are sitting or lying down.”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


Pertinent Arts
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Trajcevski et al. (Trajcevski, Aleksander, et al. "Sensorimotor System Design of Socially Intelligent Robots." ICPR Workshops (4), https://openreview.net/forum?id=7JuE0ncHMH, published 12/31/2021) teaches a prior version of the system in Filho.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NHUT HUY (JEREMY) PHAM whose telephone number is (703)756-5797. The examiner can normally be reached Mo - Fr. 8:30am - 6pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O'Neal Mistry can be reached on (313)446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NHUT HUY PHAM/Examiner, Art Unit 2674                                                                                                                                                                                                        

/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674
Read full office action
Prosecution Timeline

Jan 03, 2024
Application Filed
Dec 31, 2025
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/925,903
Patent 12598397
DIRT DETECTION METHOD AND DEVICE FOR CAMERA COVER
2y 5m to grant Granted Apr 07, 2026
17/990,310
Patent 12598074
FACIAL RECOGNITION METHOD AND APPARATUS, DEVICE, AND MEDIUM
2y 5m to grant Granted Apr 07, 2026
17/992,917
Patent 12597254
TRACKING OPERATING ROOM PHASE FROM CAPTURED VIDEO OF THE OPERATING ROOM
2y 5m to grant Granted Apr 07, 2026
18/125,767
Patent 12592087
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/973,627
Patent 12579622
METHOD AND APPARATUS FOR PROCESSING IMAGE SIGNAL, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+26.8%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEM, METHOD, AND DEVICE FOR CAPTURING HIGH RESOLUTION IMAGES OVER A PANORAMIC SCENE IN NEAR AND FAR FIELDS FOR IMAGING OF PERSONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in for Full Analysis