Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
The United States Patent & Trademark Office appreciates the application that is submitted by the inventor/assignee. The United States Patent & Trademark Office reviewed the following application and has made the following comments below.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/06/2024 is considered and attached.
Claim Status
Claims 1, 4-10, 16-18 and 20 are rejected under 35 USC § 102 in view of Filho.
Claims 2-3 and 12-15 are rejected under 35 USC § 103 in view of Filho in view of Hoch.
1st Claim Interpretation (Contingent Limitation)
Under MPEP 2111.04, Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure. The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.
For example, assume a method claim requires step A if a first condition happens and step B if a second condition happens. If the claimed invention may be practiced without either the first or second condition happening, then neither step A or B is required by the broadest reasonable interpretation of the claim.
In this case, Claim 3 recites “when” then listing two paths/methods “receiving the attentive image … when a face of the detected person is capturable” (first path) and “when the face of the detected person is not capturable … extracting a cropped portion of the pre-attentive image” (second path). Considering the limitation “outputting the attentive image” from claim 1, the first path is required, making the second path optional. While citations have been provided for completeness and rapid prosecution, only first path is required. Applicant’s comments and/or amendments relating to this issue are invited to clarify the claim language and the prosecution history.
2nd Claim Interpretation (Contingent Limitation)
Under MPEP 2111.04, Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure. However, "[t]he broadest reasonable interpretation of a system claim having structure that performs a function, which only needs to occur if a condition precedent is met, still requires structure for performing the function should the condition occur." Ex parte Schulhauser, Appeal 2013-007847 (PTAB April 28, 2016).
In this case, Claims 12 are System/Apparatus claims reciting “when” then listing two paths/methods “receiving the attentive image … when a face of the detected person is capturable” (first path) and “when the face of the detected person is not capturable … extracting a cropped portion of the pre-attentive image” (second path). Considering the limitation “outputting the attentive image” from claim 1, the first path is required, making the second path optional. While citations have been provided for completeness and rapid prosecution, the prior art must only teach the structure that performs the function of the contingent step along with the other recited claim limitations. Applicant’s comments and/or amendments relating to this issue are invited to clarify the claim language and the prosecution history.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 4-9 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by Filho et al. (Perroni Filho, Hélio, et al. "Attentive Sensing for Long-Range Face Recognition." WACV (Workshops), https://openreview.net/forum?id=UuUrJJTBlG, published 12/31/2022, hereinafter Filho)
CLAIM 1
In regards to Claim 1, Filho teaches a computer-implemented method (Filho, Page 615, section 4. Vision Pipeline: “RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server”) for capturing high resolution images of a scene for analysis of one or more persons, the scene comprising the one or more persons to be analyzed (Filho, Abstract: “we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system”; see figures 1-2 and section 4. Vision Pipeline. Filho teaches a method to detect people in a wide-angle image and then track the detected people with a high resolution narrow-field camera), the method comprising:
PNG
media_image1.png
659
972
media_image1.png
Greyscale
receiving one or more pre-attentive images, the one or more pre-attentive images capturing the scene; (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see modified figure 3 (a) below)
detecting one or more persons in the one or more pre-attentive images; (Filho, Page 615, section 4. Vision Pipeline: “… from the four pre-attentive RGBD cameras. RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server. Person detection is performed on RGB images using the YOLOv4 object detector”; See figure 4)
determining a feature vector and a geo-location for at least one of the detected persons in the pre-attentive image, the geo-location comprising an azimuthal location; (Filho, Page 615, section 4. Vision Pipeline: “For each detected person, the corresponding image crop is extracted, and a feature vector encoding their appearance is computed by a wide residual network [32]. The detection is also geo-located … we geo-locate the person at the azimuth of the bounding box centroid at a distance determined by the mean of the depth returns within the bounding box”)
matching the feature vector and geo-location to a previously detected person for tracking of such detected person (Filho, Page 615, section 4. Vision Pipeline: “After being computed, geo-located detections (composed of a feature vector and ground plane coordinates) are submitted to a tracker. Detections are paired to existing tracks through the Hungarian algorithm [17], based on a metric that combines appearance similarity (defined as the cosine distance between feature vectors) and Euclidean distance in ground plane coordinate”), and where there is no match, initializing a tracking of a new person; (Filho, Page 615, section 4. Vision Pipeline: “…Any detection not assigned to an existing track is used to initialize a new track…”)
receiving an attentive image that captures the tracked person (Filho, Page 615, section 4. Vision Pipeline: “3. A ROS bridge node for capturing attentive still frames”; see modified figure 3 (b) below) by directing gaze at the azimuthal location (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a
PNG
media_image2.png
223
691
media_image2.png
Greyscale
person of interest”; Page 616, see reconstructed text below),
PNG
media_image3.png
786
962
media_image3.png
Greyscale
the attentive image comprising a smaller field-of-view (FoV) than the one or more pre-attentive images; (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing … With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV … Attentive sensing … yielding an 8 deg horizontal FoV”. Filho teaches a pre-attentive camera system that outputs a panoramic FOV and an attentive camera system that outputs an 8-degree FOV)
and outputting the attentive image. (Filho, Page 615, left col, last paragraph: “Pre-attentive and attentive frames are streamed over a dedicated Wi-Fi link to an external server that runs person detection, face detection and face recognition algorithms”. The Examiner interprets “outputting an image” as generating/producing/exporting image data, or sending image data from one device to another)
CLAIM 4
In regards to Claim 4, Filho teaches the method of Claim 1. In addition, Filho teaches the one or more pre-attentive images comprise a panoramic view of the scene. (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see modified figure 3 (a) in the rejection of claim 1)
CLAIM 5
PNG
media_image4.png
588
924
media_image4.png
Greyscale
In regards to Claim 5, Filho teaches the method of Claim 1. In addition, Filho teaches recognizing the detected person by matching a vector associated with the detected person to vectors in a data store (Filho, page 616, right col: “Our face recognition module is built on deepFace [25] … We provide access to a gallery of reference face images associated with unique IDs”, see reconstructed text below, see step 2-4), and outputting a positive recognition where the vector is matched to the data store (Filho, page 616, right col, see reconstructed text below, step 5), and outputting a negative recognition otherwise. (Filho, page 616, right col, see reconstructed text below, step 6)
CLAIM 6
PNG
media_image5.png
1070
1394
media_image5.png
Greyscale
In regards to Claim 6, Filho teaches the method of Claim 1. In addition, Filho teaches the one or more pre-attentive images comprises depth information (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing is provided by four RealSense D455 RGBD 1280×720 pixels cameras”; Page 615, section 4. Vision Pipeline: “…from the four pre-attentive RGBD cameras. RGB and depth images are acquired”), and wherein receiving the attentive image further comprises pre-focusing using a focus determined from the depth information. (Filho, page 616, left col, see reconstructed text and figure 4 below; Page 620, section 9: “Updating to an SLR camera that allows more fully-programmable focus would also allow us to focus the camera as we rotate the mirror”)
CLAIM 7
In regards to Claim 7, Filho teaches the method of Claim 6. In addition, Filho teaches the tracking uses ground-plane coordinates. (Filho, page 615, section 4. Vision Pipeline: “1. A multi-camera RGBD tracking pipeline providing pre-attentive panoramic human tracking in ground-plane coordinates;” See modified figure 4 in the rejection of claim 6)
CLAIM 8
PNG
media_image6.png
213
689
media_image6.png
Greyscale
In regards to Claim 8, Filho teaches the method of Claim 7. In addition, Filho teaches detection of the one or more persons comprises a bounding box around the detected person, wherein where depth information is available within the bounding box and a central tendency of the depth information is less than a predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight. The Examiner notes Filho’s “mean of depth returns” corresponds to “a central tendency of the depth information”), geo-location of the detected person comprises an azimuth of a spatial parameter of the bounding box at a distance determined by a central tendency of the depth information within the bounding box (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight. The Examiner notes Filho’s “bounding box centroid” corresponds to “a spatial parameter of the bounding box”),
PNG
media_image7.png
182
682
media_image7.png
Greyscale
and wherein where depth information is not available within the bounding box or the central tendency of the depth information exceeds the predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight.), geo-location of the detected person comprises back-projecting a center of the bottom of the bounding box to a ground plane. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight.)
CLAIM 9
PNG
media_image8.png
171
685
media_image8.png
Greyscale
In regards to Claim 9, Filho teaches the method of Claim 8. In addition, Filho teaches matching the feature vector and the geo-location to the previously detected person comprises using a metric that combines a distance measure between feature vectors and Euclidean distance in ground plane coordinates. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below)
CLAIM 10
In regards to Claim 10, Filho teaches A system for capturing high resolution images of a scene for analysis of one or more persons, the scene comprising the one or more persons to be analyzed (Filho, Abstract: “we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system”; see figures 1-2 and section 4. Vision Pipeline. Filho teaches a method to detect people in a wide-angle image and then track the detected people with a high resolution narrow-field camera), the system comprising one or more processors in communication with a data storage (Filho, Page 615, section 4. Vision Pipeline: “RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server”. The Examiner notes a computer implies processor, memory and storage device), the system in communication with one or more pre-attentive cameras (Filho, page 614, section 3. Hardware Platform: “Pre-Attentive sensing is provided by four RealSense D455 RGBD 1280×720 pixels cameras …”) and an attentive camera (Filho, Page 614, section 3. Hardware Platform: “Attentive sensing is provided by a Sony Alpha 7C 3840×2160 pixel camera with a Sony E PZ 18-200mm powered zoom lens …”), the one or more processors, using instructions stored on the data storage, are configured to execute:
a pre-attentive module to receive one or more pre-attentive images from the one or more pre-attentive cameras, the one or more pre-attentive images capturing the scene (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see
PNG
media_image1.png
659
972
media_image1.png
Greyscale
modified figure 3 (a) below)
, to detect one or more persons in the one or more pre-attentive images (Filho, Page 615, section 4. Vision Pipeline: “… from the four pre-attentive RGBD cameras. RGB and depth images are acquired by a set of bridge nodes running on the onboard computer, then relayed to the external server. Person detection is performed on RGB images using the YOLOv4 object detector”; See figure 4), to determine a feature vector and a geo-location for at least one of the detected persons in the one or more pre-attentive images, the geo-location comprising an azimuthal location (Filho, Page 615, section 4. Vision Pipeline: “For each detected person, the corresponding image crop is extracted, and a feature vector encoding their appearance is computed by a wide residual network [32]. The detection is also geo-located … we geo-locate the person at the azimuth of the bounding box centroid at a distance determined by the mean of the depth returns within the bounding box”), and to match the feature vector and geo-location to a previously detected person for tracking of such detected person (Filho, Page 615, section 4. Vision Pipeline: “After being computed, geo-located detections (composed of a feature vector and ground plane coordinates) are submitted to a tracker. Detections are paired to existing tracks through the Hungarian algorithm [17], based on a metric that combines appearance similarity (defined as the cosine distance between feature vectors) and Euclidean distance in ground plane coordinate”), and where there is no match, initializing a tracking of a new person; (Filho, Page 615, section 4. Vision Pipeline: “…Any detection not assigned to an existing track is used to initialize a new track…”)
PNG
media_image2.png
223
691
media_image2.png
Greyscale
an attentive module to receive an attentive image that captures the tracked person (Filho, Page 614, section 3. Hardware Platform: “Attentive sensing is provided by a Sony Alpha 7C 3840×2160 pixel camera with a Sony E PZ 18-200mm powered zoom lens …”; Page 615, section 4. Vision Pipeline: “3. A ROS bridge node for capturing attentive still frames”; see modified figure 3 (b) below) by directing a gaze of the attentive camera at the azimuthal location (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below),
PNG
media_image3.png
786
962
media_image3.png
Greyscale
the attentive image comprising a smaller field-of-view (FoV) than the one or more pre-attentive images (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing … With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV … Attentive sensing … yielding an 8 deg horizontal FoV”. Filho teaches a pre-attentive camera system that outputs a panoramic FOV and an attentive camera system that outputs an 8-degree FOV);
and an output module to output the attentive image. (Filho, Page 615, left col, last paragraph: “The servo motor and Sony camera are controlled by an onboard computer through USB connections. Pre-attentive and attentive frames are streamed over a dedicated Wi-Fi link to an external server”. The Examiner interprets “outputting an image” as generating/producing/exporting image data, or sending image data from one device to another)
CLAIM 16
A device for capturing high resolution images of a scene for analysis of one or more persons, the scene comprising the one or more persons to be analyzed (Filho, Abstract: “we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system”; see figures 1-2 and section 4. Vision Pipeline. Filho teaches a system to detect people in a wide-angle image and then track the detected people with a high resolution narrow-field camera), the device comprising:
PNG
media_image1.png
659
972
media_image1.png
Greyscale
one or more pre-attentive cameras positioned to capture one or more pre-attentive images that combined provide a panoramic view of the scene (Filho, page 614, section Pre-Attentive sensing: “we are currently running the RealSense cameras at 640×360 resolution, yielding a pre-attentive panoramic resolution of 2560×360 pixels”; see modified figure 3 (a) below);
a mirror mounted on a controllable structure to direct a gaze of the mirror to a specified area of the panoramic view, the specified area comprising the person to be analyzed (Filho, page 615, left col: “Attentive gaze control is accomplished by a mirror mounted at a 45 deg angle on a servo motor coaxial with the attentive optic axis. Rotation of the motor shifts the azimuth of the attentive FoV, providing attentive resolution in any direction of interest identified from the pre-attentive stream.”; page 615, section 4: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”);
an attentive camera directed towards the mirror to capture an attentive image comprising the directed gaze of the mirror of the scene (Filho, Page 614, section 3. Hardware Platform: “Attentive sensing is provided by a Sony Alpha 7C 3840×2160 pixel camera with a Sony E PZ 18-200mm powered zoom lens …”; see FIG. 1), the attentive image comprising a smaller field-of-view (FoV) than the combination of the one or more pre-attentive images. (Filho, Page 614, section 3. Hardware Platform: “Pre-Attentive sensing … With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV … Attentive sensing … yielding an 8 deg horizontal FoV”. Filho teaches a pre-attentive camera system that outputs a panoramic FOV and an attentive camera system that outputs an 8-degree FOV)
CLAIM 17
In regards to Claim 17, Filho teaches the device of Claim 16. In addition, Filho teaches the mirror is positioned at an oblique angle relative to horizontal (Filho, page 615, left col: “Attentive gaze control is accomplished by a mirror mounted at a 45 deg angle on a servo motor coaxial with the attentive optic axis”; see Fig. 1 below), and wherein the attentive camera is directed towards the mirror and positioned above or below the mirror. (See Fig. 1 below, the attentive camera is mounted below the mirror, with camera’s lens directed
PNG
media_image9.png
470
715
media_image9.png
Greyscale
toward the mirror)
CLAIM 18
In regards to Claim 18, Filho teaches the device of Claim 17. In addition, Filho teaches the panoramic view comprises a 360-degree panoramic view (Filho, page 614, section 3: “Pre-Attentive sensing is provided by four RealSense D455 RGBD 1280×720 pixels cameras mounted horizontally at 90 deg intervals. With nearly 90° horizontal FoV they collectively provide a panoramic pre-attentive FoV”. The Examiner notes a setup of four cameras with 90 degree HOV, mounted perpendicular to each other (see FIG. 1 above), capture a 360 degree view image) and wherein the controllable structure comprises a motor that provides 360-degree rotation of the mirror to permit 360-degree azimuthal fixations over the panoramic view. (Filho, page 615, left col: “Attentive gaze control is accomplished by a mirror mounted at a 45 deg angle on a servo motor coaxial with the attentive optic axis. Rotation of the motor shifts the azimuth of the attentive FoV, providing attentive resolution in any direction of interest identified from the pre-attentive stream”)
CLAIM 20
PNG
media_image2.png
223
691
media_image2.png
Greyscale
In regards to Claim 20, Filho teaches the device of Claim 16. In addition, Filho teaches the one or more pre-attentive images are used to detect one or more persons in the one or more pre-attentive images and used to determine a feature vector and a geo-location for at least one of the detected persons in the pre-attentive image (Filho, Page 615, section 4. Vision Pipeline: “For each detected person, the corresponding image crop is extracted, and a feature vector encoding their appearance is computed by a wide residual network [32]. The detection is also geo-located … we geo-locate the person at the azimuth of the bounding box centroid at a distance determined by the mean of the depth returns within the bounding box”), the geo-location comprising an azimuthal location, wherein the motor directs the gaze of the mirror to the azimuthal location, and wherein the attentive image captured by the attentive camera comprises an image of the azimuthal location. (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 2-3 and 11-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Filho in view of Hoch et al. (US-20210136292-A1, hereinafter Hoch).
CLAIM 2
PNG
media_image10.png
437
689
media_image10.png
Greyscale
In regards to Claim 2, Filho teaches the method of claim 1. In addition, Filho teaches identifying the detected person comprises performing facial analysis (Filho, page 616, right col, see reconstructed text below),
PNG
media_image2.png
223
691
media_image2.png
Greyscale
and wherein receiving the attentive image that captures the detected person by directing gaze at the azimuthal location is performed (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below)
Filho does not explicitly disclose when a face of the detected person is capturable within the field of view of the attentive image.
Hoch is in the same field of art of multi-camera facial recognition system. Further, Hoch teaches when a face of the detected person is capturable within the field of view of the attentive image. (Hoch, ¶ [0080]: “the second sensor is capturing an image of the object at a quality level above a threshold … the hardware of the second sensor enables obtaining the quality level above the threshold when at least a zoom of 20X is used and the object is entirely within the image”. Hoch teaches a image sensor with narrow FOV, and a method to capture image of an object only when the object is entirely within the FOV of the image)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Filho by incorporating the conditional property of image sensor that is taught by Hoch, to make an image sensor that only capture images of certain quality; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to ensure quality of captured facial image is sufficient for identification process (Hoch, ¶ [0056]: “ The resolution of the second sensor may be sufficiently high for the target application, but the object may be moving too fast for the capability of the second sensor to capture a clear and/or high quality image…An insufficiently resolution and/or insufficient zoom may result in inability to perform the target application, for example, inability to correctly identify who the person is”).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
CLAIM 3
In regards to Claim 3, the combination of Filho and Hoch teaches the method of Claim 2. In addition, the combination of Filho and Hoch teaches when the face of the detected person is not capturable within the field of view of the attentive image, the method further comprising extracting a cropped portion of the pre-attentive image associated with a facial region of the detected person. (See 1st claim interpretation. See rejection of claim 2, the combination of Filho and Hoch teaches the first path)
CLAIM 11
In regards to Claim 11, Filho teaches the system of claim 10. In addition, Filho teaches identifying the detected person comprises performing facial analysis (Filho, page 616, right col, see reconstructed text below),
PNG
media_image10.png
437
689
media_image10.png
Greyscale
PNG
media_image2.png
223
691
media_image2.png
Greyscale
and wherein receiving the attentive image that captures the detected person by directing gaze at the azimuthal location is performed (Filho, Page 615, section 4. Vision Pipeline: “2. A servo controller that rotates the mirror to deflect attentive gaze to a person of interest”; Page 616, see reconstructed text below)
Filho does not explicitly disclose when a face of the detected person is capturable within the field of view of the attentive image.
Hoch is in the same field of art of multi-camera facial recognition system. Further, Hoch teaches when a face of the detected person is capturable within the field of view of the attentive image. (Hoch, ¶ [0080]: “the second sensor is capturing an image of the object at a quality level above a threshold … the hardware of the second sensor enables obtaining the quality level above the threshold when at least a zoom of 20X is used and the object is entirely within the image”. Hoch teaches a image sensor with narrow FOV, and a method to capture image of an object only when the object is entirely within the FOV of the image)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Filho by incorporating the conditional property of image sensor that is taught by Hoch, to make an image sensor that only capture images of certain quality; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to ensure quality of captured facial image is sufficient for identification process (Hoch, ¶ [0056]: “ The resolution of the second sensor may be sufficiently high for the target application, but the object may be moving too fast for the capability of the second sensor to capture a clear and/or high quality image…An insufficiently resolution and/or insufficient zoom may result in inability to perform the target application, for example, inability to correctly identify who the person is”).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
CLAIM 12
In regards to Claim 12, the combination of Filho and Hoch teaches the system of Claim 11. In addition, the combination of Filho and Hoch teaches when the face of the detected person is not capturable within the field of view of the attentive image, the method further comprising extracting a cropped portion of the pre-attentive image associated with a facial region of the detected person. (See 2st claim interpretation. See rejection of claim 11, the combination of Filho and Hoch teaches the first path. Filho, page 615, section 4, see reconstructed text below, and Figure 11. Filho discloses his system supports the function
PNG
media_image11.png
259
1071
media_image11.png
Greyscale
extracting image crop of facial region)
CLAIM 14
In regards to Claim 14, the combination of Filho and Hoch teaches the system of Claim 12. In addition, the combination of Filho and Hoch teaches the tracking uses ground-plane coordinates. (Filho, page 615, section 4. Vision Pipeline: “1. A multi-camera RGBD tracking pipeline providing pre-attentive panoramic human tracking in ground-plane coordinates;” See modified figure 4 in the rejection of claim 6), and wherein detection of the one or more persons comprises a bounding box around the detected person, wherein where depth information is available within the bounding box and a central tendency of the depth information is less than a predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight. The Examiner notes Filho’s “mean of depth returns” corresponds to “a central tendency of the depth information”), geo-location of the detected person comprises an azimuth of a spatial parameter of the bounding box at a distance determined by a central tendency of the depth information within the bounding box (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight. The Examiner notes Filho’s “bounding box centroid” corresponds to “a spatial parameter of the
PNG
media_image6.png
213
689
media_image6.png
Greyscale
bounding box”),
PNG
media_image7.png
182
682
media_image7.png
Greyscale
and wherein where depth information is not available within the bounding box or the central tendency of the depth information exceeds the predetermined distance (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, red highlight.), geo-location of the detected person comprises back-projecting a center of the bottom of the bounding box to a ground plane. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below, blue highlight.)
CLAIM 13
In regards to Claim 13, the combination of Filho and Hoch teaches the system of Claim 11. In addition, the combination of Filho and Hoch teaches focusing is performed approximately in parallel with directing the gaze of the attentive camera at the azimuthal location. (Filho, page 620, section 9, fourth paragraph: “Updating to an SLR camera that allows more fully-programmable focus would also allow us to focus the camera as we rotate the mirror, based upon the estimated distance to the target face…”)
CLAIM 15
PNG
media_image8.png
171
685
media_image8.png
Greyscale
In regards to Claim 15, the combination of Filho and Hoch teaches the system of Claim 13. In addition, the combination of Filho and Hoch teaches matching the feature vector and the geo-location to the previously detected person comprises using a metric that combines a distance measure between feature vectors and Euclidean distance in ground plane coordinates. (Filho, page 615, section 4. Vision Pipeline, see reconstructed text below)
CLAIM 19
Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Filho in view of Li et al. (Li, Ruijin, et al. "High-Precision Automatic Calibration Modeling of Point Light Source Tracking Systems." Sensors 21.7, published 2021, hereinafter Li).
In regards to Claim 19, Filho teaches the device of Claim 17.
Filho does not explicitly disclose the controllable structure further comprises a second motor to control an elevation of the gaze of the mirror.
PNG
media_image12.png
259
1165
media_image12.png
Greyscale
Li is in the same field of art of mirror-based tracking system. Further, Li teaches the controllable structure further comprises a second motor to control an elevation of the gaze of the mirror. (Li, page 3, section 2.1, see annotated text below. Li teaches a system with two motors, one to rotate the mirror around azimuth axis, one to rotate the mirror around the pitch axis)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Filho by incorporating a second motor to control the mirror that is taught by Li, to make a system that can rotate the mirror in both axis; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to incorporate a mechanism to control the gaze elevation (Filho, page 620, section 9 Future work: “Another limitation of our system is that only the azimuth of gaze is controlled; the gaze elevation is fixed to horizontal. Incorporating even a small degree of gaze elevation control would allow the system to continue to function for small children as well as adults who are sitting or lying down.”).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Pertinent Arts
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Trajcevski et al. (Trajcevski, Aleksander, et al. "Sensorimotor System Design of Socially Intelligent Robots." ICPR Workshops (4), https://openreview.net/forum?id=7JuE0ncHMH, published 12/31/2021) teaches a prior version of the system in Filho.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NHUT HUY (JEREMY) PHAM whose telephone number is (703)756-5797. The examiner can normally be reached Mo - Fr. 8:30am - 6pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O'Neal Mistry can be reached on (313)446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NHUT HUY PHAM/Examiner, Art Unit 2674
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674