DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 5 January 2026 has been entered.
Response to Arguments
Applicant’s arguments with respect to the claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim(s) 1-6, 9-18, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over VanBlon (US 2015/0235641 A1) in view of Border (US 2008/0218612 A1), and further in view of Krupka (US 2019/0341055 A1).
Claim 1, VanBlon teaches a method for processing a voice query (see Fig. 3), comprising:
instructing a user device to activate a camera functionality of a camera (non-audible input, e.g., visual cues captured by a camera; paragraph 0036) in response to detecting a voice query (non-audible sensor such as a camera is operated while providing voice input; paragraph 0018);
causing the camera to capture, in multiple modes (different policies that require further non-audible input; paragraph 0040-0041), a series of frames of the environment from where the voice query is originating (camera captures movements of the user; paragraph 0018);
classifying a portion of the voice query as an ambiguous portion (ambiguity may be identified in interpreting voice input; paragraph 0043 and step 303 of Fig. 3);
transmitting a request for a supplemental data related to the voice query (when ambiguity is detected, non-audible input is accessed; step 304; Fig. 3), wherein the supplemental data relates to the portion of the voice query that was classified as ambiguous and wherein the supplemental data is based on the identified subject within the series of captured frames (non-audible input may be lip or mouth movements; paragraph 0044, 0018); and
resolving the ambiguous portion based on processing the supplemental data (the voice input is re-interpreted based on the non-audible input; see step 305 of Fig. 3),
but VanBlon is silent regarding wherein the method causes the camera to capture simultaneously, in a first mode and a second mode, a series of frames of an environment from where the voice query is originating, wherein:
the first mode comprises of at least one of a first view, a first lens, or a first zoom level, and
the second mode comprises of at least one of a second view, a second lens, or a second zoom level and wherein:
the first view is different from the second view, the first lens is different from the second lens, and the first zoom level is different from the second zoom level.
Border teaches a method for causing a camera to capture simultaneously (a first image stage and second image stage are used to simultaneously capture images; paragraph 0077), in a first mode and a second mode (first and second still images at different focus distances; paragraph 0077), a series of frames of an environment (first and second images of the scene are captured; paragraph 0069 and Fig. 3) , wherein:
the first mode comprises of at least one of a first view, a first lens, or a first zoom level (first image at a first focus distance using a first zoom lens; paragraph 0077), and
the second mode comprises of at least one of a second view, a second lens, or a second zoom level (second image at a second focus distance using a second zoom lens; paragraph 0077) and wherein:
the first view is different from the second view, the first lens is different from the second lens, and the first zoom level is different from the second zoom level (first and second focus distance, respectively; paragraph 0077).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have used the teaching of Border with that of VanBlon in order to provide improved and extended rangefinding capability (see paragraph 0025 of Border).
VanBlon in view of Border is silent regarding identifying a subject within the environment based on a comparison between the voice query and a voice profile associated with the subject, wherein the subject is a source of the voice query.
Krupka teaches wherein a subject is identified within an environment (determining an identity of a human speaker; paragraph 0070) based on a comparison between a voice query and a voice profile associated with the subject (audio samples representing utterances form the human speaker are matched with a voiceprint; paragraph 0070), wherein the subject is a source of the voice query (identity of the human speaker is established; paragraph 0070 and Fig. 14).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have used the teaching of Krupka with that of the cited prior art in order to provide a training procedure capable of continually improving the function of a face locating system (see paragraph 0030 of Krupka).
Claim 2, VanBlon further teaches wherein:
the request for the supplemental data comprises a request for a specific feature extracted from at least one of the captured frames during an utterance of the voice query (non-audible input may be lip or mouth movements; paragraph 0044).
Claim 3, VanBlon further teaches: determining a location of the subject with respect to the environment (input is detected based on device environment; paragraph 0041).
Claim 4, VanBlon further teaches:
the supplemental data comprises metadata associated with a gesture performed by the subject during utterance of the voice query (non-audible input may be a gesture of the user; paragraph 0018).
Claim 5, VanBlon further teaches wherein the gesture comprises at least one of:
a hand movement, a facial expression, and a head movement (see paragraph 0018).
Claim 6, Border further teaches wherein each of the first zoom level and the second zoom level comprises at least one of a zoom-in level or a zoom-out level (“first focus distance” and “second focus distance;” paragraph 0077).
Claim 7, Krupka further teaches a camera for identifying a subject, based on an attribute of the subject, from a plurality of subjects in the environment (lip/mouth movement detection is used to determine the human speaker; paragraph 0068).
Claim 8, Krupka further teaches, wherein the attribute of the subject comprises at least one of a physical quality of the subject, or the location of the subject (lip/mouth movement detection is used to determine the human speaker; paragraph 0068).
Claim 9, Border further teaches wherein each of the first lens and the second lens comprises at least one of a standard lens, a wide-angle lens, a fish-eye lens or a telephoto lens (“first zoom lens;” paragraph 0077).
Claim 10, Border further teaches selecting at least one of the first mode and the second mode based on the requested specific feature (processor selects either sensor output from the first sensor or second sensor based on user selection; paragraph 0129, 0135 and Fig. 19).
Claim 11, Border further teaches wherein each of the first view and the second view comprises at least one of a wide-angle view, a fish-eye view, a telephoto view, or a first person gaze (“first zoom lens” and “second zoom lens;” paragraph 0077).
Claim 12, Border further teaches wherein the first mode corresponds to a first camera and the second mode corresponds to a second camera (“first image sensor” and “second image sensor;” paragraph 0077).
Claims 13-19 and 21 are analyzed and rejected as a system comprising control circuitry to perform the method of claims 1-7 and 9, respectively.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892 attached.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHIAWEI A CHEN whose telephone number is (571)270-1707. The examiner can normally be reached Mon-Fri 12:00pm - 9:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sinh Tran can be reached on (571)272-7564. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit:
https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHIAWEI CHEN/Primary Examiner, Art Unit 2637