Last updated: May 29, 2026
Application No. 18/332,012
VEHICLE INTERFACE CONTROL

Non-Final OA §103
Filed
Jun 09, 2023
Examiner
KAZEMINEZHAD, FARZAD
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Ford Global Technologies LLC
OA Round
2 (Non-Final)
Interview Optional

— +67.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 71% grant rate with +67.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 536 resolved cases, 2023–2026
Examiner Intelligence

KAZEMINEZHAD, FARZAD View full profile →
Grants 71% — above average
Career Allowance Rate
380 granted / 536 resolved
+8.9% vs TC avg
Strong +68% interview lift
Without
With
+67.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
19 currently pending
Career history
560
Total Applications
across all art units
Statute-Specific Performance

§101
1.4%
-38.6% vs TC avg
§103
64.5%
+24.5% vs TC avg
§102
7.5%
-32.5% vs TC avg
§112
3.5%
-36.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 536 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 8/13/2025, the applicant has submitted an amendment, filed 10/21/2025, amending claims 1, 3, 5, 7, 9, 17, and 20, cancelling claims 2 and 4, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but are moot with respect to new grounds of rejections further in view of Van Wiemeersch et al. (US 2020/0018976) and Hatten et al. (US 2012/0190324) mandated by the latest amendments and for the reasons explained in the response to arguments.
Response to Arguments
Page 8 the first two paragraphs provide a broad overview of the latest amendments and the last office action.
The arguments pertaining to claim 1 (pages 8-9) are directed to the latest amendments. 
Please visit the new office action for further details.
As regards to claim 7, it is argued: “Van Wiemeersch shows the speakers 118 mounted to an underside of a dashboard …Claim 7 is thus allowable”.
Nowhere in Van Wieeersch it recites “speakers” being “underside” (or any place else associated with) the “dashboard”. The figure shows them to be adjacent with the seats. Nonetheless that is merely a design choice and has no impact on the operations of the claim limitations.
Regarding claim 19, it is argued that in Bromand “the results of the noise cancellation are not played by the speaker”.
Respectfully if the “noise” was completely “cancelled”, there would have been nothing left to “play[]”. So if it is required to play the so-called “cancellation audio”, it is something that is NOT completely cancelled and Bromand’s “reduc[ed]” level “ambient audio data” that is “record[ed]” (to be played back) maps to that.
Claims 21 and 22 are new claims.
Please visit the new office action for further details.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 5-7, 9, 14-15, 17, 20-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Van Wiemeersch et al. (US 2020/0018976), and further in view of SCHARTNER ANDREAS (CN 113157080).

Regarding claim 1, Van Wiemeersch et al. do teach a system for a vehicle (“vehicle” in the Title, Abstract, Fig. 1)
comprising:
a plurality of microphone focused on respective designated locations of a plurality of designated locations with respect to the vehicle (¶ 0034 sentence 3 referring to Fig. 1: “The microphones 120” (a first microphone of a plurality of microphones) “capture the speech 302 of the passenger 112” (focused on “passenger” (a first person) “position” (a first designated location (¶ 0051 line 8)) with respect to the vehicle shown in Fig. 1); ¶ 0032 sentence 2: “the microphones 120” (a second microphone of the plurality of microphones) “capture the speech 302 of the operator 110” (focused on the “operator” (a second person in the driver’s (second designated) location shown ));
a plurality of cameras with respective fields of view encompassing respective designated locations of the plurality of designated locations (¶ 0051 lines 6+: “the camera 122” (a first camera of a plurality of cameras (i.e., the one close to the “passenger” “112”)) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”; ¶ 0025 lines 12+: “(4) gesture-detection of the operator 110” (focused on the second designated location is) “based upon image(s) captured by the cameras 122” (a second camera (i.e., the one directed at the “operator” on the upper right of Fig. 1)) of the plurality of cameras with field of view encompassing the “operator” (the second designated location));
a plurality of output devices directed to respective designated locations of the plurality of designated locations with respect to the vehicle (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in the second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (from a first output device (e.g., the speaker closest to the operator on the lower right of the Fig. 1) among a plurality of output devices) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”; ¶ 0022 sentence 3: “the speakers 118” (a second “speaker” (output device) of the plurality of speakers directed to the) “emit audio (e.g., instructions, directions, entertainment, and/or other information) to the operator 110 and/or the passenger 112” (“passenger” (the first designated location))); and
a computer communicatively coupled to the microphones, the cameras, and the output devices (¶ 0055 sentence 1: “The vehicle data bus 516” (a computer) “communicatively couples” (communicatively coupled to) “the speakers 118” (the output devices) “the microphones 120” (the microphones) “the on-board computing platform 502, the displays 506, the communication module 506, the communication module 508, the cameras 510” (and the cameras));
the computer being programmed to:
receive a selection of a first designated location and a second designated location from the plurality of designated locations (¶ 0041 sentence 2: “For example, the HUD controller 128 enables the passenger 112 to select” (receive a selection of) “a POI” (a first and/or a second designated location) “for which information is presented via the POI interface”; “POI”= “point of interest” (¶ 0038 sentence 1)),
a first microphone of the plurality of microphones being focused on the first designated location (¶ 0034 sentence 3 referring to Fig. 1: “The microphones 120” (a first microphone of a plurality of microphones (e.g., the one on the upper right of Fig. 1)) “capture” (is selected to) “the speech 302 of the passenger 112” (focus on “passenger” (a first person) “position” (a first designated location (¶ 0051 line 8)) with respect to the vehicle shown in Fig. 1 from among the “position” of the “passenger” (first designated location) and “operator” (the second designated location))),
a first camera of the plurality of cameras having a field of view encompassing the first designated location (¶ 0051 lines 6+: “the camera 122” (a first camera of a plurality of cameras (i.e., the one close to the “passenger” “112”)) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”);
a first output device of the plurality of output devices being directed to the second designated location (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in the second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (receives audio from a first output device (e.g., the speaker closest to the operator on the lower left of the Fig. 1) among a plurality of output devices) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”);
generate first text in a first language based on input audio data from the first microphone 
translate the first text to second text in a second language (¶ 0035 sentence 1: “the HUD controller 128 translates” (translate) “the” “text” (the first text) “of the passenger” (of the first person) “which is in English, to the preferred language of the operator 110, which is Spanish” (into a second language)); and
instruct the first output device to output the second text (¶ 0035 last sentence: “the HUD controller 128 audibly presents the translated speech” (output the second text) “of the passenger 112 to the operator 110 via the speakers 118” (using e.g., the first  output device)).
Van Wiemeersch et al. do not specifically disclose:
Generate first text in a first language based on input audio data from the video data from the first camera.
ANDREAS does teach:
Generate first text in a first language based on input audio data from the video data from the first camera (Abstract lines 2+: “receiving an image of a vehicle passenger” “extracting lip movement” (using “video image” (video data) of a “vehicle” “occupant” (e.g. first person) obtained from a “camera” (¶ n0009 lines 1+)) “recognizing the lip motion” “as written text” (to generate text in first language); ¶ n0006: “extracting lip movements” “from the image”; ¶ n0009: “a camera for acquiring images of a vehicle occupant”; i.e., “images” (which comprise of “lip” “movements”) are obtained from “camera” (e.g. a first camera)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate “lip reading” “recognition algorithm” for “speech recognition” of ANDREAS (¶ n0020) into the “speech into text” of Van Wiemeersch et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further help “improve reliability of” “recognition” as disclosed in ANDREAS ¶ n0020 line 3.

Regarding claim 5, Van Wiemeersch et al. do teach the system of claim 1, wherein the first designated location is in a passenger compartment of the vehicle in a seat of the vehicle (¶ 0051 lines 6+: “the camera 122” “monitors” “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108” (is in a passenger compartment and seat of the vehicle)), 
the first microphone is positioned to receive audio from a first person sitting in the seat, and the field of view of the first camera encompasses the seat (¶ 0051 lines 6+: “the camera 122” (the first camera) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”(encompassing the passenger seat); ¶ 0034 sentence 3: “The microphones 120” (the first microphone) “capture the speech 302 of the passenger 112” (to receive audio from the first person who happens to be in “passenger seat” (sitting in the seat)).

Regarding claim 6, Van Wiemeersch et al. do teach the system of claim 1, wherein the second designated location is in a passenger compartment of the vehicle (¶ 0024 lines 8-9: “driver’s seat 106” (the second designated location is in a passenger compartment of the vehicle) “for the operator” (for the second person)).

Regarding claim 7, Van Wiemeersch et al. do teach the system of claim 6, wherein the first output device is a speaker, the second designated location is in a seat of the vehicle, and the speaker is mounted to the seat (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in a second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (from e.g. the first output device which is a speaker) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)” (which as Fig. 1 shows is mounted in driver’s seat)).

Regarding claim 9, Van Wiemeersch et al. do teach a computer comprising a processor and a memory, the memory storing instructions executable by the processor (¶ 0056 sentence 2: “The flowchart of FIG. 6 is representative of machine readable instructions that are stored in memory (such as the memory 520 of FIG. 5) and include one or more programs which, when executed by a processor (such as the processor 518 of FIG. 5), cause the vehicle 100 to implement the example HUD controller 128 of FIGS. 1 and 5”)
to:
receive a selection of a first designated location and a second designated location from a plurality of designated locations with respect to a vehicle (¶ 0041 sentence 2: “For example, the HUD controller 128 enables the passenger 112 to select” (receive a selection of) “a POI” (a first and/or a second designated location) “for which information is presented via the POI interface”; “POI”= “point of interest” (¶ 0038 sentence 1) with respect to the vehicle shown in Fig. 1)),
a first microphone of the plurality of microphones being focused on the first designated location (¶ 0034 sentence 3 referring to Fig. 1: “The microphones 120” (a first microphone of a plurality of microphones (e.g., the one on the upper right of Fig. 1)) “capture” (is selected to) “the speech 302 of the passenger 112” (focus on “passenger” (a first person) “position” (a first designated location (¶ 0051 line 8)) with respect to the vehicle shown in Fig. 1 from among the “position” of the “passenger” (first designated location) and “operator” (the second designated location))),
the microphones being focused on the respective designated locations (¶ 0034 sentence 3 referring to Fig. 1: “The microphones 120” (a first microphone of a plurality of microphones) “capture the speech 302 of the passenger 112” (focused on “passenger” (a first person) “position” (a first designated location (¶ 0051 line 8)) with respect to the vehicle shown in Fig. 1); ¶ 0032 sentence 2: “the microphones 120” (a second microphone of the plurality of microphones) “capture the speech 302 of the operator 110” (focused on the “operator” (a second person in the driver’s (second designated) location shown ));
a first camera of the plurality of cameras having a field of view encompassing the first designated location (¶ 0051 lines 6+: “the camera 122” (a first camera of a plurality of cameras (i.e., the one close to the “passenger” “112”)) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”);
the cameras having respective fields of view encompassing the respective designated locations (¶ 0051 lines 6+: “the camera 122” (a first camera of a plurality of cameras (i.e., the one close to the “passenger” “112”)) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”; ¶ 0025 lines 12+: “(4) gesture-detection of the operator 110” (focused on the second designated location is) “based upon image(s) captured by the cameras 122” (a second camera (i.e., the one directed at the “operator” on the upper right of Fig. 1)) of the plurality of cameras with field of view encompassing the “operator” (the second designated location));

a first output device of the plurality of output devices being directed to the second designated location (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in the second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (receives audio from a first output device (e.g., the speaker closest to the operator on the lower left of the Fig. 1) among a plurality of output devices) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”);
the output devices being directed to the respective designated locations (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in the second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (from a first output device (e.g., the speaker closest to the operator on the lower right of the Fig. 1) among a plurality of output devices) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”; ¶ 0022 sentence 3: “the speakers 118” (a second “speaker” (output device) of the plurality of speakers directed to the) “emit audio (e.g., instructions, directions, entertainment, and/or other information) to the operator 110 and/or the passenger 112” (“passenger” (the first designated location))); and
generate first text in a first language based on input audio data from the first microphone 
translate the first text to second text in a second language (¶ 0035 sentence 1: “the HUD controller 128 translates” (translate) “the” “text” (the first text) “of the passenger” (of the first person) “which is in English, to the preferred language of the operator 110, which is Spanish” (into a second language); and
instruct the first output device to output the second text (¶ 0035 last sentence: “the HUD controller 128 audibly presents the translated speech” (output the second text) “of the passenger 112 to the operator 110 via the speakers 118” (using the output device)). 
Van Wiemeersch et al. do not specifically disclose:
Generate first text in a first language based on input audio data from video data from the first camera.
ANDREAS does teach:
Generate first text in a first language based on input audio data from video data from the first camera (Abstract lines 2+: “receiving an image of a vehicle passenger” “extracting lip movement” (using “video image” (video data) of a “vehicle” “occupant” (e.g. first person) obtained from a “camera” (¶ n0009 lines 1+)) “recognizing the lip motion” “as written text” (to generate text in first language); ¶ n0006: “extracting lip movements” “from the image”; ¶ n0009: “a camera for acquiring images of a vehicle occupant”; i.e., “images” (which comprise of “lip” “movements”) are obtained from “camera”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate “lip reading” “recognition algorithm” for “speech recognition” of ANDREAS (¶ n0020) into the “speech into text” of Van Wiemeersch et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further help “improve reliability of” “recognition” as disclosed in ANDREAS ¶ n0020 line 3.

Regarding claim 14, Van Wiemeersch et al. do teach the computer of claim 9, wherein the instructions to generate the first text include instructions to execute a speech-to-text algorithm on the input audio data (¶ 0034 sentence 3: “the microphones 120 capture the speech 302 of the passenger 112 to enable the HUD controller 128” (a speech to text algorithm based module) “to translate the speech into text” (to generate the first text)).

Regarding claim 15, Van Wiemeersch et al. do not specifically disclose the computer of claim 9, wherein the instructions to generate the first text include instructions to execute a lip-reading algorithm on the video data.
ANDREAS does teach the computer of claim 9, wherein the instructions to generate the first text include instructions to execute a lip-reading algorithm on the video data (Abstract lines 3-5: “recognizing the lip motion as a written text though a lip language recognition algorithm” (a lip -reading algorithm investigates “lip motion” (a video data) to do lip motion to text conversion)).
For obviousness to combine Van Wiemeersch et al. and ANDREAS see claim 9.

Regarding claim 17, Van Wiemeersch et al. do teach the computer of claim 9, wherein the first output device is a speaker, and the instructions further include instructions to generate output audio data from the second text, and to instruct the speaker to play the output audio data (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents” (generate and play the output audio data) “the translated speech”(of data associated with the second text) “of the passenger 112 to the operator 110” “via the speakers 118” (from e.g., the first output device which is a speaker) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”).

Regarding claim 20, Van Wiemeersch et al. do teach a method (Title, Abstract)
Comprising:
receive a selection of a first designated location and a second designated location from a plurality of designated locations with respect to a vehicle (¶ 0041 sentence 2: “For example, the HUD controller 128 enables the passenger 112 to select” (receive a selection of) “a POI” (a first and/or a second designated location) “for which information is presented via the POI interface”; “POI”= “point of interest” (¶ 0038 sentence 1) with respect to the vehicle shown in Fig. 1)),
a first microphone of the plurality of microphones being focused on the first designated location (¶ 0034 sentence 3 referring to Fig. 1: “The microphones 120” (a first microphone of a plurality of microphones (e.g., the one on the upper right of Fig. 1)) “capture” (is selected to) “the speech 302 of the passenger 112” (focus on “passenger” (a first person) “position” (a first designated location (¶ 0051 line 8)) with respect to the vehicle shown in Fig. 1 from among the “position” of the “passenger” (first designated location) and “operator” (the second designated location))),
the microphones being focused on the respective designated locations (¶ 0034 sentence 3 referring to Fig. 1: “The microphones 120” (a first microphone of a plurality of microphones) “capture the speech 302 of the passenger 112” (focused on “passenger” (a first person) “position” (a first designated location (¶ 0051 line 8)) with respect to the vehicle shown in Fig. 1); ¶ 0032 sentence 2: “the microphones 120” (a second microphone of the plurality of microphones) “capture the speech 302 of the operator 110” (focused on the “operator” (a second person in the driver’s (second designated) location shown ));
a first camera of the plurality of cameras having a field of view encompassing the first designated location (¶ 0051 lines 6+: “the camera 122” (a first camera of a plurality of cameras (i.e., the one close to the “passenger” “112”)) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”);
the cameras having respective fields of view encompassing the respective designated locations (¶ 0051 lines 6+: “the camera 122” (a first camera of a plurality of cameras (i.e., the one close to the “passenger” “112”)) “monitors” (with a field of view encompassing) “the passenger 112 to enable the HUD controller 128 to detect a position” (the first designated location) “of the passenger 112 relative to the passenger seat 108”; ¶ 0025 lines 12+: “(4) gesture-detection of the operator 110” (focused on the second designated location is) “based upon image(s) captured by the cameras 122” (a second camera (i.e., the one directed at the “operator” on the upper right of Fig. 1)) of the plurality of cameras with field of view encompassing the “operator” (the second designated location));

a first output device of the plurality of output devices directed to the second designated location (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in the second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (receives audio from a first output device (e.g., the speaker closest to the operator on the lower left of the Fig. 1) among a plurality of output devices) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”);
the output devices being directed to the respective designated locations (¶ 0035 last sentence: “Additionally, or alternatively, the HUD controller 128 audibly presents the translated speech of the passenger 112 to the operator 110” (the second person in the second designated position (i.e., “operator”  “seat” (¶ 0024 lines 8-9) in the vehicle receiving audio) “via the speakers 118” (from a first output device (e.g., the speaker closest to the operator on the lower right of the Fig. 1) among a plurality of output devices) “of the vehicle 100 (e.g., when the vehicle 100 is in motion)”; ¶ 0022 sentence 3: “the speakers 118” (a second “speaker” (output device) of the plurality of speakers directed to the) “emit audio (e.g., instructions, directions, entertainment, and/or other information) to the operator 110 and/or the passenger 112” (“passenger” (the first designated location))); 

generating first text in a first language based on input audio data from the first microphone 
translating the first text to second text in a second language (¶ 0035 sentence 1: “the HUD controller 128 translates” (translate) “the” “text” (the first text) “of the passenger” (of the first person) “which is in English, to the preferred language of the operator 110, which is Spanish” (into a second language); and
instructing the first output device to output the second text (¶ 0035 last sentence: “the HUD controller 128 audibly presents the translated speech” (output the second text) “of the passenger 112 to the operator 110 via the speakers 118” (using e.g., the first output device)). 
Van Wiemeersch et al. do not specifically disclose:
Generate first text in a first language based on input audio data from video data from the first camera.
ANDREAS does teach:
Generate first text in a first language based on input audio data from video data from the first camera (Abstract lines 2+: “receiving an image of a vehicle passenger” “extracting lip movement” (using “video image” (video data) of a “vehicle” “occupant” (e.g. first person) obtained from a “camera” (¶ n0009 lines 1+)) “recognizing the lip motion” “as written text” (to generate text in first language); ¶ n0006: “extracting lip movements” “from the image”; ¶ n0009: “a camera for acquiring images of a vehicle occupant”; i.e., “images” (which comprise of “lip” “movements”) are obtained from “camera”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate “lip reading” “recognition algorithm” for “speech recognition” of ANDREAS (¶ n0020) into the “speech into text” of Van Wiemeersch et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further help “improve reliability of” “recognition” as disclosed in ANDREAS ¶ n0020 line 3.

Regarding claim 21, Van Wiemeersch et al. do teach the computer of claim 9, wherein the instructions further include instructions to determine the first language by identifying the first language stored in a profile of a first person in the first designated location (¶ 0057 sentence 4: “ the HUD controller 128 collects information regarding the preferred languages” (determine the language) “of the operator 110 and/or the passenger 112” (of the first person associated with the first designated location) “if the selected mode is the language mode, collects information regarding nearby POI(s) and/or a user profile” (stored in a profile of the first person) “of the passenger 112”).


Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Van Wiemeersch et al. in view of ANDREAS, and further in view of SEBASTIAN et al. (DE 112015006350).

Regarding claim 8, Van Wiemeersch et al. in view of ANDREAS do not specifically disclose do not specifically disclose the system of claim 1, wherein the second designated location is exterior to and adjacent to the vehicle.
SEBASTIAN et al. do teach the system of claim 1, wherein the second designated location is exterior to and adjacent to the vehicle (¶ 0022 lines 4+: “receiving speech information from a speech interface” (a “microphone” in a second designated location (¶ 0012)) “the speech interface being configured to input the speech information from a user outside” (exterior to) “the vehicle” (and adjacent to the vehicle); ¶ 0012: “the voice interface includes at least one microphone”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the functions’ of “vehicle” “speech interface” of SEBASTIAN et al. into the “vehicle” of Van Wiemeersch et al. in Van Wiemeersch et al. in view of ANDREAS would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Van Wiemeersch et al. in view of ANDREAS to “provide more convenience for the user” as disclosed in SEBASTIAN et al. ¶ 0074 last sentence by allowing “the user” to interact from outside his vehicle in addition to those from inside.

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Van Wiemeersch et al. in view of ANDREAS, and SEBASTIAN et al. (DE 112015006350), and further in view of Won et al. (EP 3666579 A1).
Regarding claim 3, Van Wiemeersch et al. in view of ANDREAS do not specifically disclose the system of claim 1, wherein the first designated location is exterior to and adjacent to the vehicle.
SEBASTIAN et al. do teach the system of claim 1, wherein the first designated location is exterior to and adjacent to the vehicle (¶ 0022 lines 4+: “receiving speech information from a speech interface” (a “microphone” in a first designated location (¶ 0012)) “the speech interface being configured to input the speech information from a user outside” (exterior to) “the vehicle” “performing speech recognition on the received speech information” “and outputting the result information” (e.g., “text information” (¶ 0048)) “to the user outside the vehicle through an output device” (and adjacent to the vehicle); ¶ 0012: “the voice interface includes at least one microphone”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the function’s of “vehicle” “speech interface” of SEBASTIAN et al. into the “vehicle” of Van Wiemeersch et al. in Van Wiemeersch et al. in view of ANDREAS would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Van Wiemeersch et al. in view of ANDREAS to “provide more convenience for the user” as disclosed in SEBASTIAN et al. ¶ 0074 last sentence by allowing “the user” to interact from outside his vehicle in addition to those from inside.
Wiemeersch et al. in view of ANDREAS, and further in view of SEBASTIAN et al.  do teach first microphone is mounted to an exterior of the vehicle (SEBASTIAN et al.: ¶ 0022 lines 4+: “receiving speech information from a speech interface” (a “microphone” in a first designated location (¶ 0012)) “the speech interface being configured to input the speech information from a user outside” (is exterior to) “the vehicle” (to the vehicle); ¶ 0012: “the voice interface includes at least one microphone”; ¶ 0042 last sentence: “the voice interface” (i.e., the “microphone”) “may also be located on the outside” (is mounted on) “of the vehicle” (the exterior of the vehicle)).
Wiemeersch et al. in view of ANDREAS, and further in view of SEBASTIAN et al.  do not specifically disclose and the field of view of the camera encompasses an area outside the vehicle.
Won et al. do teach and the field of view of the camera encompasses an area outside the vehicle (“(57)” abstract lines 5+: “a second camera” (a camera) “for capturing an image of the outside” (field of view which encompasses an area outside of) “of the transportation means” (the vehicle)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the function of “second camera” “outside” of vehicle of Won et al. into the “vehicle” of Wiemeersch et al. in Wiemeersch et al. in view of ANDREAS, and further in view of SEBASTIAN et al.  would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable “outside” information be “provided with convenience in driving even when driving a vehicle in an area using a [foreign] language” to the driver as disclosed in Won et al. ¶ 0020 lines 4-6.

Claim(s) 10-13, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wiemeersch et al. in view of ANDREAS, and further in view of CROXFORD et al. (US 2022/0066207).
Regarding claim 10, Van Wiemeersch et al. in view of ANDREAS do teach the computer of claim 9, wherein the instructions to generate the first text include instructions to generate audio-based text based on the input audio data, generate video-based text based on the video data (Van Wiemeersch: ¶ 0034 sentence 3: “The microphone 120” “capture the speech 302” (based on input audio data from the microphone) “of the passenger 112 to enable the HUD controller 128 to translate the speech into text” (generate audio-based text in a first language i.e., “English” (¶ 0035 sentence 1); ANDREAS: Abstract lines 2+: “receiving an image of a vehicle passenger” “extracting lip movement” (using “video image” (video data) of a “vehicle” “occupant” (e.g. first person) obtained from a “camera” (¶ n0009 lines 1+)) “recognizing the lip motion” “as written text” (to generate video based text in first language); ¶ n0006: “extracting lip movements” “from the image”; ¶ n0009: “a camera for acquiring images of a vehicle occupant”; i.e., “images” (which comprise of “lip” “movements”) are obtained from “camera”).
Van Wiemeersch et al. in view of ANDREAS do not specifically disclose:
and combine the audio-based text and the video-based text into the first text.
CROXFORD et al. do teach:
and combine the audio-based text and the video-based text into the first text (¶ 0063 sentence 3: “The text from the speech recognition” (audio based text) “and the text from the lip-reading recognition” (and video based text) “can be combined” (are combined into a first text) “to improve overall accuracy of the XR system 100”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “speech recognition” and “lip-reading recognition” of CROXFORD et al. into the “speech recognition” of Van Wiemeersch et al. and “lip language recognition” algorithm of ANDREAS respectively, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable  “to improve overall accuracy” of the text recognitions as disclosed in CROXFORD et al. ¶ 0063 sentence 3.

Regarding claim 11, Van Wiemeersch et al. in view of ANDREAS do not specifically disclose the computer of claim 10, wherein the instructions further include instructions to generate an audio-based confidence level of the audio-based text and a video-based confidence level of the video-based text, and the instructions to combine the audio-based text and the video- based text into the first text include instructions to combine the audio-based text and the video- based text into the first text based on the audio-based confidence level and the video-based confidence level.
CROXFORD et al. do teach the computer of claim 10, wherein the instructions further include instructions to generate an audio-based confidence level of the audio-based text and a video-based confidence level of the video-based text, and the instructions to combine the audio-based text and the video- based text into the first text include instructions to combine the audio-based text and the video- based text into the first text based on the audio-based confidence level and the video-based confidence level (¶ 0063 sentence 3: “The text from the speech recognition” (audio based text) “and the text from the lip-reading recognition” (and video based text) “can be combined” (include instructions to combine) “to improve overall accuracy of the XR system 100. This combination may involve comparing confidence levels” (include confidence levels) “of words detected by the speech recognition” (of audio based text) “program and by the lip-reading program” (as well as confidence levels of video based text)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “speech recognition” and “lip-reading recognition” of CROXFORD et al. into the “speech recognition” of Van Wiemeersch et al. and “lip language recognition” algorithm of ANDREAS respectively, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable  “to improve overall accuracy” of the text recognitions as disclosed in CROXFORD et al. ¶ 0063 sentence 3.

Regarding claim 12, Van Wiemeersch et al. in view of ANDREAS do not specifically disclose the computer of claim 11, wherein the audio-based text includes a sequence of audio-based words, the video-based text includes a sequence of video-based words, the audio- based confidence level includes a sequence of audio-based confidence values for the respective audio-based words, and the video-based confidence level includes a sequence of video-based confidence values for the respective video-based words.
CROXFORD et al. do teach the computer of claim 11, wherein the audio-based text includes a sequence of audio-based words, the video-based text includes a sequence of video-based words, the audio- based confidence level includes a sequence of audio-based confidence values for the respective audio-based words, and the video-based confidence level includes a sequence of video-based confidence values for the respective video-based words (¶ 0063 sentence 3: “The text from the speech recognition” (audio based text) “and the text from the lip-reading recognition” (and video based text) “can be combined” “to improve overall accuracy of the XR system 100. This combination may involve comparing confidence levels” (include confidence levels) “of words” (of a sequence of audio based words) “detected by the speech recognition” (of audio based text) “program and by the lip-reading program” (as well as confidence levels or values of a sequence of respective video based words)).
For obviousness to combine Wiemeersch et al. in view of ANDREAS and CROXFORD et al. see claim 11.

Regarding claim 13, Van Wiemeersch et al. in view of ANDREAS do not specifically disclose the computer of claim 12, wherein the instructions to combine the audio-based text and the video-based text into the first text includes instructions to select a word for the first text from either the audio-based words or the video-based words according to which of the respective audio-based confidence value or video-based confidence value is greater.
CROXFORD et al. do teach the computer of claim 12, wherein the instructions to combine the audio-based text and the video-based text into the first text includes instructions to select a word for the first text from either the audio-based words or the video-based words according to which of the respective audio-based confidence value or video-based confidence value is greater (¶ 0063 lines 9+: “This combination may involve comparing confidence levels of words detected by the speech recognition program and by the lip-reading program and, in a case where the words detected don't match” (instructions used) “picking” (for selecting) “the word” (a word) “with the higher confidence level” (whose confidence value is “higher” (greater)).
For obviousness to combine Wiemeersch et al. in view of ANDREAS and CROXFORD et al. see claim 11.

Regarding claim 16, Wiemeersch et al. in view of ANDREAS do teach the computer of claim 15, wherein the lip-reading algorithm outputs video-based text, and the instructions to generate the first text include instructions to execute a speech-to-text algorithm on the input audio data to output audio-based text (ANDREAS: Abstract lines 2+: “receiving an image of a vehicle passenger” “extracting lip movement” (using “video image” (video data) of a “vehicle” “occupant” (e.g. first person) obtained from a “camera” (¶ n0009 lines 1+)) “recognizing the lip motion” “as written text” (to generate video based text in first language); ¶ n0006: “extracting lip movements” “from the image”; ¶ n0009: “a camera for acquiring images of a vehicle occupant”; i.e., “images” (which comprise of “lip” “movements”) are obtained from “camera” ; Van Wiemeersch: ¶ 0034 sentence 3: “The microphone 120” “capture the speech 302” (based on input audio data from the microphone) “of the passenger 112 to enable the HUD controller 128 to translate the speech into text” (generate audio based text in a first language i.e., “English” (¶ 0035 sentence 1).
 Van Wiemeersch et al. in view of ANDREAS do not specifically disclose:
and combine the audio-based text and the video-based text into the first text.
CROXFORD et al. do teach:
 and combine the audio-based text and the video-based text into the first text (¶ 0063 sentence 3: “The text from the speech recognition” (audio based text) “and the text from the lip-reading recognition” (and video based text) “can be combined” (are combined into a first text) “to improve overall accuracy of the XR system 100”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “speech recognition” and “lip-reading recognition” of CROXFORD et al. into the “speech recognition” of Van Wiemeersch et al.  and “lip language recognition” algorithm of ANDREAS respectively, would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable  “to improve overall accuracy” of the text recognitions as disclosed in CROXFORD et al. ¶ 0063 sentence 3.

Claim(s) 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Van Wiemeersch et al. in view of ANDREAS, and further in view of Bromand et al. (EP3 644315 A1).
Regarding claim 18, Van Wiemeersch et al. in view of ANDREAS  do not specifically disclose the computer of claim 17, wherein the instructions further include instructions to generate cancellation audio data based on the input audio data, and instruct the speaker to play the cancellation audio data.
Bromand et al. do teach the computer of claim 17, wherein the instructions further include instructions to generate cancellation audio data based on the input audio data, and instruct the speaker to play the cancellation audio data (¶ 0153 sentence 1: “At operation 406, audio cancellation is performed on the sound system” (generate cancellation audio) “while the sound system 102 is in operation” (based on an input audio data which results in finally playing audio left after the cancellation or cancellation audio data)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “Audio Cancellation Engine” of Bromand et al. into the “speech recognition” of  Van Wiemeersch et al. in Van Wiemeersch et al. in view of ANDREAS would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Van Wiemeersch et al. in view of ANDREAS to “cancel undesired audio data” as disclosed in Bromand et al. ¶ 0153 line 4 in doing its speech recognition.

Regarding claim 19, Van Wiemeersch et al. in view of ANDREAS  do not specifically disclose the computer of claim 18, wherein the instructions further include instructions to instruct the speaker to play the output audio data and the cancellation audio data simultaneously.
Bromand et al. do teach the computer of claim 18, wherein the instructions further include instructions to instruct the speaker to play the output audio data and the cancellation audio data simultaneously (¶ 0002 lines 21-25: “In order for the voice enabled device to understand the voice command from the user, it is desirable to accurately cancel or reduce from the recording the ambient audio data” (after the audio cancellation operation, the “voice command” (cancellation audio data) is played simultaneously along with the “reduc[ed]” “ambient audio data” (output audio data)).
For obviousness to combine Van Wiemeersch et al. in view of ANDREAS and Bromand et al. see claim 18.

Claim(s) 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Van Wiemeersch et al. in view of ANDREAS, and further in view of Hatton et al. (US 2012/0190324).
Regarding claim 22, Van Wiemeersch et al. do not specifically disclose the computer of claim 9, wherein the instructions further include instructions to determine the second language based on a location of the vehicle with respect to a geofenced area.
Hatton et al. do teach the computer of claim 9, wherein the instructions further include instructions to determine the second language based on a location of the vehicle with respect to a geofenced area (¶ 0016: “The illustrative method further includes determining if vehicle GPS coordinates” (using a geofenced data in a vehicle) “are available for use in a language determination” (to determine e.g. a second language associated with an instant vehicle location) “and using at least one of: an MCC code, GPS coordinates, or the non-availability of both the MCC code and the GPS coordinates to determine an appropriate language for an emergency message”).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “language determination” of the “vehicle” based on “GPS” “coordinates” of Hatten et al. into the vehicle of Van Wiemeersch et al. in Van Wiemeersch et al. in view of ANDREAS would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Van Wimeersch et al.  to enable its vehicle to “determine [ambient] language boundaries” as disclosed in Hatton et al. Abstract line 6.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D Shah can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farzad Kazeminezhad/
Art Unit 2653
November 29th 2025.
Read full office action
Prosecution Timeline

Jun 09, 2023
Application Filed
Aug 13, 2025
Non-Final Rejection mailed — §103
Oct 08, 2025
Interview Requested
Oct 20, 2025
Examiner Interview Summary
Oct 20, 2025
Applicant Interview (Telephonic)
Oct 21, 2025
Response Filed
Dec 03, 2025
Final Rejection mailed — §103
Jan 27, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/819,177
Patent 12626713
DYNAMIC VOICE NULLFORMER BY DYNAMICALLY ADJUSTING AMOUNTS IN A MIXTURE OF A FIRST AND A SECOND VOICE BEAMFORMER
3y 9m to grant Granted May 12, 2026
17/891,676
Patent 12625672
SERVER AND ELECTRONIC DEVICE FOR PROCESSING USER UTTERANCE AND OPERATING METHOD THEREOF BY SELECTING AMONG A PLURALITY OF ELECTRONIC DEVICES ONE DEVICE BASED ON A SUM OF SCORES
3y 8m to grant Granted May 12, 2026
18/962,575
Patent 12626071
GENERATING SUMMARIES OF TEXTS USING LARGE LANGUAGE MODELS
1y 5m to grant Granted May 12, 2026
18/490,047
Patent 12608565
MULTIMODAL TEXT-TO-TEXT NEURAL MACHINE TRANSLATION USING NOISE AND DOMAIN ADAPTERS AND TRAINING NOISE ADAPTERS WHILE A DOMAIN ADAPTER IS FROZEN
2y 6m to grant Granted Apr 21, 2026
18/319,946
Patent 12603080
GAZE-BASED AND AUGMENTED AUTOMATIC INTERPRETATION METHOD AND SYSTEM
2y 11m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
71%
Grant Probability
99%
With Interview (+67.7%)
3y 6m (~6m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 536 resolved cases by this examiner. Grant probability derived from career allowance rate.