Last updated: May 29, 2026
Application No. 18/723,638
METHOD AND APPARATUS FOR SYSTEM COMMAND VERIFICATION

Non-Final OA §103
Filed
Jun 24, 2024
Priority
Dec 23, 2021 — provisional 63/293,184 +1 more
Examiner
JONES, HEATHER RAE
Art Unit
2481
Tech Center
2400 — Computer Networks
Assignee
Stoneridge Electronics AB
OA Round
2 (Non-Final)
Interview Optional

— +5.6% interview lift. Interview lift (+5.6%) is below the 15.0% threshold. A written response is recommended.
Based on 750 resolved cases, 2023–2026
Examiner Intelligence

JONES, HEATHER RAE View full profile →
Grants 68% — above average
Career Allowance Rate
514 granted / 750 resolved
+10.5% vs TC avg
Moderate +6% lift
Without
With
+5.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
15 currently pending
Career history
775
Total Applications
across all art units
Statute-Specific Performance

§101
0.4%
-39.6% vs TC avg
§103
80.7%
+40.7% vs TC avg
§102
13.5%
-26.5% vs TC avg
§112
0.1%
-39.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 750 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claims 20-43 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 20-31, 33-40, and 42 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (U.S. Patent Application Publication 2014/0214424) in view of Lee et al. (U.S. Patent Application Publication 2020/0241824) in view of Kaja et al. (U.S. Patent Application Publication 2020/0135190).
Regarding claim 20, Wang et al. discloses a commercial vehicle (Fig. 1 – vehicle 108) comprising: a vehicle cab including a camera system configured to provide a video feed of at least one field of view, the at least one field of view including at least one of a vehicle operator seat and an expected head position of a vehicle operator in the vehicle operator seat (Fig. 1 – imaging device 104; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0024] – imaging device 104 might include a camera sensor mounted at the rearview mirror position – in such an example, a rearview mirror mounted camera sensor may be able to capture the view of all occupants in the vehicle); at least one microphone configured to record an audio feed of sounds within the vehicle cab (Fig. 1 – microphone device 106; paragraph [0025] – microphone device 106 may be configured to capture audio data from one or more occupants 110); and a controller (Fig. 1 – IVI system 100) configured to: receive the recorded sounds and the video feed (Fig. 2; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108; paragraph [0029] – process 200 may begin at block 202, “RECEIVE AUDIO DATA”, where audio data may be received – for example, the received audio data may include spoken input from one or more occupants of a vehicle; paragraph [0030] – processing may continue from operation 202 to operation 204, “RECEIVE VISUAL DATA”, where visual data may be received – for example, the received visual data may include video of the one or more occupants of the vehicle); identify, based on the received recorded sounds, at least one voice command, wherein the at least one voice is from a source (Fig. 2; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data); use the video feed to determine whether a vehicle operator is the source of the at least one voice command (Figs. 2 and 3; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined); and based on the determination indicating that the vehicle operator is the source, implement the voice command (Figs. 2 and 3; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined (the driver is one of the occupants of the vehicle); paragraph [0056] – when a face is detected and recognized, control system 308 might adapt any response to a command to adjust the response based at least in part on the identity of the recognized occupant). However, Wang et al. fails to disclose a plurality of displays within the vehicle cab; a controller configured to: identify at least one voice command to adjust what is shown on one of the plurality of displays; determine whether the vehicle operator is gazing at one of the plurality of displays; based on the determination, and indicating that the vehicle operator is gazing at a particular one of the plurality of displays, implement the voice command to adjust what is shown on the particular one of the plurality of displays; and based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command.
Referring to the Lee et al. reference, Lee et al. discloses a vehicle comprising: a plurality of displays within the vehicle cab (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content); a controller configured to: identify at least one voice command to adjust what is shown on one of the plurality of displays (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content); determine whether the vehicle operator is gazing at one of the plurality of displays (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content); and based on the determination, and indicating that the vehicle operator is gazing at a particular one of the plurality of displays, implement the voice command to adjust what is shown on the particular one of the plurality of displays (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had based on the determination, and indicating that the vehicle operator is gazing at a particular one of the plurality of displays, implement the voice command to adjust what is shown on the particular one of the plurality of displays as disclosed by Lee et al. in the vehicle disclosed by Wang et al. in order to reduce driver distractions.  However, Wang et al. in view of Lee et al. fails to disclose based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command.
Referring to the Kaja et al. reference, Kaja et al. discloses a vehicle comprising: use the video feed to determine whether a vehicle operator is the source of the at least one voice command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command); based on the determination indicating that the vehicle operator is the source, implement the voice command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command); and based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had determined if the vehicle occupant was an authorized user for such voice commands as disclosed by Kaja et al. in the vehicle disclosed by Wang et al. in view of Lee et al. in order to eliminate security and safety concerns of the vehicle occupants.
Regarding claim 21, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 20 including that wherein the controller is configured to: detect at least one of a physical action and a physical positioning of the vehicle operator; and base the determination of whether the vehicle operator is the source on whether the at least one voice command corresponds to said at least one of a physical action and a physical positioning (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; Kaja et al.: paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command).  
Regarding claim 22, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 20 and 21 including that wherein the determination is based on the physical action, and the physical action includes at least one of lip movement, a predefined gesture, a change in head position, a change in gaze direction, a posture alteration, and facial movement (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture)).  
Regarding claim 23, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 20 and 21 including that wherein the determination is based on the physical action, the physical action is lip movement, and the controller is configured to determine that the vehicle operator is the source based on lip movement of the vehicle operator corresponding to the at least one voice command (Wang et al.: Figs. 1-4; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraphs [0040] and [0041] – speech recognition module 302 identifies the most likely match for what was said, speech recognition module may return what is recognized as an initial text string; [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion; paragraph [0072] - lip tracking process 400 may include tracking lip contour construction 414 results on motion pictures as lips 402 move - for example, video data image 420 illustrates lip tracking process 400 tracking lip contour construction 414 results as lips 402 close - similarly, video data image 422 illustrates lip tracking process 400 tracking lip contour construction 414 results as lips 402 close - by tracking lip contour construction 414, lip tracking process 400 may be able to tell if a vehicle occupant is speaking or not).  
Regarding claim 24, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 20 and 21 including that wherein the determination of whether the vehicle operator is the source is based on the physical positioning, and the physical positioning includes a posture or gaze direction of the vehicle operator (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture)).  
Regarding claim 25, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 20 and 21 including that wherein the controller is configured to determine which of a plurality of vehicle systems the at least one voice command corresponds to based on the at least one of the physical action and the physical positioning (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture); Kaja et al.: paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command).  
Regarding claim 26, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 20 including that wherein the camera system includes a single camera and wherein the at least one field of view includes each of the vehicle operator seat and at least one vehicle passenger seat (Wang et al.: Fig. 1 – imaging device 104; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0024] – imaging device 104 might include a camera sensor mounted at the rearview mirror position – in such an example, a rearview mirror mounted camera sensor may be able to capture the view of all occupants in the vehicle).  
Regarding claim 27, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 20 including that wherein the camera system includes at least two cameras, a first camera defining a field of view including the vehicle operator and a second camera defining a field of view including at least one vehicle passenger seat (Kaja et al: Fig. 1; paragraph [0016] – the computing platform 104 may further drive or otherwise communicate with one or more interior cameras 118 configured to capture video images of vehicle occupants inside the cabin by way of the video controller 114 – multiple cameras may be used to capture facial images of occupants in different rows).  
Regarding claim 28, Wang et al. discloses a system for a commercial vehicle controller (Fig. 1 - IVI system 100) comprising: at least one audio input configured to connect to an audio sensor and at least one video input (Fig. 1 – imaging device 104 – microphone device 106; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0024] – imaging device 104 might include a camera sensor mounted at the rearview mirror position – in such an example, a rearview mirror mounted camera sensor may be able to capture the view of all occupants in the vehicle; paragraph [0025] – microphone device 106 may be configured to capture audio data from one or more occupants 110); and a processor operatively connected to memory and configured to: identify at least one voice command based on audio received at the audio input (Fig. 2; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data); use a video feed received from the at least one video input to determine whether a vehicle operator is a source of the at least one voice command (Figs. 2 and 3; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined); and based on the determination indicating that the vehicle operator is the source, implement the at least one voice command (Figs. 2 and 3; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined (the driver is one of the occupants of the vehicle); paragraph [0056] – when a face is detected and recognized, control system 308 might adapt any response to a command to adjust the response based at least in part on the identity of the recognized occupant). However, Wang et al. fails to disclose wherein the voice command is a command to adjust what is shown on one of a plurality of displays; determine which one of a plurality of displays within the vehicle cab the vehicle operator is gazing at; based on the determination and indicating that the vehicle operator was gazing at a particular one of the plurality of displays in conjunction with the at least one voice command, implement the at least voice command to adjust what is shown on the particular one of the plurality of displays; and based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command.
Referring to the Lee et al. reference, Lee et al. discloses a system for a vehicle comprising: wherein the voice command is a command to adjust what is shown on one of a plurality of displays (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content); determine which one of a plurality of displays within the vehicle cab the vehicle operator is gazing at (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content); based on the determination and indicating that the vehicle operator was gazing at a particular one of the plurality of displays in conjunction with the at least one voice command, implement the at least voice command to adjust what is shown on the particular one of the plurality of displays (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had based on the determination, and indicating that the vehicle operator is gazing at a particular one of the plurality of displays, implement the voice command to adjust what is shown on the particular one of the plurality of displays as disclosed by Lee et al. in the system disclosed by Wang et al. in order to reduce driver distractions.  However, Wang et al. in view of Lee et al. fails to disclose based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command.
Referring to the Kaja et al. reference, Kaja et al. discloses a system for a vehicle controller comprising: use the video feed to determine whether a vehicle operator is the source of the at least one voice command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command); based on the determination indicating that the vehicle operator is the source, implement the voice command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command); and based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had determined if the vehicle occupant was an authorized user for such voice commands as disclosed by Kaja et al. in the system disclosed by Wang et al. in view of Lee et al. in order to eliminate security and safety concerns of the vehicle occupants.
Regarding claim 29, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 28 including that wherein the controller is configured to base the determination of whether the vehicle operator is the source on whether a physical action of the vehicle operator performed during the at least one voice command corresponds to the at least one voice command (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; Kaja et al.: paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command).  
Regarding claim 30, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 28 and 29 including that wherein the physical action includes at least one of lip movement, a predefined gesture, a change in head position, a change in gaze direction, a posture alteration, and facial movement (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture)).
Regarding claim 31, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 28-30 including that wherein the physical action is the lip movement, and the controller is configured to convert the lip movement to words, and base the determination of whether the vehicle operator is the source based on whether the words match the at least one voice command (Wang et al.: Figs. 1-4; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraphs [0040] and [0041] – speech recognition module 302 identifies the most likely match for what was said, speech recognition module may return what is recognized as an initial text string; [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion; paragraph [0072] - lip tracking process 400 may include tracking lip contour construction 414 results on motion pictures as lips 402 move - for example, video data image 420 illustrates lip tracking process 400 tracking lip contour construction 414 results as lips 402 close - similarly, video data image 422 illustrates lip tracking process 400 tracking lip contour construction 414 results as lips 402 close - by tracking lip contour construction 414, lip tracking process 400 may be able to tell if a vehicle occupant is speaking or not).  
Regarding claim 33, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 28 and 29 including that wherein the physical action includes at least one of a change in gaze direction and a change in posture, and wherein the controller is configured to determine which of a plurality of vehicle systems the at least one voice command corresponds to based on the at least one of the change in gaze direction and the change in posture (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture)).  
Regarding claim 34, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 28 including that wherein: the video feed includes at least one audio input configured to connect to an audio sensor and at least one video input (Wang et al.: Fig. 1 – imaging device 104 – microphone device 106; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0024] – imaging device 104 might include a camera sensor mounted at the rearview mirror position – in such an example, a rearview mirror mounted camera sensor may be able to capture the view of all occupants in the vehicle; paragraph [0025] – microphone device 106 may be configured to capture audio data from one or more occupants 110; Kaja et al: Fig. 1; paragraph [0016] – the computing platform 104 may further drive or otherwise communicate with one or more interior cameras 118 configured to capture video images of vehicle occupants inside the cabin by way of the video controller 114 – multiple cameras may be used to capture facial images of occupants in different rows; paragraph [0017] – the microphone 126 may be provided with direction-detecting features to detect the direction and/or location of the audio input – the microphone 126 may be a single microphone assembly or may include multiple microphone assemblies); the at least one video input includes at least two video inputs; and a first of the at least two video inputs is configured to receive a video feed defining a field of view including an expected position of the vehicle operator (Wang et al.: Fig. 1 – imaging device 104 – microphone device 106; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0024] – imaging device 104 might include a camera sensor mounted at the rearview mirror position – in such an example, a rearview mirror mounted camera sensor may be able to capture the view of all occupants in the vehicle; paragraph [0025] – microphone device 106 may be configured to capture audio data from one or more occupants 110; Kaja et al: Fig. 1; paragraph [0016] – the computing platform 104 may further drive or otherwise communicate with one or more interior cameras 118 configured to capture video images of vehicle occupants inside the cabin by way of the video controller 114 – multiple cameras may be used to capture facial images of occupants in different rows; paragraph [0017] – the microphone 126 may be provided with direction-detecting features to detect the direction and/or location of the audio input – the microphone 126 may be a single microphone assembly or may include multiple microphone assemblies). 
Regarding claim 35, Wang et al. discloses a method for verifying an audio command in a commercial vehicle comprising: receiving an audio feed and at least one video feed at a controller (Fig. 1 – imaging device 104 – microphone device 106; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0024] – imaging device 104 might include a camera sensor mounted at the rearview mirror position – in such an example, a rearview mirror mounted camera sensor may be able to capture the view of all occupants in the vehicle; paragraph [0025] – microphone device 106 may be configured to capture audio data from one or more occupants 110); detecting an audio command within the audio feed using the controller (Fig. 2; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data); using the video feed to determine whether a vehicle operator is a source of the audio command (Figs. 2 and 3; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined); based on the determination indicating that the vehicle operator is the source, implementing the audio command (Figs. 2 and 3; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined (the driver is one of the occupants of the vehicle); paragraph [0056] – when a face is detected and recognized, control system 308 might adapt any response to a command to adjust the response based at least in part on the identity of the recognized occupant).  However, Wang et al. fails to disclose determine whether a vehicle operator of the commercial vehicle is gazing at one of a plurality of displays within the commercial; based on the determination and indicating that the vehicle operator was gazing at a particular one of the plurality of displays in conjunction with the at least one voice command, implementing the audio command to adjust what is shown on the particular one of the plurality of displays; and based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command.
Referring to the Lee et al. reference, Lee et al. discloses a method for audio commands in a vehicle comprising: determine whether a vehicle operator of the commercial vehicle is gazing at one of a plurality of displays within the commercial (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content); and based on the determination and indicating that the vehicle operator was gazing at a particular one of the plurality of displays in conjunction with the at least one voice command, implementing the audio command to adjust what is shown on the particular one of the plurality of displays (Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had based on the determination, and indicating that the vehicle operator is gazing at a particular one of the plurality of displays, implement the voice command to adjust what is shown on the particular one of the plurality of displays as disclosed by Lee et al. in the method disclosed by Wang et al. in order to reduce driver distractions.  However, Wang et al. in view of Lee et al. fails to disclose based on the determination indicating that the vehicle operator is not the source, prevent implementation of the voice command.
Referring to the Kaja et al. reference, Kaja et al. discloses a method for verifying an audio command in a vehicle comprising: use the video feed to determine whether a vehicle operator is the source of an audio command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command); based on the determination indicating that the vehicle operator is the source, implement the audio command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command); and based on the determination indicating that the vehicle operator is not the source, prevent implementation of the audio command (paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had determined if the vehicle occupant was an authorized user for such audio commands as disclosed by Kaja et al. in the method disclosed by Wang et al. in view of Lee et al. in order to eliminate security and safety concerns of the vehicle occupants.
Regarding claim 36, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 35 including that the method further comprises detecting at least one of a physical positioning and a physical action of the vehicle operator; wherein said using the video feed to determine whether a vehicle operator is the source of the at audio command is based on said at least one of a physical positioning and a physical action of the vehicle operator (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; Kaja et al.: paragraph [0024] – the operations of the process 200 may be applied to various situations – for instance, multiple occupants/users may share a ride in the vehicle 102 – among those users, a driver sitting on the driver’s seat may be an authorized user for certain voice commands such as playing messages or load emails, and passenger sitting at the rear-right seat is an unauthorized user for such commands – for instance, when the computing platform 104 detects a voice command such as “play my message”, it is important to determine the source/identity of the user who made such command and verify if the user is authorized to do so before executing it for privacy and security concerns; paragraph [0025] – when multiple users are present in the vehicle cabin, facial recognition alone may be insufficient for identification purposes as it may still unclear which user made the voice command – therefore, knowing the location of the user who made the voice command may be helpful in this situation - for instance, if the “play my message” voice command is made by the unauthorized passenger sitting on the rear-right seat, the computing platform 104 may detects his/her location via the microphone 126 and only perform facial recognition on the image for the rear-right seat passenger - in this case, the computing platform 104 may decline to execute the voice command responsive to the authentication fails, even if the authorized driver for such voice command is also in the image captured via the camera 118 - however, if the command is made by the driver, the authentication will succeed under the same principle and the computing platform 104 may proceed to execute the voice command). 
Regarding claim 37, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 35 and 36 including that wherein: the determination of whether the vehicle operator is the source is based on the physical action; and the physical action comprises lip movement in the video feed (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture)). 
Regarding claim 38, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 35-37 including that wherein said using the video feed to determine whether the vehicle operator is the source comprises: converting the lip movement to words; and basing the determination of whether the vehicle operator is the source on whether the words correspond to the audio command (Wang et al.: Figs. 1-4; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraphs [0040] and [0041] – speech recognition module 302 identifies the most likely match for what was said, speech recognition module may return what is recognized as an initial text string; [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion; paragraph [0072] - lip tracking process 400 may include tracking lip contour construction 414 results on motion pictures as lips 402 move - for example, video data image 420 illustrates lip tracking process 400 tracking lip contour construction 414 results as lips 402 close - similarly, video data image 422 illustrates lip tracking process 400 tracking lip contour construction 414 results as lips 402 close - by tracking lip contour construction 414, lip tracking process 400 may be able to tell if a vehicle occupant is speaking or not). 
Regarding claim 39, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 35 and 36 including that vehicle operator is the source is based on the physical action; and the physical action comprises at least one of a change in gaze direction and a change in posture of the vehicle operator (Wang et al.: Figs. 1-3; paragraph [0022] – imaging device 104 may be configured to capture visual data from one or more occupants 110 of vehicle 108 – for example, imaging device 104 may be configured to capture visual data from a driver 112, a front seat passenger 114, from one or more rear seat passenger 116, the like, and/or combinations thereof; paragraph [0026] – for example, IVI system 100 may receive audio data from microphone device 106 and/or visual data from imaging device 104 from one or more occupants 110 of vehicle 108 – a determination may be made regarding which of the one or more occupants 110 of vehicle 108 to associate with the received audio data based at least in part on the received visual data; paragraph [0031] – processing may continue from operation 204 to operation 206, “DETERMINE WHICH OF THE ONE OR MORE OCCUPANTS OF THE VEHICLE TO ASSOCIATE WITH THE RECEIVED AUDIO DATA”, where which one of the one or more occupants of the vehicle to associate with the received audio data may be determined – for example, which of the one or more occupants of the vehicle to associate with the received audio data may be determined based at least in part on the received visual data; paragraph [0035] – IVI system 100 may include a speech recognition module 302, a face detection module 304, a lip tracking module 306, a control system 308, the like, and/or combinations thereof – as illustrated, speech recognition module 302, face detection module 304, and lip smacking module 306 may be capable of communication with one another and/or communication with control system 308; paragraph [0036] – in addition to acoustic signal processing techniques to recognize what command a driver or passenger is issuing, process 300, also may employ visual information processing techniques such as face detection and lip tracking; paragraph [0053] – processing may continue from operation 320 to operation 322, “DETERMINE WHO IS SPEAKING”, where which of the one or more occupants of the vehicle is speaking may be determined; paragraph [0067] – the challenges in lip localization and tracking exist in several aspects - for example, deformable object models may be complex, some face poses and/or lip shapes may not be well known or well-studied, illumination conditions may be subject to frequent change, backgrounds may be complex and/or may be subject to frequent change, lip movement together with head movement may change position frequently or in an unpredicted manner, and/or other factors, such as self-occlusion (head movements changes one’s posture)).
Regarding claim 40, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 20 including that wherein the command is a command to zoom what is depicted on the particular one of the plurality of displays (Lee et al.: Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content).
Regarding claim 42, Wang et al. in view of Lee et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 35 including that wherein the command is a command to zoom what is depicted on the particular one of the plurality of displays (Lee et al.: Figs. 3A, 3B, and 4; paragraph [0007] – a display system in a vehicle, the system comprising: one or more displays, at least one sensor configured to determine a user’s line of sight, a controller configured to determine an active zone and a non-active zone of the one or more displays based on the user’s line of sight, wherein the one or more displays are configured to operate the active zone at an enhanced level as compared to the non-active zone; paragraph [0008] – the display may be configured to display a single-screen display or a multi-screen display – that is, in a multi-screen display, the single physical screen can include multiple displays that are managed as separate logical displays; paragraph [0032] – a spoken command, such as “zoom in”, may be programmed as a voice command to activate the magnification of the active zone, while maintaining line of sight on the content).
Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. in view of Kaja et al. as applied to claims 28-31 above, and further in view of Bhattacharya et al. (U.S. Patent Application Publication 2021/0347328).
Regarding claim 32, Wang et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claims 28-31, but fails to disclose wherein the controller is configured to convert lip movements in the video feed to words using a neural network based on machine learning training of the controller from a data set of subtitled film footage.
Referring to the Bhattacharya et al. reference, Bhattacharya et al. discloses a vehicle controller comprising: wherein the controller is configured to convert lip movements in the video feed to words using a neural network based on machine learning training of the controller from a data set (paragraph [0025] - the processing circuitry may authenticate the occupant to authorize a vehicular operation command based on at least one of the first sensor data and the second sensor data - in some embodiments, the processing circuitry may implement a machine learning model to authenticate the occupant - the machine learning model may be a neural network that is trained with a data set including various multimodal data for respective occupants to learn specific audio authentication and vehicular command preferences - in some embodiments, the multimodal data for training may be historical multimodal data of the vehicle; paragraph [0032]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had the controller be configured to convert lip movements in the video feed to words using a neural network based on machine learning training of the controller from a data set as disclosed by Bhattacharya et al. in the vehicle controller disclosed by Wang et al. in view of Kaja et al. in order to efficiently determine the voice commands with more accuracy.  However, Wang et al. in view of Kaja et al. in view of Bhattacharya et al. still fail to disclose that the data set used for training the neural network was from a data set of subtitled film footage.  Official Notice is taken that both the concepts and advantages of using a data set of subtitled film footage for training the neural network is well-known in the art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had used a data set of subtitled film footage for training the neural network in the vehicle controller disclosed by Wang et al. in view of Kaja et al. in view of Bhattacharya et al. in order to properly train the neural network with the most relevant data.
Claims 41 and 43 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. in view of Lee et al. in view of Kaja et al. as applied to claims 20 and 35 above, and further in view of Corrodi et al. (U.S. Patent Application Publication 2022/0292841).
Regarding claim 41, Wang et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 20, but fails to disclose wherein the command is a command to pan what is depicted on the particular one of the plurality of displays.
Referring to the Corrodi et al. reference, Corrodi et al. discloses a vehicle comprising: wherein the audio command is a command to pan what is depicted on the particular one of the plurality of displays (paragraph [0061] – the ECU 30 is operable to detect video adjustment commands from a vehicle occupant – the adjust may include adjusting a field of view of the video feed on a particular display 18 (e.g., a cropping adjustment) and/or increasing or decreasing a magnification of the video feed, for example; paragraph [0062] – the command can be received through the user interface 38, as a spoken command through the microphone 42 (e.g., “zoom in”, “pan left”, etc.)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had the audio command be a command to pan what is depicted on the particular one of the plurality of displays as disclosed by Corrodi et al. in the vehicle disclosed by Wang et al. in view of Lee et al. in view of Kaja et al. in order to adjust the field of view on the display.  
Regarding claim 43, Wang et al. in view of Kaja et al. discloses all of the limitations as previously discussed with respect to claim 35, but fails to disclose wherein the command is a command to pan what is depicted on the particular one of the plurality of displays.
Referring to the Corrodi et al. reference, Corrodi et al. discloses a method comprising: wherein the audio command is a command to pan what is depicted on the particular one of the plurality of displays (paragraph [0061] – the ECU 30 is operable to detect video adjustment commands from a vehicle occupant – the adjust may include adjusting a field of view of the video feed on a particular display 18 (e.g., a cropping adjustment) and/or increasing or decreasing a magnification of the video feed, for example; paragraph [0062] – the command can be received through the user interface 38, as a spoken command through the microphone 42 (e.g., “zoom in”, “pan left”, etc.)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had the audio command be a command to pan what is depicted on the particular one of the plurality of displays as disclosed by Corrodi et al. in the method disclosed by Wang et al. in view of Lee et al. in view of Kaja et al. in order to adjust the field of view on the display.  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEATHER R JONES whose telephone number is (571)272-7368. The examiner can normally be reached Mon. - Fri.: 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached at (571)272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HEATHER R JONES/Primary Examiner, Art Unit 2481                                                                                                                                                                                                        
December 12, 2025
Read full office action
Prosecution Timeline

Jun 24, 2024
Application Filed
Jun 05, 2025
Non-Final Rejection mailed — §103
Sep 05, 2025
Response Filed
Dec 17, 2025
Final Rejection mailed — §103
Feb 17, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/708,465
Patent 12634550
VIDEO SYSTEM
4y 1m to grant Granted May 19, 2026
18/937,860
Patent 12632497
SUMMARY GENERATION BASED ON TRIP
1y 6m to grant Granted May 19, 2026
18/908,099
Patent 12627894
HEAD-WEARABLE VISUAL RECOGNITION APPARATUS
1y 7m to grant Granted May 12, 2026
18/216,190
Patent 12587718
SYSTEM AND METHOD FOR PROVIDING DESCRIPTIVE VIDEO
2y 8m to grant Granted Mar 24, 2026
18/764,211
Patent 12568179
PERSONALIZED VIDEOS FEATURING MULTIPLE PERSONS
1y 8m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
68%
Grant Probability
74%
With Interview (+5.6%)
3y 5m (~1y 6m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 750 resolved cases by this examiner. Grant probability derived from career allowance rate.