Last updated: April 19, 2026

Application No. 18/146,662

SPEECH INSTRUCTION CONTROL METHOD IN VEHICLE CABIN AND RELATED DEVICE

Non-Final OA §101§103

Filed

Dec 27, 2022

Examiner

ARMSTRONG, ANGELA A

Art Unit

2659

Tech Center

2600 — Communications

Assignee

Shenzhen Yinwang Intelligent Technologies Co., Ltd.

OA Round

3 (Non-Final)

Interview Optional

— +9.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 641 resolved cases, 2023–2026

Examiner Intelligence

ARMSTRONG, ANGELA A View full profile →

Grants 75% — above average

Career Allow Rate

478 granted / 641 resolved

+12.6% vs TC avg

Moderate +10% lift

Without

With

+9.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 11m

Avg Prosecution

25 currently pending

Career history

666

Total Applications

across all art units

Statute-Specific Performance

§101

21.9%

-18.1% vs TC avg

§103

43.7%

+3.7% vs TC avg

§102

14.8%

-25.2% vs TC avg

§112

7.7%

-32.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 641 resolved cases

Office Action

§101 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 23, 2025 has been entered.
Claims 1-11 and 17  have been amended.  Claims 1-20 remain pending.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 1, 11, and 17 are directed to a vehicle instruction formation method, device and computer readable medium for speech instruction control in a vehicle.  The claims recite limitations for obtaining target audio data collected in a vehicle cabin; obtaining a first-type instruction based on target audio data collected in the vehicle cabin; obtaining lip motion information of in vehicle members at N positions in a vehicle cabin of the vehicle in a target time period, wherein the first-type instruction is obtained based on target audio data collected in the vehicle cabin wherein the lip motion information of the in-vehicle members is obtained in accordance with when the first-type instruction being recognized from the target audio data, and the target time period corresponding to the first-type instruction in the audio data; matching the first-type instruction with the lip motion information of the in vehicle members in the N positions in the vehicle cabin; obtaining a target position of the N positions based on a matching result between the lip motion information of the in vehicle members the N positions and the first-type instruction, wherein the target position is a position of a target member of the members at the N positions in the vehicle cabin for which lip motion information matches the first-type instruction in the matching result; and sending indication information indicating to execute the first-type instruction on a target adjustable component in the target position in the vehicle cabin.  The limitation for “obtaining target audio….” Is a data gathering step that can be achieved by a person listening to riders in a vehicle speaking.  The feature for “obtaining a first-type instruction…” is a data gathering step that can be achieved by a person listening to people speak instructions or requests of the car riders.  The feature for “obtaining lip motion information…” is a data gathering step that can be achieved by a person listening to people speak and observing mouth movements of persons at different location positions in a vehicle as they speak.  The feature for “the first instruction being recognized..” can be achieved by the person hearing what the people are speaking and understanding and recognizing any instructions that are spoken within a time period or interval.  The feature for “matching…” can be achieved by the person corresponding the speech heard with the noted mouth movements from the different seat positions.  The limitation for “obtaining….” Can be achieved by the person observing who speaks and where that speaker is located in the vehicle.  The feature for “sending indication….to execute…” can be achieved by the person alerting the passengers that they will be performing the command/request of an adjustable component based on the uttered command/request by the observed position speaker.  The recited limitations are directed a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of the generic control device, memory, processor, medium. and generic computer components.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  Accordingly, the claims recite an abstract idea.
This judicial exception is not integrated into a practical application because the recited generic control device, memory, processor, medium. and generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Accordingly, the elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  The claims are directed to an abstract idea.  The claims are not patent eligible.
 The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as indicated with respect to integration of the abstract idea into a practical application, the additional elements of the generic control device, memory, processor, medium. and generic computer components to perform the various steps amounts to no more than mere instructions to apply the exception using generic computer components.  Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.  The claims are not patent eligible.

Dependent claims 2-6, 8-10, 12-15, and 18-20 do not integrate the judicial exception into a practical application and do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitations of the dependent claims are directed to mental processing and/or pen and paper steps of organizing or manipulating functions and commands for data gathering, observing lip motions, processing speech data and observing speakers in a vehicle.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zurek et al (US Patent Application Publication NO. 2019/0341044), hereinafter Zurek, in view of Joh (US Patent Application Publication No. 2021/0166683).
Zurek teaches a method and apparatus for using image data to aid voice recognition.  Regarding claim 1, Zurek teaches a method performed by a control device for speech instruction control of a vehicle, comprising: obtaining target audio data collected in the vehicle cabin [para 0053 -- The device also receives 704 a first acoustic signal, e.g., speech that is captured by one or both microphones 110, 112 of device 102, from the first individual that includes voice data for that individual. Likewise, the device 102 receives 706 a similar acoustic signal from the second individual, which includes voice data]; obtaining a first-type instruction [para 0053 -- The device also receives 704 a first acoustic signal, e.g., speech that is captured by one or both microphones 110, 112 of device 102, from the first individual that includes voice data for that individual. Likewise, the device 102 receives 706 a similar acoustic signal from the second individual, which includes voice data]; obtaining lip motion information [para 0056 – image data of detected lip movement] of persons located in N positions in a vehicle cabin of the vehicle in a target time period [para 0039 –forward facing camera monitoring positions of passengers] , wherein the first-type instruction is obtained based on target audio data collected in the vehicle cabin, the lip motion information of the persons is obtained when the first-type instruction is recognized from the target audio data, and the target time period corresponding to the first-type instruction in the audio data [para 0053-0056]; matching the first-type instruction with the lip motion information of the persons in the N positions in the vehicle cabin [para 0057 -- the device 102 associates the first voice data with the first individual by correlating the voice data to the first individual's lip movement. From the image data, the device 102 identifies lip movement that indicates speech. From the acoustic signal, the device 102 identifies the beginning of the voice data and its duration. The device 102 then determines that the first individual began speaking when the voice data was first received and continued speaking until the voice data within the first acoustic signal ended. If the voice data and the lip movement of only one individual are synchronized, then the device 102 associates that individual with the voice data. Using one or more of the aforementioned association methods, the device 102 similarly determines 710 that voice data in the second acoustic signal (or second voice data) originated from the second individual]; obtaining a target position based on a matching result between the lip motion information of the persons in the N positions and the first-type instruction, wherein the target position is a position of a target person in the vehicle cabin whose lip motion information matches the first-type instruction in the matching result [para 0057 -- the device 102 associates the first voice data with the first individual by correlating the voice data to the first individual's lip movement. From the image data, the device 102 identifies lip movement that indicates speech. From the acoustic signal, the device 102 identifies the beginning of the voice data and its duration. The device 102 then determines that the first individual began speaking when the voice data was first received and continued speaking until the voice data within the first acoustic signal ended. If the voice data and the lip movement of only one individual are synchronized, then the device 102 associates that individual with the voice data. Using one or more of the aforementioned association methods, the device 102 similarly determines 710 that voice data in the second acoustic signal (or second voice data)originated from the second individual]; and sending an indication for executing the first-type instruction [para 0029; 0035; 0051-0052; 0058; 0065-0067 --The voice recognition module 206 includes the elements needed to process voice data by recognizing words. Voice recognition, as used herein, refers to the ability of hardware and/or software to interpret speech. In one embodiment, processing voice data includes converting speech to text. This type of processing is used, for example, when one is dictating an e-mail. In another embodiment, processing voice data includes identifying commands from speech].  Zurek teaches the system can be utilized in a vehicle environment with multiple speakers [para 0036-0039], but fails to the details of the processing and executing commands for in-vehicle members on  target adjustable components at the N target positions.  In a similar field of endeavor, Joh teaches vehicle control apparatus and method using speech recognition and provides for detecting, recognizing and performing commands for carrying out hands-free speech based on the location and recognized speech of the vehicle member [para 0044; 0056-0059] for controlling adjustable components [para 0074-0082 --controlling actions of air conditioning, heating, ventilation sheets, windows, or the like based on the location of a passenger…When a driver utters: “open a window”, the processor 180 opens the window positioned on the left side of the driver seat. When a passenger at the rear seat of the driver seat enters a voice command saying: “open a window”, the processor 180 opens the window of the left side at the rear seat].  Joh teaches the invention is advantageous in preventing incorrect services that the vehicle passengers do not want [para 0005].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the recognizing and performing commands based on the location and recognized speech of the vehicle member suggested by Joh, in the system of Zurek, for the purpose of preventing incorrect services that the vehicle passengers do not want, as suggested by Joh, and thereby improving and enhancing the user’s interaction with the system.
Regarding claim 2, the combination of Zurek and Joh teaches obtaining a first-type instruction and lip motion information comprises: obtaining the target audio data in the vehicle cabin; obtaining image data in the vehicle cabin when recognizing that the target audio data comprises the first-type instruction; and extracting the lip motion information of the persons in the N positions in the vehicle cabin from the image data in the vehicle cabin [para 0039; 0057-0061 -- the device 102 determines from lip movement identified in captured image data which individual of a group of individuals is speaking. The device 102 further determines, using facial-recognition techniques, for example, the identity of the speaker from the captured image data. Based on the speaker's identify, the device 102 selects a voice-recognition database to use for performing voice recognition while the identified speaker is speaking. The identified speaker, for instance, might have created a speech-recognition database during a previous speech-recognition training session with the device 102].
Regarding claim 3, the combination of Zurek and Joh teaches extracting the lip motion information of the persons in the N positions comprises: extracting the lip motion information of the persons in the N positions in the vehicle cabin from the image data in the vehicle cabin when recognizing that multiple persons are in the vehicle cabin [para 0039; 0055-0058 -- the device 102 determines from lip movement identified in captured image data which individual of a group of individuals is speaking. The device 102 further determines, using facial-recognition techniques, for example, the identity of the speaker from the captured image data. Based on the speaker's identify, the device 102 selects a voice-recognition database to use for performing voice recognition while the identified speaker is speaking. The identified speaker, for instance, might have created a speech-recognition database during a previous speech-recognition training session with the device 102].
Regarding claim 4, the combination of Zurek and Joh teaches obtaining the target position comprise: obtaining, based on the first-type instruction and the lip motion information of the N persons located in the vehicle cabin, a matching degree between lip motion information of a person in each of the N positions and the instruction information; identifying the target person as a person corresponding to lip motion information with a highest matching degree; and using a position of the target person as the target position [para 0039; 0057-0061 –correlates speech with lip movement data and facial recognition for positional info of speakers].
Regarding claim 5, the combination of Zurek and Joh teaches the first-type instruction is a speech waveform sequence extracted from the audio data or text instruction information recognized based on the audio data [para 0056-0061 – voice recognition].
Regarding claims 6, the combination of Zurek and Joh teaches the lip motion information of the persons in the N positions in the vehicle cabin is image sequences of lip motion of the persons in the N positions in the vehicle cabin in the target time period [para 0039; 0057-0061 –correlates speech with lip movement data and facial recognition for positional info of speakers; Joh para 0056-0059].
Regarding claim 7, the combination of Zurek and Joh teaches: generating a correspondence between the lip motion information of the persons in the vehicle cabin and the N positions, wherein the step of using a position of the target person as the target position comprises: determining, based on the correspondence between the lip motion information of the persons in the vehicle cabin and the N positions and based on position data provided by a sensor in the vehicle cabin, the position of the target person as the target position, data from a sensor in the vehicle [para 0039; 0055-0061 –correlates speech with lip movement data and facial recognition for positional info of speakers; Joh para 0056-0059].
Regarding claim 8, the combination of Zurek and Joh  teaches generating a correspondence between the lip motion information of the persons in the N positions and identities of the persons in the N positions, wherein the step of identifying the target person comprises: obtaining the target lip motion information with the highest matching degree; determining the target person based on the correspondence between the lip motion information of the persons in the N positions and the identities of the persons in the N positions [para 0039; 0055-0061 –correlates speech with lip movement data and facial recognition for positional info of speakers; Joh para 0056-0059].
Regarding claim 9, the combination of Zurek and Joh teaches wherein the audio data in the vehicle cabin is obtained based on data collected by a plurality of microphones in the vehicle cabin; or the audio data in the vehicle cabin is obtained based on audio data collected by a microphone in a specified position area in the vehicle cabin [para 0021; 0028; 0053; 0055; Joh para 0056-0059].
Regarding claim 10, the combination of Zurek and Joh  teaches the first-type instruction is a control instruction in the vehicle cabin [para 0029; 0035; 0051-0052; 0058; 0065-0067 --The voice recognition module 206 includes the elements needed to process voice data by recognizing words. Voice recognition, as used herein, refers to the ability of hardware and/or software to interpret speech. In one embodiment, processing voice data includes converting speech to text. This type of processing is used, for example, when one is dictating an e-mail. In another embodiment, processing voice data includes identifying commands from speech].
Claims 11-16 and 17-20 are rejected under similar rationale as claims 1-10.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659

Read full office action

Prosecution Timeline

Dec 27, 2022

Application Filed

Jun 14, 2025

Non-Final Rejection — §101, §103

Sep 18, 2025

Response Filed

Sep 30, 2025

Final Rejection — §101, §103

Dec 23, 2025

Response after Non-Final Action

Jan 20, 2026

Request for Continued Examination

Jan 27, 2026

Response after Non-Final Action

Feb 21, 2026

Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/240,480

Patent 12602547

DOMAIN ADAPTING GRAPH NETWORKS FOR VISUALLY RICH DOCUMENTS

2y 5m to grant Granted Apr 14, 2026

18/044,890

Patent 12596879

METHOD AND SYSTEM FOR IDENTIFYING CITATIONS WITHIN REGULATORY CONTENT

2y 5m to grant Granted Apr 07, 2026

18/082,503

Patent 12585892

AUTO-TRANSLATION OF CUSTOMIZED ASSISTANT

2y 5m to grant Granted Mar 24, 2026

18/466,230

Patent 12555491

Inclusive Intelligence for Digital Workplace

2y 5m to grant Granted Feb 17, 2026

17/660,813

Patent 12547843

SYSTEMS AND METHODS FOR GENERALIZED ENTITY MATCHING

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

75%

Grant Probability

84%

With Interview (+9.5%)

3y 11m

Median Time to Grant

High

PTA Risk

Based on 641 resolved cases by this examiner. Grant probability derived from career allow rate.