Last updated: April 19, 2026

Application No. 18/512,252

APPARATUS FOR VOICE RECOGNITION AND METHOD THEREOF

Non-Final OA §103

Filed

Nov 17, 2023

Examiner

ZHU, RICHARD Z

Art Unit

2654

Tech Center

2600 — Communications

Assignee

Kia Corporation

OA Round

3 (Non-Final)

Interview Optional

— +15.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 718 resolved cases, 2023–2026

Examiner Intelligence

ZHU, RICHARD Z View full profile →

Grants 69% — above average

Career Allow Rate

498 granted / 718 resolved

+7.4% vs TC avg

Strong +15% interview lift

Without

With

+15.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

32 currently pending

Career history

750

Total Applications

across all art units

Statute-Specific Performance

§101

16.0%

-24.0% vs TC avg

§103

54.5%

+14.5% vs TC avg

§102

19.7%

-20.3% vs TC avg

§112

4.2%

-35.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 718 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114 
A request for continued examination under, including the fee set forth in 37 CFR1.17(e), was filed in this application after final rejection. Since this application is eligiblefor continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e)has been timely paid, the finality of the previous Office action has been withdrawnpursuant to 37 CFR 1.114. Applicant's submission filed on 03/10/2026 has been entered.
Status of the Claims
Claims 1-20 are pending. 
Response to Applicant’s Arguments
In response to “Arizmendi does not disclose separating multiple independent partial intents, deleting duplicate partial intents, and recombining the remaining partial intents to generate a final intent, as recited in amended independent claim 1” and “unlike Arizmendi, where later partial results sequentially replace actions derived from earlier partial results, claim i receives a plurality of partial intents as inputs and generates a final intent by rearranging and combining such partial intents. Applicant respectfully asserts that Arizmendi discloses no such subject matter. Therefore, Applicant respectfully asserts that Arizmendi fails to disclose the above-recited features of amended independent claim 1”.
In view of such amendments to claims 1 and 11, anticipation rejection under 35 USC 102 has been withdrawn. Upon further search and consideration, please see details of a new combination of references set forth below. 
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 8-11, 13, and 18-20 are rejected under 35 USC 103(a) as being unpatentable over Arizmendi et al. (US 2014/0156268 A1) in view of Yoon et al. (US 2021/0166687 A1).
Regarding Claims 1 and 11, Arizmendi discloses a voice recognition apparatus (Fig. 1) comprising: 
a microphone configured to extract an utterance of a user (¶25, input device 190 as a microphone for speech); 
a memory configured to store a scenario matching intent extracted from the utterance (¶22, system memory storing software module; i.e., per ¶15, software module to implement POMDP based dialog manager that keeps a probability distribution over user states and allows a number of current and past dialog and recognition features to be used when considering the meaning of a recognition result); and 
a processor (¶22, processor 120 executing software modules) configured to: 
search for the scenario based on the utterance (¶35, integrating incremental speech recognition results with a partially observable Markov decision process (“POMDP”) dialog manager (“DM”), the DM tracks a probability distribution over multiple hidden dialog states / belief states by using confidence scores associated with respective incremental speech recognition result to update the hidden dialog state / belief states; the DM can determine a belief in a particular dialog state or an action to take in response to a likely dialog state such that a belief state generated from an incremental result incorporates all of the contextual information available to the system from the start of the dialog until the moment of that incremental result);
perform a voice recognition function (¶35, integrating incremental speech recognition results with a partially observable Markov decision process (“POMDP”) dialog manager (“DM”) requires using an incremental automatic speech recognition (“IASR”) module 202 per ¶¶28-29, which analyzes speech input and provide incremental speech recognition results / incremental textual transcriptions of the speech input as output); 
extract intent from the utterance (¶29, SLU module 204 can receive the transcribed input and can use a natural language understanding model to analyze the group of words that are included in the transcribed input to derive a meaning from the input);
separate the intent into partial intents by using separators (¶30, SLU 204 processing partial results from IASR 202; ¶36, recognized text from the incremental speech recognition result may be incomplete and may end in the middle of a sentence or even word; i.e., SLU 204 separately generates partial intents for respective partial ASR results ending in the middle of a sentence or word);
generate a final intent by combining partial intents, such that duplicate partial intents are deleted depending on definitions of the separators (¶29, dialog manager DM 206 receives meaning of the speech input from SLU 204 and determines an action; ¶34, Table I, based on a determination (i.e., definition) that the fourth action indicated by a first partial intent and the fifth action indicated by a second partial intent are the same (i.e., duplicates), reject the fifth action; in view of ¶¶16-17, if an action is rejected (e.g., duplicate fifth action), then the copy of the dialog manager state (per ¶19, a belief state incorporating contextual information available to the system from the beginning of the dialog) corresponding to the rejected action is discarded (i.e., deleted)),
wherein each of the partial intents indicates at least one of an action, a target, or an entity included the first intent or the second intent (¶29 and ¶34, Table I, DM module determines SLU meaning of incremental transcription “Mckeesport” indicates fourth action “Ok, Mckeesport…” while SLU meaning of incremental transcription “Mckeesport center” indicates fifth action “Ok, Mckeesport…”).
Arizmendi does not teach extracting a first intent from a first utterance, a second intent from a second utterance, separate the first intent into first partial intents and the second intent into second partial intents by using separators, and rearrange the first partial intents of the first intent and the second partial intents of the second intent.
Yoon discloses a voice recognition apparatus performing a voice recognition function (Fig. 1, server 200 including a processor per ¶54; ¶59, processor converts user voice into text data through an ASR model), extracting a first intent from a first utterance (¶58, processor divides user voice in units of sentences to identify a user’s intent included in each sentence), extracting a second intent from a second utterance (¶58, processor divides user voice in units of sentences to identify a user’s intent included in each sentence), and rearrange the first intent and the second intent (¶67, distinguish user’s voice in units of sentences from #1 to #5, each sentence includes an intent where the processor first identify the priority of each sentence / intent in the order of #2, #5, #1, #3, and #4; ¶¶68-69, rearrange the first priority based on intent information in the order of #2, #1, #5, #3, and #4).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to divide user utterances into units of sentences to extract a first intent from a first sentence / utterance, a second intent from a second sentence / utterance and rearrange the first intent and the second intent so that same or similar intents become adjacent in the priority order (Yoon, ¶68).
The established function of Arizmendi provides incremental speech recognition result end in the middle of a sentence (Arizmendi, ¶36). Therefore, when user utterance is divided into units of sentences (Yoon, ¶58) to extract the first intent from the first sentence / utterance and the second intent from the second sentence / utterance, incremental speech recognition results ending in the middle of each sentence is provided to SLU module 204 to separate the first intent into first partial intents and the second intent into second partial intents by using the separators (Arizmendi, ¶31).
The predictable result of the combination of Arizmendi and Yoon would rearrange the first partial intents of the first intent and the second partial intents of the second intent based on the first intent and the second intent being the same or similar (Yoon, ¶68) wherein each of the first partial intents and the second partial intents indicates at least an action according to the established function of Arizmendi (Arizmendi, ¶34, Table I).
Regarding Claims 3 and 13, Arizmendi discloses wherein the processor is further configured to assign one separator among the separators to each one of or any combination of the action, the target, and the entity (¶16, IIM obtains a potential action for a partial result; e.g., ¶30, obtain a potential action for “call mom”; ¶34, obtain potential actions for respective partial results “Mckee” and “Mckeesport”), included in each of the first intent and the second intent (¶30, SLU 204 processes partial results from incremental ASR202 and providing result to IIM 212; per ¶34 and Table 1, provide SLU processing results of “Mckee” and “Mckeesport” to IIM to obtain respective potential actions).
Regarding Claims 8 and 18, Arizmendi as modified by Yoon disclose wherein the processor is configured to rearrange the first partial intents derived from the first utterance and the second partial intents derived from the second utterance in a reverse order of utterances (Yoon, ¶¶68-69, rearrange the first priority #2, #5, #1, #3, and #4 based on intent information in the order of #2, #1, #5, #3, and #4; i.e., #5 and #1 have been reversed in the priority order).
Regarding Claims 9 and 19, Arizmendi discloses wherein the processor is configured to generate the final intent by deleting actions other than a most preceding action among the first partial intents and the second partial intents (¶34 and Table I, discard or reject actions corresponding to partial results “Yew”, “Ridge”, as well as action for “Mckee” since it is incomplete per ¶33; execute revised action for “Mckeesport”).
Regarding Claims 10 and 20, Arizmendi discloses wherein the processor is configured to generate the final intent by deleting particular partial intents that match deleted partial intents, from among the first partial intents and the second partial intents (¶34, fifth action for fifth partial “Mckeesport center” is rejected since it is same as the third partial “Mckee”, which was discarded as being incomplete result of fourth partial “Mckeesport”).
Claims 2, 4-7, 12, and 14-17 are rejected under 35 USC 103(a) as being unpatentable over Arizmendi et al. (US 2014/0156268 A1) and Yoon et al. (US 2021/0166687 A1) as applied to claims 1 and 11, in view of Suleman et al. (US 2015/0039292 A1).
Regarding Claims 2 and 12, Arizmendi does not disclose generate the final intent based on the first utterance and the second utterance in response to one of or any combination of an action, a target, or an entity, is missing from the second utterance obtained after the first utterance.
Suleman teaches a dialog device for determining intent data and associated entity data from user queries (¶¶39-40, dialogue driver 306 receives user query 302 for processing to determine that a particular user query likely relates to a particular command; ¶¶43-44, evaluate the user query to determine that it is an entity type query) and generating a final intent based on a first utterance and a second utterance in response to one of any combination of the action,  the target, or the entity, being missing from the second utterance obtained after the first utterance (¶74, given a first user query “Find me a flight from Calgary” and subsequent query “Change that to New York”, determine which word in “Find me a flight from Calgary” is referenced by the pronoun “that” in “Change that to New York”; i.e., “Change that to New York” is missing a command / intent such that entity extraction is performed on the second user query in order to perform the command intended by the user corresponding to “Find me a flight” intent / command according to “Change Calgary to New York”).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to generate the final intent based on the first utterance and the second utterance in response to one of or any combination of an action, a target, or an entity, is missing from the second utterance obtained after the first utterance in order to perform the command intended by the user (Suleman, ¶74).
Regarding Claims 4 and 14, Arizmendi discloses wherein the processor is further configured to: 
assign a first separator to the first partial intents, wherein the first separator points to the action or the target included among the first partial intents extracted from the first intent (¶30, SLU 204 processing partial results from IASR 202 and providing results to IIM 212; ¶¶31-32, IIM 302 copies the current state of dialog manager, provides copied temporary instance of dialog manager with an incremental speech recognition result, and inspects the action that the copied dialog manager would take); and 
assign a second separator to the second partial intents included among the second partial intents extracted from the second intent (¶32, provide a second copy of the original dialog manager with a new incremental speech recognition result 304 and determine if the second copy takes an action that advances the dialog), 
wherein the second separator points to the action or the target being opposite to the first partial intents to which the first separator was assigned (¶31, when inspecting the action that the copied dialog manager with the incremental speech recognition result 304 would take, determine that the action does not sufficiently advance the dialog that would require re-asking the same question; ¶32, make a second copy of the original dialog manager and a new incremental speech recognition result that is a revision of the previous incremental speech recognition result, evaluate or determine that the second copy takes an action that advances to the dialog; i.e., the first action does not advance the dialog being opposite to second action that does advance the dialog).
Regarding Claims 5 and 15, Arizmendi discloses wherein the processor is further configured to generate the final intent by deleting a particular first partial intent and a particular second partial intent that are the same as each other, in response to the particular first partial intent to which the first separator is assigned being the same as the particular second partial intent to which the second separator is assigned (¶34, begin to execute the action generated by the third partial “Mckee” and the fourth partial “Mckeesport” revises the action; per ¶33, incomplete incremental speech recognition (e.g., “Mckee”) can be discarded; see also ¶34, fifth partial “Mckeesport Center” where corresponding fifth action is rejected since it is the same; in view of ¶¶16-17, if an action is rejected (e.g., duplicate fifth action), then the copy of the dialog manager state (per ¶19, a belief state incorporating contextual information available to the system from the beginning of the dialog) corresponding to the rejected action is discarded (i.e., deleted)).
Regarding Claims 6 and 16, Arizmendi discloses wherein the processor is further configured to: 
assign a third separator to a first entity included among the first partial intents and assign a fourth separator to a second entity included among the second partial intents, wherein the fourth separator negates the first entity to which the third separator was assigned (¶32, if the second copy of the dialog manager (which is associated with the new ISR that is a revision; e.g., entity “Austin” in the previous ISR result was revised to entity “Boston” in the revised ISR result) takes an action that advances the dialog and is different from the action generated by the first copy, then terminate the first action, discard the first copy of the dialog manager, initiate the second action, and the second copy assumes the position of the first copy; i.e., when generating a copy of the dialog manager for respective incremental speech recognition (i.e., first separator for the first ISR per ¶31 and second separator for the new ISR per ¶32), generate a first position (third separator) for the copied state of dialog manager and corresponding ISR (entity “Austin”) and a second position (fourth separator) for the copied state of dialog manager and corresponding revised ISR (entity “Boston”), the copied state of dialog manger and corresponding revised ISR “Boston” takes the position of the copied state of the dialog manager and corresponding ISR “Austin”).
Regarding Claims 7 and 17, Arizmendi discloses wherein the processor is further configured to generate the final intent by deleting the entities that are the same as each other (in another example, see ¶34 and Table I, reject incremental speech recognition result “Mckeesport center” since it is the same as incremental speech recognition result “Mckeesport”; in view of ¶¶16-17, if an action is rejected (e.g., duplicate fifth action), then the copy of the dialog manager state (per ¶19, a belief state incorporating contextual information available to the system from the beginning of the dialog) corresponding to the rejected action is discarded (i.e., deleted)) and to which the third separator is assigned is the same as the entity to which the fourth separator is assigned (¶34 and Table I, third separator being position assigned to the copied dialog manager state associated with incremental speech recognition result “Mckeesport” and fourth separator being position assigned to the copied dialog manager state associated with incremental speech recognition result “Mckeesport center”, which refers to the same entity “Mckeesport”).
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor Hai Phan whose telephone number is 571-272-6338. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2654                                                                                                                                                                                                        03/20/2026

Read full office action

Prosecution Timeline

Nov 17, 2023

Application Filed

Jul 12, 2025

Non-Final Rejection — §103

Oct 09, 2025

Response Filed

Jan 08, 2026

Final Rejection — §103

Mar 10, 2026

Request for Continued Examination

Mar 12, 2026

Response after Non-Final Action

Mar 20, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/247,441

Patent 12592228

SPEECH INTERACTION METHOD ,AND APPARATUS, COMPUTER READABLE STORAGE MEDIUM, AND ELECTRONIC DEVICE

2y 5m to grant Granted Mar 31, 2026

18/365,694

Patent 12592222

APPARATUSES, COMPUTER PROGRAM PRODUCTS, AND COMPUTER-IMPLEMENTED METHODS FOR ADAPTING SPEECH RECOGNITION CONFIDENCE SCORES BASED ON EXPECTED RESPONSE

2y 5m to grant Granted Mar 31, 2026

18/510,086

Patent 12586574

ELECTRONIC DEVICE FOR PROCESSING UTTERANCE, OPERATING METHOD THEREOF, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

18/520,336

Patent 12579978

NETWORKED DEVICES, SYSTEMS, & METHODS FOR INTELLIGENTLY DEACTIVATING WAKE-WORD ENGINES

2y 5m to grant Granted Mar 17, 2026

17/957,934

Patent 12572739

GENERATING MACHINE INTERPRETABLE DECOMPOSABLE MODELS FROM REQUIREMENTS TEXT

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

69%

Grant Probability

85%

With Interview (+15.4%)

3y 2m

Median Time to Grant

High

PTA Risk

Based on 718 resolved cases by this examiner. Grant probability derived from career allow rate.