Last updated: May 29, 2026

Application No. 18/816,659

AUTOMATIC UPDATING OF AUTOMATIC SPEECH RECOGNITION FOR NAMED ENTITIES

Non-Final OA §103

Filed

Aug 27, 2024

Priority

Nov 06, 2023 — provisional 63/596,574

Examiner

DESIR, PIERRE LOUIS

Art Unit

2659

Tech Center

2600 — Communications

Assignee

Samsung Electronics Co., Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +32.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 61% grant rate with +32.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 288 resolved cases, 2023–2026

Examiner Intelligence

DESIR, PIERRE LOUIS View full profile →

Grants 61% of resolved cases

Career Allowance Rate

176 granted / 288 resolved

-0.9% vs TC avg

Strong +32% interview lift

Without

With

+32.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 11m

Avg Prosecution

3 currently pending

Career history

296

Total Applications

across all art units

Statute-Specific Performance

§101

4.5%

-35.5% vs TC avg

§103

75.0%

+35.0% vs TC avg

§102

11.5%

-28.5% vs TC avg

§112

4.3%

-35.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 288 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Khemka et al., US 2023/0409615 A1 (“Khemka”) in view of Evermann et al., (US 2019/0179890 A1) (Evermann). 
Regarding claims 1, 9 and 17, Khemka discloses a device, a non-transitory medium and method, “identifying, using an automated speech recognition (ASR) system, at least one named entity hypothesis from at least one audio input;” by receiving audio and performing ASR to generate one-best and n-best hypotheses, followed by NLU extracting slots/entities (Khemka Fig. 2; ¶¶ [0050]–[0056]; ¶¶ [0110]–[0116]); “providing, using the ASR system, the identified at least one named entity hypothesis to a large language model (LLM);” by passing ASR/NLU outputs to transformer-based models (e.g., T5) for dialogue state tracking and slot/value prediction using example-guided inputs (Khemka ¶¶ [0158]–[0176]; Fig. 15); “generating a prompt using an automated prompt generator;” by programmatically constructing question templates Q(s→q) and concatenating retrieved in-context examples (TransferQA formatting) that serve as prompts for the model (Khemka ¶¶ [0174]–[0176]; Fig. 15); “processing, using the LLM, the identified at least one named entity hypothesis and the prompt to generate updated named entity recognition data;” by feeding the template prompt plus examples and dialogue history into T5 to output slot values/updated entity recognition (Khemka ¶¶ [0169]–[0176]; Fig. 15). 
Khemka does not specifically disclose “providing the updated named entity recognition data back to the ASR system.” 
However, Evermann discloses the natural language processor… re-processes the text string… to search for word matches… [and] can determine that the text actually refers to… [correct entity]… and feed corrected tokens back into the recognition pipeline” (¶0014, ¶0135; Fig. 5). 
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Evermann’s feedback of corrected named-entity tokens into Khemka’s ASR/NLU pipeline to improve recognition accuracy. Both references address the same problem — errors in speech-to-text transcription — and Evermann’s technique predictably improves ASR performance by reintegrating corrected entity data.
Regarding claims 2, 10 and 18, Khemka discloses the device, non-transitory medium and method of claims 1, 9 and 17, further comprising updating the ASR system with the updated named entity recognition data to enhance named entity recognition accuracy (i.e., model training, on-device/federated updates, and adapting models with data (Khemka ¶0061; ¶0114–¶0115; ¶0179–¶0181).
It should also be noted that Evermann discloses a system which updates its speech recognition models with corrected named entities to improve future recognition accuracy (see paragraphs 15 and 136). Although not combined here, it would have been obvious to integrate Evermann’s update mechanism into Khemka’s pipeline for predictable accuracy gains.
Regarding claims 3, 11 and 19, Khemka discloses the device, non-transitory medium and method of claims 2, 10 and 18, further comprising: accessing a set of audio samples of named entities collected from users of a voice assistance system, wherein each audio sample is annotated with a text transcript of the named entity and a corresponding category that the named entity belongs to of a plurality of categories (i.e., Khemka discloses user-collected named-entity audio samples, annotated with transcripts and categories (¶0110–¶0116)); for each audio sample in the set of audio samples: generating the prompt including the named entity based on the corresponding category (i.e., Khemka’s DST/TransferQA prompt generation with slot/category context (¶0174–¶0176)); and providing the prompt as input to the LLM (i.e., (¶0174–¶0176)), wherein the updated named entity recognition data includes a plurality of possible commands including the named entity based on the corresponding category (i.e. Khemka’s LLM output slot values/dialogue state (¶0176).); and training, based on the plurality of possible commands generated by the LLM, at least one of a language model or a talk-to-speech (TTS) model of the ASR system (i.e., Khemka’s training/fine-tuning of language/TTS models with generated examples (¶0179–¶0181; ¶0102–¶0106)). 
Regarding claims 4 and 12, Khemka discloses the device and method of claims 3 and 11, wherein the plurality of categories includes at least one of: an application name; a name of a person; a name of a television program; a name of a movie; a name of an electronic device; a name of a place; a name of a radio station; a name of a podcast; a name of a genre; a name of a business; a name of a sports team; or a name of a song (i.e., many domains/slots and lists of types (music, movies, contacts, places, etc.), and teaches domain/slot vocabularies that encompass these categories (Khemka ¶0046; ¶0116; Fig. 3C).
Regarding claims 5 and 13, Khemka discloses the device and method of claims 3 and 11, further comprising: creating a base model trained using the set of audio samples of named entities collected from the users of the voice assistance system; and periodically updating the ASR system and/or the LLM based on the base model (i.e., building base models from user data, on-device and federated training, and periodically updating models including ASR/NLU components (Khemka ¶0061; ¶0114–¶0115; ¶0179–¶0181)).
Regarding claims 6, 14 and 20, Khemka discloses the device, medium and method of claims 1, 9 and 17, further comprising: providing the prompt generated using the automated prompt generator to a talk-to-speech (TTS) model and synthesizing, using the TTS model, an audio sample based on the prompt (i.e., Khemka explicitly describes a TTS component that generates synthesized speech from text and discusses generated examples for model fine-tuning (Khemka ¶0102–¶0106; ¶0179–¶0181)).
Although Kemka discloses  general model training but does not explicitly teach training the ASR model using TTS-synthesized audio derived from LLM prompts. 
Evermann teaches generating augmented training data (including TTS-synthesized audio) for improving recognition of named entities and using such synthesized samples for ASR training/augmentation (Evermann ¶0015; ¶0136).
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to to apply Evermann’s data-augmentation technique using TTS samples to Khemka’s prompt/TTS flows in order to improve ASR training coverage for named entities recognized/produced by the LLM and to also improve the performance of the ASR. 
Regarding claims 7 and 15, Khemka discloses the device and method of claims 1 and 9, wherein the ASR system and the LLM are executed on a same electronic device (i.e., on-device ASR/portions of NLU and acknowledges some processing (ASR/parts of NLU/LLM) may execute on device (Khemka ¶0047–¶0053; Fig. 2).
Regarding claims 8 and 16, Khemka discloses the device and method of claims 7 and 15, further comprising: providing user information stored on the same electronic device to the LLM; and processing, using the LLM, the identified at least one named entity hypothesis to generate the updated named entity recognition data using the user information (i.e., using local user data, personalized lexicons/gazetteers and using such context in NLU/LLM processing (Khemka ¶0046; ¶0061; ¶0110–¶0116)).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PIERRE LOUIS DESIR whose telephone number is (571)272-7799. The examiner can normally be reached Monday-Friday 9AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PIERRE LOUIS DESIR/           Supervisory Patent Examiner, Art Unit 2659

Read full office action

Prosecution Timeline

Aug 27, 2024

Application Filed

Mar 02, 2026

Non-Final Rejection mailed — §103

May 01, 2026

Interview Requested

May 13, 2026

Applicant Interview (Telephonic)

May 14, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

18/215,972

Patent 12632788

PROMPT AUGMENTED GENERATIVE REPLAY VIA SUPERVISED CONTRASTIVE TRAINING FOR LIFELONG INTENT DETECTION

2y 10m to grant Granted May 19, 2026

18/446,635

Patent 12609124

VOICE AGENT SYSTEM

2y 8m to grant Granted Apr 21, 2026

18/298,060

Patent 12585679

EXECUTING UNSUPERVISED PRE-TRAINING TASKS WITH A MACHINE LEARNING MODEL TO PREDICT DOCUMENT GRAPH ATTRIBUTES

2y 11m to grant Granted Mar 24, 2026

18/184,630

Patent 12562154

Scalable Model Specialization Framework for Speech Model Personalization

2y 11m to grant Granted Feb 24, 2026

18/055,870

Patent 12555594

SYSTEM AND METHOD FOR TRACKING EMOTIONAL STATE OF A CALLER USING ARTIFICIAL INTELLIGENCE

3y 3m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

61%

Grant Probability

93%

With Interview (+32.1%)

3y 11m (~2y 2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 288 resolved cases by this examiner. Grant probability derived from career allowance rate.