Prosecution Insights
Last updated: April 19, 2026
Application No. 17/939,805

SPEECH RECOGNITION FOR MULTIPLE USERS USING SPEECH PROFILE COMBINATION

Final Rejection §101§103
Filed
Sep 07, 2022
Examiner
CHUNG, DANIEL WONSUK
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
4 (Final)
54%
Grant Probability
Moderate
5-6
OA Rounds
2y 10m
To Grant
92%
With Interview

Examiner Intelligence

Grants 54% of resolved cases
54%
Career Allow Rate
24 granted / 44 resolved
-7.5% vs TC avg
Strong +38% interview lift
Without
With
+37.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
33 currently pending
Career history
77
Total Applications
across all art units

Statute-Specific Performance

§101
25.2%
-14.8% vs TC avg
§103
52.3%
+12.3% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 44 resolved cases

Office Action

§101 §103
DETAILED ACTION This communication is in response to the Amendments and Arguments filed on 11/24/2025. Claims 1-20 are pending and have been examined. All previous objections / rejections not mentioned in this Office Action have been withdrawn by the examiner. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendments Regarding the Applicant’s arguments for the rejections under 35 U.S.C. § 101, applicant has amended independent claims 1, 19 and 20 to include “in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application”. Applicant asserts that the amended limitations work to cover a particular solution to a problem and improve upon conventional systems by increasing speech recognition accuracy based on improving the relevancy of stored speech profiles. Examiner respectfully disagrees. An important consideration in determining whether a claim improves technology or a technical field is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. MPEP 2106.05(a). Here, the amended claim language recites only an idea of a solution and fails to recite details of how a solution to a problem is accomplished. For example, the amended limitation can be interpreted as adding a word to a first speech profile based on usage frequency threshold. The claim does not recite how the word in the speech profile increases speech recognition accuracy. Applicant asserts in their remarks that speech profile based on “application-specific language” can make devices more efficient by increasing speech recognition accuracy. The claim, as currently recited, is missing the details of how the addition of specific words in the speech profile achieves the desired outcome of increased speech recognition accuracy. Regarding the Applicant’s arguments for the rejections under 35 U.S.C. § 103 applicant has amended independent claims 1, 19, and 20. Hence, the Applicant’s arguments are moot in view of new grounds of rejection. Hence, new references have been applied. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-8, 10-14, and 16-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Regarding claim 1, 19, and 20 the limitations of “receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application receiving a second speech input from a first user; in response to receiving the second speech input, obtaining a combined speech profile from a plurality of speech profiles, wherein the plurality of speech profiles includes the first respective speech profile; interpreting the second speech input based on the combined speech profile to obtain a plurality of speech recognition results; wherein the plurality of speech recognition results includes a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and selecting, from the plurality of speech recognition results, a respective speech recognition result based on an identified voice profile”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human listening to speech including unique dialect words, determining the usage frequency of the dialect words and adding the word to a list in the mind, listening to speech and interpreting the speech input based on identified subset of dialects of users to produce speech recognition results based on the dialects, and selecting a speech recognition result that most accurately corresponds to the dialect of the speech. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claims recite an abstract idea. This judicial exception is not integrated into a practical application because the recitation of a device in claim 1 and a non-transitory computer readable storage medium in claims 20, reads to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using P0030-P0042 in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to listen to speech including unique dialect words, determine the usage frequency of the dialect words and adding the word to a list in the mind, listen to speech and interpreting the speech input based on identified subset of dialects of users to produce speech recognition results based on the dialects, and select a speech recognition result that most accurately corresponds to the dialect of the speech amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible. With respect to claim 2, the claim recites “identifying, from the combined speech profile, a second user-specific word from a first respective speech profile of the plurality of speech profiles, wherein the combined speech profile includes the first user-specific word”, which reads on a human thinking of user specific words that different people use. No additional limitations are present. With respect to claim 3, the claim recites “identifying a second word from a third respective speech profile of the plurality of speech profiles” and “identifying a first weight associated with the first user-specific word; and identifying a second weight associated with the second word, wherein the second weight is less than the first weight”, which reads on a human identifying words from speech in the mind where specific words from one person is considered more identifiable than specific words from another person. No additional limitations are present. With respect to claim 4, the claim recites “wherein the first user-specific word corresponds to at least one of an object stored on the device or an object stored in association with a user profile”, which reads on a human thinking of a specific word that a person uses. No additional limitations are present. With respect to claim 5, the claim recites “identifying a reference word associated with a usage exceeding a threshold usage, wherein the first user-specific word corresponds to the reference word”, which reads on a human identifying a specific word that a person uses when the person uses the word multiple times. No additional limitations are present. With respect to claim 6, the claim recites “while interpreting the second speech input, determining the identified voice profile based on characteristics of the second speech input”, which reads on a human identifying a voice from speech according to the characteristics of the speech in the mind. No additional limitations are present. With respect to claim 7, the claim recites “wherein determining the voice profile based on characteristics of the second speech input includes comparing the characteristics of the second speech input to each voice profile of a plurality of voice profiles”, which reads on a human identifying a voice from speech according to the characteristics of the speech in the mind compared to other voices. No additional limitations are present. With respect to claim 8, the claim recites “determining a word from the second speech input”, “identifying a first word, within a second respective speech profile of the plurality of speech profiles, corresponding to the determined word”, and “identifying a second word, within a third respective speech profile of the plurality of speech profiles, corresponding to the determined word”, which reads on a human identifying, in the mind, specific words that different people use in speech. No additional limitations are present. With respect to claim 10, the claim recites “identifying, from the plurality of speech recognition results, a first particular speech recognition result associated with a weight value exceeding a threshold weight” and “in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result”, which reads on a human identifying a specific word that is not used by identified people and utilizing general knowledge to identify text from speech in the mind. No additional limitations are present. With respect to claim 11, the claim recites “determining whether the plurality of speech recognition results includes a second particular speech recognition result that does not include a user-specific word, wherein the second particular speech recognition result is associated with a weight that exceeds the threshold weight”, “in accordance with a determination that the plurality of speech recognition results includes the second particular speech recognition result, selecting the second particular speech recognition result as the respective speech recognition result”, and “in accordance with a determination that the plurality of speech recognition results does not include the second particular speech recognition result, selecting the first particular speech recognition result as the respective speech recognition result”, which reads on a human selecting a particular speech recognition result in the mind where the speech recognition result does not include a specific word that a person uses. No additional limitations are present. With respect to claim 12, the claim recites “in accordance with a determination that the first particular speech recognition result includes predefined content, modifying the first particular speech recognition result” and “providing the modified first particular speech recognition result as the respective speech recognition result”, which reads on a human choosing a particular speech recognition result according to predefined context in the mind. No additional limitations are present. With respect to claim 13, the claim recites “determining a plurality of words from the second speech input”, “comparing the plurality of words to a respective plurality of words included in the combined speech profile”, and “selecting the respective speech recognition result based on the comparison”, which reads on a human selecting a speech recognition result based on specific word that a person uses. No additional limitations are present. With respect to claim 14, the claim recites “in response to selecting the respective speech recognition result, removing, from the electronic device, the combined speech profile”, which reads on a human ignoring different dialects in the mind once the mind acknowledges the speech input is of a specific dialect. No additional limitations are present. With respect to claim 16, the claim recites “receiving a third speech input from a respective user, wherein the third speech input includes a respective word” and “in accordance with a determination that one or more criteria are met, adding, to a speech profile corresponding to the respective user, the respective word”, which reads on a human adding to the mind a specific word that a person uses. No additional limitations are present. With respect to claim 17, the claim recites “wherein the one or more criteria includes a criterion that the respective word is not included in the speech profile corresponding to the respective user”, which reads on a human adding to the mind a specific word that a person uses if it was not previously acknowledged. No additional limitations are present. With respect to claim 18, the claim recites “wherein the one or more criteria includes a criterion that the respective word was previously received from the respective user at least as threshold number of times”, which reads on a human adding to the mind a specific word that a person uses a specific number of times. No additional limitations are present. These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-9, 13, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Klose et al. (U.S. PG Pub No. 20160372117), hereinafter Klose, in view of Endo et al. (U.S. Patent No. 7228275), hereinafter Endo, and in further view of Wan et al. (U.S. PG Pub No. 20230035947), hereinafter Wan. Regarding claim 1, 19, and 20 Klose teaches: (Claim 1) An electronic device, comprising: (P0033, Functions explained herein below may be implemented using individual hardware circuitry, using software functioning in conjunction with a programmed microprocessor or a general purpose computer.) (Claim 1) one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: (P0033, Embodied in a device (e.g., the speech recognition unit described herein below), a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.) (Claim 19) A computer-implemented method, comprising: (P0013, A method of performing speech recognition.) (Claim 19) an electronic device with one or more processors and memory: (P0033, Embodied in a device (e.g., the speech recognition unit described herein below), a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.) (Claim 20) A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of a first electronic device, cause the first electronic device to: (P0033, Embodied in a device (e.g., the speech recognition unit described herein below), a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.) receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile; (P0013, performing speech recognition on the speech signal using the selected user profile.; P0039, The receiving component is configured to receive, in step 5202, a speech signal spoken by a user.) receiving a second speech input from a user; (P0039, The receiving component is configured to receive, in step 5202, a speech signal spoken by a user.) in response to receiving the second speech input, obtaining a combined speech profile from a plurality of speech profiles, wherein the plurality of speech profiles includes the first respective speech profile; (P0020, Speaker recognition may be performed on a first portion of the speech signal. [Receiving the speech input.]; P0042, A plurality of speech recognition user profiles may be downloaded from the remote source beforehand and the selected user profile may be selected among the plurality of downloaded speech recognition user profiles accordingly. [Combined speech profile is defined broadly in the specification and includes “combined speech profile may include a speech profile for user”. Specification P0249.]) wherein the plurality of speech recognition results includes: a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking.; P0041, A speech recognition user profile which is associated with the identified user.) a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking.; P0041, A speech recognition user profile which is associated with the identified user.) Klose does not specifically teach: receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile; in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application; interpreting the second speech input based on the combined speech profile to obtain a plurality of speech recognition results: wherein the plurality of speech recognition results includes: a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and selecting, from the plurality of speech recognition results, a respective speech recognition result based on an identified voice profile. Endo, however, teaches: interpreting the second speech input based on the combined speech profile to obtain a plurality of speech recognition results: (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.) wherein the plurality of speech recognition results includes: a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.) a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.) selecting, from the plurality of speech recognition results, a respective speech recognition result based on an identified voice profile. (Col. 8, Lines 22-24, The method of FIG. 4 selects the recognized speech text with the highest raw confidence score as the speech recognition result.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to obtain a plurality of speech recognition results from speech input. It would have been obvious to combine the references because utilizing multiple speech recognizer and selecting the best takes advantage of the strengths, while complementing the weaknesses, of each speech recognizer due to accent and identify of the speaker. (Endo Col. 3, Lines 3-16) Klose in view of Endo does not specifically teach: receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile; in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application; Wan, however, teaches: receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile; (P0155, The customized language model and the keyword for the preset scenario are obtained based on the shared text. The hot word list for the preset scenario is updated based on the keyword to obtain the new hot word list. Speech recognition for the preset scenario is performed with the customized language model and the new hot word list.) in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application; (P0103, In step S603, a keyword is determined based on word frequencies of phrases and a word frequency threshold. The word frequency of a phrase represents the number of occurrences of the phrase in the phrase set or the sentence set.; P0117, Keywords are filtered by calculating language model scores of sentences including homonyms of the keywords. The keyword associated with a language model score higher than its homonym(s) is added to the hot word list. The keyword takes effect immediately when the keyword is added to the hot word list.; P0082, Speech recognition for the preset scenario is performed with the customized language model and the new hot word list.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add a word to a speech profile based on usage frequency. It would have been obvious to combine the references because a speech recognition model needs to account for usage scenario for higher speech recognition accuracy. (Wan P0004) Regarding claim 2 Klose in view of Endo and further view of Wan teach claim 1. Klose further teaches: identifying, from the combined speech profile, a first user-specific word from a second respective speech profile of the plurality of speech profiles, wherein the combined speech profile includes the first user-specific word. (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking. Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase in advance, as described above), wherein the speaker recognition data may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations).; P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.) Regarding claim 3 Klose in view of Endo and further view of Wan teach claim 2. Klose further teaches: identifying a second word from a third respective speech profile of the plurality of speech profiles; and (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking. Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase in advance, as described above), wherein the speaker recognition data may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations).; P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.) Klose does not specifically teach: identifying a first weight associated with the first user-specific word; and identifying a second weight associated with the second word, wherein the second weight is less than the first weight. Endo, however, teaches: identifying a first weight associated with the first user-specific word; and identifying a second weight associated with the second word, wherein the second weight is less than the first weight. (Col. 5, Lines 56-60, For example, one speech recognizer 204 may output its speech recognition result as slot-value pairs, such as <device=“television”: confidence score=80>and <action=“on”: confidence score=60>.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to identify confidence scores for words in the recognition results. It would have been obvious to combine the references because utilizing multiple speech recognizer and selecting the best takes advantage of the strengths, while complementing the weaknesses, of each speech recognizer due to accent and identify of the speaker. (Endo Col. 3, Lines 3-16) Regarding claim 4 Klose in view of Endo and further view of Wan teach claim 2. Klose further teaches: wherein the first user-specific word corresponds to at least one of an object stored on the electronic device or an object stored in association with a user profile. (P0042, The selected user profile may be stored in a respective storage device accessible by the speech recognition unit and, thus, the user profile may be applied immediately upon selection.; P0043, The user profile … may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations). … The user profile is thus used to implement a speaker-dependent speech recognition technique, as described above.; P0025, Speaker recognition may be performed based on a vocabulary.) Regarding claim 5 Klose in view of Endo and further view of Wan teach claim 2. Klose further teaches: identifying a reference word associated with a usage exceeding a threshold usage, wherein the first user-specific word corresponds to the reference word. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.) Regarding claim 6 Klose in view of Endo and further view of Wan teach claim 1. Klose further teaches: while interpreting the second speech input, determining the identified voice profile based on characteristics of the second speech input. (P0024, Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase, as described above). The speaker recognition data may include at least one of voice characteristics of the user, pronunciation characteristics of the user, a vocabulary characteristic to the user.) Regarding claim 7 Klose in view of Endo and further view of Wan teach claim 6. Klose further teaches: wherein determining the identified voice profile based on characteristics of the second speech input includes comparing the characteristics of the second speech input to each voice profile of a plurality of voice profiles. (P0024, Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase, as described above). The speaker recognition data may include at least one of voice characteristics of the user, pronunciation characteristics of the user, a vocabulary characteristic to the user.; P0023, The selected user profile is selected among the plurality of speech recognition user profiles.) Regarding claim 8 Klose in view of Endo and further view of Wan teach claim 1. Klose further teaches: determining a word from the second speech input; (P0017, Speech recognition may include converting the speech signal into a text message.) identifying a first word, within a second respective speech profile of the plurality of speech profiles, corresponding to the determined word; and (P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.) identifying a second word, within a third respective speech profile of the plurality of speech profiles, corresponding to the determined word. (P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.) Regarding claim 9 Klose in view of Endo and further view of Wan teach claim 1. Klose further teaches: establishing wireless communication with a second electronic device; and (P0042, In case the selected user profile is not available to the speech recognition unit, it may be downloaded from a remote source, such as a remote server hosted by an airline which provides user profiles of airline crew members for download.) in response to establishing wireless communication with the second electronic device, receiving a speech profile from the second electronic device, wherein the plurality of speech profiles includes the received speech profile. (P0042, In case the selected user profile is not available to the speech recognition unit, it may be downloaded from a remote source, such as a remote server hosted by an airline which provides user profiles of airline crew members for download.) Regarding claim 13 Klose in view of Endo and further view of Wan teach claim 1. Klose does not specifically teach: determining a plurality of words from the second speech input; comparing the plurality of words to a respective plurality of words included in the combined speech profile; and selecting the respective speech recognition result based on the comparison. Endo, however, teaches: determining a plurality of words from the speech input; (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.) comparing the plurality of words to a respective plurality of words included in the combined speech profile; and (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.) selecting the respective speech recognition result based on the comparison. (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.; Col. 2, Lines 45-48, The decision module selects either the first speech text or the second speech text as the output speech text depending upon which of the first and second confidence scores is higher.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to compare words to words included in the combined speech profile where a selection is made on the speech recognition result based on the comparison. It would have been obvious to combine the references because utilizing multiple speech recognizer and selecting the best takes advantage of the strengths, while complementing the weaknesses, of each speech recognizer due to accent and identify of the speaker. (Endo Col. 3, Lines 3-16) Regarding claim 16 Klose in view of Endo and further view of Wan teach claim 1. Klose further teaches: receiving a second speech input from a respective user, wherein the second speech input includes a respective word; and (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.) in accordance with a determination that one or more criteria are met, adding, to a speech profile corresponding to the respective user, the respective word. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.) Regarding claim 17 Klose in view of Endo and further view of Wan teach claim 16. Klose further teaches: wherein the one or more criteria includes a criterion that the respective word is not included in the speech profile corresponding to the respective user. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.) Regarding claim 18 Klose in view of Endo and further view of Wan teach claim 16. Klose further teaches: wherein the one or more criteria includes a criterion that the respective word was previously received from the respective user at least as threshold number of times. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.) Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Klose in view of Endo, in view of Wan, and further view of Coifman et al. (U.S. PG Pub No. 20110153620). Regarding claim 10 Klose in view of Endo and further view of Wan teach claim 1. Klose does not specifically teach: identifying, from the plurality of speech recognition results, a first particular speech recognition result associated with a weight value exceeding a threshold weight; and in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result. Endo, however, teaches: identifying, from the plurality of speech recognition results, a first particular speech recognition result associated with a weight value exceeding a threshold weight; and (Fig. 4, Perform multiple speech recognition. Confidence score exceeding a threshold.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to identify speech recognition results where the weight exceeds a threshold. It would have been obvious to combine the references because utilizing speech recognition result that exceeds a threshold ensures that the confidence score is reliable. (Endo Col. 8, Lines 31-34) Klose in view of Endo and further view of Wan does not specifically teach: in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result. Coifman, however, teaches: in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result. (Fig. 7, Evaluating speech input against text strings in selected databases -> (no) -> Evaluating speech input against text strings in generalized / base vocabulary database.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to utilize general speech recognition result when user specific words do not correspond to identified voice profile. It would have been obvious to combine the references because generalized dictionary acts as a default fallback in the case where evaluation criteria is not met. (Coifman, P0064) Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Klose in view of Endo, in view of Wan, and further view of Nemoto (U.S. PG Pub No. 20010039492). Regarding claim 14 Klose in view of Endo and further view of Wan teach claim 1. Klose in view of Endo and further view of Wan does not specifically teach: in response to selecting the respective speech recognition result, removing, from the electronic device, the combined speech profile. Nemoto, however, teaches: in response to selecting the respective speech recognition result, removing, from the electronic device, the combined speech profile. (P0018, The present invention removes a speech element array corresponding to a reading that the person did not use in the first response in the conversation.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to remove combined speech profile that was not selected. It would have been obvious to combine the references because the same person can maintain the same reading consistently in one conversation so utilizing other speech element would lower recognition probability for subsequent user responses.. (Nemoto P0018) Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Klose in view of Endo, in view of Wan, and further view of Singh et al. (U.S. PG Pub No. 20200160860). Regarding claim 15 Klose in view of Endo, and further view of Wan teach claim 1. Klose in view of Endo and further view of Wan does not specifically teach: detecting a disconnection of wireless communication between a second electronic device and the electronic device, wherein the plurality of speech profiles includes a fourth respective speech profile received from the second electronic device; and in response to detecting the disconnection of wireless communication between the second electronic device and the electronic device, removing the fourth respective speech profile from the plurality of speech profiles. Singh, however, teaches: detecting a disconnection of wireless communication between a second electronic device and the electronic device, wherein the plurality of speech profiles includes a fourth respective speech profile received from the second electronic device; and (P0153, Vehicle may establish the connection with the second device using a wireless link.; P0159, Determine that the connection is no longer active, which indicates that the connection is terminated, such as when the second device is out of range.; P0036, The system may obtain contact data associated with the second device and associate that contact data with the first device, for example with a profile associated with the first device. This profile may be a device profile, such as a profile associated with a vehicle, or a user profile.) in response to detecting the disconnection of wireless communication between the second electronic device and the electronic device, removing the fourth respective speech profile from the plurality of speech profiles. (P0161, In response to receiving the signal indicating termination of the connection, the system and/or the communications system may delete second device contact data associated with the vehicle profile. For example, the system and/or the communications system may determine the second device contact data, or any information associated with the second device contact data, and may remove the data from the vehicle profile or in some way disassociate the data from the vehicle profile.) It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to remove speech profile from first device when disconnected from second device. It would have been obvious to combine the references because the removal allows the first device to only use the speech profiles when the second device is connected to the first device. (Singh PP0047) Allowable Subject Matter Claim 11 and 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Claim 11 depends on claim 10 and extends claim 10 with added limitations. Claim 12 depends on claim 11. More specifically, none of the prior art either alone or in combination, teaches or makes obvious the combination of limitations of “determining whether the plurality of speech recognition results includes a second particular speech recognition result that does not include a user-specific word, wherein the second particular speech recognition result is associated with a weight that exceeds the threshold weight”, “in accordance with a determination that the plurality of speech recognition results includes the second particular speech recognition result, selecting the second particular speech recognition result as the respective speech recognition result”, and “in accordance with a determination that the plurality of speech recognition results does not include the second particular speech recognition result, selecting the first particular speech recognition result as the respective speech recognition result”. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL WONSUK CHUNG whose telephone number is (571)272-1345. The examiner can normally be reached Monday - Friday (7am-4pm)[PT]. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /DANIEL W CHUNG/Examiner, Art Unit 2659 /PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action

Prosecution Timeline

Sep 07, 2022
Application Filed
Apr 03, 2024
Response after Non-Final Action
Apr 22, 2025
Non-Final Rejection — §101, §103
May 29, 2025
Examiner Interview Summary
May 29, 2025
Applicant Interview (Telephonic)
May 30, 2025
Response Filed
Aug 10, 2025
Final Rejection — §101, §103
Aug 28, 2025
Applicant Interview (Telephonic)
Aug 28, 2025
Examiner Interview Summary
Aug 29, 2025
Request for Continued Examination
Sep 03, 2025
Response after Non-Final Action
Sep 05, 2025
Non-Final Rejection — §101, §103
Oct 21, 2025
Applicant Interview (Telephonic)
Oct 22, 2025
Examiner Interview Summary
Nov 24, 2025
Response Filed
Mar 19, 2026
Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12579471
DATA AUGMENTATION AND BATCH BALANCING METHODS TO ENHANCE NEGATION AND FAIRNESS
2y 5m to grant Granted Mar 17, 2026
Patent 12493892
METHOD AND SYSTEM FOR EXTRACTING CONTEXTUAL PRODUCT FEATURE MODEL FROM REQUIREMENTS SPECIFICATION DOCUMENTS
2y 5m to grant Granted Dec 09, 2025
Patent 12400078
INTERPRETABLE EMBEDDINGS
2y 5m to grant Granted Aug 26, 2025
Patent 12387000
PRIVACY-PRESERVING AVATAR VOICE TRANSMISSION
2y 5m to grant Granted Aug 12, 2025
Patent 12380875
SPEECH SYNTHESIS WITH FOREIGN FRAGMENTS
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
54%
Grant Probability
92%
With Interview (+37.5%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month