DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 11/24/2025.
Claims 1-20 are pending and have been examined.
All previous objections / rejections not mentioned in this Office Action have been withdrawn by the examiner.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendments
Regarding the Applicant’s arguments for the rejections under 35 U.S.C. § 101, applicant has amended independent claims 1, 19 and 20 to include “in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application”. Applicant asserts that the amended limitations work to cover a particular solution to a problem and improve upon conventional systems by increasing speech recognition accuracy based on improving the relevancy of stored speech profiles. Examiner respectfully disagrees. An important consideration in determining whether a claim improves technology or a technical field is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. MPEP 2106.05(a). Here, the amended claim language recites only an idea of a solution and fails to recite details of how a solution to a problem is accomplished. For example, the amended limitation can be interpreted as adding a word to a first speech profile based on usage frequency threshold. The claim does not recite how the word in the speech profile increases speech recognition accuracy. Applicant asserts in their remarks that speech profile based on “application-specific language” can make devices more efficient by increasing speech recognition accuracy. The claim, as currently recited, is missing the details of how the addition of specific words in the speech profile achieves the desired outcome of increased speech recognition accuracy.
Regarding the Applicant’s arguments for the rejections under 35 U.S.C. § 103 applicant has amended independent claims 1, 19, and 20. Hence, the Applicant’s arguments are moot in view of new grounds of rejection. Hence, new references have been applied.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8, 10-14, and 16-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1, 19, and 20 the limitations of “receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application receiving a second speech input from a first user; in response to receiving the second speech input, obtaining a combined speech profile from a plurality of speech profiles, wherein the plurality of speech profiles includes the first respective speech profile; interpreting the second speech input based on the combined speech profile to obtain a plurality of speech recognition results; wherein the plurality of speech recognition results includes a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and selecting, from the plurality of speech recognition results, a respective speech recognition result based on an identified voice profile”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human listening to speech including unique dialect words, determining the usage frequency of the dialect words and adding the word to a list in the mind, listening to speech and interpreting the speech input based on identified subset of dialects of users to produce speech recognition results based on the dialects, and selecting a speech recognition result that most accurately corresponds to the dialect of the speech. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
This judicial exception is not integrated into a practical application because the recitation of a device in claim 1 and a non-transitory computer readable storage medium in claims 20, reads to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using P0030-P0042 in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to listen to speech including unique dialect words, determine the usage frequency of the dialect words and adding the word to a list in the mind, listen to speech and interpreting the speech input based on identified subset of dialects of users to produce speech recognition results based on the dialects, and select a speech recognition result that most accurately corresponds to the dialect of the speech amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.
With respect to claim 2, the claim recites “identifying, from the combined speech profile, a second user-specific word from a first respective speech profile of the plurality of speech profiles, wherein the combined speech profile includes the first user-specific word”, which reads on a human thinking of user specific words that different people use. No additional limitations are present.
With respect to claim 3, the claim recites “identifying a second word from a third respective speech profile of the plurality of speech profiles” and “identifying a first weight associated with the first user-specific word; and identifying a second weight associated with the second word, wherein the second weight is less than the first weight”, which reads on a human identifying words from speech in the mind where specific words from one person is considered more identifiable than specific words from another person. No additional limitations are present.
With respect to claim 4, the claim recites “wherein the first user-specific word corresponds to at least one of an object stored on the device or an object stored in association with a user profile”, which reads on a human thinking of a specific word that a person uses. No additional limitations are present.
With respect to claim 5, the claim recites “identifying a reference word associated with a usage exceeding a threshold usage, wherein the first user-specific word corresponds to the reference word”, which reads on a human identifying a specific word that a person uses when the person uses the word multiple times. No additional limitations are present.
With respect to claim 6, the claim recites “while interpreting the second speech input, determining the identified voice profile based on characteristics of the second speech input”, which reads on a human identifying a voice from speech according to the characteristics of the speech in the mind. No additional limitations are present.
With respect to claim 7, the claim recites “wherein determining the voice profile based on characteristics of the second speech input includes comparing the characteristics of the second speech input to each voice profile of a plurality of voice profiles”, which reads on a human identifying a voice from speech according to the characteristics of the speech in the mind compared to other voices. No additional limitations are present.
With respect to claim 8, the claim recites “determining a word from the second speech input”, “identifying a first word, within a second respective speech profile of the plurality of speech profiles, corresponding to the determined word”, and “identifying a second word, within a third respective speech profile of the plurality of speech profiles, corresponding to the determined word”, which reads on a human identifying, in the mind, specific words that different people use in speech. No additional limitations are present.
With respect to claim 10, the claim recites “identifying, from the plurality of speech recognition results, a first particular speech recognition result associated with a weight value exceeding a threshold weight” and “in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result”, which reads on a human identifying a specific word that is not used by identified people and utilizing general knowledge to identify text from speech in the mind. No additional limitations are present.
With respect to claim 11, the claim recites “determining whether the plurality of speech recognition results includes a second particular speech recognition result that does not include a user-specific word, wherein the second particular speech recognition result is associated with a weight that exceeds the threshold weight”, “in accordance with a determination that the plurality of speech recognition results includes the second particular speech recognition result, selecting the second particular speech recognition result as the respective speech recognition result”, and “in accordance with a determination that the plurality of speech recognition results does not include the second particular speech recognition result, selecting the first particular speech recognition result as the respective speech recognition result”, which reads on a human selecting a particular speech recognition result in the mind where the speech recognition result does not include a specific word that a person uses. No additional limitations are present.
With respect to claim 12, the claim recites “in accordance with a determination that the first particular speech recognition result includes predefined content, modifying the first particular speech recognition result” and “providing the modified first particular speech recognition result as the respective speech recognition result”, which reads on a human choosing a particular speech recognition result according to predefined context in the mind. No additional limitations are present.
With respect to claim 13, the claim recites “determining a plurality of words from the second speech input”, “comparing the plurality of words to a respective plurality of words included in the combined speech profile”, and “selecting the respective speech recognition result based on the comparison”, which reads on a human selecting a speech recognition result based on specific word that a person uses. No additional limitations are present.
With respect to claim 14, the claim recites “in response to selecting the respective speech recognition result, removing, from the electronic device, the combined speech profile”, which reads on a human ignoring different dialects in the mind once the mind acknowledges the speech input is of a specific dialect. No additional limitations are present.
With respect to claim 16, the claim recites “receiving a third speech input from a respective user, wherein the third speech input includes a respective word” and “in accordance with a determination that one or more criteria are met, adding, to a speech profile corresponding to the respective user, the respective word”, which reads on a human adding to the mind a specific word that a person uses. No additional limitations are present.
With respect to claim 17, the claim recites “wherein the one or more criteria includes a criterion that the respective word is not included in the speech profile corresponding to the respective user”, which reads on a human adding to the mind a specific word that a person uses if it was not previously acknowledged. No additional limitations are present.
With respect to claim 18, the claim recites “wherein the one or more criteria includes a criterion that the respective word was previously received from the respective user at least as threshold number of times”, which reads on a human adding to the mind a specific word that a person uses a specific number of times. No additional limitations are present.
These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-9, 13, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Klose et al. (U.S. PG Pub No. 20160372117), hereinafter Klose, in view of Endo et al. (U.S. Patent No. 7228275), hereinafter Endo, and in further view of Wan et al. (U.S. PG Pub No. 20230035947), hereinafter Wan.
Regarding claim 1, 19, and 20 Klose teaches:
(Claim 1) An electronic device, comprising: (P0033, Functions explained herein below may be implemented using individual hardware circuitry, using software functioning in conjunction with a programmed microprocessor or a general purpose computer.)
(Claim 1) one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: (P0033, Embodied in a device (e.g., the speech recognition unit described herein below), a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.)
(Claim 19) A computer-implemented method, comprising: (P0013, A method of performing speech recognition.)
(Claim 19) an electronic device with one or more processors and memory: (P0033, Embodied in a device (e.g., the speech recognition unit described herein below), a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.)
(Claim 20) A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of a first electronic device, cause the first electronic device to: (P0033, Embodied in a device (e.g., the speech recognition unit described herein below), a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.)
receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile; (P0013, performing speech recognition on the speech signal using the selected user profile.; P0039, The receiving component is configured to receive, in step 5202, a speech signal spoken by a user.)
receiving a second speech input from a user; (P0039, The receiving component is configured to receive, in step 5202, a speech signal spoken by a user.)
in response to receiving the second speech input, obtaining a combined speech profile from a plurality of speech profiles, wherein the plurality of speech profiles includes the first respective speech profile; (P0020, Speaker recognition may be performed on a first portion of the speech signal. [Receiving the speech input.]; P0042, A plurality of speech recognition user profiles may be downloaded from the remote source beforehand and the selected user profile may be selected among the plurality of downloaded speech recognition user profiles accordingly. [Combined speech profile is defined broadly in the specification and includes “combined speech profile may include a speech profile for user”. Specification P0249.])
wherein the plurality of speech recognition results includes: a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking.; P0041, A speech recognition user profile which is associated with the identified user.)
a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking.; P0041, A speech recognition user profile which is associated with the identified user.)
Klose does not specifically teach:
receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile;
in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application;
interpreting the second speech input based on the combined speech profile to obtain a plurality of speech recognition results:
wherein the plurality of speech recognition results includes: a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and
a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and
selecting, from the plurality of speech recognition results, a respective speech recognition result based on an identified voice profile.
Endo, however, teaches:
interpreting the second speech input based on the combined speech profile to obtain a plurality of speech recognition results: (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.)
wherein the plurality of speech recognition results includes: a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.)
a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user; and (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.)
selecting, from the plurality of speech recognition results, a respective speech recognition result based on an identified voice profile. (Col. 8, Lines 22-24, The method of FIG. 4 selects the recognized speech text with the highest raw confidence score as the speech recognition result.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to obtain a plurality of speech recognition results from speech input. It would have been obvious to combine the references because utilizing multiple speech recognizer and selecting the best takes advantage of the strengths, while complementing the weaknesses, of each speech recognizer due to accent and identify of the speaker. (Endo Col. 3, Lines 3-16)
Klose in view of Endo does not specifically teach:
receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile;
in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application;
Wan, however, teaches:
receiving a first speech input including a first word, wherein the first word is not included in a first respective speech profile; (P0155, The customized language model and the keyword for the preset scenario are obtained based on the shared text. The hot word list for the preset scenario is updated based on the keyword to obtain the new hot word list. Speech recognition for the preset scenario is performed with the customized language model and the new hot word list.)
in accordance with a determination that the first word satisfies a usage frequency threshold, adding the first word to the first respective speech profile, wherein the usage frequency threshold corresponds to a usage frequency of the first word with a first application; (P0103, In step S603, a keyword is determined based on word frequencies of phrases and a word frequency threshold. The word frequency of a phrase represents the number of occurrences of the phrase in the phrase set or the sentence set.; P0117, Keywords are filtered by calculating language model scores of sentences including homonyms of the keywords. The keyword associated with a language model score higher than its homonym(s) is added to the hot word list. The keyword takes effect immediately when the keyword is added to the hot word list.; P0082, Speech recognition for the preset scenario is performed with the customized language model and the new hot word list.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add a word to a speech profile based on usage frequency. It would have been obvious to combine the references because a speech recognition model needs to account for usage scenario for higher speech recognition accuracy. (Wan P0004)
Regarding claim 2 Klose in view of Endo and further view of Wan teach claim 1.
Klose further teaches:
identifying, from the combined speech profile, a first user-specific word from a second respective speech profile of the plurality of speech profiles, wherein the combined speech profile includes the first user-specific word. (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking. Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase in advance, as described above), wherein the speaker recognition data may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations).; P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.)
Regarding claim 3 Klose in view of Endo and further view of Wan teach claim 2.
Klose further teaches:
identifying a second word from a third respective speech profile of the plurality of speech profiles; and (P0040, The speaker recognition component is configured to perform speaker recognition, in step 5204, on the received speech signal to identify the user from the speech signal. In other words, the speaker recognition component identifies the user who is actually speaking. Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase in advance, as described above), wherein the speaker recognition data may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations).; P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.)
Klose does not specifically teach:
identifying a first weight associated with the first user-specific word; and identifying a second weight associated with the second word, wherein the second weight is less than the first weight.
Endo, however, teaches:
identifying a first weight associated with the first user-specific word; and identifying a second weight associated with the second word, wherein the second weight is less than the first weight. (Col. 5, Lines 56-60, For example, one speech recognizer 204 may output its speech recognition result as slot-value pairs, such as <device=“television”: confidence score=80>and <action=“on”: confidence score=60>.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to identify confidence scores for words in the recognition results. It would have been obvious to combine the references because utilizing multiple speech recognizer and selecting the best takes advantage of the strengths, while complementing the weaknesses, of each speech recognizer due to accent and identify of the speaker. (Endo Col. 3, Lines 3-16)
Regarding claim 4 Klose in view of Endo and further view of Wan teach claim 2.
Klose further teaches:
wherein the first user-specific word corresponds to at least one of an object stored on the electronic device or an object stored in association with a user profile. (P0042, The selected user profile may be stored in a respective storage device accessible by the speech recognition unit and, thus, the user profile may be applied immediately upon selection.; P0043, The user profile … may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations). … The user profile is thus used to implement a speaker-dependent speech recognition technique, as described above.; P0025, Speaker recognition may be performed based on a vocabulary.)
Regarding claim 5 Klose in view of Endo and further view of Wan teach claim 2.
Klose further teaches:
identifying a reference word associated with a usage exceeding a threshold usage, wherein the first user-specific word corresponds to the reference word. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.)
Regarding claim 6 Klose in view of Endo and further view of Wan teach claim 1.
Klose further teaches:
while interpreting the second speech input, determining the identified voice profile based on characteristics of the second speech input. (P0024, Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase, as described above). The speaker recognition data may include at least one of voice characteristics of the user, pronunciation characteristics of the user, a vocabulary characteristic to the user.)
Regarding claim 7 Klose in view of Endo and further view of Wan teach claim 6.
Klose further teaches:
wherein determining the identified voice profile based on characteristics of the second speech input includes comparing the characteristics of the second speech input to each voice profile of a plurality of voice profiles. (P0024, Speaker recognition may be performed based on speaker recognition data (e.g., generated in an enrollment phase, as described above). The speaker recognition data may include at least one of voice characteristics of the user, pronunciation characteristics of the user, a vocabulary characteristic to the user.; P0023, The selected user profile is selected among the plurality of speech recognition user profiles.)
Regarding claim 8 Klose in view of Endo and further view of Wan teach claim 1.
Klose further teaches:
determining a word from the second speech input; (P0017, Speech recognition may include converting the speech signal into a text message.)
identifying a first word, within a second respective speech profile of the plurality of speech profiles, corresponding to the determined word; and (P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.)
identifying a second word, within a third respective speech profile of the plurality of speech profiles, corresponding to the determined word. (P0025, More particularly, speaker recognition may be performed based on a vocabulary limited to terms relating to the cabin or situations around the cabin of the aircraft, e.g., the vocabulary may be limited to words, phrases and/or formulations used in typical cabin announcements.)
Regarding claim 9 Klose in view of Endo and further view of Wan teach claim 1.
Klose further teaches:
establishing wireless communication with a second electronic device; and (P0042, In case the selected user profile is not available to the speech recognition unit, it may be downloaded from a remote source, such as a remote server hosted by an airline which provides user profiles of airline crew members for download.)
in response to establishing wireless communication with the second electronic device, receiving a speech profile from the second electronic device, wherein the plurality of speech profiles includes the received speech profile. (P0042, In case the selected user profile is not available to the speech recognition unit, it may be downloaded from a remote source, such as a remote server hosted by an airline which provides user profiles of airline crew members for download.)
Regarding claim 13 Klose in view of Endo and further view of Wan teach claim 1.
Klose does not specifically teach:
determining a plurality of words from the second speech input;
comparing the plurality of words to a respective plurality of words included in the combined speech profile; and
selecting the respective speech recognition result based on the comparison.
Endo, however, teaches:
determining a plurality of words from the speech input; (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.)
comparing the plurality of words to a respective plurality of words included in the combined speech profile; and (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.)
selecting the respective speech recognition result based on the comparison. (Col. 2, Lines 28-33, The present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts and associated confidence scores and a decision module selecting one of the speech texts based upon their associated confidence scores.; Col. 2, Lines 45-48, The decision module selects either the first speech text or the second speech text as the output speech text depending upon which of the first and second confidence scores is higher.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to compare words to words included in the combined speech profile where a selection is made on the speech recognition result based on the comparison. It would have been obvious to combine the references because utilizing multiple speech recognizer and selecting the best takes advantage of the strengths, while complementing the weaknesses, of each speech recognizer due to accent and identify of the speaker. (Endo Col. 3, Lines 3-16)
Regarding claim 16 Klose in view of Endo and further view of Wan teach claim 1.
Klose further teaches:
receiving a second speech input from a respective user, wherein the second speech input includes a respective word; and (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.)
in accordance with a determination that one or more criteria are met, adding, to a speech profile corresponding to the respective user, the respective word. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.)
Regarding claim 17 Klose in view of Endo and further view of Wan teach claim 16.
Klose further teaches:
wherein the one or more criteria includes a criterion that the respective word is not included in the speech profile corresponding to the respective user. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.)
Regarding claim 18 Klose in view of Endo and further view of Wan teach claim 16.
Klose further teaches:
wherein the one or more criteria includes a criterion that the respective word was previously received from the respective user at least as threshold number of times. (P0043, The user profile (e.g., generated in a user-specific training in advance) may include voice and/or pronunciation characteristics of the user, a vocabulary characteristic to the user (e.g., characteristic words, phrases and/or other often used formulations), as well as probabilities of occurrences of words, phrases and/or formulations in the language commonly used by the user.)
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Klose in view of Endo, in view of Wan, and further view of Coifman et al. (U.S. PG Pub No. 20110153620).
Regarding claim 10 Klose in view of Endo and further view of Wan teach claim 1.
Klose does not specifically teach:
identifying, from the plurality of speech recognition results, a first particular speech recognition result associated with a weight value exceeding a threshold weight; and
in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result.
Endo, however, teaches:
identifying, from the plurality of speech recognition results, a first particular speech recognition result associated with a weight value exceeding a threshold weight; and (Fig. 4, Perform multiple speech recognition. Confidence score exceeding a threshold.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to identify speech recognition results where the weight exceeds a threshold. It would have been obvious to combine the references because utilizing speech recognition result that exceeds a threshold ensures that the confidence score is reliable. (Endo Col. 8, Lines 31-34)
Klose in view of Endo and further view of Wan does not specifically teach:
in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result.
Coifman, however, teaches:
in accordance with a determination that the first particular speech recognition result includes a user-specific word that does not correspond to the identified voice profile, selecting the respective speech recognition result based on availability of a general speech recognition result. (Fig. 7, Evaluating speech input against text strings in selected databases -> (no) -> Evaluating speech input against text strings in generalized / base vocabulary database.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to utilize general speech recognition result when user specific words do not correspond to identified voice profile. It would have been obvious to combine the references because generalized dictionary acts as a default fallback in the case where evaluation criteria is not met. (Coifman, P0064)
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Klose in view of Endo, in view of Wan, and further view of Nemoto (U.S. PG Pub No. 20010039492).
Regarding claim 14 Klose in view of Endo and further view of Wan teach claim 1.
Klose in view of Endo and further view of Wan does not specifically teach:
in response to selecting the respective speech recognition result, removing, from the electronic device, the combined speech profile.
Nemoto, however, teaches:
in response to selecting the respective speech recognition result, removing, from the electronic device, the combined speech profile. (P0018, The present invention removes a speech element array corresponding to a reading that the person did not use in the first response in the conversation.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to remove combined speech profile that was not selected. It would have been obvious to combine the references because the same person can maintain the same reading consistently in one conversation so utilizing other speech element would lower recognition probability for subsequent user responses.. (Nemoto P0018)
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Klose in view of Endo, in view of Wan, and further view of Singh et al. (U.S. PG Pub No. 20200160860).
Regarding claim 15 Klose in view of Endo, and further view of Wan teach claim 1.
Klose in view of Endo and further view of Wan does not specifically teach:
detecting a disconnection of wireless communication between a second electronic device and the electronic device, wherein the plurality of speech profiles includes a fourth respective speech profile received from the second electronic device; and
in response to detecting the disconnection of wireless communication between the second electronic device and the electronic device, removing the fourth respective speech profile from the plurality of speech profiles.
Singh, however, teaches:
detecting a disconnection of wireless communication between a second electronic device and the electronic device, wherein the plurality of speech profiles includes a fourth respective speech profile received from the second electronic device; and (P0153, Vehicle may establish the connection with the second device using a wireless link.; P0159, Determine that the connection is no longer active, which indicates that the connection is terminated, such as when the second device is out of range.; P0036, The system may obtain contact data associated with the second device and associate that contact data with the first device, for example with a profile associated with the first device. This profile may be a device profile, such as a profile associated with a vehicle, or a user profile.)
in response to detecting the disconnection of wireless communication between the second electronic device and the electronic device, removing the fourth respective speech profile from the plurality of speech profiles. (P0161, In response to receiving the signal indicating termination of the connection, the system and/or the communications system may delete second device contact data associated with the vehicle profile. For example, the system and/or the communications system may determine the second device contact data, or any information associated with the second device contact data, and may remove the data from the vehicle profile or in some way disassociate the data from the vehicle profile.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to remove speech profile from first device when disconnected from second device. It would have been obvious to combine the references because the removal allows the first device to only use the speech profiles when the second device is connected to the first device. (Singh PP0047)
Allowable Subject Matter
Claim 11 and 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Claim 11 depends on claim 10 and extends claim 10 with added limitations. Claim 12 depends on claim 11. More specifically, none of the prior art either alone or in combination, teaches or makes obvious the combination of limitations of “determining whether the plurality of speech recognition results includes a second particular speech recognition result that does not include a user-specific word, wherein the second particular speech recognition result is associated with a weight that exceeds the threshold weight”, “in accordance with a determination that the plurality of speech recognition results includes the second particular speech recognition result, selecting the second particular speech recognition result as the respective speech recognition result”, and “in accordance with a determination that the plurality of speech recognition results does not include the second particular speech recognition result, selecting the first particular speech recognition result as the respective speech recognition result”.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL WONSUK CHUNG whose telephone number is (571)272-1345. The examiner can normally be reached Monday - Friday (7am-4pm)[PT].
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DANIEL W CHUNG/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659