DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Non-final Office Action mailed on 9/16/2025, Applicant has filed an amendment on 12/10/2025. In this reply, despite advice given to Applicant's representative and consideration given to potential amendments to claim 1 in the interview held on 11/18/2025, Applicant has ultimately decided not to amend any of the independent claims to advance prosecution.
Instead, Applicant has argued that the prior art of record fails to teach the processing of outputting a user speech for learning from among a plurality of external devices based upon the environment information "while the electronic device is operating in a learning mode" (see Remarks, Pages 17-18)- a point discussed and explained in the interview granted for the current round of prosecution. Applicant's arguments have been fully considered, but are not found to be persuasive for the reasons noted in the Response to Arguments section.
In response to the amended, more specific title of the invention (Remarks, page 13), the objection directed towards a non-descriptive title of the invention has been withdrawn.
Applicant argues that the 35 U.S.C. 112(b) indefiniteness issues have been corrected via the instant amendments to claims 4, 11, and 17 (Remarks, Page 13).
In response, due to these corrections, the 35 U.S.C. 112(b) rejections are now moot and have been withdrawn.
Response to Arguments
With respect to independent Claim 1, Applicant argues that Kim (U.S. PG Publication: 2021/0158819 A1) fails to teach "while the electronic device is operating in a learning mode, identify an external device for outputting a user speech for learning from among a plurality of external devices based on the environment information that is among a plurality of environment information stored in the memory" because: i) Kim "appears to already know which external device will output the utterance," ii) the device is not identified from among a plurality of external devices, and iii) the device is not based on the environmental information that was obtained around the electronic device, while the user speech is received, and according to a result of the speech recognition. Applicant closes the arguments with an additional point that Kim does not disclose any identification of an external device to output a user speech for learning based on the environmental information whether Kim is in a learning mode or not (Remarks, Pages 17-18).
In response, it should first be noted that the learning operations taught by Kim are in fact carried out in a learning mode. Specifically, a learning mode in Kim is triggered when a "misrecognized" or "unrecognized" user utterance is identified (Paragraphs 0101-0102 and 0110 noting that this situation triggers "learning to adjust" speech recognition) where for such a situation, "unrecognized or misrecognition situation information" is stored (Paragraph 0121).
Turning now to Applicant argument i) that Kim does not teach an identification of an “external device for outputting a user speech for learning” because Kim appears to know what external device will output the utterance, Applicant is directed towards paragraphs 0112-0113 and 0121 of Kim. In particular, paragraph 0112 describes that the electronic device having a processor operates in an environment where a "plurality of external devices" is "provided" and that the processor can decide upon which of these electronic devices are requested to perform "output of the learning utterance" and in paragraph 0113 can even identify/decide upon a sequence that the external devices are used to output the "learning utterance."
Note that paragraphs 0121-0122 discuss how the processor of the electronic device can decide/identify what external device will output a learning noise or the learning utterance. Since the processor makes decisions on what electronic devices will output the learning utterance and optionally in what order such an output will be performed along with the whole operation being triggered by an "unrecognized or misrecognized" learning mode to improve speech recognition, Applicant's argument i) is not found to be persuasive.
Turning now to Applicant argument ii) that the device is not identified from among a plurality of external devices, Applicant is directed towards the “plurality of external devices” that are provided in Paragraph 0112 for either learning utterance output and/or learning noise output depending on the request determination at the electronic device (see also Paragraphs 0113 and 0121). Thus, since the electronic device operates in an environment including and makes determines to make requests of ones of a plurality of external devices (see the multiple devices shown in Fig. 1) for learning utterance output, Applicant argument ii) is not found to be persuasive.
Turning now to Applicant argument iii) that the device is not based on the environmental information that was obtained around the electronic device, the position of record on this aspect of the limitation was explained in the conducted interview. In particular, the claims are interpreted under the broadest reasonable interpretation (BRI) in light of the specification in examination. Note that the instant claims as drafted are broad with respect to the form that the environmental information may take and how the external device is identified by considering that broadly recited "environment information." The claim does not explain how the environmental information is used to identify a particular external device because the claim only generically recites that the identification is somehow "based on" that information.
In Kim (as well as explained in the Non-final Office Action and the conducted interview), the "environmental noise received together with the user utterance" as part of the "unrecognized or misrecognized situation" was mapped to the claimed "environmental information" (see Paragraph 0121). In the simulation/re-creation of the misrecognition or unrecognition situation then, Kim's electronic device identifies and assigns particular external devices to play back either stored learning speech or a learning environmental noise. Paragraph 0121 notes that the processor can decide upon "various combinations" of learning outputs (where paragraph 0113 notes that devices can be assigned in a specific order) where among these outputs is environmental information/noise. Here, for example, the processor can identify itself as outputting the noise and decide that an external device should output the learning utterance among other configurations (see also Paragraph 0122). Since the electronic device in Kim needs to recreate a misrecognition scenario in which learning noise and speech is output, the device not outputting the noise is assigned the learning utterance. Accordingly, the identification of the device for learning utterance output is "based on" the "environmental information” under the BRI and Applicant's argument iii) is not found to be persuasive.
The art rejections of the remaining independent claim and dependent claims were traversed for reasons similar to claim 1 (Remarks, Page 18). In regards to such arguments, see the response directed towards claim 1.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-7, 9, and 13-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kim (U.S. PG Publication: 2021/0158819 A1).
With respect to Claim 1, Kim discloses:
An electronic device comprising:
a speaker (audio output interface in the form of "at least one speaker," Paragraph 0061);
a microphone (audio receiver implemented with a microphone, Paragraph 0060);
a communication interface (Paragraph 0053- "electronic device...includes a communication interface");
a memory storing at least one instruction (storage device including software, Paragraph 0059); and
one or more processors connected to the speaker, the microphone, the communication interface, and the memory (processor, Paragraph 0059; see also Fig. 2, Element 39 showing a connection to the speaker, microphone, and storage), wherein the one or more processors are configured to:
while the electronic device operates in a speech recognition mode, perform speech recognition by inputting a user speech signal from the microphone into a speech recognition model, the user speech signal corresponding to a user speech received through the microphone (in a voice/speech recognition mode, the processor performs voice recognition processing on the user utterance received through the audio receiver/microphone using "voice recognition models," Paragraphs 0043, 0046, 0060, and 0064),
obtain environment information around the electronic device while the user speech is received according to a result of the speech recognition (device obtains "environmental noise received together with the user utterance" responsive to a speech recognition result in the form of an "unrecognized or misrecognized situation," Paragraph 0121),
store the obtained environment information in the memory (Paragraph 0121- "processor...may store the...situation information" that includes the "environmental noise received together with the user utterance;" See also Paragraphs 0059-0060 and 0072 discussing the collection and storage of noise sounds from the external environment),
while the electronic device is operating in a learning mode (execution of a “learning mode,” Paragraph 0128), identify an external device for outputting a user speech for learning from among a plurality of external devices based on the environment information that is among a plurality of environment information stored in the memory (selection of a particular external device for outputting a learning utterance, Paragraphs 0112-0113 and 0121),
control the communication interface to transmit a command to the external device for controlling the output of the user speech for learning (request or transmit of information for the "learning utterance output" via an external device, Paragraphs 0112-0113, 0121, and 0128; see also Paragraphs 0053 (describing communication between an electronic device and an external device via the communication interface) and 0133).
With respect to Claim 2, Kim further discloses:
The electronic device of claim 1, wherein the one or more processors, based on receiving a user speech for learning signal from the microphone, the user speech for learning signal corresponding to the user speech for learning outputted by the external device and received by the microphone, train the speech recognition model with respect to the received user speech for learning signal (learning noise is performed "in connection with the operation S72 of Fig. 7" wherein this step involves step S71 that is the output of a "learning utterance" triggered by a request from the processor to train recognition characteristics of a voice assistant comprising the voice recognition models, Paragraphs 0043, 0096-0097, 0121, 0130, and 0133).
With respect to Claim 3, Kim further discloses:
The electronic device of claim 1, wherein the one or more processors are further configured to: obtain information on a place where the user speech is uttered while the user speech signal is received in the speech recognition mode, the environment information including the information on the place where the user speech is uttered (environmental/place context information as a part of the "misrecognized situation" during a voice recognition attempt that is obtained and stored, Paragraph 0121), and identify the external device for outputting the user speech for learning from among the plurality of external devices based on the information on the place where the user speech is uttered (identification of a particular external electronic device to output the learning noise based on the environmental context of the misrecognized situation, Paragraphs 0112-0113, 121, 0128, and 0133).
With respect to Claim 4, Kim further discloses:
The electronic device of claim 2, wherein the one or more processors are further configured to:
obtain operation information of the external device while the user speech signal is received in the speech recognition mode, the environment information including the operation information (obtaining and storing "misrecognized situation information" including operating machine/artificial noise associated with an external device, Paragraphs 0121, 0128, and 0133; see also a list of external devices including sound producing devices such as a media player, washing machine, etc. in Paragraph 0044),
transmit, to the external device, a command for operating the external device that corresponds to the operation information of the external device (transmitted activation request to that external device to reproduce learning noise, Paragraph 0121, 0128, and 0133), and
train the speech recognition model when the external device is operated according to the command for operating the external device and the user speech for learning signal is received (training of the voice assistant recognition model responsive to the transmitted request, Paragraphs 0043, 0121, 0128, and 0133).
With respect to Claim 5, Kim further discloses:
The electronic device of claim 4, wherein the one or more processors are further configured to: while the external device is operating according to the command for operating the external device, receive an other noise signal corresponding to an other noise that has occurred in the external device, and train the speech recognition model based on the received user speech signal and the received other noise signal (learning data is received by the processor for training that includes the learning noise associated with the external device to train the voice assistant recognition model "based on learning noise" that is “in conjunction” with the learning utterance, Paragraphs 0043, 0108, 0112-0113, 0121, and 0133).
With respect to Claim 6, Kim further discloses:
The electronic device of claim 1, wherein the one or more processors are further configured to:
in the learning mode, identify an other noise signal corresponding to an other noise that is around the electronic device while the user speech signal is received based on the environment information (identification of context information including "environmental noise received together with the user utterance," Paragraph 0121, wherein such environmental noise is identified in a learning mode, see Paragraphs 0128 and 0133),
control the communication interface to transmit a command to the external device for controlling the external device to output the other noise (Paragraph 0121- "the processor 39 may reproduce the unrecognized or misrecognized situation by causing the external device 22 to output the environmental noise as the second learning noise through the output request of the learning utterance or the transmitted information;" see also Paragraphs 0053, 0128. and 0133), and
during an outputting of the other noise by the external device according to the command to the external device for controlling the external device to output the other noise, receive the user speech for learning signal and the other noise signal, and train the speech recognition model based on the received user speech for learning and the other noise (learning noise is performed "in connection with the operation S72 of Fig. 7" wherein this step involves step S71 that is the output of a "learning utterance" triggered by a request from the processor to train recognition characteristics of a voice assistant comprising the voice recognition models, Paragraphs 0043, 0096-0097, 0121, 0130, and 0133).
With respect to Claim 7, Kim further discloses:
The electronic device of claim 1, further comprising: a sensor (various forms of sensors- camera, direction sensor, distance sensor, etc. discussed in Paragraph 0124),
wherein the one or more processors are further configured to: obtain object information located around the electronic device based on sensing data of the sensor while the user speech signal is received in the speech recognition mode the environment information including the object information (sensor obtains object information in the form of user presence (via audio and/or images) that continues during recognition mode that causes the misrecognition situation in the home and constitutes context (note that user utterance audio may indicate presence), Paragraphs 0062, 0121 and 0124-0126, 0128),
based on receiving new sensing data from the sensor, identify whether object information according to the new sensing data corresponds to the object information included in the environment information (identifying a change from user presence to absence based on the sensor data, Paragraphs 0062, 0124-0126, and 0128),
based on the object information according to the new sensing data corresponding to the object information included in the environment information, enter the learning mode (enter “learning mode” based on the new change in sensing data, Paragraphs 0124-0126 and 0128), and
while operating in the learning mode, control the communication interface to transmit a command to the external device for controlling the output of the user speech for learning to the external device (Paragraph 0121- "causing the external device 22 to output the environmental noise as the second learning noise through the output request of the learning utterance or the transmitted information” in a learning mode, Paragraphs 0128 and 0133).
With respect to Claim 9, Kim further discloses:
The electronic device of claim 2, wherein each of the plurality of environment information includes a confidence score about the result of the speech recognition, and based on the confidence score being greater than or equal to a threshold value, includes a text corresponding to the result of the speech recognition (misrecognition situation including a "probability value or a reliability value" along with the output value in the form of text, Paragraphs 0075; and receiving text data as a result of the voice recognition that is implied to be based on a "predetermined threshold" related to said reliability or probability (i.e., sufficient likelihood the user uttered that particular text; note also that increasing reliability implies that a sufficient reliability is sought or above a predetermined threshold consideration) that may be adjusted to produce said output in the form of text, Paragraphs 0040, 0046, 0051, and 0079).
With respect to Claim 13, Kim further discloses:
The electronic device of claim 1, wherein the one or more processors are further configured to, based on a preset event being identified, enter the electronic device to the learning mode (preset events trigger learning mode such as a misrecognition, a specific time, or user absence among other options, Paragraph 0121 and 0128).
Claim 14 recites an embodiment of the invention directed to a method practiced by the apparatus of claim 1, and thus, is rejected under similar rationale.
Claims 15-20 contain subject matter respectively similar to Claim 2-7, and thus, are rejected under similar rationale.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Bromand (U.S. PG Publication: 2019/0318722 A1).
With respect to Claim 8, Kim teaches the system for reproducing a misrecognition situation to training a speech recognition model using a training user utterance emitted from an external device as applied to claim 2. Kim does not teach that the learning signal/speech is generated via an obtained text-to-speech (TTS) model as set forth in claim 8. Bromand, however, discloses:
The electronic device of claim 2, wherein the one or more processors are further configured to: while operating in the speech recognition mode, generate a text-to-speech (TTS) model based on the user speech received through the microphone, and obtain the user speech for learning signal based on the TTS model (training device that emits a training utterance using a text-to-speech synthesizer wherein the synthesizer model is trained (e.g., "trained neural network") based on "natural samples" obtained from recorded user speech, Paragraphs 0024, 0036, 0038, 0040-0041, and 0045).
Kim and Bromand are analogous art because they are from a similar field of endeavor in simulating speech recognition tests for model training. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize the TTS model for generating a test utterance as taught by Bromand for the learning speech in Kim to provide a predictable result of more efficient testing/training by decreasing the number of natural samples required and increasing the diversity and robustness of the trained model (Bromand, Paragraph 0032).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Lee (U.S. PG Publication: 2019/0392818 A1).
With respect to Claim 10, Kim teaches the system for reproducing a misrecognition situation to training a speech recognition model using a training user utterance emitted from an external device and confidence scoring of speech recognition results as applied to claim 9. Kim does not teach the selection of misrecognition environment information for entries having confidence less than a threshold and confidence scores equal to or greater than a threshold where the latter is then used as learning speech as is set forth in claim 10. Lee, however, discloses:
The electronic device of claim 9, wherein the one or more processors are further configured to: based on the electronic device entering the learning mode, identify any of the plurality of environment information stored in the memory of which the confidence score is less than the threshold value, obtain at least one of the plurality of environment information having the confidence score equal to or greater than the threshold value and having a similarity equal to or greater than a threshold similarity to the environment information, and obtain the user speech for learning signal based on the text included in the at least one of the plurality of environment information having the confidence score equal to or greater than the threshold value (in a training mode- measuring confidence levels of learnable audio data for speech recognition related to similar/matching speech data (Paragraphs 0176-0177 and 0206) wherein different threshold values classify learnable data into learning data (when a confidence value related to similarity is greater than a reference confidence level/threshold) or adaptation data (when a confidence value is less than a reference confidence level) (Paragraphs 0205-0210) and wherein the adaptation data reflects "a practical environment," Paragraphs 0007 and 0226).
Kim and Lee are analogous art because they are from a similar field of endeavor in environmental adaptation for speech recognition. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize the data sample classification taught by Kim in the learning data selection for external devices taught by Kim to provide a predictable result of a classification metric for candidate samples that can allow for improvement of speech recognition performance in a real environment (Lee, Paragraph 0007).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Pogue, et al. (U.S. Patent: 9,799,329).
With respect to Claim 11, Kim teaches the system for reproducing a misrecognition situation to training a speech recognition model using a training user utterance emitted from an external device as applied to claim 2. Kim does not teach the increase of repetition number information included in the environment information based on obtained environment information corresponding to stored environment information, Pogue, however, discloses:
The electronic device of claim 2, wherein the one or more processors are further configured to, based on the obtained environment information corresponding to the environment information stored in the memory, increase repetition number information included in the environment information (incrementing a particular environmental sound signature to increase a count (Col. 3, Lines 3-19 example of a microwave beep as a type of environmental sound in a speech recognition environment that is tracked; particularly see Col. 8, Lines 50-62- After determining that the generated signature matches the stored signature, the process 300, at 312, increments the number of times that the environmental sound has been captured or “heard” within the environment. That is, if the sound has previously been heard two times within the environment, then the process 300 increments this number to indicate that the sound has now been heard three times).
Kim and Pogue are analogous art because they are from a similar field of endeavor in environmental noise adaptation for speech recognition. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize the environmental noise signature tracking taught by Pogue in the environmental noise learning sound processing taught by Kim to provide a predictable result of better identifying the sounds that reoccur within a particular environment (Pogue, Col. 1, Lines 50-55) so that speech recognition can adjust to such prevalent occurrences.
Allowable Subject Matter
Claim 12 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
With respect to Claim 12, the prior art of record taken individually or in combination fails to explicitly teach or fairly suggest the system of claim 1 where based on the repetition number information included in the environment information being equal to or greater than a threshold number of times, obtain at least one of the plurality of environment information having a confidence score of the result of the speech recognition being equal to or greater than a threshold value, and having a similarity equal to or greater than a threshold similarity, obtain the user speech for learning signal based on text included in the at least one environment information, and wherein as the repetition number information increases, the threshold similarity is reduced.
Most pertinent prior art:
Kim (U.S. PG Publication: 2021/0158819 A1) teaches a system that identifies a misrecognition environment situation in speech recognition in a voice assistant and trains a speech recognition model using learning speech and learning noise from external devices (see Paragraphs 0043, 0112-0113, 0121, 0128, and 133). Kim does not consider repetition number information for a correspondence between obtained and stored environmental information being greater than a threshold wherein as the repetition number information increases, the threshold similarity used to obtain a plurality of environment information (i.e., as a result of being greater than or equal to that threshold).
While Pogue, et al. (U.S. Patent: 9,799,329) does track the repetition of environmental sounds such as a microwave or other electronic devices when operating a speech recognizer, Pogue does not adjust a confidence threshold by which to obtain environment information used in training a speech recognition model. Instead, Pogue uses such repetitive/recurring information to filter noise (see Col. 3, Lines 3-19 and Col. 8, Lines 50-62).
Thus, the prior art of record fails to explicitly teach or fairly suggest the invention set forth in claim 12.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655
/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655