DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 09/29/2025 has been entered.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 3, 4, 6, 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee U.S. PAP 2020/0074993 A1 in view of Kwak U.S. PAP 2012/0173244 A1, further in view of DiMascio U.S. Patent No. 11,164,562 B2.
Regarding claim 1 Lee teaches an artificial intelligence (AI) apparatus for recognizing speech (artificial intelligence system, see par. [0005]) comprising:
a memory (memory 160) configured to store an intention mapping table including a mapping item that maps text and intention information (recognition result provision part, see par. [0129]);
a processor configured to: receive first speech data, perform speech recognition to determine first intention information corresponding to the first speech data using the intention mapping table and a natural language processing (NLP) engine (the processor 120 may extract a plurality of utterance intentions from a user voice “Now, my stomach is” that a user uttered so far, see par. [0143]),
receive second speech data (in case an utterance intention does not coincide with the previously determined utterance intention, the processor 120 may provide response information corresponding to an utterance intention of a user determined based on an additional word, instead of the previous response information, see par. [0083]),
However Lee does not teach wherein the second speech data is speech data re-uttered by a user based on the determination that the first speech recognition for the first speech data has failed perform speech recognition to determine second intention information corresponding to the second speech data, based on the second intention information being generated by the NLP engine based on the second speech data, add a first mapping item mapping the first text and the second intention information to the intention mapping table. determine that speech recognition for the first speech data has failed based on the NLP engine failing to generate the first intention information and first text converted from the first speech data not being located in the intention mapping table.
In the same field of endeavor Kwak teaches a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, see abstract. Another example of a voice command recognition model is a statistical dialog model, in which the recognition result of a voice command is assumed as a probability having a plurality of possibilities, and the optimum response is determined based on the probabilities. Different from the rule based dialog model, in the statistical dialog model, all possible dialogs do not need to be constructed individually, and a recognition error is subject to a confirmation process such that the intention of a command may be determined, thereby constructing a potentially more stable dialog model, see par. [0007]. The command intention determining unit may comprise a command intention probability distribution update unit configured to update the command intention probability distribution corresponding to the voice command of the user, an error determining unit configured to determine the error in recognizing the voice command or configured to determine an error in figuring out the command intention through a updated command intention probability distribution, and a re-input requesting response generating unit configured to request re-input of the voice command if the error determining unit determines that an error occurs (the second speech data is speech data re-uttered by a user in a situation where the speech recognition for the first speech data has failed, wherein the situation where the speech recognition for the first speech data has failed includes: the NLP engine fails to generate the first intention information), see par. [0021]. The command intention determining unit may further comprise a confirmation request unit, wherein the command intention selecting unit is further configured to determine whether a plurality of command intentions have a probability that exceeds the threshold value, and the confirmation request unit requests the user for a confirmation of the selected command intention, in response to the plurality of command intentions exceeding the threshold value, see par. [0024]. A sub-dialog to confirm the intention of a user and output the generated sub-dialog. If the channel EBS does not exist in the application rule, the user intention checking unit 137 may generate a response indicating "sorry, but EBS is not supported. Please select another channel." (first text not included in the intention mapping table), see par. [0065].
It would have been obvious to one of ordinary skill in the art to combine the Lee invention with the teachings of Kwak for the benefit of constructing a potentially more stable dialog model, see par. [0007].
However Lee in view of Kwak does not teach the intention mapping table further including a counter value corresponding to each mapping item.
In the same field of endeavor DiMascio teaches a method for entity-level clarification in conversation services is disclosed. The method includes receiving a conversation services training example set, building an entity usage map using the conversation services training example set, receiving a user utterance, and, responsive to receiving the user utterance, generating a clarification response using the entity usage map, see abstract. The method also includes providing the clarification response to a user, see col. 1 lines 24-31. increment a clarification response counter (a counter value corresponding to each mapping item, see claim 1).
It would have been obvious to one of ordinary skill in the art to combine the Lee in view of Kwak invention with the teachings of DiMascio for the benefit of generating a clarification response using the entity usage map, see abstract.
Regarding claim 3 Lee teaches the AI apparatus of claim 2, wherein the processor is configured to: perform a control corresponding to the generated second intention information when speech recognition for the second speech data has succeeded ( initiating an operation for preparing execution of an application for performing an operation corresponding to an utterance intention having the highest reliability among the plurality of utterance intentions, see par. [0020]).
Regarding claim 4 Lee teaches the AI apparatus of claim 1, wherein the second speech is received within a predetermined time period after the first speech data has been received (the electronic device 100 may determine an utterance intention in real time on the basis of a user voice input so far, even before an utterance of a user is completed, see par. [0185]).
Regarding claim 6 Lee teaches the AI apparatus of claim 1, wherein the processor is configured to add the first mapping item to the intention mapping table mapping the first text and the second intention information only when the second intention information is generated using the NLP engine (The model update part 122-5 may make a data recognition model updated based on evaluation of a recognition result provided by the recognition result provision part 122-4. For example, the model update part 122-5 may provide a recognition result provided by the recognition result provision part 122-4 to the model learning part 141-4, and thereby make the model learning part 141-4 update a data recognition model, see par. [0131]).
Regarding claim 13 Lee teaches the AI apparatus of claim 1, wherein the intention mapping table further includes a recommended application, wherein the processor is configured to suggest a recommended application corresponding to the second intention information, determine that speech recognition for the second speech data has succeeded when the recommended application is executed, and add the first mapping item including the recommended application to the intention mapping table (r preparing execution of an application for performing an operation corresponding to an utterance intention having the highest reliability among the plurality of utterance intentions, see par. [0012-0013]).
Regarding claim 14 Lee teaches a method for recognizing speech ( a method for obtaining an utterance intention of a user , see abstract) comprising: .
receiving first speech data (receiving a user voice uttered by a user, see par. [0010]);
performing speech recognition to determine first intention information corresponding to the first speech data using an intention mapping table including a mapping item that maps text and intention information and a natural language processing (NLP) engine (obtain an utterance intention of a user on the basis of at least one word included in the user voice while the user voice is being input, see par. [0010]);
receiving second speech data ( in case an utterance intention does not coincide with the previously determined utterance intention, the processor 120 may provide response information corresponding to an utterance intention of a user determined based on an additional word, see par. [0083]);
performing speech recognition to determine second intention information corresponding to the second speech data (response information corresponding to an utterance intention of a user determined based on an additional word, see par. [0083]);
However Lee does not teach wherein the second speech data is speech data re-uttered by a user based on the determination that the first speech recognition for the first speech data has failed perform speech recognition to determine second intention information corresponding to the second speech data, based on the second intention information being generated by the NLP engine based on the second speech data, add a first mapping item mapping the first text and the second intention information to the intention mapping table. determine that speech recognition for the first speech data has failed based on the NLP engine failing to generate the first intention information and first text converted from the first speech data not being located in the intention mapping table.
In the same field of endeavor Kwak teaches a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, see abstract. Another example of a voice command recognition model is a statistical dialog model, in which the recognition result of a voice command is assumed as a probability having a plurality of possibilities, and the optimum response is determined based on the probabilities. Different from the rule based dialog model, in the statistical dialog model, all possible dialogs do not need to be constructed individually, and a recognition error is subject to a confirmation process such that the intention of a command may be determined, thereby constructing a potentially more stable dialog model, see par. [0007]. The command intention determining unit may comprise a command intention probability distribution update unit configured to update the command intention probability distribution corresponding to the voice command of the user, an error determining unit configured to determine the error in recognizing the voice command or configured to determine an error in figuring out the command intention through a updated command intention probability distribution, and a re-input requesting response generating unit configured to request re-input of the voice command if the error determining unit determines that an error occurs (the second speech data is speech data re-uttered by a user in a situation where the speech recognition for the first speech data has failed, wherein the situation where the speech recognition for the first speech data has failed includes: the NLP engine fails to generate the first intention information), see par. [0021]. The command intention determining unit may further comprise a confirmation request unit, wherein the command intention selecting unit is further configured to determine whether a plurality of command intentions have a probability that exceeds the threshold value, and the confirmation request unit requests the user for a confirmation of the selected command intention, in response to the plurality of command intentions exceeding the threshold value, see par. [0024]. A sub-dialog to confirm the intention of a user and output the generated sub-dialog. If the channel EBS does not exist in the application rule, the user intention checking unit 137 may generate a response indicating "sorry, but EBS is not supported. Please select another channel." (first text not included in the intention mapping table), see par. [0065].
It would have been obvious to one of ordinary skill in the art to combine the Lee invention with the teachings of Kwak for the benefit of constructing a potentially more stable dialog model, see par. [0007].
However Lee in view of Kwak does not teach the intention mapping table further including a counter value corresponding to each mapping item.
In the same field of endeavor DiMascio teaches a method for entity-level clarification in conversation services is disclosed. The method includes receiving a conversation services training example set, building an entity usage map using the conversation services training example set, receiving a user utterance, and, responsive to receiving the user utterance, generating a clarification response using the entity usage map, see abstract. The method also includes providing the clarification response to a user, see col. 1 lines 24-31. increment a clarification response counter (a counter value corresponding to each mapping item, see claim 1).
It would have been obvious to one of ordinary skill in the art to combine the Lee in view of Kwak invention with the teachings of DiMascio for the benefit of generating a clarification response using the entity usage map, see abstract.
Regarding claim 15 Lee teaches a recording medium recording a method of recognizing speech (a non-transitory computer-readable recording medium, see abstract), the method comprising:
receiving first speech data (receiving a user voice uttered by a user, see par. [0010]);
performing speech recognition to determine first intention information corresponding to the first speech data using an intention mapping table including a mapping item that maps text and intention information and a natural language processing (NLP) engine (obtain an utterance intention of a user on the basis of at least one word included in the user voice while the user voice is being input, see par. [0010]);
receiving second speech data ( in case an utterance intention does not coincide with the previously determined utterance intention, the processor 120 may provide response information corresponding to an utterance intention of a user determined based on an additional word, see par. [0083]);
performing speech recognition to determine second intention information corresponding to the second speech data (response information corresponding to an utterance intention of a user determined based on an additional word, see par. [0083]).
However Lee does not teach wherein the second speech data is speech data re-uttered by a user based on the determination that the first speech recognition for the first speech data has failed perform speech recognition to determine second intention information corresponding to the second speech data, based on the second intention information being generated by the NLP engine based on the second speech data, add a first mapping item mapping the first text and the second intention information to the intention mapping table. determine that speech recognition for the first speech data has failed based on the NLP engine failing to generate the first intention information and first text converted from the first speech data not being located in the intention mapping table.
In the same field of endeavor Kwak teaches a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, see abstract. Another example of a voice command recognition model is a statistical dialog model, in which the recognition result of a voice command is assumed as a probability having a plurality of possibilities, and the optimum response is determined based on the probabilities. Different from the rule based dialog model, in the statistical dialog model, all possible dialogs do not need to be constructed individually, and a recognition error is subject to a confirmation process such that the intention of a command may be determined, thereby constructing a potentially more stable dialog model, see par. [0007]. The command intention determining unit may comprise a command intention probability distribution update unit configured to update the command intention probability distribution corresponding to the voice command of the user, an error determining unit configured to determine the error in recognizing the voice command or configured to determine an error in figuring out the command intention through a updated command intention probability distribution, and a re-input requesting response generating unit configured to request re-input of the voice command if the error determining unit determines that an error occurs (the second speech data is speech data re-uttered by a user in a situation where the speech recognition for the first speech data has failed, wherein the situation where the speech recognition for the first speech data has failed includes: the NLP engine fails to generate the first intention information), see par. [0021]. The command intention determining unit may further comprise a confirmation request unit, wherein the command intention selecting unit is further configured to determine whether a plurality of command intentions have a probability that exceeds the threshold value, and the confirmation request unit requests the user for a confirmation of the selected command intention, in response to the plurality of command intentions exceeding the threshold value, see par. [0024]. A sub-dialog to confirm the intention of a user and output the generated sub-dialog. If the channel EBS does not exist in the application rule, the user intention checking unit 137 may generate a response indicating "sorry, but EBS is not supported. Please select another channel." (first text not included in the intention mapping table), see par. [0065].
It would have been obvious to one of ordinary skill in the art to combine the Lee invention with the teachings of Kwak for the benefit of constructing a potentially more stable dialog model, see par. [0007].
However Lee in view of Kwak does not teach the intention mapping table further including a counter value corresponding to each mapping item.
In the same field of endeavor DiMascio teaches a method for entity-level clarification in conversation services is disclosed. The method includes receiving a conversation services training example set, building an entity usage map using the conversation services training example set, receiving a user utterance, and, responsive to receiving the user utterance, generating a clarification response using the entity usage map, see abstract. The method also includes providing the clarification response to a user, see col. 1 lines 24-31. increment a clarification response counter (a counter value corresponding to each mapping item, see claim 1).
It would have been obvious to one of ordinary skill in the art to combine the Lee in view of Kwak invention with the teachings of DiMascio for the benefit of generating a clarification response using the entity usage map, see abstract.
Claim(s) 2, 7-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee U.S. PAP 2020/0074993 A1, in view of Kwak U.S. PAP 2012/0173244 A1, in view of DiMascio U.S. Patent No. 11,164,562 B2 further in view of Jang U.S. PAP 2010/0179812 A1.
Regarding claim 2 Lee in view of Kwak in view of DiMascio does not teach the AI apparatus of claim 1, wherein the processor is configured to:
Further determine that the speech recognition for the first speech data has failed based on reliability of generated first intention information being lower than a first reference reliability value (obtain reliability of a plurality of utterance intentions on the basis of the at least one word input, and based on an utterance intention having reliability equal to or greater than a predetermined value , see par. [0011]; utterance intention does not coincide with the previously determined utterance intention, see par. [0083]).
However Lee in view of Kwak in view of DiMascio does not teach and determine that the speech recognition for the second speech data has failed based on reliability of the generated second intention information being lower than a second reference reliability value.
IN the same field of endeavor Jang teaches methods and devices for learning voice command instructions, which have erred previously, through learning about repeated mistakes, see par. [0003]. Jang teaches he voice command recognition learning unit 220 may perform automated user-adaptable voice command recognition by calculating a similarity between a correct/incorrect pattern and the input voice command instructions in attempting to recognize the user's voice command and utilizing a difference between voice command instructions, which have been previously classified as incorrect before learning, and correct voice command instructions when re-learning a phonemic model of voice command recognition, see par. [0038].
It would have been obvious to one of ordinary skill in the art to combine the Lee in view of Kwak in view of DiMascio invention ith the teachings of Jang for the benefit of learning voice command instructions, which have erred previously, through learning about repeated mistakes, see par. [0003].
Regarding claim 7 Lee in view of Kwak in view of DiMascio does not teach the AI apparatus of claim 1, the counter for the first mapping is set to 1 when the first mapping item is added..
In the same field of endeavor Jang teaches methods and devices for learning voice command instructions, which have erred previously, through learning about repeated mistakes, see par. [0003]. Jang teaches hen a voice command is learned, first, the signal processing apparatus 200 determines whether or not the total number of times of attempts of voice command recognition is 1 (S401), see par. [0057].
It would have been obvious to one of ordinary skill in the art to combine the Lee in view of Kwak invention ith the teachings of Jang for the benefit of learning voice command instructions, which have erred previously, through learning about repeated mistakes, see par. [0003].
Regarding claim 8 Jang teaches the AI apparatus of claim 7, wherein the processor is further configured to determine the second intention information using a second mapping item in the intention mappiong table which maps the second intention information and the second text converted from the second speech data, and based on the second mapping item having a counter greater than or equal to a first reference counter value ( If it is determined that the total number of times of attempts of voice command recognition is 1 (YES in S401), the signal processing apparatus 200 learns voice command feature values of a successful attempt using the Acoustic model (S414). On the other hand, if it is determined that the total number of times of attempts of voice command recognition is not 1 (NO in S401), the signal processing apparatus 200 calculates the similarity between voice command feature values in an unsuccessful attempt and voice command feature values in an successful attempt in a result of the voice command recognition (S402), see par. [0058]).
Regarding claim 9 Lee teaches the AI apparatus of claim 8, wherein the processor is configured to recommend the second intention information of the second mapping item as the first intention information when the counter of the second mapping item is not greater than the second reference counter value (determines whether or not the voice command feature values in the unsuccessful attempt are similar to the voice command feature values in the successful attempt see par. [0059]), and determine the recommended second intention information as the first intention information when receiving a user's explicit consent or implied consent for the recommended intention information, and wherein the second reference counter value is greater than the first reference counter value (The signal processing apparatus 200 determines whether or not the user cancels the corresponding operation (S305). If it is determined that the user does not cancel the corresponding operation, the signal processing apparatus 200 stores the voice command recognition result (S306), see par. [0050]).
Regarding claim 10 Jang teaches the AI apparatus of claim 9, wherein the processor is configured to determine the second intention information of the second mapping item as the first intention information when the counter value of the second mapping item is greater than a second reference counter value (the signal processing apparatus 200 learns the voice command feature values in the unsuccessful attempt using the Acoustic model , see par. [0059]).
Regarding claim 11 Lee teaches the AI apparatus of claim 10, wherein the processor is configured to generate training data corresponding to the second mapping item and update the NLP engine using the generated training data when the counter of the second mapping item is greater than a third counter value, and wherein the third reference counter value is greater than the second reference counter value (the signal processing apparatus 200 adds a word having the voice command feature values in the unsuccessful attempt to a voice command recognition object of the Lexicon model, see par. [0059]).
Regarding claim 12 Lee teaches the AI apparatus of claim 11, wherein the processor is configured to delete the second mapping item from the intention mapping table after the training data has been generated (may update response information by a method of replacing the response information screen such that response information corresponding to the newly determined utterance intention is provided, see par. [0157]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656