DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the submission filed November 16, 2023. Claims 1-20 are pending.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/16/2023; 5/30/2024; 3/4/2025 is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 6, 10-13, 16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shin et al (US Patent Application Publication NO. 2019/0385592), hereinafter Shin, in view of Wang et al (CN 103559537 -English Translation), hereinafter Wang.
Regarding claims 1 and 11, Shin teaches a device (with memory and processor) and method [Fig 1, Fig 2, para 0041] comprising instructions and steps to: obtain a voice signal based on a text to speech (TTS) model (310) wherein the voice signal corresponds to an input text [para 0133 -- the speech synthesis model 310 may learn text data of the first training data and output speech data (first speech data) based on speech corresponding to the text], based on identifying that the voice signal includes an error, identify an error part of the voice signal which includes the identified error [para 00136-0137; 0142-0143; 0147 -- The speech recognition model 320 may learn speech recognition based on the first speech data. The first speech data may be data obtained from the speech synthesis model 310. The first speech recognition result may be text data; controller 330 may control to change the parameter of the speech synthesis model 310 based on the first speech recognition result obtained from the speech recognition model 320 ]. Shin fails to specifically teach identify an activity of each of the plurality of nodes related to the error part, and modify at least one node among the plurality of nodes based on the identified activity of the at least one node. IN a similar field of endeavor, Wang teaches template matching based on error back propagation, providing for determining an output layer node error, a hidden node error from a model with a plurality of neurons, nodes and weights; calculating an error rate for the nodes according to the template; and adjusting weights of nodes having an error rate greater than or equal to a threshold value [para 0010; 0030-0039]. One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the node error detection and weight modifications suggested by Wang, in the system of Shin, and the results would have been predictable and would provide an improved TTS model and thereby increase system performance and the user’s experience,
Regarding claims 2 and 12, the combination of Shin and Wang teaches reducing a weight related to the at least one node [Wang’s node weight adjustments – para 0030-0039].
Regarding claims 3 and 13, the combination of Shin and Wang teaches replacing the at least one node with at least one pre-stored node, wherein the at least one pre-stored node is stored in the memory and corresponds to text corresponding to the error part [Wang para 0035 – invalid nodes deleted..where replacing invalid nodes with known optimum nodes is an obvious step requiring only routine skill in the art].
Regarding claims 6 and 16, the combination of Shin and Wang teaches an automatic speech recognition (ASR) model is stored in the memory [Shin’s speech recognition model 320], and wherein the memory stores instructions configured to, when executed by the at least one processor, cause the electronic device to: obtain text which is a result of applying the ASR model to the voice signal [para 00136-0137; 0142-0143; 0147 -- The first speech recognition result may be text data], and based on identifying that the text includes a part which is different from the input text, identify the part which is different from the input text as the error part [para 00136--0143; 0147 -- may obtain an error value between the first speech recognition result and the second speech recognition result by comparing the second speech recognition result with the first speech recognition result.].
Regarding claims 10 and 19, the combination of Shin and Wang teaches control the communication module to transmit to a server information related to the error part and the modification of the at least one node, receive, through the communication module, a modified TTS model from the server, and update the TTS model stored in the memory based on the modified TTS mode [Shin’s AI server processing including learning processor, model storage and communication unit for data transmission and receival –para 0042; 0065-0067].
Regarding claim 20, Shin teaches a device (with memory and processor) and method [Fig 1, Fig 2, para 0041] comprising instructions and steps to: obtain a voice signal based on a text to speech (TTS) model (310) wherein the voice signal corresponds to an input text [para 0133 -- the speech synthesis model 310 may learn text data of the first training data and output speech data (first speech data) based on speech corresponding to the text], based on identifying that the voice signal includes an error, identify an error part of the voice signal which includes the identified error [para 00136-0137; 0142-0143; 0147 -- The speech recognition model 320 may learn speech recognition based on the first speech data. The first speech data may be data obtained from the speech synthesis model 310. The first speech recognition result may be text data; controller 330 may control to change the parameter of the speech synthesis model 310 based on the first speech recognition result obtained from the speech recognition model 320 ]. Shin fails to specifically teach identify an activity of each of the plurality of nodes related to the error part, and reduce a weight related to least one node among the plurality of nodes based on the identified activity of the at least one node. IN a similar field of endeavor, Wang teaches template matching based on error back propagation, providing for determining an output layer node error, a hidden node error from a model with a plurality of neurons, nodes and weights; calculating an error rate for the nodes according to the template; and adjusting weights of nodes having an error rate greater than or equal to a threshold value [para 0010; 0030-0039]. One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the node error detection and weight modifications suggested by Wang, in the system of Shin, and the results would have been predictable and would provide an improved TTS model and thereby increase system performance and the user’s experience,
Claims 4-5, 7, 14-15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Shin in view of Wang and further in view of Won et al (KR 10-2386635 – English translation), hereinafter Won.
Regarding claims 4 and 14, Shin and Wang fail to specifically teach identifying that the voice signal includes at least one phoneme having a length equal to or greater than a preset length, identify a part of the voice signal corresponding to the at least one phoneme as the error part. Won teaches the evaluation items of speech synthesis data may include the presence of errors for each analysis unit included in the speech synthesis data, the type of error, the degree of error, and the location of the analysis unit determined as an error in the sentence corresponding to the speech synthesis data [para 0058]. One having ordinary skill in the art would have recognized the advantages of implementing the speech synthesis evaluation techniques suggested by Won, in the system of Shin/Wang, and the results would be predictable so as to determine specific locations and identifications of errors to more clearly determine which portions of the model should be adjusted, so as to provide an improved TTS model and thereby increase system performance and the user’s experience.
Regarding claims 5 and 15, the combination of Shin, Wang and Won teaches based identifying that the voice signal includes a waveform part having an abnormal waveform, identify the waveform part as the error part [Won’s speech synthesis evaluation – para 0058].
Regarding claims 7 and 17, the combination of Shin, Wang and Won teaches display the input text on the display, and identify the error part based on a user input received through the display, wherein the user input comprises selection of a portion of the input text [Won’s user interface and checkbox features – para 0077].
Claims 8-9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Shin in view of Wang and further in view of Yun et al (KR 10-2018-0039371 – English translation), hereinafter Yun.
Regarding claims 8-9 and 18, Shin and Wang fail to teach identifying a sentence structure of the input text; obtaining, based on the sentence structure, at least one character string; obtaining a character string voice signal resulting from inputting the at least one character string into the TTS model; and identifying, based on the character string voice signal, whether the error part has been modified, wherein the obtaining at least one character string comprises changing a text before or after a portion of the input text corresponding to the error part. Yun teaches performing speech synthesis by receiving text as an input from a user; when determining that there is an error in text as the speech recognition, providing and displaying recognition error content and an error correction request to and on an interface unit; and when an agent unit is provided with a response to an error correction request, reflecting the response, correcting the error, and then providing error-corrected text [para 0051-0054]. One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the correction of errors using user interactions as suggested by Yun, in the system of Shin/Wang, so as to allow for correcting an error of an automatic interpretation through interaction with a user, and the results would have been predictable and provided a more user-friendly system and thereby enhance and improve the user’s experience.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Petrochuk (US Patent No. 12,051,428) discloses training a speech synthesis neural network model that generates the speech waveform, determines loss, where loss is used with a back-propagation process (or a different technique for performing a similar feedback process of modifying weights) to update the weights of the model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659
/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659