DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 04/29/2026 has been entered.
Response to Arguments/Amendments
3. With respect to Claim Rejection 35 U.S.C § 102/103, Applicant argues on page 1 of the Remarks that “Applicant respectfully submits that the claims as presently amended include the distinguishing features that formed the basis of the previously indicated Allowable Subject Matter while avoiding the written description issue identified in the prior Office Action. Support for the amended claims can be found at least in paragraphs [0053], [0057], [0086], [0089], and [0092] of the present application.
The features of the amended claims are as follows:
1. Generating pronouncing time point information representing pronouncing time points of the voices pronounced by the speakers based on the voice signals.
2. Determining an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same.
3. When pronouncing sections of the voices of different speakers overlap each other at least partly, the output circuit sequentially outputs the translation results in accordance with the determined output order without overlapping the translation results with each other.
4. The translation results include voice signals related to voices obtained by translating the voices.
Accordingly, even when the speakers pronounce the voices in an overlapping manner, not only can the voices of the respective speakers be accurately recognized and translated, but also the translation results can be sequentially output without overlapping each other, thereby enabling smooth communication between the speakers. See paragraphs [0010] and [0092] of the present application.”
In response, Examiner respectfully notes that the amended claim recites
“determine an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same,
wherein the output circuit is configured to sequentially output the translation results in accordance with the determined output order without overlapping the translation results with each other when pronouncing sections of the voices of different speakers overlap each other at least partly, and”
Claim recites outputting the translation results sequentially in accordance with the determined output order without overlapping when pronouncing sections of the voices of different speakers overlap each other at least partly.
Claim recites: a) output the translation results sequentially (i.e., one after the other, without overlapping), b) order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same), and c) pronouncing sections of the voices of different speakers overlap each other at least partly.
Logically, this scenario is impossible.
When speakers speak at the same time at least partly, the pronouncing time points of the speakers’ utterances are the same at least partly. In this case, if the output order of the translation results and a pronouncing order of the voices are the same, the translation results have to be overlapped at least partly.
Paragraph [0010] discloses “[0010] The voice processing device according to embodiments of the present disclosure can generate the translation results for the voices of the speakers, and output the translation results in accordance with the output order determined based on the pronouncing time points of the voices of the speakers. Accordingly, the voice processing device has the effect of being able to accurately recognize and translate the voices of the speakers even if the speakers overlappingly pronounce the voices, and to smoothly perform communications between the speakers by sequentially outputting the translations of the speakers.” This paragraphs disclose order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same).
Paragraph [0053] discloses “[0053] Further, according to embodiments, the voice processing device 100 may judge pronouncing time points of the voices of the respective speakers SPK1 to SPK4 by using the separated voice signals, and generate and store pronouncing time point information representing the pronouncing time points.” This paragraph discloses generating and store pronouncing time point information representing the pronouncing time points.
Paragraph [0057] disclose “[0057] In this case, although the pronouncing section of the voice "AAA" and the pronouncing section of the voice “BBB” may overlap each other at least partly, the voice processing device 100 according to embodiments of the present disclosure may generate the first separated voice signal related to the voice "AAA" and the second separated voice signal related to the voice “BBB”.” This paragraph discloses generating separated voice signals related to voices. This paragraph discloses pronouncing sections of the voices of different speakers overlap each other at least partly.
Paragraph [0086] discloses “[0086] The voice processing device 100 may determine the output order of the translation results for the voices based on the pronouncing time points of the voices of the speakers SPK1 to SPK4. According to embodiments, the voice processing device 100 may generate pronouncing time point information representing pronouncing time points of the voices based on the voice signals related to the voices. The voice processing device 100 may determine the output order for the voices based on the pronouncing time point information, and output the translation results in accordance with the determined output order.” This paragraph discloses that the translation results are output in accordance with the determined output order and the output order is determined based on the pronouncing time point information.
Paragraph [0089] discloses “[0089] For example, the voice processing device 100 may determine the output order of the translation results so as to be the same as the pronouncing order of the voices, and output the translation results for the voices in accordance with the determined output order. That is, by the voice processing device 100, the translation result for the first pronounced voice may be first output.” This paragraph discloses order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same),
Paragraph [0092] discloses “[0092] The voice processing device 100 according to embodiments of the present disclosure may determine the source languages and the target languages in accordance with the voice source positions of the voices of the speakers SPK1 to SPK4, translate18 the voices of the speakers SPK1 to SPK4 in accordance with the determined source languages and target languages, and output the translation results. In this case, the translation results may be output in accordance with the output order that is determined in accordance with the pronouncing time points of the voices of the speakers SPK1 to SPK4. Accordingly, not only the voices of the speakers can be accurately recognized and translated even if the speakers SPK1 to SPK4 overlappingly pronounce the voices, but also the translations of the speakers SPK1 to SPK4 can be sequentially output, so that the communications between the speakers SPK1 to SPK4 can be smoothly performed.” First, the paragraph discloses that the translation results is output in accordance with the output order that is determined in accordance with the pronouncing time points of the voices of the speakers SPK1 to SPK4. It means the translation result for the first pronounced voice is first output and the translation result for the second pronounced voice is second output. Secondly, the paragraph discloses that if the speakers SPK1 to SPK4 overlappingly pronounce the voices, but also the translations of the speakers SPK1 to SPK4 can be sequentially output. Yes, the translations of the speakers SPK1 to SPK4 could be/is able to sequentially output if the speakers SPK1 to SPK4 overlappingly pronounce the voices. However, the paragraph does not disclose the translations of the speakers SPK1 to SPK4 can be sequentially output in order based on pronouncing time points order of the voice in response to the speakers overlappingly at least partly pronouncing the voice.
There is no support for the amended claim. More specifically, there is no support for all of these following features in combination: a) output the translation results sequentially (i.e., one after the other, without overlapping), b) order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same), and c) pronouncing sections of the voices of different speakers overlap each other at least partly.
Overall, Examiner does not agree that “Support for the amended claims can be found at least in paragraphs [0053], [0057], [0086], [0089], and [0092] of the present application.”
Examiner agrees that none of the cited references, individually or in combination, disclose or suggest the subject matter of amended independent. Thus, 102/103 rejections have been withdrawn.
Claim Rejections - 35 USC § 112
4. The following is a quotation of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
5. Claims 1, 3-6, 8-9, 11-12, 14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1 recites
“determine an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same,
wherein the output circuit is configured to sequentially output the translation results in accordance with the determined output order without overlapping the translation results with each other when pronouncing sections of the voices of different speakers overlap each other at least partly, and”
Claim 9 recites the similar features as Claim 1.
Paragraph [0010] discloses “[0010] The voice processing device according to embodiments of the present disclosure can generate the translation results for the voices of the speakers, and output the translation results in accordance with the output order determined based on the pronouncing time points of the voices of the speakers. Accordingly, the voice processing device has the effect of being able to accurately recognize and translate the voices of the speakers even if the speakers overlappingly pronounce the voices, and to smoothly perform communications between the speakers by sequentially outputting the translations of the speakers.” This paragraphs disclose order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same).
Paragraph [0053] discloses “[0053] Further, according to embodiments, the voice processing device 100 may judge pronouncing time points of the voices of the respective speakers SPK1 to SPK4 by using the separated voice signals, and generate and store pronouncing time point information representing the pronouncing time points.” This paragraph discloses generating and store pronouncing time point information representing the pronouncing time points.
Paragraph [0057] disclose “[0057] In this case, although the pronouncing section of the voice "AAA" and the pronouncing section of the voice "BBB" may overlap each other at least partly, the voice processing device 100 according to embodiments of the present disclosure may generate the first separated voice signal related to the voice "AAA" and the second separated voice signal related to the voice "BBB".” This paragraph discloses generating separated voice signals related to voices. This paragraph discloses pronouncing sections of the voices of different speakers overlap each other at least partly.
Paragraph [0086] discloses “[0086] The voice processing device 100 may determine the output order of the translation results for the voices based on the pronouncing time points of the voices of the speakers SPK1 to SPK4. According to embodiments, the voice processing device 100 may generate pronouncing time point information representing pronouncing time points of the voices based on the voice signals related to the voices. The voice processing device 100 may determine the output order for the voices based on the pronouncing time point information, and output the translation results in accordance with the determined output order.” This paragraph discloses that the translation results are output in accordance with the determined output order and the output order is determined based on the pronouncing time point information.
Paragraph [0089] discloses “[0089] For example, the voice processing device 100 may determine the output order of the translation results so as to be the same as the pronouncing order of the voices, and output the translation results for the voices in accordance with the determined output order. That is, by the voice processing device 100, the translation result for the first pronounced voice may be first output.” This paragraph discloses order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same),
Paragraph [0092] discloses “[0092] The voice processing device 100 according to embodiments of the present disclosure may determine the source languages and the target languages in accordance with the voice source positions of the voices of the speakers SPK1 to SPK4, translate18 the voices of the speakers SPK1 to SPK4 in accordance with the determined source languages and target languages, and output the translation results. In this case, the translation results may be output in accordance with the output order that is determined in accordance with the pronouncing time points of the voices of the speakers SPK1 to SPK4. Accordingly, not only the voices of the speakers can be accurately recognized and translated even if the speakers SPK1 to SPK4 overlappingly pronounce the voices, but also the translations of the speakers SPK1 to SPK4 can be sequentially output, so that the communications between the speakers SPK1 to SPK4 can be smoothly performed.” First, the paragraph discloses that the translation results is output in accordance with the output order that is determined in accordance with the pronouncing time points of the voices of the speakers SPK1 to SPK4. It means the translation result for the first pronounced voice is first output and the translation result for the second pronounced voice is second output. Secondly, the paragraph discloses that if the speakers SPK1 to SPK4 overlappingly pronounce the voices, but also the translations of the speakers SPK1 to SPK4 can be sequentially output. Yes, the translations of the speakers SPK1 to SPK4 could be/is able to sequentially output if the speakers SPK1 to SPK4 overlappingly pronounce the voices. However, the paragraph does not disclose the translations of the speakers SPK1 to SPK4 can be sequentially output in order based on pronouncing time points order of the voice in response to the speakers overlappingly pronouncing the voice.
There is no support for the amended claim. More specifically, there is no support for all of these following features in combination: a) output the translation results sequentially (i.e., one after the other, without overlapping), b) order of the translation results based on the pronouncing time point (i.e., order of the translation results and the pronouncing order of the voice are the same), and c) pronouncing sections of the voices of different speakers overlap each other at least partly.
Claims 3-6, 8, 11-12, 14 are rejected as the same ground by virtue of their dependency.
Allowable Subject Matter
6. Claims 1, 3-6, 8-9, 11-12, 14 are allowed in view of the prior art of record. However, the claims stand rejected under 112(a), and for the application to pass to allowance this rejection need to be overcome. Any amendments to overcome the 112(a) rejection that results in any change in scope require further search and/or consideration in order to determine it allowability.
The following is a statement of reasons for the indication of allowable subject matter: the prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s).
“determine an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same,
wherein the output circuit is configured to sequentially output the translation results in accordance with the determined output order without overlapping the translation results with each other when pronouncing sections of the voices of different speakers overlap each other at least partly, and ” as recited in Claim 1.
Claim 9 recites the similar features as Claim 1.
Nakadai et al. (US 2015/0154957 A1.) In this reference, Nakadai et al. disclose a method and a system for separating the voices based on the voice source positions (Nakadai et al. Fig. 16 element 21 Sound source localizing unit, element 22 sound source separating unit, [0062] In case of sound signals from the plurality of speakers, the speech recognizing unit 13 distinguishes the speakers and recognizes the speech details for each distinguished speaker, [0134] The sound source localizing unit 21 estimates an azimuth of a sound source on the basis of an input signal input from the sound signal acquiring unit 12 and outputs azimuth information indicating the estimated azimuth and sound signals of N channels to the sound source separating unit 22. The azimuth estimated by the sound source localizing unit 21 is, for example, a direction in the horizontal plane with respect to the direction of a predetermined microphone out of the N microphones from the point of the center of gravity of the positions of the N microphones of the sound collecting unit 11. For example, the sound source localizing unit 21 estimates the azimuth using a generalized singular-value decomposition-multiple signal classification (GSVD-MUSIC) method, [0137] the sound source separating unit 22 may calculate a sound feature quantity for each sound signal of N channels and may separate the sound signals into the sound signals by speakers on the basis of the calculated sound feature quantity and the azimuth information input from the sound source localizing unit 21), and generate translation results for the voices by using the separated voice signals (Nakadai et al. [0158] The language displayed in an image presented to each speaker may be based on a language selected in advance from a menu. For example, when the speaker Sp1 selects Japanese as the language from the menu, the translation unit 24 may translate the speech uttered in French by another speaker and may display the translation result in the first character presentation image 322C. Accordingly, even when another speaker utters speech in French, English, or Chinese, the conversation support apparatus 1A may display the speech pieces of other speakers in Japanese in the fourth character presentation image 352C in FIG. 18, [0112] The images 524A to 524C of the characters obtained by recognizing the speech of the second speaker Sp2 are displayed in the first character presentation image 522. As shown in FIG. 14, the images 524A to 524C are sequentially displayed from the deep side to the near side of the image display unit 15 in the first speaker Sp1. The images 534A to 534D of the characters obtained by recognizing the speech of the first speaker Sp1 are displayed in the second character presentation image 532. As shown in FIG. 14, the images 534A to 534D are sequentially displayed from the deep side to the near side of the image display unit 15 in the second speaker Sp2. In FIG. 14, the uttering order is, for example as follows, image 534A, image 524A, image 534B, image 524B, image 534C, image 524C, and image 534D. See paragraph [0111, 0140] and Fig. 14.) Nakadai et al. displays the translation results sequentially in response to the speakers pronounce the voice sequentially. Nakadai et al. does not teach determining an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same, sequentially outputting the translation results in accordance with the determined output order without overlapping the translation results with each other when pronouncing sections of the voices of different speakers overlap each other at least partly. Thus, Nakadai et al. fail to teach and/or suggest the allowable subject matter.
Sakamoto et al. (US 2013/0211818 A1.) In this reference, Sakamoto et al. disclose a method and a system for outputting the translation results sequentially when the speakers pronounce the voices overlapping by selecting a priority speaker and adjusting a timing in order to output the translated results sequentially (Sakamoto et al. Fig. 6 elements S4 Speech durations are overlapped? Yes, element S7-S11, [0076] In S1 through S3 on FIG. 6, the processes translate each speech and generate each synthesized speech. In S4 on FIG. 6, the unit 105 determines whether the first and the second speech durations overlap each other. In this case, the speech 705 and the speech 706 overlap and the process of S4 thus moves to S7.) In response to the speakers overlappingly pronounce the voice, Sakamoto et al. have to adjust a timing for outputting the translation result sequentially (i.e., outing the translation result in a sequence). However, Sakamoto et al. does not teach determining an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same, sequentially outputting the translation results in accordance with the determined output order without overlapping the translation results with each other when pronouncing sections of the voices of different speakers overlap each other at least partly. Thus, Sakamoto et al. fail to teach and/or suggest the allowable subject matter.
Gu et al. (US 2022/0329693 A1.) In this reference, Gu et al. disclose a method and a system for displaying the transcription and the translation result sequentially (Gu et al. [0007] According to a first aspect, an embodiment of this application provides a translation method, where the method is applicable to a first electronic device, and the method includes: The first electronic device establishes a call connection to a second electronic device and displays a call interface of the first electronic device; then the first electronic device receives a first operation of a first user; in response to the first operation, the first electronic device switches from displaying the call interface to displaying a translation interface; then the first electronic device receives a first speech of the first user in a first language and sends the first speech to the second electronic device; in response to the first speech, the translation interface of the first electronic device sequentially displays at least a first text and a second text, where the first text is obtained by recognizing the first speech, and the second text is obtained by translating the first speech into a target language; and when the translation interface displays the second text, the first electronic device sends a machine speech in the target language to the second electronic device, where the machine speech in the target language is obtained by translating the first speech into the target language. According to this method, a machine speech and a text that are in a target language can be synchronized, so as to avoid a problem that a user has completed reading a post-translation text but a machine speech has not been received yet.) Gu et al. display the first text and the second text in sequence. The first text is the transcription of the speech and the second text is obtained by translating the speech into a target language. However, Gu et al. does not teach determining an output order of the translation results based on the pronouncing time point information so that the output order of the translation results and a pronouncing order of the voices are the same, sequentially outputting the translation results in accordance with the determined output order without overlapping the translation results with each other when pronouncing sections of the voices of different speakers overlap each other at least partly. Thus, Gu et al. fail to teach and/or suggest the allowable subject matter.
Conclusion
7. The prior art made of record and not relied upon is considered pertinent to application’s disclosure. See PTO-892.
a. Aue et al. (US 2015/0347399 A1.) In this reference, Aue et al. disclose a method and a system for generating, separately form the translation of the source user's speech, a further translation of the target user's speech in the source language to be transmitted to the source user.
b. Murthy et al. (US 2016/0350286 A1.) In this reference, Murthy et al. disclose a method and a system for translating different languages in the vehicle.
c. Ochiai et al. (US 2023/0067132 A1.) In this reference, Ochiai et al. disclose a method and a system for extracting a separated signal from a mixed speech signal by the beamformer.
8. Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THUYKHANH LE/Primary Examiner, Art Unit 2655