DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
1. Regarding the objections to claims 1 and 21, Applicant has amended each claim to address the minor informalities. Accordingly, the objections have been withdrawn.
2. Regarding the rejection of claims 1-10 and 21-30 under 35 U.S.C. § 101, Applicant's arguments filed 02/05/2026 have been fully considered but they are not persuasive.
Applicant argues on pg. 10 for reconsideration of the claims under 35 U.S.C. § 101 in view of the amendments to claims 1 and 21. After reconsideration of the claims, the Examiner maintains the rejection of claims 1-10 and 21-30 under § 101 as still being directed to abstract ideas without significantly more. The claims still recite limitations which are performable as mental processes under Step 2A Prong 1, which fall under the category of abstract idea. Specifically, a person is capable of performing in the mind with the aid of pen and paper: listening to a speech in first language, writing down a non-abridged translation, determining a degree of summarization based on this translation, generating a new abridged translation, and displaying the translation to the user. Under Step 2A Prong 2 and Step 2B analysis, the additional limitations, when viewed in combination with the claim as a whole, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. The amended claims add a text summarization machine learning model which is merely “used” to generate the abridged translation. As recited, this model amounts to merely carrying out a process which can be performed mentally with pen and paper (e.g. text summarization) using a generic computer component model. Thus, the claim is neither eligible under Step 2A Prong 2 nor Step 2B.
Hence, Applicant’s arguments are not persuasive.
3. Regarding the rejections under 35 U.S.C. § 102 and 35 U.S.C. § 103, Applicant's arguments filed 02/05/2026 have been fully considered but they are not persuasive.
Applicant argues on pg. 11 of the Remarks that Sonoo fails to disclose “determining a degree of summarization based on one or more properties of the audio data and/or based on one or more properties of the non-abridged translated speech data”. The Examiner respectfully disagrees with this argument. Under the BRI of the claim, Sonoo discloses this step. Specifically, Sonoo discloses a step of determining a fluency metric of a particular input and non-abridged translated sentence, which is calculated based on a length/number of words in the input and translated sentences, with more concise/ less abridged translations assigned higher fluency scores (Fig. 6-7, para. 0043 and 0047). This step reads on the BRI of “determining a degree of summarization based on one or more properties of the audio data and/or based on one or more properties of the non-abridged translated speech data”. Hence, Applicant’s arguments regarding the above are not persuasive.
Regarding the argument that Sonoo does not specifically disclose “generating, based on processing both the degree of summarization and the non-abridged translated speech data using a text summarization machine learning model, abridged translated speech data…”, Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in this argument.
Claim Objections
4. Claims 7 and 27 are objected to because of the following informalities:
Claim 7: “determining a degree of summarization” should instead be “determining the degree of summarization”, as antecedent basis is provided for this term in claim 1.
Claim 27: “determine a degree of summarization” should instead be “determine the degree of summarization”, as antecedent basis is provided for this term in claim 21.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
5. Claims 1-10 and 21-32 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1, “A method” is recited, which is directed to one of the four statutory categories of invention (process) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
The following limitations, under their broadest reasonable interpretation, recite mental processes:
determining that a person is speaking to a user in a first language that is different from a second language of the user: a person listens to another person speaking in a first language different from a user language
generating, based on audio data characterizing first language speech from the person speaking to the user, non-abridged translated speech data that characterizes a non- abridged second language translation of the first language speech: a person writes down a non-abridged translation of what they hear using pen and paper
determining a degree of summarization based on one or more properties of the audio data and/or based on one or more properties of the non-abridged translated speech data: a person decides by how much they should summarize a text based on the audio or non-abridged translation (e.g. see that a translation is very long, so decide they need to summarize to ten words or less)
generating, based on processing both the degree of summarization and the non-abridged translated speech data…, abridged translated speech data that characterizes an abridged version of the second language translation: a person writes down abridged speech data using the non-abridged translation and the degree of summarization (e.g. writes a new translation in ten words or less)
causing, based on the non-abridged translated speech data, … visually render the non-abridged second language translation: a person writes down the non-abridged translation on paper and presents the translation to the user
causing, based on the abridged translated speech data, second language audio to be rendered … wherein the second language audio includes synthesized speech of the abridged version of the second language translation: a person speaks the abridged translation to the user.
Claim 1 does not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “A method implemented by one or more processors”, “a display interface of a computing device”, “an audio interface of the computing device or an additional computing device”, and “using a text summarization machine learning model”. These limitations are recited at a high level of generality and amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, the claim is directed to an abstract idea (Step 2A: YES).
Claim 1 does not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claim 1 is not patent eligible.
Regarding dependent claims 2-10, “The method” is recited, which is directed to one of the four statutory categories of invention (process) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite further mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
The following limitations, under their broadest reasonable interpretation, recite further mental processes:
Claim 2:
wherein causing … to render the second language audio is performed simultaneously to the display interface of the computing device rendering the non-abridged second language translation: a person speaks the abridged translation while providing the written non-abridged translation to the user at the same time.
Claim 2 contains the additional limitation “causing the audio interface to render”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 3:
wherein causing the … to render the non-abridged second language translation includes scrolling the non-abridged second language translation at the display interface at a rate that is based on a determined rate in which the person is speaking to the user: a person provides the written translation to the user at the same pace as the speech.
Claim 3 contains the additional limitation “causing the display interface to render”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 4:
determining, …, that the user is directing their gaze towards the display interface of the computing device, wherein … render the non-abridged second language translation is performed in response to determining that the user is directing their gaze towards …: a person provides the written non-abridged translation to the user in response to detecting a user’s gaze.
Claim 4 contains the additional limitations “based on image data captured by a camera of the computing device or the additional computing device”, “the display interface of the computing device”, “causing the display interface to render”, which amount to mere instructions to implement the judicial exception using a generic computer.
Claim 5:
performing a disfluency removal process on the non-abridged translated speech data for identifying and removing disfluencies in the second language translation, wherein the abridged translated speech data is generated based on a version of the second language translation with the disfluencies removed: a person creates abridged translated speech by listening to the speaker and removing words they recognize as disfluencies (e.g. ‘um’, ‘uh’).
Claim 5 contains no additional limitations.
Claim 6:
determining a target length for the abridged version of the translation based on a detected rate with which the person is speaking in the first language, wherein a duration of rendering the second language audio is based on the target length: a person decides how long or short to make the abridge translation based on how fast the speaker is speaking.
Claim 6 contains no additional limitations.
Claim 7:
determining a degree of summarization for the abridged version of the translation based on a detected rate with which the person is speaking in the first language, wherein natural language content embodied in the second language audio is based on the degree of summarization: a person decides a degree of summarization based on how fast a speaker is speaking, and decides what words to include in the second language audio based on the degree of summarization.
Claim 7 contains no additional limitations.
Claim 8:
comparing the detected rate in which the person is speaking in the first language to one or more threshold values in furtherance of determining the degree of summarization for the abridged version, wherein the degree of summarization is greater for a higher detected rate of speaking relative to a lower detected rate of speaking: a person compares how fast a speaker is speaking to a threshold speed, and determines a degree of summarization.
Claim 8 contains no additional limitations.
Claim 9:
wherein the degree of summarization is based on an estimated total number of phonemes, characters, and/or words in the abridged version of the second translation relative to the non-abridged second translation: a person uses a number of phonemes, characters, or words for the non-abridged and abridged translations to determine the degree of summarization.
Claim 9 contains no additional limitations.
Claim 10:
processing the non-abridged translated speech … to generate abridged sentence text from unabridged sentence text characterized by the non-abridged translated speech data, wherein the abridged sentence data characterizes a summarization of the unabridged sentence text: a person summarizes an unabridged sentence to generate an abridged sentence.
Claim 10 contains the additional limitation “using one or more large language models (LLMs)”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claims 2-10 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, the claims are directed to an abstract idea (Step 2A: YES).
Claims 2-10 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 2-10 are not patent eligible.
Regarding claim 21, “A system” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite similar limitations to those in method claim 1, as discussed above, and thus also recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES) (see above analysis for claim 1).
Claim 21 does not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “A system comprising: one or more processors; and memory storing instructions that, when executed by the at least one or more processors, cause the at least one processor or more processors to:”, “a display interface of a computing device”, “an audio interface of the computing device or an additional computing device”, and “using a text summarization machine learning model”. These limitations are recited at a high level of generality and amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, the claim is directed to an abstract idea (Step 2A: YES).
Claim 21 does not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claim 21 is not patent eligible.
Regarding dependent claims 22-32, “The system” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, claims 22-30, under their broadest reasonable interpretation, recite limitations similar to those in claims 2-10 respectively, and thus also recite further mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES) (see analysis for claims 2-10). Further, claims 31 and 32, under their broadest reasonable interpretation, recite limitations which contain further mental processes and/or generic computer components:
Claim 31:
Claim 31 recites “wherein the text summarization machine learning model is a large language model (LLM)”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 32:
processing…text that is descriptive of the degree of summarization and the translated speech: a person generates a new translation by reading text instructions (e.g. reads text “translate {sentence} into a new sentence with ten words or less”)
Claim 32 contains the additional limitation “using the text summarization machine learning model”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claims 22-32 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, the claims are directed to an abstract idea (Step 2A: YES).
Claims 22-32 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 22-32 are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. Claims 1-2, 5-9, 21-22, and 25-29 are rejected under 35 U.S.C. 103 as being unpatentable over Sonoo & Sumita (US 2017/0091177 A1, hereinafter Sonoo) in view of Marcu (US 2026/0065898 A1).
Regarding claim 1, Sonoo discloses A method (Fig. 11) implemented by one or more processors (para. 0068 “The functions of the translator 101, the controller 102, the evaluator 103, the speech synthesizer 105, the speech recognizer 1001, the controller 1002, the condition designator 1201 and the controller 1202 in the above embodiments may be implemented by a processor coupled with a memory. For example, the memory may stores instructions for executing the functions and the processor may read the instructions from the memory and execute the instructions.”), the method comprising: determining that a person is speaking to a user in a first language that is different from a second language of the user (para. 0022 “Certain embodiments described herein are described with respect to a translation example in which a first language corresponding to an original language is set to Japanese and a second language corresponding to a target language is set to English.”; para. 0050 “First, the speech recognizer 1001 receives the input speech and generates the input text that is a recognition result of the input speech and the time information”); generating, based on audio data characterizing first language speech from the person speaking to the user (para. 0051 “Next, the translation generator 106 in the translator 101 (refer FIG. 1 for details) receives the input text and generates the translation result (step S1102)”), non-abridged translated speech data that characterizes a non- abridged second language translation of the first language speech (Fig. 8, 804; para. 0044 “The controller 102 selects the translated sentence 802 that has the highest evaluation value for adequacy among a plurality of translated sentences…”); determining a degree of summarization based on one or more properties of the audio data and/or based on one or more properties of the non-abridged translated speech data (para. 0047 “The machine translation apparatus 100 further includes a speech recognizer 1001 that receives input speech and outputs input text as recognition result and time information (for example, start time and end time of speech) of the input speech.”; para. 0043 “As features 702 for model training, it can utilize a number of characters of input sentence and translated sentence, a number of words of input sentence and translated sentence, a part of speech information of input sentence and translated sentence, phrasing information of input sentence and translated sentence, N-gram information of input sentence and translated sentence, a reproduction time of synthesized speech and intonation information of speech-synthesized translated sentence and so on. By referring the evaluation model 701, the evaluator 103 calculates evaluation values for any translation result. The example in FIG. 7 indicates that evaluation values of adequacy 5 and fluency 3 are calculated…”)…; generating…abridged translated speech data that characterizes an abridged version of the second language translation (Fig. 8, 805; para. 0044 “And, the controller 102 selects the translated sentence 803 that has the highest evaluation value for fluency other than the translated sentence 802, and outputs it in a form of synthesized speech 805 via the speech synthesizer with synchronization.”); causing, based on the non-abridged translated speech data, a display interface of a computing device to visually render the non-abridged second language translation (Fig. 8, non-abridged translation 802 on display 104 as 804; para. 0044 “The controller 102 selects the translated sentence 802 that has the highest evaluation value for adequacy among a plurality of translated sentences, and displays it in a display area 804 via the display 104.”); and causing, based on the abridged translated speech data, second language audio to be rendered via an audio interface of the computing device or an additional computing device, wherein the second language audio includes synthesized speech of the abridged version of the second language translation (Fig. 8, synthesized speech 805 generated for abridged translation 803; para. 0044 “And, the controller 102 selects the translated sentence 803 that has the highest evaluation value for fluency other than the translated sentence 802, and outputs it in a form of synthesized speech 805 via the speech synthesizer with synchronization.”).
Sonoo teaches generating abridged translated speech data and non-abridged translated speech data (see above claim mapping). However, Sonoo does not specifically disclose [generating], based on processing both the degree of summarization and the non-abridged [translated] speech data using a text summarization machine learning model, [abridged [translated] speech data…].
Marcu teaches generating, based on processing both the degree of summarization and the non-abridged … speech data using a text summarization machine learning model, abridged … speech data… (Marcu teaches a Natural Language Condenser model (155, 520, 530, 540, 560) which generates a condensed ASR data 510 from initial uncondensed ASR data 610 by processing both the non-abridged speech data 610 (fed to encoder 520) and a degree of summarization (verbosity output of 560), see Fig. 5; para. 0042 “The natural language condenser component 155 may receive one or more transcriptions of speech from the ASR component 150 and condense the text. The natural language condenser component 155 may include one or more neural network models configured to rewrite received input text to generate condensed text. The natural language condenser component 155 may, for example, remove redundancies, self-corrections, non-verbal speech hesitations such as “ah” and “um,” and/or rewrite sentences to reduce verbosity while retaining semantic meaning... In some implementations, the natural language condenser component 155 may be incorporated into the NMT component 170 such that the NMT component 170 can output transcriptions in the target language that are length-adjusted to remove unhelpful words/syllables and/or to match a length of the source text and/or speech.”; para. 0067 “At inference time, a verbosity control component 560 may prepend a verbosity value to the source text; for example, based on the context data 515 and/or the sentiment data 1555/1575. The natural language condenser component 155 may thus favor translations that match the verbosity value (e.g., rank them higher than possible translations that may be shorter/longer but have a similar score with regard to semantic meaning). In some implementations, the verbosity value may be provided to the encoder 520, the decoder 540, or both the encoder 520 and decoder 540.”).
Sonoo and Marcu are considered to be analogous to the claimed invention as
they both are in the same field of machine translation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sonoo to incorporate the teachings of Marcu in order to specifically generate, based on processing both the degree of summarization and the non-abridge translated speech data using a text summarization machine learning model, abridged translated speech data. Doing so would be beneficial, as this would allow for translated speech which is shortened while preserving meaning, reducing latency and allowing more “open air” time where the device is not actively outputting speech (Marcu, para. 0042).
Regarding claim 2, Sonoo in view of Marcu discloses wherein causing the audio interface to render the second language audio is performed simultaneously to the display interface of the computing device rending the non-abridged second language translation (Fig. 8; para. 0044 “The controller 102 selects the translated sentence 802 that has the highest evaluation value for adequacy among a plurality of translated sentences, and displays it in a display area 804 via the display 104. And, the controller 102 selects the translated sentence 803 that has the highest evaluation value for fluency other than the translated sentence 802, and outputs it in a form of synthesized speech 805 via the speech synthesizer with synchronization. In this way, for the input text 801, it can output a translation result that is more fluent and easy to listen to as speech information and a translation result that is more accurate as character information.”).
Regarding claim 5, Sonoo in view of Marcu discloses wherein generating the abridged translated speech data includes: performing a disfluency removal process on the non-abridged translated speech data for identifying and removing disfluencies in the second language translation (para. 0052 “Next, the translation editor 107 detects the post editing model 108. If the post editing model 108 is available (Yes in steps S1104), the translation editor 107 generates a new translation result by applying post-editing to the translation result generated by the translation generator 106, and backs to step S1103 (step S1105).”), wherein the abridged translated speech data is generated based on a version of the second language translation with the disfluencies removed (para. 0042 “For the translated sentence 502, the translation editor 107 applies the post editing model 108 and obtains a translated sentence 503 [We will discuss the new project.] that is a result of post editing by replacing a phrase (partial character string) corresponding to [gathered in order to] with another character [will] and by replacing [a] with [the].”; para. 0044 “And, the controller 102 selects the translated sentence 803 that has the highest evaluation value for fluency other than the translated sentence 802, and outputs it in a form of synthesized speech 805…”).
Regarding claim 6, Sonoo in view of Marcu discloses wherein generating the abridged speech data includes: determining a target length for the abridged version of the translation based on a detected rate with which the person is speaking in the first language (para. 0047 “The machine translation apparatus 100 further includes a speech recognizer 1001 that receives input speech and outputs input text as recognition result and time information (for example, start time and end time of speech) of the input speech.”; para. 0048 “Moreover, the controller 1002 outputs translation results to the display 104 and the speech synthesizer 105 based on evaluation values and the time information.”), wherein a duration of rendering the second language audio is based on the target length (para. 0054 “Next, the controller 1002 calculates a time difference (time interval) from the last input speech by using the time information. If the time difference is equal to or more than a threshold (Yes in step S1107), it performs a judgment based on a second condition for speech synthesis and outputs one of the translation results that satisfy the second condition to the speech synthesizer 105. The speech synthesizer 105 synthesizes speech of the translation result (step S1109). For example, the second condition for speech synthesis is such as whether evaluation value for fluency is the maximum.”).
Regarding claim 7, Sonoo in view of Marcu discloses wherein generating the abridged speech data includes: determining a degree of summarization for the abridged version of the translation based on a detected rate with which the person is speaking in the first language (para. 0047 “The machine translation apparatus 100 further includes a speech recognizer 1001 that receives input speech and outputs input text as recognition result and time information (for example, start time and end time of speech) of the input speech.”; para. 0043 “As features 702 for model training, it can utilize a number of characters of input sentence and translated sentence, a number of words of input sentence and translated sentence, a part of speech information of input sentence and translated sentence, phrasing information of input sentence and translated sentence, N-gram information of input sentence and translated sentence, a reproduction time of synthesized speech and intonation information of speech-synthesized translated sentence and so on. By referring the evaluation model 701, the evaluator 103 calculates evaluation values for any translation result. The example in FIG. 7 indicates that evaluation values of adequacy 5 and fluency 3 are calculated…”), wherein natural language content embodied in the second language audio is based on the degree of summarization (para. 0044 “And, the controller 102 selects the translated sentence 803 that has the highest evaluation value for fluency other than the translated sentence 802, and outputs it in a form of synthesized speech 805 via the speech synthesizer with synchronization. In this way, for the input text 801, it can output a translation result that is more fluent and easy to listen to as speech information…”).
Regarding claim 8, Sonoo in view of Marcu discloses wherein determining the degree of summarization for the abridged version of the translation includes: comparing the detected rate in which the person is speaking in the first language to one or more threshold values in furtherance of determining the degree of summarization for the abridged version (para. 0054 “Next, the controller 1002 calculates a time difference (time interval) from the last input speech by using the time information. If the time difference is equal to or more than a threshold (Yes in step S1107), it performs a judgment based on a second condition for speech synthesis and outputs one of the translation results that satisfy the second condition to the speech synthesizer 105. The speech synthesizer 105 synthesizes speech of the translation result (step S1109). For example, the second condition for speech synthesis is such as whether evaluation value for fluency is the maximum.”), wherein the degree of summarization is greater for a higher detected rate of speaking relative to a lower detected rate of speaking (para. 0006 “Moreover, there is a technique that detects a time difference between an utterance time of a speaker and a reproduction time of synthesized speech of translation result text, and performs retranslation by replacing translation of different words having the same meaning, and reduces the time difference by outputting translation result that is appropriate for speech synthesis.”).
Regarding claim 9, Sonoo in view of Marcu discloses wherein the degree of summarization is based on an estimated total number of phonemes, characters, and/or words in the abridged version of the second translation relative to the non-abridged second translation (para. 0043 “FIG. 7 illustrates one example for calculating evaluation values for a translation result. First, it constructs an evaluation model 701 that inputs input sentences and translated sentences from the evaluation data 600 and outputs evaluation values. For model training, for example, it can utilize widely known machine learning techniques such as Multi-class Support Vector Machine (Multi-class SVM). As features 702 for model training, it can utilize a number of characters of input sentence and translated sentence, a number of words of input sentence and translated sentence, a part of speech information of input sentence and translated sentence, phrasing information of input sentence and translated sentence, N-gram information of input sentence and translated sentence, a reproduction time of synthesized speech and intonation information of speech-synthesized translated sentence and so on. By referring the evaluation model 701, the evaluator 103 calculates evaluation values for any translation result.”).
Regarding claim 21, claim 21 is a system claim with limitations similar to method claim 1, and is thus rejected under similar rationale.
Additionally, Sonoo discloses A system (Fig. 10, 100; para. 0047 “FIG. 10 illustrates a functional block diagram of a machine translation apparatus 100…”) comprising: one or more processors (para. 0068 “The functions of the translator 101, the controller 102, the evaluator 103, the speech synthesizer 105, the speech recognizer 1001, the controller 1002, the condition designator 1201 and the controller 1202 in the above embodiments may be implemented by a processor coupled with a memory.”); and memory storing instructions that, when executed by the at least one or more processors, cause the at least one processor or more processors to (para. 0068 “For example, the memory may stores instructions for executing the functions and the processor may read the instructions from the memory and execute the instructions.”).
Regarding claim 22, claim 22 is rejected for analogous reasons to claim 2.
Regarding claim 25, claim 25 is rejected for analogous reasons to claim 5.
Regarding claim 26, claim 26 is rejected for analogous reasons to claim 6.
Regarding claim 27, claim 27 is rejected for analogous reasons to claim 7.
Regarding claim 28, claim 28 is rejected for analogous reasons to claim 8.
Regarding claim 29, claim 29 is rejected for analogous reasons to claim 9.
7. Claims 3 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Sonoo in view of Marcu, and further in view of Sadkin et al. (US 2016/0062970 A1, hereinafter Sadkin).
Regarding claim 3, Sonoo in view of Marcu does not specifically disclose wherein causing the display interface to render the non-abridged second language translation includes scrolling the non-abridged second language translation at the display interface at a rate that is based on a determined rate in which the person is speaking.
Sadkin teaches wherein causing the display interface to render the non-abridged second language translation includes scrolling the non-abridged second language translation at the display interface at a rate that is based on a determined rate in which the person is speaking (para. 0094 “The system then determines whether the central portion of the text artifact displayed on the screen of the computerized device is behind the position of the speaker (or the location of the hypothesis match in the text artifact) 602. If the central displayed section is behind the position of the speaker then the system accelerates the scrolling 604. The system then determines whether the central displayed section is far behind the speaker's position 606. If the central displayed section is far behind the position of the speaker then the system accelerates the rate progressively and advances the text artifact 608. If the central displayed section is not far behind the position of the speaker then the system accelerates the scrolling incrementally and advances the text artifact 610. If the central displayed section is not behind the position of the speaker, but is instead ahead of the speaker's position then the system decelerates the scrolling 612. The system then determines whether the central displayed section is far ahead of the speaker's position 614. If the system determines that the central displayed section is not far ahead of the speaker's position then the system decelerates the scrolling incrementally and advances the text artifact 616. If the system determines that the central displayed section is far ahead of the speaker's position then the system decelerates progressively and stops the scrolling of the text artifact 618.”).
Sonoo, Marcu, and Sadkin are considered to be analogous to the claimed invention as they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sonoo in view of Marcu to incorporate the teachings of Sadkin in order to cause the display interface to render the non-abridged second language translation includes scrolling the non-abridged second language translation at the display interface at a rate that is based on a determined rate in which the person is speaking. Doing so would be beneficial, as this would allow for the non-abridged second language translation to match the pace of the speaker dynamically (Sadkin, para. 0093).
Regarding claim 23, claim 23 is rejected for analogous reasons to claim 3.
8. Claims 4 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Sonoo in view of Marcu, and further in view of Ota et al. (US 2017/0277257 A1, hereinafter Ota).
Regarding claim 4, Sonoo in view of Marcu does not specifically disclose determining, based on image data captured by a camera of the computing device or the additional computing device, that the user is directing their gaze towards the display interface of the computing device, wherein causing the display interface to render the non-abridged second language translation is performed in response to determining that the user is directing their gaze towards the display interface of the computing device.
Ota teaches determining, based on image data captured by a camera of the computing device or the additional computing device (para. 0017 “FIG. 1 is an HMD 100, according to an embodiment. The HMD 100 includes a display surface 102, a camera array 104, and processing circuitry (not shown).”; para. 0019 “An inward-facing camera array (not shown) may be used to track eye movement and determine directionality of eye gaze. Gaze detection may be performed using a non-contact, optical method to determine eye motion.”), that the user is directing their gaze towards the display interface of the computing device (para. 0022 “The user's eye gaze direction 306 is determined by the HMD 302, such as with inward facing cameras or other mechanisms. Based on the eye gaze direction 306, a subset of directional microphones is activated. The HMD 302 may incorporate a number of directional microphones that substantially cover the range of the user's vision (e.g., approximately 180 degrees in front of the user 300). … Once the sound is received, additional processes may be used to translate speech, display speech (e.g., for translation or to assist hearing impaired people), amplify sound, or the like, of the sound source corresponding to the eye gaze direction 306 (e.g., party 304A).”), wherein causing the display interface to render the non-abridged second language translation is performed in response to determining that the user is directing their gaze towards the display interface of the computing device (para. 0023 “From the user's perspective and continuing the example illustrated in FIG. 3, the user is looking at the person to the user's right. In response, the HMD 302 displays speech-recognized text in a dialog box 400. In the example illustrated in FIG. 4, the dialog box 400 is positioned proximate to the person speaking. Proximate in this context refers to the position of the overlaid graphics that include the text, in the augmented reality presentation. The dialog box 400 is presented close to the real-world object (e.g., person), so that the user is given an intuitive user interface showing which person's speech is being provided. This is further assisted with the triangle portion 402 of the dialog box 400. It is understood that other presentation formats may be used to provide an intuitive interface, such as thought bubbles, a line, scrolling text, or the like.”).
Sonoo, Marcu, and Ota are considered to be analogous to the claimed invention as they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sonoo in view of Marcu to incorporate the teachings of Ota in order to determine, based on image data captured by a camera of the computing device or the additional computing device, that the user is directing their gaze towards the display interface of the computing device, wherein causing the display interface to render the non-abridged second language translation is performed in response to determining that the user is directing their gaze towards the display interface of the computing device. Doing so would be beneficial, as this would provide a translation to the user that is understandable (Ota, para. 0031 “Alternatively, the spoken version or text presentation may be a translation from the speaker's language to a language that the user understands.”) and intuitive (Ota, para. 0023 “The dialog box 400 is presented close to the real-world object (e.g., person), so that the user is given an intuitive user interface showing which person's speech is being provided. This is further assisted with the triangle portion 402 of the dialog box 400. It is understood that other presentation formats may be used to provide an intuitive interface, such as thought bubbles, a line, scrolling text, or the like.”).
Regarding claim 24, claim 24 is rejected for analogous reasons to claim 4.
9. Claims 10 and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Sonoo in view of Marcu, and further in view of Zhu et al. (US 2025/0111133 A1, hereinafter Zhu).
Regarding claim 10, Sonoo in view of Marcu discloses processing the non-abridged translated speech data using one or more…language models (para. 0026 “The translation editor 107 receives the translation result from the translation generator 106 and generates a new translation result by post-editing a part of the machine translation result by utilizing the post editing model 108 that includes editing rule sets of the second language. Moreover, the translation editor 107 may utilize different kinds of post editing models, and generates one translation result with post editing for one post editing model. As for the post editing models and the post editing process, the translation editor 106 can apply statistical post editing that performs statistical translation by utilizing, for example, the original language as machine-translated sentence and the target language as reference translation.”) to generate abridged sentence text from unabridged sentence text characterized by the non-abridged translated speech data (para. 0042 “For the translated sentence 502, the translation editor 107 applies the post editing model 108 and obtains a translated sentence 503 [We will discuss the new project.] that is a result of post editing by replacing a phrase (partial character string) corresponding to [gathered in order to] with another character [will] and by replacing [a] with [the].”), wherein the abridged sentence data characterizes a summarization of the unabridged sentence text (para. 0044 “And, the controller 102 selects the translated sentence 803 that has the highest evaluation value for fluency other than the translated sentence 802, and outputs it in a form of synthesized speech 805 via the speech synthesizer with synchronization.”).
Sonoo in view of Marcu does not specifically disclose [processing the non-abridged translated speech data using one or more] large language models (LLMs) [to generate abridged sentence text…].
Zhu teaches [processing the non-abridged translated speech data using one or more] large language models (LLMs) [to generate abridged sentence text…] (para. 0038 “The summarizer model 330 can include a neural network (NN), such as a large language model (LM)…”; para. 0031 “The summarizer 222 receives the transcript 108 and provides a transcript summary 236 consistent with a user-provided parameter, such as a topic 226, speaker 228, readability 230, granularity 232, or a combination thereof.”; para. 0036 “The readability 230 specifies how much processing is performed in making the summary 236 read less like a transcript and more like a book. The raw transcript 108 is typically hard to consume because it is long, disjointed, not a fluent as written text, includes filler words (e.g., “ummmm”, “uhhhh”, “like”, “yeah”, “you know”, or the like that are present but do not add to the discourse), or a combination thereof, The readability 230 can be specified in a number of ways, The user 112 can select a level of readability 230 in which a higher (or lower if negative logic is used) level indicates a more fluent summary 236. A more fluent summary means filler words are removed and the transcript 108 is segmented by topic.”).
Sonoo, Marcu, and Zhu are considered to be analogous to the claimed invention as they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sonoo in view of Marcu to incorporate the teachings of Zhu in order to process the non-abridged translated speech data using one or more large language models (LLMs). Doing so would be beneficial, as this would provide a more fluent translation (Zhu, para. 0036 “The user 112 can select a level of readability 230 in which a higher (or lower if negative logic is used) level indicates a more fluent summary 236. A more fluent summary means filler words are removed and the transcript 108 is segmented by topic.”).
Regarding claim 30, claim 30 is rejected for analogous reasons to claim 10.
Regarding claim 31, Sonoo in view of Marcu does not specifically disclose wherein the text summarization machine learning model is a large language model (LLM).
Zhu teaches wherein the text summarization machine learning model is a large language model (LLM) (para. 0038 “The summarizer model 330 can include a neural network (NN), such as a large language model (LM)…”; para. 0031 “The summarizer 222 receives the transcript 108 and provides a transcript summary 236 consistent with a user-provided parameter, such as a topic 226, speaker 228, readability 230, granularity 232, or a combination thereof.”; para. 0036 “The readability 230 specifies how much processing is performed in making the summary 236 read less like a transcript and more like a book. The raw transcript 108 is typically hard to consume because it is long, disjointed, not a fluent as written text, includes filler words (e.g., “ummmm”, “uhhhh”, “like”, “yeah”, “you know”, or the like that are present but do not add to the discourse), or a combination thereof, The readability 230 can be specified in a number of ways, The user 112 can select a level of readability 230 in which a higher (or lower if negative logic is used) level indicates a more fluent summary 236. A more fluent summary means filler words are removed and the transcript 108 is segmented by topic.”).
Sonoo, Marcu, and Zhu are considered to be analogous to the claimed invention as they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sonoo in view of Marcu to incorporate the teachings of Zhu in order to specifically have the text summarization machine learning model be a large language model (LLM). Doing so would be beneficial, given the same rationale as for claim 10.
10. Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Sonoo in view of Marcu, and further in view of NPL Document (GitHub page) “5 Levels of Summarization: Novice to Expert”, hereinafter NPL1.
Regarding claim 32, Sonoo in view of Marcu discloses translated speech data (see above mapping for claim 1). However, Sonoo in view of Marcu does not specifically disclose processing, using the text summarization machine learning model, text that is descriptive of the degree of summarization and the translated speech data.
NPL1 teaches processing, using the text summarization machine learning model, text that is descriptive of the degree of summarization and the …speech data (pg. 3, “In [17]” “template = “Please write a one sentence summary of the following text {essay}…””; prompt input to text summarization machine learning model (LLM, see pg. 1, 1st para.)).
Sonoo, Marcu, and NPL1 are considered to be analogous to the claimed invention as they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Sonoo in view of Marcu to incorporate the teachings of NPL1 in order to specifically process, using the text summarization machine learning model, text that is descriptive of the degree of summarization and the speech data. Doing so would be beneficial, as this would provide instructions for a large language model to perform text summarization across several different lengths of input text, ranging from a couple sentences to entire books (NPL1, pg. 1, introduction).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Yu et al. (US 2024/0161731 A1): translating speech, determining whether to summarize or omit speech input, determining degree of summarization (see Fig. 5)
Doggett (US 2020/0042601 A1): checking translation to determine if translation satisfies subtitle parameters, such as timing requirements, and in response to failing, using a text modifier to shorten translation (see Fig. 2 and para. 0024-0025)
Kamatani & Sakamoto (US 2016/0314116 A1): input machine translation result, calculate an abridgement time and word number, and generate an abridged sentence (Fig. 3)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY DOUGLAS HUTCHESON whose telephone number is (703)756-1601. The examiner can normally be reached M-F 8:00AM-5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CODY DOUGLAS HUTCHESON/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659