Last updated: April 19, 2026
Application No. 18/492,635
PROBABILISTIC MULTI-PARTY AUDIO TRANSLATION

Final Rejection §103
Filed
Oct 23, 2023
Examiner
MEIS, JON CHRISTOPHER
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Mass Luminosity Inc.
OA Round
2 (Final)
Interview Optional

— +59.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 22 resolved cases, 2023–2026
Examiner Intelligence

MEIS, JON CHRISTOPHER View full profile →
Grants 46% of resolved cases
Career Allow Rate
10 granted / 22 resolved
-16.5% vs TC avg
Strong +59% interview lift
Without
With
+59.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
24.9%
-15.1% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
10.6%
-29.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending.  Claims 1, 11, and 20 are independent.
This Application was published as US 20240220737.
Apparent priority is 28 December 2022.
The instant Application is directed to a method of streaming translation.
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.

Response to Amendment
Applicant’s amendments to the claims have overcome the objection to claim 20.

Response to Arguments
35 USC 101
Applicant's arguments have been fully considered and are persuasive. Particularly, Examiner agrees that rejecting translation variants based on precise timing thresholds could not be practically performed in the mind. Therefore, the rejection is withdrawn.

35 USC 103
Applicant’s arguments with respect to 35 USC 103 have been considered but are not persuasive in regards to the Fantinuoli reference. Applicant argues that Fantinuoli discloses compression that occurs at the preprocessing stage, not based on similarity scoring or rejecting variants, and that Fantinuoli does not disclose a mechanism for evaluating multiple translation candidates, applying a similarity score, or rejecting a variant when a threshold is exceeded.
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., evaluating multiple translation candidates) are not all recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
	Examiner argues that Kim is relied upon for the limitation of applying a similarity score, and therefore this limitation does not need to be taught by Fantinuoli.
	Examiner argues that Fantinuoli does teach rejecting a variant when a time threshold is exceeded. See: “For example, the processor may adjust the compression level upwards based on a word/time ratio of the speech segments being high enough to cause a latency above a predetermined threshold between the audible speech and a translation.” Col 10; 7-11 – this reads on exceeding a time threshold. See further: “The sentence compressor model 150 may compress the speech segments based on the content of the speech by applying the sentence compressor machine learning model on the speech segments. Compressing the speech segments may include removing oralities such as “um,” “like,” and “uh.” Compressing the speech segments may include shortening and/or simplifying the speech segments. For example, a long, complex word in a speech segment may be replaced with a shorter, less complex word.” Col 6; 7-17 – replacing a longer word with a shorter word reads on rejecting a translation variant.
In regards to arguments that Fantinuoli discloses compression as a pre-processing stage, Fantinuoli discloses: “In other embodiments, the orchestrator model 140 may continuously update the word/time ratio of the speech transcript as the automatic speech recognition 120 generates, in real-time, the speech transcript.” Col 6; 60-63.
Therefore, the rejection is maintained.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-3, 5-6, 8-13, 15-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng et al. ("Opportunistic Decoding with Timely Correction for Simultaneous Translation") in view of Kim et al. (US 20110153309 A1) and Fantinuoli (US 11704507 B1).


    PNG
    media_image1.png
    222
    827
    media_image1.png
    Greyscale

Zheng, Fig. 2

Regarding claim 1, Zheng discloses: 1. A method, comprising: receiving input text of a communication session; (Fig. 2 shows input text at the top of the figure. Section 1 mentions "international conferences" which reads on a communication session. )
generating translation data by processing the input text with a prediction model (Fig. 2 shows that two words are predicted at each step. See also "As shown in Fig. 1, our proposed method always decodes more words than the original policy at each step to catch up with the speaker and reduce the latency." bottom of pg. 1 to top of pg. 2. See also pg. 2 section "Correction with Beam Search." Beam search reads on a prediction model. )
and a translation model; (Fig. 2 shows the translated words in English. See pg. 4, section 5 - "Datasets and Implementations" for the translation model details. )
processing the translation data and enunciation data with a sentence similarity model to generate a similarity score; ("At step t + 1, when encoder obtains more information from x6g(t) to x6g(t+1), the decoder is capable to generate more appropriate candidates and may revise and replace the previous outputs from opportunistic decoding." pg. 2, Section 3 - "Timely Correction" – Zheng discloses processing the translation and enunciation data so see if a more appropriate candidate is available. Zheng does not explicitly disclose sentence similarity or a similarity score.)
and presenting the enunciation data based on the similarity score (Fig. 2 shows presentation of enunciation data. See also "When there is a disagreement, our model always uses the hypothesis from later step to replace the previous commits." )
and based on rejecting a translation variant for exceeding a time threshold. (not explicitly disclosed by Zheng)
Zheng does not explicitly disclose sentence similarity or a similarity score, or presenting the enunciation data based on the similarity score or a time threshold.
Kim discloses: processing the translation data and enunciation data with a sentence similarity model to generate a similarity score; ("[0018] The similarity calculating unit 120 considers the confidence score for each word processed by the voice recognizing unit 100 and compares the various elements extracted by the language processing unit 110 with various elements stored in the translated sentence DB 150 to calculate the similarity therebetween. ..." )
and presenting the enunciation data based on the similarity score ("[0019] The similarity calculation result by Equation (1) is expressed in the form of probability. A threshold value is set and it is determined whether the calculated similarity is higher than the threshold value. If the calculated similarity is higher than the threshold value, class information of the second-language sentence corresponding to the first-language sentence selected from the translated sentence DB 150 is translated and the translated result is transferred to the voice synthesizing unit 140 without passing through the sentence translating unit 130. On the other hand, if the calculated similarity is lower than the threshold value, user selection is requested or the first-language sentence (i.e., the voice recognition result) is transferred to the sentence translating unit 130. ..." )
Zheng and Kim are considered analogous art to the claimed invention because they disclose methods for translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Zheng with the sentence similarity calculating unit of Kim in order to use the originally generated sentence if it meets a threshold. Doing so would have been beneficial so that the audience is not overwhelmed by modifications. (Zheng pg. 2, para 1.)
Kim does not explicitly disclose presenting data based on a time threshold.
Fantinuoli discloses: presenting enunciation data based on rejecting a translation variant for exceeding a time threshold. (“For example, the processor may adjust the compression level upwards based on a word/time ratio of the speech segments being high enough to cause a latency above a predetermined threshold between the audible speech and a translation.” Col 10; 7-11 – this reads on exceeding a time threshold. See further: “The sentence compressor model 150 may compress the speech segments based on the content of the speech by applying the sentence compressor machine learning model on the speech segments. Compressing the speech segments may include removing oralities such as “um,” “like,” and “uh.” Compressing the speech segments may include shortening and/or simplifying the speech segments. For example, a long, complex word in a speech segment may be replaced with a shorter, less complex word.” Col 6; 7-17 – replacing a longer word with a shorter word reads on rejecting a translation variant.)
Zheng, Kim, and Fantinuoli are considered analogous art to the claimed invention because they disclose methods for translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Zheng in view of Kim with the time threshold and compression level disclosed by Fantinuoli. Doing so would have been beneficial because "Reducing latency allows conversation participants to engage in natural conversation without waiting for the translation to occur." (Fantinuoli Col 1; 62-64)

Regarding claim 2, Zheng and Kim do not disclose the additional limitations. 
Fantinuoli discloses: 2. The method of claim 1, further comprising: presenting the enunciation data using the time threshold identifying when the enunciation data for the input text is to be presented. ("For example, the processor may adjust the compression level upwards based on a word/time ratio of the speech segments being high enough to cause a latency above a predetermined threshold between the audible speech and a translation." Col 10; 7-11 – adjusting the compression level identifies when the data is presented.)
Zheng, Kim, and Fantinuoli are considered analogous art to the claimed invention because they disclose methods for translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Zheng in view of Kim with the word/time threshold and compression level disclosed by Fantinuoli. Doing so would have been beneficial because "Reducing latency allows conversation participants to engage in natural conversation without waiting for the translation to occur." (Fantinuoli Col 1; 62-64)

Regarding claim 3, Zheng and Kim do not disclose the additional limitations.
Fantinuoli discloses: 3. The method of claim 1, further comprising: determining a playback rate using the time threshold and a time value of the input text; and presenting the enunciation data using the playback rate. ("The speed of the text to speech model may be a speech at which the text to speech model generates the audible translated speech based on the translation. A faster speed may be perceived as a faster speaking pace and a slower speed may be perceived as a slower speaking pace. The processor may adjust the speed based on a latency of the audible translated speech relative to the audible speech. In some embodiments, the processor may adjust the speech based on the word/time ratio of the speech segments to reduce the latency of the audible translated speech relative to the audible speech." Col 10; 40-50 – see claim 2 regarding the threshold)
See claim 2 for motivation statement.

Regarding claim 5, Zheng discloses: 5. The method of claim 1, further comprising: processing the input text with the translation model; and processing output from the translation model with the prediction model to generate the translation data. ("When the opportunistic decoding window is w at decoding step t, we define the beam search over w + 1 (include the original output) as follows: ... where nextb n+w(·) performs a beam search with n + w steps, and generate y 0 t as the outputs which include both original and opportunistic decoded words. n represents the length of yt" pg. 3, Section 3 "Correction with Beam Search" – the beam search uses the output translation to predict next words.)

Regarding claim 6, Zheng does not explicitly disclose the additional limitations.
Kim discloses: 6. The method of claim 1, further comprising: presenting the enunciation data as synthesized audio in an audio stream of a live media stream of the communication session. ("[0021] The voice synthesizing unit 140 receives the second-language sentence from the similarity calculating unit 120 or the second-language sentence from the sentence translating unit 130, synthesizes the prestored voice data mapping to the received second-language sentence, and outputs the synthesized voice data in the form of analog signals." )
Zheng and Kim are considered analogous art to the claimed invention because they disclose methods for translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the method of Zheng in view of Kim with the voice synthesizing unit taught by Kim. Doing so would have been beneficial so that the user could hear the translated audio.

Regarding claim 8, Zheng discloses: 8. The method of claim 1, further comprising: adjusting the enunciation data with the translation data when the similarity score satisfies a similarity threshold. ("At the same time, it also employs a timely correction mechanism to review the extra outputs from previous steps with more source context, and revises these outputs with current preference when there is a disagreement." pg. 2, para 1)
Zheng discloses that the enunciation data is always replaced with the translation if there is a disagreement. Zheng does not disclose adjusting it based on the similarity score satisfying a threshold.
Kim discloses: 8. The method of claim 1, further comprising: adjusting the enunciation data with the translation data when the similarity score satisfies a similarity threshold. ("[0019] The similarity calculation result by Equation (1) is expressed in the form of probability. A threshold value is set and it is determined whether the calculated similarity is higher than the threshold value. If the calculated similarity is higher than the threshold value, class information of the second-language sentence corresponding to the first-language sentence selected from the translated sentence DB 150 is translated and the translated result is transferred to the voice synthesizing unit 140 without passing through the sentence translating unit 130. On the other hand, if the calculated similarity is lower than the threshold value, user selection is requested or the first-language sentence (i.e., the voice recognition result) is transferred to the sentence translating unit 130. ..." )
See claim 1 for motivation statement.

Regarding claim 9, Zheng discloses: 9. The method of claim 1, further comprising: presenting a correction from the enunciation data after adjustment of the enunciation data when the similarity score satisfies a similarity threshold. ("At the same time, it also employs a timely correction mechanism to review the extra outputs from previous steps with more source context, and revises these outputs with current preference when there is a disagreement." pg. 2, para 1)
Zheng does not disclose: the similarity score satisfies a threshold.
Kim discloses: 9. The method of claim 1, further comprising: presenting a correction from the enunciation data after adjustment of the enunciation data when the similarity score satisfies a similarity threshold. ("[0019] The similarity calculation result by Equation (1) is expressed in the form of probability. A threshold value is set and it is determined whether the calculated similarity is higher than the threshold value. If the calculated similarity is higher than the threshold value, class information of the second-language sentence corresponding to the first-language sentence selected from the translated sentence DB 150 is translated and the translated result is transferred to the voice synthesizing unit 140 without passing through the sentence translating unit 130. On the other hand, if the calculated similarity is lower than the threshold value, user selection is requested or the first-language sentence (i.e., the voice recognition result) is transferred to the sentence translating unit 130. ..." )
See claim 1 for motivation statement.

Regarding claim 10, Zheng discloses: 10. The method of claim 1, further comprising: presenting a correction from the enunciation data after adjustment of the enunciation data and adjustment of a playback rate. (" At the same time, it also employs a timely correction mechanism to review the extra outputs from previous steps with more source context, and revises these outputs with current preference when there is a disagreement." pg. 2, para 1 – Zheng discloses a simultaneous translation system; therefore any correction would continue to be presented after the playback rate has been adjusted as detailed in claim 3.)

Claim 11 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally, “at least one processor; and an application” of the Claim are taught by Zheng (“Transformer based wait-k model” pg. 4, section 5 implies use of a processor).

Claim 12 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.  

Claim 13 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.  

Claim 15 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.  

Claim 16 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.  

Claim 18 is a system claim with limitations corresponding to the limitations of Claim 8 and is rejected under similar rationale.  

Claim 19 is a system claim with limitations corresponding to the limitations of Claim 9 and is rejected under similar rationale.  

Claim 20 is a computer readable medium claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.  Additionally, “A non-transitory computer readable medium comprising instructions executable by a computer processor” of the Claim are taught by Zheng (“Transformer based wait-k model” pg. 4, section 5 implies use of a computer which has memory).

Claim(s) 4 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng in view of Kim and Fantinuoli as applied in claim 1 above, in further view of Hamid et al. (US 20120253785 A1).

Regarding claim 4, Zheng discloses: 4. The method of claim 1, further comprising: processing the input text with the prediction model; (see claim 1)
and processing output from the prediction model with the translation model to generate the translation data. (not explicitly disclosed)
Zheng, Kim, and Fantinuoli do not disclose a prediction model which outputs to the translation model.
Hamid discloses: processing the input text with the prediction model; and processing output from the prediction model with the translation model to generate the translation data. ("[0033] … An example language model may be generated based on determining the probability that a sequence would occur in a natural language conversation or natural language document based on observing a large amount of text in the language. Thus, the language model may statistically predict a "next" word or phrase in a sequence of words associated with a natural language that forms the basis of the language model. The language model probability value may be based on information included in a language model repository 144, which may be configured to store information obtained based on a large corpus of documents by analyzing contexts of words in documents that have been translated from a source language to a target language. Based on such analyses, the language model probability value may predict a particular word or phrase that would be expected next in a sequence of object words in a source language. According to an example embodiment, a ranked listing of suggested translations may be obtained." )
Zheng, Kim, Fantinuoli, and Hamid are considered analogous art to the claimed invention because they disclose methods for translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Zheng in view of Kim and Fantinuoli with the language model disclosed by Hamid. Doing so would have been beneficial in order to use the information included in a large parallel corpus of documents to provide the highest ranking candidate translation. (Hamid [0034])

Claim 14 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.  

Claim(s) 7 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng in view of Kim and Fantinuoli as applied in claim 1 above, in further view of Potkonjak (US 20100324894 A1).

Regarding claim 7, Zheng, Kim, and Fantinuoli do not disclose the additional limitations.
Potkonjak discloses: 7. The method of claim 1, further comprising: presenting the enunciation data as subtitle text in a video stream of a live media stream of the communication session. ("[0029] The constraints 192 and objective functions 194 can be specified for use in capturing lectures or other presentations using single or multiple distributed microphones. The constraints 192 and objective functions 194 can be specified for use in the operation of call centers where one or more of the processing stages 120-170 may be applied to voice signals generated by call center personnel. The constraints 192 and objective functions 194 can be specified for use in the operation of call centers where one or more of the processing stages 120-170 may be applied to voice signals generated by call center customers. Text generated by the V2T processing stage 130 or the T2T processing stage 140 may be displayed to a speaker or a listener. Such a display of text may support closed caption applications or other services for captions, subtitles, or the hearing impaired." )
Zheng, Kim, Fantinuoli and Potkonjak are considered analogous art to the claimed invention because they disclose methods for translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Zheng in view of Kim and Fantinuoli to output subtitles as disclosed by Potkonjak. Doing so would have been beneficial so that hearing impaired users could read the output. (Potkonjak [0029])

Claim 17 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JON C MEIS whose telephone number is (703)756-1566. The examiner can normally be reached Monday - Thursday, 8:30 am - 5:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JON CHRISTOPHER MEIS/Examiner, Art Unit 2654                   

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654
Read full office action
Prosecution Timeline

Oct 23, 2023
Application Filed
Sep 16, 2025
Non-Final Rejection — §103
Dec 19, 2025
Applicant Interview (Telephonic)
Dec 19, 2025
Examiner Interview Summary
Dec 22, 2025
Response Filed
Mar 19, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/881,473
Patent 12603087
VOICE RECOGNITION USING ACCELEROMETERS FOR SENSING BONE CONDUCTION
2y 5m to grant Granted Apr 14, 2026
18/303,296
Patent 12579975
Detecting Unintended Memorization in Language-Model-Fused ASR Systems
2y 5m to grant Granted Mar 17, 2026
17/979,989
Patent 12482487
MULTI-SCALE SPEAKER DIARIZATION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
2y 5m to grant Granted Nov 25, 2025
18/020,514
Patent 12475312
FOREIGN LANGUAGE PHRASES LEARNING SYSTEM BASED ON BASIC SENTENCE PATTERN UNIT DECOMPOSITION
2y 5m to grant Granted Nov 18, 2025
18/065,374
Patent 12430329
TRANSFORMING NATURAL LANGUAGE TO STRUCTURED QUERY LANGUAGE BASED ON MULTI-TASK LEARNING AND JOINT TRAINING
2y 5m to grant Granted Sep 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
46%
Grant Probability
99%
With Interview (+59.0%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.