Prosecution Insights
Last updated: April 19, 2026
Application No. 18/651,312

SYSTEM AND METHOD FOR MULTILINGUAL SPEECH-TO-SPEECH TRANSLATION WITH SPEECH REFINEMENT USING COMBINED MACHINE LEARNING MODELS

Non-Final OA §101§103§112
Filed
Apr 30, 2024
Examiner
SHAIKH, ZEESHAN MAHMOOD
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Sanas AI Inc.
OA Round
1 (Non-Final)
52%
Grant Probability
Moderate
1-2
OA Rounds
3y 2m
To Grant
99%
With Interview

Examiner Intelligence

Grants 52% of resolved cases
52%
Career Allow Rate
16 granted / 31 resolved
-10.4% vs TC avg
Strong +55% interview lift
Without
With
+55.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
32 currently pending
Career history
63
Total Applications
across all art units

Statute-Specific Performance

§101
25.7%
-14.3% vs TC avg
§103
45.8%
+5.8% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
5.8%
-34.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 31 resolved cases

Office Action

§101 §103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claim 4 and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. The claim recites, “Find all the languages present in this code, and return it as a JSON array…”. The abbreviation “JSON” should first be spelled out in word format before using the abbreviation. Appropriate correction is required. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Independent claim 1, and 12 recite, “obtain a text input, wherein the text input is associated with one or more languages”, “generate a customized prompt for a selected large language model (LLM), wherein the customized prompt concatenates a system instruction, an output language indication, and an input content, wherein the customized prompt is dynamically generated for an idiomatic translation of the text input”, “pass the customized prompt in the selected LLM to generate a translation output, wherein the translation output is an idiomatic translation”, and “present the translation output”. The limitation of text from multiple languages, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting, “a network interface”, “a user interface”, “a memory”, and “one or more processors”, nothing in the claim precludes the step from practically being performed in the mind. For example, “obtaining” in the context of this claim encompasses receiving text, which a human can do in the mind. Next, the limitation of generating a prompt, as drafted, is a process, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, nothing in the claim precludes the step from practically being performed in the mind. For example, “generating” in the context of this claim encompasses developing instructions, which a human can do in the mind or with a pen and paper. Next, the limitation of passing a prompt to generate an idiomatic translation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, “passing” in the context of this claim encompasses delivering instructions to produce a relevant translation, which a human can do in the mind or with a pen and paper. Lastly the limitation of presenting a translation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the elements listed above, “presenting” in the context of this claim encompasses displaying a translation, which a human can do with a pen and paper. The judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements, using “a network interface”, “a user interface”, “a memory”, and “one or more processors” to perform the recited limitations. These elements in these steps are recited at a high-level of generality such that is amounts no more than mere instructions to apply the exception using generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using “a network interface”, “a user interface”, “a memory”, and “one or more processors” to perform the recited limitations amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. The claim is not patent eligible. Dependent claims 2-11 and 13-20 are also rejected for the same reasons provided in independent claim 1 and 12 above. The dependent claim, including the further recited limitation, does not integrate the abstract idea into a practical application and the additional elements, taken individually and in combination do not contribute to an inventive concept. In other words, the dependent claim is directed to an abstract idea without significantly more. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. Claims 1-3, 5-6, 12-14, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kochura et al. US 20180165275 A1 (hereinafter Kochura) in view of Waibel et al. US 20110307241 A1 (hereinafter Waibel). Regarding independent claim 1 and 12, Kochura teaches a method/ apparatus, comprising: a network interface that connects the apparatus to a communication network (FIG. 1, 105); a user interface that obtains one or more user inputs from one or more users and presents an output result to the one or more users (FIG. 1, 180, 184, 186, 188); a memory; and one or more processors coupled to one or more memory units, the one or more processors configured to (FIG. 1, 112, 114, 116) obtain a text input, wherein the text input is associated with one or more languages (FIG. 2, 202, [0046] “the electronic medium may be a direct text communication between two entities”; [0036] “in an electronic communication between two entities, such as a communication between mobile phone (180) and tablet (184), two or more languages may be embedded therein with one or more idioms present”); generate a translation output, wherein the translation output is an idiomatic translation ([0038] “return a translation of the idiom with respect to the expression within the communication”); and present the translation output ([0038] “The returned idiom translation is presented on the visual display”). Kochura fails to teach generate a customized prompt for a selected large language model (LLM), wherein the customized prompt concatenates a system instruction, an output language indication, and an input content, wherein the customized prompt is dynamically generated for an idiomatic translation of the text input; pass the customized prompt in the selected LLM to However, Waibel teaches generate a customized prompt for a selected large language model (LLM), wherein the customized prompt concatenates a system instruction, an output language indication, and an input content, wherein the customized prompt is dynamically generated for an idiomatic translation of the text input; pass the customized prompt in the selected LLM to ([0012] “The system then prompts the user to verify the description, and updates the utterance and the user verified description in a first machine translation module associated with the first language”) Kochura in view of Waibel are considered to be analogous to the claimed invention because both are the same field of translation systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques identification and translation of idioms and explanations of idioms of Kochura with the technique of generating a prompt containing instructions for a language model taught by Waibel in order to improve speech-to-speech translation systems for cross-lingual communication (see Waibel [0003]). Regarding claim 2 and 13, Kochura in view of Waibel teaches all of the limitations of claim 1 and 12, upon which claims 2 and 13 depend. Additionally, Kochura teaches wherein the system instruction contains one or more elements comprising a direct instruction for multilingual detection for the input, a direct instruction for output text format, and a translation indication customized for the polished (idiomatic) translation ([0074] “Computer readable program instructions for carrying out operations of the present embodiments may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions….”). Regarding claim 3 and 14, Kochura in view of Waibel teaches all of the limitations of claim 2 and 13, upon which claims 3 and 14 depend. Additionally, Kochura teaches wherein the translation indication is further customized to indicate a polished translation (FIG. 3, 314, [0050] examiner interprets the second language idiom as the polished translation). Regarding claim 5 and 16, Kochura in view of Waibel teaches all of the limitations of claim 1 and 12, upon which claim 5 and 16 depend. Additionally, Waibel teaches wherein the one or more processors are further configured to process a voice speech by one or more users into the text input, and wherein the voice speech is transcribed into the text input by a selected speech-to-text model (FIG. 1; ASR modules 2 and 9 [0044]; [0052] “The resulting text of the user's speech is displayed via the GUI on the device screen 13 at step 28”). Regarding claims 6 and 17, Kochura in view of Waibel teaches all of the limitations of claim 1 and 12, upon which claims 6 and 17 depend. Additionally, Waibel teaches wherein the translation output is presented as a text output, a speech output or a combination of text and speech output ([0052] “The resulting text of the user's speech is displayed via the GUI on the device screen 13 at step 28”). Claims 7-9 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Kochura in view of Waibel as shown above in claim 1, in further view of Sagie (US 20140365200 A1). Regarding claim 7 and 18, Kochura in view of Waibel teaches all of the limitations of claim 1 and 12, upon which claims 7 and 18 depend. Kochura in view of Waibel fails to teach further comprising performing an LLM selection procedure to select an LLM among a group of candidate LLMs as the selected LLM. However, Sagie teaches further comprising performing an LLM selection procedure to select an LLM among a group of candidate LLMs as the selected LLM ([0011] “applying the plurality of speech recognition engines includes utilization of a language model or a modifier that is selected in accordance with a translation profile”) Kochura in view of Waibel in view of Sagie are considered to be analogous to the claimed invention because both are the same field of translation systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the translation techniques of Kochura in view of Waibel with the technique of language model selection taught by Sagie in order to improve automatic speech translation (see Sagie [0001]). Regarding claim 8, Kochura in view of Waibel in view of Sagie teaches all of the limitations of claim 7, upon which claim 8 depends. Additionally, Sagie teaches wherein the LLM selection procedure uses an LLM selection prompt instructing each candidate LLM to perform the idiomatic translation ([0144] “The user may be prompted to indicate (e.g., by operating a control) which of several candidate transcripts or translations is preferred, or to correct a candidate transcript or translation”). Regarding claim 9, Kochura in view of Waibel in view of Sagie teaches all of the limitations of claim 7, upon which claim 9 depends. Additionally, Sagie teaches wherein the LLM selection procedure uses a predefined set of text input texts ([0049] “The electronic speech signal may be received by translation processor 16 via input channel 15”, examiner interprets a channel to have a predefined threshold limit). Regarding claim 19, Kochura in view of Waibel in view of Sagie teaches all of the limitations of claim 18, upon which claim 19 depends. Additionally, Sagie teaches wherein the LLM selection procedure uses an LLM selection prompt instructing each candidate LLM to perform the idiomatic translation and a predefined set of text input texts ([0144] “The user may be prompted to indicate (e.g., by operating a control) which of several candidate transcripts or translations is preferred, or to correct a candidate transcript or translation”; [0049] “The electronic speech signal may be received by translation processor 16 via input channel 15”, examiner interprets a channel to have a predefined threshold limit). Claims 10-11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kochura in view of Waibel, as shown above in claim 1, in further view of Woo (US 20240393942 A1). Regarding claim 10, Kochura in view of Waibel teaches all of the limitations of claim 1, upon which claim 10 depends. Kochura in view of Waibel fails to teach obtaining a reference input, wherein the text input is generated based on the reference input. However, Woo teaches obtaining a reference input, wherein the text input is generated based on the reference input ([0059] “The “user language keypad list” may be stored in a variable in the form of an array, in a file with a specific name, or in a specific table in a database. As exemplified in DRAWING 7, the Environment Setting Manager program of the character input interface references the languages stored in the user language list”). Kochura in view of Waibel in view of Woo are considered to be analogous to the claimed invention because both are the same field of translation systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the translation techniques of Kochura in view of Waibel with the technique of using a file name reference input taught by Woo in order to improve supporting efficient and convenient language activities of users by integrating and providing services related to input and output of natural language, such as text input, speech recognition, machine translation, and speech synthesis, using an extended keypad in any mobile computing device consisting of input, computation, and output, which includes a personal computer, a smartphone, a tablet PC, or a smartwatch, and in holographic or augmented reality simulating the same (see Woo [0001]). Regarding claim 11, Kochura in view of Waibel in view of Woo teaches all of the limitations of claim 10, upon which claim 11 depends. Additionally, Woo teaches wherein the reference input is a file name ([0059] “The “user language keypad list” may be stored in a variable in the form of an array, in a file with a specific name, or in a specific table in a database. As exemplified in DRAWING 7, the Environment Setting Manager program of the character input interface references the languages stored in the user language list”) Regarding claim 20, Kochura in view of Waibel teaches all of the limitations of 12, upon which claim 20 depends. Kochura in view of Waibel fails to teach further comprising: obtaining a reference input, wherein the text input is generated based on the reference input, and wherein the reference input is a file name. However, Woo teaches further comprising: obtaining a reference input, wherein the text input is generated based on the reference input, and wherein the reference input is a file name ([0059]). Kochura in view of Waibel in view of Woo are considered to be analogous to the claimed invention because both are the same field of translation systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the translation techniques of Kochura in view of Waibel with the technique of using a file name reference input taught by Woo in order to improve supporting efficient and convenient language activities of users by integrating and providing services related to input and output of natural language, such as text input, speech recognition, machine translation, and speech synthesis, using an extended keypad in any mobile computing device consisting of input, computation, and output, which includes a personal computer, a smartphone, a tablet PC, or a smartwatch, and in holographic or augmented reality simulating the same (see Woo [0001]). Allowable Subject Matter Claim 4 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Koyama et al. (US 5541838 A) teaches a translation machine is provided with an idiom registering capability. If a header standing for a idiom contains two or more variable parts, these parts are represented by representative symbols. With this representing method, the idiom is allowed to be easily registered and retrieved. This makes it possible to shorten the registering time, reduce the translating toil and prevent increase of information storage capacity. The translation machine includes an input unit, a storing unit, an output unit, an idiom registering unit, and a translating unit for performing a routine translating operation. The idiom registering unit operates to register an idiom header with two or more representative symbols standing for words or word trains sharing a predetermined attribute. Horvitz (US 20060293893 A1) teaches architecture that interacts with a user, or users of different tongues to enhance speech translation. A recognized concept or situation is sensed and/or converged upon, and disambiguated with mixed-initiative user interaction with a device to provide simplified inferences about user communication goals in working with others who speak another language. Reasoning is applied about communication goals based on the concept or situation at the current focus of attention or the probability distribution over the likely focus of attention, and the user or user's conversational partner is provided with appropriately triaged choices and, images, text and/or speech translations for review or perception. The inferences can also process an utterance or other input from a user as part of the evidence in reasoning about a concept, situation, goals, and/or disambiguating the latter. The system's best understanding of the question, need, or intention at the crux of the communication can be echoed back to the user for confirmation. Context-sensitive focusing of recognition and information gathering components can be provided based on the listening, and can employ words recognized from prior or current user utterances to further focus the inference. Sharifi et al. (US 20240202469 A1) teaches automatically translating a customized automated assistant from a first language to a new language, so that the automated assistant can interpret spoken utterances in the new language and respond to such spoken utterances in the new language. For example, a customized automated assistant can be configured for use in a first language through the developer(s) providing input(s) that are in the first language, and thereafter automatically translated to a distinct second language for which no developer input is provided. The deployment of the customized automated assistant for utilization with the second language can be selective. For example, it can be selective in that it is only automatically deployed and/or is only suggested for deployment in response to determining that one or more objective criteria, that indicate accuracy and/or robustness of the second language translation of the customized automated assistant, are satisfied. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658 /RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action

Prosecution Timeline

Apr 30, 2024
Application Filed
Dec 20, 2025
Non-Final Rejection — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12579373
SYSTEM AND METHOD FOR SYNTHETIC TEXT GENERATION TO SOLVE CLASS IMBALANCE IN COMPLAINT IDENTIFICATION
2y 5m to grant Granted Mar 17, 2026
Patent 12555575
Wakeup Indicator Monitoring Method, Apparatus and Electronic Device
2y 5m to grant Granted Feb 17, 2026
Patent 12518090
LOGICAL ROLE DETERMINATION OF CLAUSES IN CONDITIONAL CONSTRUCTIONS OF NATURAL LANGUAGE
2y 5m to grant Granted Jan 06, 2026
Patent 12511318
MULTI-SYSTEM-BASED INTELLIGENT QUESTION ANSWERING METHOD AND APPARATUS, AND DEVICE
2y 5m to grant Granted Dec 30, 2025
Patent 12512088
METHOD AND SYSTEM FOR USER-INTERFACE ADAPTATION OF TEXT-TO-SPEECH SYNTHESIS
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
99%
With Interview (+55.0%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 31 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month