Last updated: April 19, 2026

Application No. 18/734,831

METHOD FOR GENERATING SPEECH TRANSLATION MODEL, TRANSLATION METHOD, AND APPARATUS

Non-Final OA §101§102

Filed

Jun 05, 2024

Examiner

CHAWAN, VIJAY B

Art Unit

2658

Tech Center

2600 — Communications

Assignee

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

OA Round

1 (Non-Final)

Interview Optional

— +11.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 882 resolved cases, 2023–2026

Examiner Intelligence

CHAWAN, VIJAY B View full profile →

Grants 88% — above average

Career Allow Rate

776 granted / 882 resolved

+26.0% vs TC avg

Moderate +12% lift

Without

With

+11.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

21 currently pending

Career history

903

Total Applications

across all art units

Statute-Specific Performance

§101

20.9%

-19.1% vs TC avg

§103

13.8%

-26.2% vs TC avg

§102

33.8%

-6.2% vs TC avg

§112

9.4%

-30.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 882 resolved cases

Office Action

§101 §102

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claims are directed toward an abstract idea without significantly more. 
Claim 1, is rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (an abstract idea) and does not include additional elements that amount significantly more than the judicial exception. 
Step 1
Claim 1 is directed toward a “method”, which is a machine and thus falls within a statutory category under the most recent guidelines of 35 U.S.C. 101.
Step 2A, Prong 1
Claim 1 recites  instructions for “extracting, by the semantic feature extractor, a source semantic unit sequence of source language audio and a target semantic unit sequence of target language audio, wherein the source language audio corresponds to the target language audio”; “adjusting a first decoder of the plurality of decoders based on the source semantic unit sequence and the target semantic unit sequence”; and “adjusting a second decoder of the plurality of decoders based on the source semantic unit sequence, the target semantic unit sequence, a source acoustic unit sequence of the source language audio, and a target acoustic unit sequence of the target language audio, wherein the semantic feature extractor remains unchanged during the adjustment of the first decoder and the second decoder.” These limitations collectively recite the collection, evaluation and translation of information, including language evaluation and translation. As characterized by the USPTO guidance and case law, such activities fall within the abstract-idea groupings of mental processes (e.g. observations, evaluations, and judgments that could be performed in the human mind or with pen and paper) and organizing /transmitting information. Reference can be made to latest patent eligibility guidelines. Accordingly, claim 1 recites an abstract idea.
Step 2A, Prong 2 
The claim is implemented on a general purpose computer with “decoder”s. These are generic computer components performing their well-understood, routine, and conventional functions of storing and executing instructions, receiving requests, and sending content.
The claim does not recite any specific improvement to computer functionality (e.g., a particular translation algorithm, model architecture, data structure, memory organization, caching mechanism, latency-reduction technique, or network protocol that improves the operation of the computer or network). Nor does it effect a transformation of a physical article or use the abstract idea in any other manner that imposes a meaningful limit on the claim’s scope. Therefore, the claim does not integrate the abstract idea into a practical application under Step 2A, Prong 2.
Step 2B 
Beyond the abstract idea, the additional elements are the generic computer performing conventional functions. Implementing the abstract idea on generic computer components does not amount to significantly more. (Alice, 573 U.S. at 223–24).
The ordered combination of limitations mirrors the abstract idea itself performed using routine computer operations. There is no recited unconventional hardware, no technical improvement to the functioning of the computer itself, and no nonconventional arrangement of known components etc.
Accordingly, claim 1 does not include an “inventive concept” sufficient to transform the abstract idea into a patent-eligible application.
Therefore , claim 1 is directed to an abstract idea and does not recite additional elements that integrate the exception into a practical application or amount to significantly more than the exception itself. Claim 1 is therefore rejected under 35 U.S.C. § 101. Dependent claims 2-8 do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
The Independent claim 9 recite(s) the steps of a method for speech translation, wherein the method is performed by the speech translation model generated according to claim 1, the speech translation model comprises a semantic feature extractor and a plurality of decoders, and the method comprises: “generating a predicted target semantic unit sequence based on a given source semantic unit sequence of given source language audio”; and “generating a predicted acoustic unit sequence based on the given source semantic unit sequence of the given source language audio, the predicted target semantic unit sequence, and a given source acoustic unit sequence.” All the steps can be performed by a human being including applying a translation service algorithm. These limitations collectively recite the collection, evaluation and translation of information, including language evaluation and translation. As characterized by the USPTO guidance and case law, such activities fall within the abstract-idea groupings of mental processes (e.g. observations, evaluations, and judgments that could be performed in the human mind or with pen and paper) and organizing /transmitting information. Reference can be made to latest patent eligibility guidelines. 
Accordingly, claim 9 recites an abstract idea. Claims 10-12 are dependent claim and do not contain subject matter that can be overcome the rejection of independent claim. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
With respect to integration of the abstract idea into a practical application, the additional element of using a generic computing device the determining and data gathering steps amount to no more than mere instructions to apply the exception using a generic computer. The current specification on paragraph 0072,  clearly specifies that “… [0072] FIG. 11 illustrates a block diagram of an electronic device 1100 according to some embodiments of the disclosure. The device 1100 may be a device or apparatus described in the embodiments of the disclosure. As shown in FIG. 11, the device 1100 includes a central processing unit (CPU) and/or a graphics processing unit (GPU) 1101, which may perform various appropriate actions and processes according to computer program instructions stored in a read-only memory (ROM) 1102 or computer program instructions loaded from a storage unit 1108 into a random access memory (RAM) 1103. Various programs and data required for the operation of the device 1100 may also be stored in the RAM 1103. The CPU/GPU 1101, the ROM 1102, and the RAM 1103 are connected with one another through a bus 1104. An input/output (I/O) interface 1105 is also connected to the bus 1104. Although not shown in FIG. 11, the device 1100 may also include a coprocessor.” The additional elements have been considered both individually and as an ordered combination in the significantly more consideration.  The inclusion of the computer or memory and controller to perform the selecting and generating steps amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computing device cannot provide an inventive concept. Therefore, claims 13 and 20 as drafted are not patent eligible. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.  Independent claim 13 is therefore not drawn to eligible subject matter as they are directed to an abstract idea without significantly more. Claims 14-18 are dependent claims and do not contain subject matter that can be overcome the rejection of independent claim 13. Claim 20 is directed toward a non-transitory computer readable medium with instructions to implement the method of claim 1 and is rejected under similar rationale.
All dependent claims when analyzed as a whole are held to be patent ineligible under 35 U.S.C. §101 because any additional recited limitations fail to establish that the claims are not directed to an abstract idea for the same reasons already recited for the independent claims.


Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Li et al., (US 2024/0028841 A1).
As per claims 1, 9, 13 and 20 Li et al., teach a method/device/non-transitory computer readable medium to implement the method for generating a speech translation model, wherein the speech translation model comprises a semantic feature extractor and a plurality of decoders, and the method comprises (abstract): 
extracting, by the semantic feature extractor, a source semantic unit sequence of source language audio and a target semantic unit sequence of target language audio, wherein the source language audio corresponds to the target language audio (0005, 0031, 0033); 
adjusting a first decoder of the plurality of decoders based on the source semantic unit sequence and the target semantic unit sequence (0068, 0071, 0005,  0038); and 
adjusting a second decoder of the plurality of decoders based on the source semantic unit sequence, the target semantic unit sequence, a source acoustic unit sequence of the source language audio, and a target acoustic unit sequence of the target language audio, wherein the semantic feature extractor remains unchanged during the adjustment of the first decoder and the second decoder (0068, 0071, 0006, 0009, 0038-0039, 0049). 
As per claims 2 and 14, Li et al., teach the method/device according to claims 1 and 13, wherein adjusting the first decoder of the plurality of decoders comprises: obtaining a first prompt sequence by combining the source semantic unit sequence, the target semantic unit sequence, and task information, wherein the task information at least specifies a language type of the source language audio and a language type of the target language audio (0038, 0041) ; and adjusting the first decoder based on the first prompt sequence (0068, 0071, 0006, 0009, 0038-0039, 0049). 
As per claims 3 and 15, Li et al., teach the method/device according to claims 2 and 14, wherein adjusting the second decoder of the plurality of decoders comprises: obtaining a second prompt sequence by combining the source semantic unit sequence, the target semantic unit sequence, the source acoustic unit sequence, and the target acoustic unit sequence; and adjusting the second decoder based on the second prompt sequence (0068, 0071, 0006, 0009, 0038-0039, 0049). 
As per claims 4 and 16, Li et al., teach the method/device according to claims 2 and 14, further comprising: obtaining a compressed source semantic unit sequence and a compressed target semantic unit sequence by compressing the source semantic unit sequence and the target semantic unit sequence; and adjusting the first decoder by utilizing the compressed source semantic unit sequence, the compressed target semantic unit sequence, and the task information (0068, 0071, 0006, 0009, 0038-0039, 0049). 
As per claims 5 and 17, Li et al., teach the method/device according to claims 4 and 16, further comprising: obtaining a source timing value sequence and a target timing value sequence by compressing the source semantic unit sequence and the target semantic unit sequence, wherein the source timing value sequence and the target timing value sequence are associated with a pattern of the compression (0038, 0072); and adjusting a third decoder of the plurality of decoders by utilizing the compressed source semantic unit sequence, the compressed target semantic unit sequence, the source timing value sequence, and the target timing value sequence ((0068, 0071, 0006, 0009, 0038-0039, 0049). 
As per claims 6 and 18, Li et al., teach the method/device according to claims 1 and 14, wherein the semantic feature extractor comprises any one of an unsupervised model and a cluster model (0005, 0031, 0033, 0028). 
As per claims 7 and 19, Li et al., teach the method/device according to claims 2 and 14, further comprising: adjusting the first decoder using multi-task learning, wherein the multi-task learning comprises at least one of the following: a speech recognition task; a text translation task; or a speech-to-speech conversion task (0038, 0046-0047). 
As per claim 8, Li et al., teach the method according to claim 1, wherein at least one of the source language audio and the target language audio comprises an unwritten language, and the unwritten language has no handwritten text (0031, 0005, 0025 – 0026, 0031-0033). 
As per claim 9, Li et al., teach a method for speech translation, wherein the method is performed by the speech translation model generated according to claim 1, the speech translation model comprises a semantic feature extractor and a plurality of decoders, and the method comprises: generating a predicted target semantic unit sequence based on a given source semantic unit sequence of given source language audio; and generating a predicted acoustic unit sequence based on the given source semantic unit sequence of the given source language audio, the predicted target semantic unit sequence, and a given source acoustic unit sequence (0026, 0035, 0031, 0038-0039, 0063-0064, 0068. 0071). 
As per claim 10, Li et al., teach the method according to claim 9, wherein generating the predicted target semantic unit sequence comprises: obtaining a first predicted prompt sequence by combining the given source semantic unit sequence and task information (0068, 0071, 0038, 0046-0047 0006, 0009, 0038-0039, 0049); and generating the predicted target semantic unit sequence by inputting the first predicted prompt sequence into the first decoder of the plurality of decoders (0026, 0035, 0063-0064).  
As per claim 11, Li et al., teach the method according to claim 10, wherein generating the predicted acoustic unit sequence comprises: obtaining a second predicted prompt sequence by combining the given source semantic unit sequence, the predicted target semantic unit sequence, and the given source acoustic unit sequence; and generating the predicted acoustic unit sequence by inputting the second predicted prompt sequence into the second decoder of the plurality of decoders (0026, 0035, 0063-0064). 
As per claim 12, Li et al., teach the method according to claim 9, further comprising: generating predicted target language audio based on the predicted acoustic unit sequence (0026, 0035, 0063-0064).
Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
The following prior art cited maybe relevant alone or in combination to Applicant’s invention. 
Tunstall-Pedoe eta l., (US 12,518,107 B2) teach a computer implemented method for the automated analysis or use of data, comprising: learning new information and representing the new information in a structured, machine-readable representation, in which representation of data comprises semantic nodes and passages; and in which each semantic node represents an entity and is represented by an identifier; and each passage is either (i) a semantic node or (ii) a combination of semantic nodes; and where machine-readable meaning comes from choice of semantic nodes and how they are combined and ordered as passages; in which the representation of data uses a shared syntax that applies to semantic nodes and passages that represent factual statements, query statements and reasoning statements, wherein the syntax is an unambiguous syntax comprising nesting of structured, representations of data to a depth; and storing the structured representation of data in a non-transitory storage medium and automatically processing it.
Dong et al., (US 2025/0061888 A1) teach a model training method and apparatus, a speech-to-speech translation method and apparatus, and a medium. The method includes: obtaining a speech recognition sample and a real speech-to-speech translation sample; generating a pseudo-labeled speech-to-speech translation sample based on the speech recognition sample; and training a speech-to-speech translation model based on the pseudo-labeled speech-to-speech translation sample and the real speech-to-speech translation sample. Therefore, the model training precision can be improved.
Chien et al., (US 2018/0166069 A1) teach storing a speech recognition model including speech-units and basic components of acoustic models, wherein each of the speech-units includes at least one state and each state corresponds to one of the basic components of acoustic models; receiving first and second speech signals; obtaining a speech-unit sequence of a native/non-native vocabulary from a speech-analysis and unit-expansion module; recognizing the first speech signal according to the speech recognition model and the speech-unit sequence of the native/non-native vocabulary and further outputting a recognition result; and selecting an optimal component from the basic components of acoustic models according to the speech recognition model, the second speech signal, and the word corresponding to the second speech signal, and further updating the speech-units according to the best basic component of acoustic model.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Jun 05, 2024

Application Filed

Feb 04, 2026

Non-Final Rejection — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/384,607

Patent 12603089

ELECTRONIC APPARATUS PERFORMING SPEECH RECOGNITION AND METHOD FOR CONTROLLING THEREOF

2y 5m to grant Granted Apr 14, 2026

18/438,891

Patent 12592229

WAKEWORD DETECTION

2y 5m to grant Granted Mar 31, 2026

18/512,110

Patent 12586579

End-To-End Segmentation in a Two-Pass Cascaded Encoder Automatic Speech Recognition Model

2y 5m to grant Granted Mar 24, 2026

18/814,983

Patent 12585895

Communication Channel Quality Improvement System Using Machine Conversions

2y 5m to grant Granted Mar 24, 2026

18/363,309

Patent 12579968

METHOD OF DETERMINING END POINT DETECTION TIME AND ELECTRONIC DEVICE FOR PERFORMING THE METHOD

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

88%

Grant Probability

99%

With Interview (+11.6%)

2y 8m

Median Time to Grant

Low

PTA Risk

Based on 882 resolved cases by this examiner. Grant probability derived from career allow rate.