Last updated: April 19, 2026

Application No. 16/992,489

SYSTEM AND METHOD USING CLOUD STRUCTURES IN REAL TIME SPEECH AND TRANSLATION INVOLVING MULTIPLE LANGUAGES, CONTEXT SETTING, AND TRANSCRIPTING FEATURES

Final Rejection §103

Filed

Aug 13, 2020

Examiner

JACKSON, JAKIEDA R

Art Unit

2657

Tech Center

2600 — Communications

Assignee

Wordly Inc.

OA Round

8 (Final)

Interview Optional

— +15.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 905 resolved cases, 2023–2026

Examiner Intelligence

JACKSON, JAKIEDA R View full profile →

Grants 74% — above average

Career Allow Rate

669 granted / 905 resolved

+11.9% vs TC avg

Strong +15% interview lift

Without

With

+15.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

35 currently pending

Career history

940

Total Applications

across all art units

Statute-Specific Performance

§101

25.8%

-14.2% vs TC avg

§103

42.5%

+2.5% vs TC avg

§102

21.8%

-18.2% vs TC avg

§112

3.5%

-36.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 905 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
In response to the Office Action mailed June 3, 2025, applicant submitted an amendment filed on December 3, 2025, in which the applicant amended and requested reconsideration.

Response to Arguments
Applicants argue that the prior art cited fails to teach the claims as amended. However, Diamant teaches translating using artificial intelligence (p. 0023, 0033, 0109).  Therefore, Applicants arguments have been considered, but are not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 9-11, 14-16, 18-19, 21 and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Diamant et al. (PGPUB 2019/0341050), hereinafter referenced as Diamant in view of Lin et al. (PGPUB 2014/0358516), hereinafter referenced as Lin.

Regarding claims 1 and 9, Diamant discloses a system and method, hereinafter referenced as a system for using cloud structures in real time speech and translation involving multiple languages, comprising: 
a computer server including a processor and a memory for storing an application that configures the computer server to perform functions including (application; p. 0155-0158): 
receiving audio content in a first spoken language from a first user’s speaking device (speaker speaking Chinese; p. 0061-0062), 
transcribing the audio content into transcribed audio content in the first spoken language (p. 0023-0024, 0060-0064)
receiving a second language preference from a second user’s client device and a second language preference from a second client device, the first and second language preference differing from the spoken language and each other (preferred language; p. 0061-0062),
a first translation engine configured to translate by artificial intelligence (p. 0023, 0033, 0109) and receive a digital transmission of the transcribed audio content and the first language preference (translate and send data; p. 0061-0062), but does not specifically teach a receiving a first language preference from a first client device associated with a second spoken language, wherein the first language preference associated with the second spoken language differs from the first spoken language, receiving a second language preference from a second client device associated with a third spoken language, wherein the second language preference associated with the third spoken language differs from the first spoken language and the second spoken language; a first translation engine configured to receive a digital transmission of the transcribed audio content and the first language preference, a second translation engine configured to receive a digital transmission of the transcribed audio content and the second language preference, wherein the application further configures the computer server to receive the first and second translated transcribed content from the first and second translation engines, to send the first translated transcribed content to the first client device, to send the second translated transcribed content to the second client device, and to develop a context for the audio content in order to improve transcription and translation of the audio content from the first speaking device.
Lin discloses a system comprising:
receiving a second user's language preference from a second user's client device associated with a second spoken language, wherein the second user's language preference associated with the second spoken language differs from the first spoken language (user 102 and 108 speaks different languages; p. 0019),
receiving a third user's language preference from a third user's client device associated with a third spoken language (third language), wherein the third user's language preference associated with the third spoken language differs from the first spoken language and the second spoken language (p. 0046-0047);

a first translation engine configured to receive a digital transmission of the transcribed audio content and the second user's language preference, wherein the first translation engine is cloud-based and configured to translate the transcribed audio content from the first spoken language into a second user's translated transcribed content (fig. 2, element 208 and fig. 3; p. 0033-0034, 0040-0044); and 
a second translation engine configured to receive a digital transmission of the transcribed audio content and the third user's language preference, wherein the second translation engine is cloud-based and configured the transcribed audio content from the first spoken language into a third user's translated transcribed content (fig. 2, element 212 and fig. 3; p. 0035-0036, 0040-0044),
wherein the application further configures the computer server to receive the second user's translated transcribed content and the third user's translated transcribed content, to send the second user's translated transcribed content to the second user's client device, to send the third user's translated transcribed content to the third user's client device(fig. 2-3 with p. 0033-0036, 0040-0044), and to develop first user's context from audio content of the first user, to develop a second user's context from audio content of the second user, and to blend the first user's context with the second user's context to generate a blended context (fig. 3), that improves transcription and translation for all users
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to provide a flexible, convenient system.
Regarding claims 2 and 14, Diamant discloses a system wherein the application further configures the computer server to carry the context forward across content provided by additional speaking devices and spoken languages beyond the first spoken language (context; p. 0077).  
Regarding claim 3, Diamant discloses a system wherein the application further configures the computer server to maintain a running transcript of the spoken audio content and permits client devices to submit annotations to the transcript (annotations; p. 0124).  
Regarding claim 4, Diamant discloses a system wherein the submitted annotations at least one of summarize, explain, add to, and question portions of transcripts highlighted by the annotation (summarize aspects of the conference transcription; p. 0024, 0082, 0124).  
Regarding claim 5, Diamant discloses a system wherein the application further configures the computer server to selectively rely on a third translation engine to supplement translation actions of the first translation engine (add data; p. 0082, 0124).  
Regarding claim 10, Diamant discloses a system wherein actions of establishing and adjusting the context are based on factors comprising at least one of subject matter of the third user’s translated transcribed audio, settings in which the portions are spoken, audiences of the portions including at least one client device requesting translation into the third language, and cultural considerations of users of the at least one client device (population, distinct characteristics, accents; p. 0059-0062).  
Regarding claim 11, Diamant discloses a system wherein the factors further include cultural and linguistic nuances associated with translation of the first language to the third language and translation of the second language to the third language (population, distinct characteristics, accents; p. 0059).  
Regarding claim 15, it is interpreted and rejected for similar reasons as set forth above in claims 1 and 9.  In addition, Lin discloses a system for using cloud structures in real time speech and translation involving multiple languages and transcript development, comprising: 
a translation engine configured to receive a call (translation p. 0033-0037); and
receiving at least one tag in the translated transcribed content placed by the client device (tags; p. 0023-0025), wherein the tag is associated with a portion of the translated transcribed content, receiving error commentary associated with the tag, wherein the error commentary alleges an error in the tagged portion of the translated transcribed content, and correcting the tagged portion of the translated transcribed content in the transcript in accordance with the error commentary (correcting error translation; p. 0031).
Regarding claim 16, Diamant discloses a system wherein the application further configures the computer server to verify the commentary prior to correcting the portion in the transcript (edit transcript; p. 0111).  
Regarding claim 18, Diamant discloses a system wherein the error alleged concerns at least one of translation, contextual issues, and idiomatic issues (correct transcript; p. 0111).  
Regarding claim 19, it is interpreted and rejected for similar reasons as set forth above.  In addition, Lin discloses a system wherein the application further configures the computer server to call up a third translation engine in the cloud structure that translates the transcribed audio content into a fourth user’s translated transcribed content in a fourth spoke language differing from the first, second and third spoken languages (different/multiple languages; p. 0019, 0033-0035, 0048).  
Regarding claim 21, it is interpreted and rejected for similar reasons as set forth above.  In addition, Lin discloses a system further comprising:
generating second user’s translated audio content in the second user’s language preference from the third user’s translated transcribed content (fig. 2-3); and 
generating third user’s translated audio content in the third user’s language preference from the third user’s translated transcribed content (fig. 2-3).  
Regarding claim 24, it is interpreted and rejected for similar reasons as set forth above.  In addition, Diamant discloses transcribed data (p. 0023-0024, 0060-0064).  Furthermore, Lin further comprising:
receiving a request, from another client device, for translation into a fourth spoken language differing from the first, second, and third spoken languages (different/multiple languages; p. 0019, 0033-0035, 0048);
translating, by using a third translation engine in the cloud structure (fig. 2), the first audio content into a fourth user’s translated audio content of text in the fourth spoken language differing from the first, second, and third spoken languages (different/multiple languages; p. 0019, 0033-0035, 0048); 
translating, by using a fourth translation engine in the cloud structure, the second audio content into more of the fourth user’s translated audio content of text in the fourth spoken language differing from the first, second, and third spoken languages (different/multiple languages; p. 0019, 0033-0035, 0048).  
 	Claims 6-8, 17 and 20, 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Diamant in view of Lin and in further view of Thomson et al. (PGPUB 2020/0175987), hereinafter referenced as Thomson.

Regarding claim 6, Diamant in view of Lin disclose a system as described above, but does not specifically teach a system wherein the application further configures the computer server to selectively blend translated content provided by the first translation engine with translated content provided by the second translation engine.  
Thomson discloses a system wherein the processor
receives audio content comprising human speech spoken in a first language (language translation; p. 1587), 
translates the content into a second language (p. 1587), 
displays the translated content in a transcript displayed on a client device viewable by a user speaking the second language (display; p. 0097);
receives at least one tag in the translated content placed by the client device, the tag associated with a portion of the content (annotations; p. 1064), 
receives commentary associated with the tag, the commentary alleging an error in the portion of the content (spelling and language translation errors; p. 0175, 1013), 
corrects the portion of the content in the transcript in accordance with the commentary (correcting spelling and language translation errors; p. 0175, 1013), 
wherein the application verifies the commentary prior to correcting the portion in the transcript (verify; p. 0897-0899, 0682, 0935);  
wherein the error alleged concerns at least one of translation, contextual issues, and idiomatic issues (correcting spelling and language translation; p. 0175, 1013); 
wherein the application sends the audio content to at least a first cloud-based translation engine for the translation (p. 0205, 1013-1016, 1691, 1701);  
wherein the application configures the computer server to selectively blend translated content provided by the first translation engine with translated content provided by the second translation engine (fusing multiple transcription; p. 0097, 0140-0160, 1704), to provide a more accurate transcription.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to provide a finalized, summarized and accurate translation.  
Regarding claim 7, it is interpreted and rejected for similar reasons as set forth above.  In addition, Thomson discloses a system wherein the application further configures the computer server to selectively blend translated content based on factors comprising at least one of the first spoken language and the first and second language preferences, subject matter of the content, voice characteristics of the spoken audio content, demonstrated listening abilities and attention levels of users of the second user’s client device, and technical quality of transmission (quality; p. 1023-1031).  
Regarding claim 8, it is interpreted and rejected for similar reasons as set forth above in claim 6.  In addition, Diamant discloses a system wherein the application further configures the computer server to dynamically build a model of translation based at least on at least one of the factors, on locations of users of the client devices, and on observed attributes of the translation engines (location; p. 0023).  Furthermore, Thomson discloses a system wherein the application dynamically builds a model of translation based at least on at least one of the factors, on locations of users of the client devices, and on observed attributes of the translation engines (p. 0510, 0735, 1094).
Regarding claim 17, Diamant discloses a system wherein users of a plurality of client devices hearing the audio content and reading the transcript additionally provide summaries (summarize aspects of the conference transcription; p. 0024, 0082, 0124), annotations (annotations; p. 0124).  In addition, Thompson discloses a system wherein users of a plurality of client devices hearing the audio content and reading the first or second transcript additionally provide summaries (summarize; p. 0168, 0521, 0590), annotations (annotations; p. 1064) and highlighting to the first or second transcript (p. 0161, 0492, 0720-0721), to provide assist with clearly identifying data.
Regarding claim 20, it is interpreted and rejected for similar reasons as set forth above.  In addition, Thomson discloses a system wherein the application further sends the audio content to a second cloud-based translation engine for the translation and selectively blends translated content provided by the first translation engine and the second translation engine with translated content provided by the third translation engine (fusing multiple transcription with different engines; p. 0097, 0140-0160, 1704).
Regarding claim 22, it is interpreted and rejected for similar reasons as set forth above.  In addition, Thomson wherein 
the second user’s translated audio content is generated by the second user’s client device (figure 1 with first language/acoustic mode; p. 1552); and
the third user’s translated audio content is generated by the third user’s client device (figure 1 with second language/acoustic mode; p. 1552).  
Regarding claim 23, it is interpreted and rejected for similar reasons as set forth above.  In addition, Thomson wherein 
the second user’s translated audio content is generated by the first translation engine (figure 1 with first language/acoustic mode; p. 1552); and
the third user’s translated audio content is generated by the second translation engine (figure 1 with first language/acoustic mode; p. 1552).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657

Read full office action

Prosecution Timeline

Aug 13, 2020

Application Filed

Mar 12, 2022

Non-Final Rejection — §103

Apr 06, 2022

Response after Non-Final Action

Aug 16, 2022

Response Filed

Dec 14, 2022

Final Rejection — §103

Apr 20, 2023

Request for Continued Examination

Apr 26, 2023

Response after Non-Final Action

May 09, 2023

Non-Final Rejection — §103

Jul 12, 2023

Applicant Interview (Telephonic)

Jul 12, 2023

Examiner Interview Summary

Aug 15, 2023

Response Filed

Nov 01, 2023

Final Rejection — §103

May 07, 2024

Request for Continued Examination

May 10, 2024

Response after Non-Final Action

May 18, 2024

Non-Final Rejection — §103

Nov 20, 2024

Applicant Interview (Telephonic)

Nov 22, 2024

Examiner Interview Summary

Nov 23, 2024

Response Filed

Jan 17, 2025

Final Rejection — §103

May 23, 2025

Request for Continued Examination

May 27, 2025

Response after Non-Final Action

May 30, 2025

Non-Final Rejection — §103

Dec 03, 2025

Response Filed

Mar 05, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/151,953

Patent 12603079

PROVIDING A REPOSITORY OF AUDIO FILES HAVING PRONUNCIATIONS FOR TEXT STRINGS TO PROVIDE TO A SPEECH SYNTHESIZER

2y 5m to grant Granted Apr 14, 2026

18/379,618

Patent 12603088

TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL

2y 5m to grant Granted Apr 14, 2026

17/750,345

Patent 12598092

SYSTEMS, METHODS, AND APPARATUS FOR NOTIFYING A TRANSCRIBING AND TRANSLATING SYSTEM OF SWITCHING BETWEEN SPOKEN LANGUAGES

2y 5m to grant Granted Apr 07, 2026

18/327,115

Patent 12597427

CONFIGURABLE NATURAL LANGUAGE OUTPUT

2y 5m to grant Granted Apr 07, 2026

18/614,575

Patent 12597418

AUDIO SIGNAL PROCESSING DEVICE AND METHOD FOR SYNCHRONIZING SPEECH AND TEXT BY USING MACHINE LEARNING MODEL

2y 5m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

9-10

Expected OA Rounds

74%

Grant Probability

89%

With Interview (+15.4%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 905 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEM AND METHOD USING CLOUD STRUCTURES IN REAL TIME SPEECH AND TRANSLATION INVOLVING MULTIPLE LANGUAGES, CONTEXT SETTING, AND TRANSCRIPTING FEATURES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email