Last updated: April 19, 2026
Application No. 16/870,498
SYSTEMS, METHODS, AND APPARATUS FOR ASYNCHRONOUS SPEECH TO TEXT DATA PROCESSING

Non-Final OA §103§112
Filed
May 08, 2020
Examiner
SHALU, ZELALEM W
Art Unit
2145
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvoq Incorporated
OA Round
11 (Non-Final)
Interview Optional

— +19.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 108 resolved cases, 2023–2026
Examiner Intelligence

SHALU, ZELALEM W View full profile →
Grants only 29% of cases
Career Allow Rate
31 granted / 108 resolved
-26.3% vs TC avg
Strong +19% interview lift
Without
With
+19.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
34 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
14.3%
-25.7% vs TC avg
§103
63.4%
+23.4% vs TC avg
§102
8.1%
-31.9% vs TC avg
§112
10.8%
-29.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the Amendment filed on 12/30/2025. Claims 11-28 are pending in the case. 

Applicant Response
In Applicant’s response dated 12/30/2025, Applicant amended claims 11, 22 and 23 and argued against all objections and rejections previously set forth in the Office Action dated 09/30/2025. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 11 and Claim 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 11 and Claim 22:
	Claim 11 recites “invoking, at the thin client device, the client device configured to receive audio data and transmit the audio data over a communication link to the remotely hosted speech to text application”
This limitation is indefinite because it’s unclear to Examiner weather the “client device” is the same as the previously recited “thin client device” or the claim requires another second device. Examiner notes that if the “client device” is intended to be an APP as “a client device APP”, then the claim is not clear as what the difference is between “the client device” and the “thin client device”. The claim is also inconsistent from the specification that states a client side application executing on a thin client device rather than invoking a separate device. In addition, the claim as recited have antecedent issue because “the client device” lacks clear antecedent bases since the claim introduces both “thin client device and “” client device” without clarifying the relationship.  This limitation is also not obvious to one of ordinary skill in the art. Thus, the scope of the limitation cannot be determined by Examiner. For purpose of examination, Examiner will interpret this limitation as: “invoking, at the thin client device, the client device APP configured to receive audio data and transmit the audio data over a communication link to the remotely hosted speech to text application”. Claim 22 recites the same limitation and is rejected under 112(b) using the same rationale.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/30/2025 has been entered.

Examiner Comments
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Claim Rejections - 35 USC § 103
7.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


8.	Claims 11-28 are rejected under 35 U.S.C. 103 as being unpatentable over Sung (Pub. No.: US 20170372703 A1, Pub. Date: December 28, 2017) in view of  Gao ( US 20160351194 A1 , 2016-12-01 ) in view Yanagihara (Pub. No.: US 20120166192 A1, Pub. Date: June 28, 2012.) in view of Yellin (Pub. No.: US 20210329089 A1 Pub. Date 2021-10-21). 

Regarding independent Claim 11, 
	Sung teaches a method to allow a thin client using dictation to provide dictation functionality for a primary application through a client device APP (see Sung: Fig.1, [0045], “The user device 102 (a thin client e.g. a mobile phone, smart phone, personal digital assistant (PDA), music player, e-book reader, tablet computer, a wearable computing device, laptop computer, desktop computer, or other portable or stationary computing device) detects the spoken input and records audio data (dictation) that represents the voice command 108.”, i.e. the recording of the audio data by the client device is the dictation functionality), the method comprising:
providing a primary application operating on the thin client device (see Sung: Fig.1, [0044], “The user 102 makes a request to a digital assistant using the user device 104”, i.e. the digital assistant is a primary application operating on the thin client device), wherein the primary application is configured to receive data from the client device APP (see Sung: Fig.1, [0044], “The user 102 may make the request to digital assistant functionality accessed through or provided by the user device 104.”, i.e. the digital assistant is the primary application that receive data/command.)
invoking, at the thin client device, the client device APP configured to receive audio data (see Sung: Fig.2, [0045], “user device 104 receives a user request from the user 102. The user 102 may make the request to digital assistant functionality accessed through or provided by the user device 104. The user 102 may invoke the digital assistant in any multiple ways, such as speaking a hot word, pressing an on-screen button, pressing, and holding a "home" button, performing a gesture. The user may make the request through any appropriate type of user input, such as typed input or voice input. In the illustrated example, the user 102 speaks a voice command 108, "Set a reminder for tomorrow at 4:00 pm." The user device 102 detects the spoken input and records audio data that represents the voice command 108.”), and transmit the audio data over an internet based  communication link to the remotely hosted speech to text application (see Sung: Fig.2, [0046], “user device 104 sends (transmits) data indicating the user request 115 to the server system 110. For example, when the request is made as a voice input, the user device 104 can provide audio data for the user's utterance. The audio data can be an audio waveform recorded by the user device 102, a compressed form of the audio information, or information derived or extracted from recorded audio, such as data indicating speech features such as mel frequency coefficients.” … [0036], “The network 106 can include public and/or private networks and can include the Internet.” … [0071], “The client device 102 can use an on-device buffer when it needs to execute actions using a server, but has no internet connection at the time of the user's conversation.”)
determining, by the client device APP on the thin client device, whether the internet to transmit the audio data is available to allow communication of the audio data to the remotely hosted speech to text application (see Sung: Fig.3, [0088], “factors may be considered in determining whether to perform an action synchronously or asynchronously. For example, a device may determine that an action that is classified as appropriate for synchronous execution. However, the device may determine that the action involves communication with a server, and that network connectivity is temporarily disconnected or that the server is currently responding slowly or is unavailable.”)
if the communication link to the remotely hosted speech to text application is available (see Sung: Fig.1, [0037] “A first network round-trip may be required for a client device to send speech data to a server for speech recognition and then receive a transcription of the speech data. Once the client device processes the transcribed text, a second network round-trip may then be required for a local application of the client device to communicate with a back-end application server and receive confirmation from the application server.”)
transmitting the audio data to the remotely hosted speech to text application wherein the remotely hosted speech to text application is configured to convert the audio data to textual data (see Sung: Fig.1, [0047], “the server system 110 interprets the user request 115 to determine what action the user 102 has requested to be performed. The server system 110 can also determine other details about how the action should be performed. The server system 110 includes a request interpreter module 120 that analyzes the request. In some implementations, the request interpreter module 120 obtains text representing the user request.”)
subsequently to determining whether the internet is available, and only if the communication link to the remotely hosted speech to text application is determined to not be available (see Sung: Fig.1, [0088], “the device may determine that the action involves communication with a server, and that network connectivity is temporarily disconnected or that the server is currently responding slowly or is unavailable.”), … [0071], “The client device 102 can use an on-device buffer when it needs to execute actions using a server, but has no internet connection at the time of the user's conversation.” … [0072], the client device 104 uses buffering of requests to manage network connectivity outages and delays or unavailability of application servers 112.”… [0071], “The client device 102 can use an on-device buffer when it needs to execute actions using a server, but has no internet connection at the time of the user's conversation.”))
generating, on the thin client device, an audio data file data (see Sung: Fig.2, [0013], “a text-to-speech system is used to generate audio data comprising synthesized speech.”); [0045], “The user device 102 detects the spoken input and records audio data that represents the voice command 108,” i.e., the queue of commands are audio data files that are stored in the thin client device for execution at a later time), 
wherein the audio data file is configured to store, subsequent to shutdown of the primary application, until restoration of the internet, audio data generated by a user and received by the thin client device (see Sung: Fig.2, 0043], “the asynchronous nature of processing can allow a device to cache (store) interactions or deal with low connectivity. A queue of commands may be created at a device and then be sent for later execution. A device that lacks connectivity to a server can still receive commands and store them, then send them to a server for processing once connectivity is restored.”… [0071], “The client device 102 can use an on-device buffer when it needs to execute actions using a server, but has no internet connection at the time of the user's conversation.” … [0072], the client device 104 uses buffering of requests to manage network connectivity outages and delays or unavailability of application servers 112.”))
generating, on the thin client device, a context file (see Sung: Fig.2, 00436, “In stage (B), the user device 104 sends data indicating the user request 115 to the server system 110. For example, when the request is made as a voice input, the user device 104 can provide audio data for the user's utterance. The audio data can be an audio waveform recorded by the user device 102, a compressed form of the audio information, or information derived or extracted from recorded audio, such as data indicating speech features such as mel-frequency coefficients.”), […]
monitoring, at the thin client device, for re-establishment of the internet to the remotely hosted speech to text application and transmitting the audio data from the audio data file to the remotely hosted speech to text application wherein the remotely hosted speech to text application is configured to convert the audio data from the audio data file to textual data (see Sung: Fig.2, [0072], “requested action may be designated as being most appropriate for synchronous execution, upon determining that connectivity with an application server needed to perform the action is not available, the client device 104 may store data causing the action to be performed at a later time. For example, the task may be scheduled, placed in a buffer of tasks to be completed, set to occur in response to connectivity being restored, and/or set to be retried at a certain time period. The client device 104 can use a multi-threaded or multi-process technique to receive and fulfill other user requests in the meantime.”),
receiving the textual data generated by the remotely hosted speech to text application (see Sung: Fig.3, [0059], “The server system 110 includes a request interpreter module 120 that analyzes the request. In some implementations, the request interpreter module 120 obtains text representing the user request. For voice requests, the request interpreter module 120 may obtain a transcription for received audio from an automated speech recognizer, which may be provided by the server system 110 or another system.”)

As shown above, Sung teaches or suggests between synchronous and asynchronous speech to text transcription requested by a user may also be performed by one or more client devices (thin Clint devices), or by a combination of a server system and one or more client devices when network connectivity is temporarily disconnected or that the server is currently responding slowly or is unavailable.

However, Sung does not teach the system wherein:
if the internet to the remotely hosted speech to text application is determined to not be available, generating, on the thin client device, a context file wherein the context file is configured to store, subsequent to shutdown of the primary application until reception of textual data from the audio data is received, data, commands, or data and commands such that the context file comprises a command to launch the primary application and navigate to a text entry field for which the audio data was generated in a background of the thin client device;
generating, on the thin client device, a context file that comprise command to navigate to a text entry field for which the audio data was generated in a background of the thin client device; 
subsequent to receiving the textual data, launching the primary application and navigating to the text entry field using the data, commands, or data and command stored in the context file, and 
populating the text entry field with the textual data by replacing the temporary data.

Gao teaches the system wherein:
if the internet to the remotely hosted speech to text application is determined to not be available (see Gao: Fig.6, [0050], “Due to the potential for connectivity issues with such devices, as well as the general latency that may be experienced even when connectivity issues are not present, it may also be desirable in some instances to incorporate local or offline processing functionality, including both voice to text and semantic processing functionality, within a voice-enabled electronic device”), generating, on the thin client device, a context file wherein the context file is configured to store, subsequent to shutdown of the primary application until reception of textual data from the audio data is received, data, commands, or data and commands (see Gao: Fig. 3, [0046], “a voice processing routine 100 that may be executed by voice-enabled device 52 to handle a voice input. Routine 100 begins in block 102 by receiving voice input, e.g., in the form of a digital audio signal. In this implementation, an initial attempt is made to forward the voice input to the online search service (block 104). If unsuccessful, e.g., due to the lack of connectivity or the lack of a response from the online search service, block 106 passes control to block 108 to convert the voice input to text tokens (block 108, e.g., using module 64 of FIG. 2), parse the text tokens (block 110, e.g., using module 68 of FIG. 2), and build an action from the parsed text (block 112, e.g., using module 72 of FIG. 2)”), such that the context file comprises a command to launch the primary application (see Gao: Fig. 3, [0046], “build an action from the parsed text (block 112, e.g., using module 72 of FIG. 2). The resulting action is then used to perform client-side rendering and synchronization (block 114, e.g., using module 62 of FIG. 2), and processing of the voice input is complete.”), […]
Because Sung and Gao are in the same/similar field of endeavor speech to text transcription and performing an action on user interface accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Sung to include a context file that include a command to launch the application in the background of the thin client device of as taught by Gao. After modification of Sung, the synchronous and asynchronous speech to text transcription to generate action on the client device use interface can also facilitate the client device application to launch an application that that was generated by the speech to text transcription command requires as taught by Gao. One would have been motivated to make such a combination to improve efficiency of using and electronic device by allowing speech command to launch an application.

Yanagihara  teaches the system wherein:
generating, on the thin client device, a context file that comprise command to navigate to a text entry field for which the audio data was generated in a background of the thin client device (see Yanagihara: Fig.7, [0065], “The textual representation, at state 740, is presented to the user. For example, a presentation engine (e.g., the presentation engine 460 of FIG. 4) can present the textual representation to the user using a display (e.g., the input window 1300.)
navigating to the text entry field using the data, commands, or data and command stored in the context file (see Yanagihara: Fig.7, [0065], “The textual representation, at state 740, is presented to the user. For example, a presentation engine (e.g., the presentation engine 460 of FIG. 4) can present the textual representation to the user using a display (e.g., the input window 1300.), and 
populating the text entry field with the textual data by replacing the temporary data (see Yanagihara: Fig.7, [0065], “The textual representation, at state 740, is presented to the user. For example, a presentation engine (e.g., the presentation engine 460 of FIG. 4) can present the textual representation to the user using a display (e.g., the input window 130). In some implementations, the mobile device can receive edits on the displayed text. For example, a user can use a virtual keyboard (e.g., the virtual keyboard 140 of FIG. 1) to revised the displayed text. Based on the received edits, the mobile device can correct the displayed text.”)
	Because both Sung, Gao, and Yanagihara are in the same/similar field of endeavor speech to text transcription and performing an action on user interface accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the method of Sung to include a method of storing, in the context file, data, commands, or data and commands such that on execution, the thin client device can navigate to a text entry field for which the audio data was generated as taught by Yanagihara. After modification of Sung, the synchronous and asynchronous speech to text transcription to generate action on the client device use interface can also facilitate the client device application to navigate to a text entry field to enter text data that was generated by the speech to text transcription taught by Yanagihara. One would have been motivated to make such a combination to provide users an easier, efficient, and time saving document processing and data entry application by effectivity generating textual data from the speech data.
	As shown above, Sung, Gao and Yanagihara teaches or suggest all the limitations of Claim 1. Yanagihara teaches providing an application such as user interface 110 that is used to compose a text message, such as a text message for an electronic mail (email) application, a short message service (SMS) application, a word processing application, a data entry application, and/or an instant message (IM) application, among many others. Sung teaches processing user speech request asynchronously or synchronously by a combination of a server system and one or more client devices. 
	
However, Sung, Gao and Yanagihara does not teach or suggest the system wherein:
when communication link to the remotely hosted application is not available, the context file includes a command to launch the primary application in the background of the thin client device.
	However, Yellin teaches or disclose the system wherein when internet to the remotely hosted application is not available (see Yellin: Fig.21, [1019], “app code that is used to download content can be identified offline and used for prefetch.”, i.e. the app code is used to launch a content when there is no communication with the remote server.),  the context file includes a command to launch the primary application in the background of the thin client device (see Yellin: Fig.21, [0960], “ the code to refresh the app is not provided by the app, but rather the app might be launched in the background based on a set of instructions defined in advance for the app (e.g., defined for a software agent in the device, possibly residing in the OS). Alternatively, a refresh might be accomplished by killing the process running the app and issuing a launch request for the app (as if the user had clicked on the app) or for the in-app content (as if the user had clicked on the in-app content link)—with a key difference being that the refresh takes place in the background.”)
	Because Sung, Gao, Yanagihara and Yellin are in the same/similar field of endeavor speech to text transcription and performing an action on user interface accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Sung to include a context file that include a command to launch the application in the background of the thin client device of as taught by Yellin. After modification of Sung, the synchronous and asynchronous speech to text transcription to generate action on the client device use interface can also facilitate the client device application to launch an application that that was generated by the speech to text transcription command requires as taught by Yellin. One would have been motivated to make such a combination to improve efficiency of using and electronic device by allowing speech command to launch an application.
	
Regarding Claim 12, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim1. Yanagihara teaches the method comprising inserting temporary data into the text entry field (see Yanagihara: Fig.7, [0065], “The textual representation, at state 740, is presented to the user. For example, a presentation engine (e.g., the presentation engine 460 of FIG. 4) can present the textual representation to the user using a display (e.g., the input window 1300.)
	One would have been motivated to combine Sung, Gao, Yanagihara and Yellin, before the effective filing date of the invention because it provides the benefit where speech-to-text application may be improved and made more efficient by incorporating offline or local processing functionality for processing voice inputs. (Sung-7076 – [0003].)

Regarding Claim 13, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim1. Yanagihara teaches the method wherein the text entry field is an editable tab in a graphical user interface (see Yanagihara: Fig.1B, [0017], “The editing interface 110 (editable tab) can support speech input from the user. For example, the mobile device 100 can receive speech through a microphone 160. In some implementations, the editing interface 110 can display text derived from the received speech using the input window 130”)
	One would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to include an editable tab in a graphical user interface in the data entry field as taught by Yanagihara. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.

Regarding Claim 14, 
	Sung, Gao, Yanagihara, and  Yellin teaches all the limitations of Claim1. Yanagihara further teaches the method wherein the text entry field is a word document (see Yanagihara: Fig.1B, [0017], “user can use the editing interface 110 to compose a text message, such as a text message for an electronic mail (email) application, a short message service (SMS) application, a word processing application, a data entry application, and/or an instant message (IM) application, among many others.”). 
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to include an editable tab in a graphical user interface in the data entry field as taught by Yanagihara. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.

Regarding Claim 15, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim1. Sung further teaches the method wherein the context file comprises metadata appended to the audio data file (see Sung: Fig.1, [0083], “server system 110 uses additional information to determine whether a requested action should be performed synchronously or asynchronously. For example, the user device 104 can send context information (metadata) indicating its current context. This context information may include, for example, data indicating items visible on a display of the user device 104, data indicating applications installed or running on the user device 104.”)

Regarding Claim 16, 
	Sung, Gao, Yanagihara and  Yellin, teaches all the limitations of Claim1. Yanagihara further teaches the method wherein the context file is transmitted with the audio data to the remotely hosted speech to text application (see Sung: Fig.1, [0047], “the server system 110 interprets the user request 115 to determine what action the user 102 has requested to be performed. The server system 110 can also determine other details about how the action should be performed. The server system 110 includes a request interpreter module 120 that analyzes the request. In some implementations, the request interpreter module 120 obtains text representing the user request.”) and wherein the context file is received by the client device APP  with the textual data generated by the remotely hosted speech to text application (see Sung: Fig.1, [0059], “the content of the confirmation message can be generated based on the requested action. For example, for the action of setting a reminder, message text 132 can be generated such as “Okay, I'll set the reminder.” The message can be provided in any appropriate form, such as text data, audio data, or both. The server system 110 can use a text-to-speech module 124 to generate audio data 134 that includes a synthesized utterance of the message text.”)

Regarding Claim 17, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 1. Sung further teaches the method wherein receiving, at the thin client device, comprises receiving an executable file (see Sung: Fig.3, [0063], “The action is caused to be performed asynchronously to the user request (310). The execution of the action can be decoupled from the user's conversation with the digital assistant, allowing other requests to the digital assistant to be received and processed independently and in parallel to the first request.”)

Regarding Claim 18, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 12. Sung further teaches the wherein the converting the audio data into temporary data by an alternative speech to text application executing on the thin client device (see Sung: Fig.2, [0097], “Based on determining that the second action is not classified as an action to be performed asynchronously to the second user request, the second action can be caused to be performed synchronously with respect to the user request. Confirmation can be provided to the client device after synchronous execution has completed.”)

Regarding Claim 19, 
	Sung, Gao, Yanagihara, and  Yellin teaches all the limitations of Claim 11. Sung further teaches the system receiving, at the thin client device, user credentials to associate the thin client device with a user (see Yellin: [1015], “require login credentials, app-specific login procedures, and/or personalized apps (i.e. apps with different content for different users). Instead of asking the user to somehow pass his or her login credentials to the cloud so that a crawler could crawl the relevant content, the whole crawling mechanism could be avoided and the system simply uses the app itself (that already holds the user credentials) to get the needed content. Such an approach, avoids both a cloud-based crawler and the need to pass the user log in credentials to the cloud. In effect, each mobile device effectively implements its own device-based personal crawler, which is based on the app's own fetch mechanism (hence there is no difficulty in handling app-specific login procedures, user login credentials, and/or personalized apps).”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to receive at the thin client device, user credentials to associate the thin client device with a user as taught by Yellin. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.

Regarding Claim 20, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 19. Sung further teaches the system wherein receiving the textual data generated by the remotely hosted speech to text application comprises receiving the textual data at the thin client device (see Sung: Fig.1, [0061], “In stage (F), the server system 110 sends the confirmation message 136 to the user device 104. The user device 104 then outputs the confirmation to the user 102. In the illustrated example, the confirmation message 136 includes the audio data 134, and the user device 104 outputs audio with the synthesized speech of “Okay, I'll set the reminder.” The confirmation message 136 may include the message text 130, and the user device 104 may additionally or alternatively display the message text 130 to provide confirmation to the user 102.”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to receive at the thin client device, user credentials to associate the thin client device with a user as taught by Yellin. One would have been motivated to make such a combination to provide users with safe and secure information transfer in an application from the speech data.

Regarding Claim 21, 
	Sung, Gao, Yanagihara, and  Yellin teaches all the limitations of Claim 19. Yellin further teaches the system, wherein receiving the textual data generated by the remotely hosted speech to text application comprises receiving the textual data at a device associated with the user credentials (see Yellin: [1015], “require login credentials, app-specific login procedures, and/or personalized apps (i.e. apps with different content for different users). Instead of asking the user to somehow pass his or her login credentials to the cloud so that a crawler could crawl the relevant content, the whole crawling mechanism could be avoided and the system simply uses the app itself (that already holds the user credentials) to get the needed content.”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to receive at the thin client device, user credentials to associate the thin client device with a user as taught by Yellin. One would have been motivated to make such a combination to provide users with safe and secure information transfer in an application from the speech data.

Regarding independent claim 22, 
	Claim 22 is directed to a method claim and the claim has similar/same claim limitation as claim 1 and is rejected with the same rationale. Examiner notes that Claim 22 has additional claim limitation that is taught by the combination of Sung, Gao, Yanagihara, and  Yellin 
	Yellin teaches the method comprising entering, at the thin client device, user credentials to associate a user with the thin client device (see Yellin: [1015], “require login credentials, app-specific login procedures, and/or personalized apps (i.e. apps with different content for different users). Instead of asking the user to somehow pass his or her login credentials to the cloud so that a crawler could crawl the relevant content, the whole crawling mechanism could be avoided and the system simply uses the app itself (that already holds the user credentials) to get the needed content.”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to receive at the thin client device, user credentials to associate the thin client device with a user as taught by Yellin. One would have been motivated to make such a combination to provide users with safe and secure information transfer in an application from the speech data.
	Yanagihara further teaches the populating the text entry field with the textual data received at the user device associated with the user credentials (see Yanagihara: Fig.7, [0065], “The textual representation, at state 740, is presented to the user. For example, a presentation engine (e.g., the presentation engine 460 of FIG. 4) can present the textual representation to the user using a display (e.g., the input window 130). In some implementations, the mobile device can receive edits on the displayed text. For example, a user can use a virtual keyboard (e.g., the virtual keyboard 140 of FIG. 1) to revised the displayed text. Based on the received edits, the mobile device can correct the displayed text.”)
	Because both Sung and Yanagihara are in the same/similar field of endeavor speech to text transcription and performing an action on user interface accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the method of Sung to include a method of storing, in the context file, data, commands, or data and commands such that on execution, the thin client device can navigate to a text entry field for which the audio data was generated as taught by Yanagihara. After modification of Sung, the synchronous and asynchronous speech to text transcription to generate action on the client device use interface can also facilitate the client device application to navigate to a text entry field to enter text data that was generated by the speech to text transcription taught by Yanagihara. One would have been motivated to make such a combination to provide users an easier, efficient, and time saving document processing and data entry application by effectivity generating textual data from the speech data.

Regarding Claim 23, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 22. Sung, comprising populating the text entry field with temporary data if the internet to the remotely hosted speech to text application is not available and, wherein, populating the text entry field with textual data comprises replacing the temporary data with textual data (see Sung-7076: Fig.3, [0045], “for example, illustrates a voice processing routine 100 that may be executed by voice-enabled device 52 to handle a voice input. Routine 100 begins in block 102 by receiving voice input, e.g., in the form of a digital audio signal. In this implementation, an initial attempt is made to forward the voice input to the online search service (block 104). If unsuccessful, e.g., due to the lack of connectivity or the lack of a response from the online search service, block 106 passes control to block 108 to convert the voice input to text tokens (block 108, e.g., using module 64 of FIG. 2), parse the text tokens (block 110, e.g., using module 68 of FIG. 2), and build an action from the parsed text (block 112, e.g., using module 72 of FIG. 2). The resulting action is then used to perform client-side rendering and synchronization (block 114, e.g., using module 62 of FIG. 2), and processing of the voice input is complete.”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to include populating the text entry field with temporary data if the communication link to the remotely hosted speech to text application is not available as taught by Sung-7076. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.

Regarding Claim 24, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 23. Sung further teaches the system comprising generating the temporary data using an alternative speech to text application (see Gao: Fig.8, [0073], “Block 244 then determines whether there is sufficient available storage space in the voice to text model, e.g., in an amount of storage space allocated to the model. If so, control passes to block 246 to dynamically update the voice to text model to recognize the list of relevant context sensitive entities, e.g., by training the model, incorporating paths associated with the entities into the model, or in other manners. Routine 240 is then complete.	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to include comprising generating the temporary data using an alternative speech to text application as taught by GAO. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.

Regarding Claim 25, 
	Sung, Gao, Yanagihara, Yellin teaches all the limitations of Claim 22. Sung further teaches the system wherein the user device associated with the user credentials is the thin client device (see Yellin: Fig.7, [0259], “the identity of the party using the enhanced prefetch functionality and requiring permission to access content on behalf of the user might also be passed to the OS-MCD Agent, which can pass it to the Content Source (e.g., using a function callback made available by the App through the Prefetcher API).”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to include the user device associated with the user credentials is the thin client device as taught by Yellin. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.
Regarding Claim 26, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 22. Sung further teaches the system wherein the context file is transmitted to the remotely hosted speech to text application along with the audio data file (see Sung: Fig.2, [0046], “the user device 104 sends (transmits) data indicating the user request 115 to the server system 110. For example, when the request is made as a voice input, the user device 104 can provide audio data for the user's utterance. The audio data can be an audio waveform recorded by the user device 102, a compressed form of the audio information, or information derived or extracted from recorded audio, such as data indicating speech features such as mel frequency coefficients.”)  and, wherein, the context file is received along with the textual data generated by the remotely hosted speech to text application (see Sung: Fig.1, [0047], “the server system 110 interprets the user request 115 to determine what action the user 102 has requested to be performed. The server system 110 can also determine other details about how the action should be performed. The server system 110 includes a request interpreter module 120 that analyzes the request. In some implementations, the request interpreter module 120 obtains text representing the user request.”)

Regarding Claim 27, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 26. Sung further teaches the system wherein the context file comprises an executable file (see Sung: Fig.3, [0063], “The action is caused to be performed asynchronously to the user request (310). The execution of the action can be decoupled from the user's conversation with the digital assistant, allowing other requests to the digital assistant to be received and processed independently and in parallel to the first request.”)

Regarding Claim 28, 
	Sung, Gao, Yanagihara, and Yellin teaches all the limitations of Claim 27. Sung further teaches the system comprising executing the executable file such that the application is invoked in a background (see Yellin: Fig.21, [0960], “the code to refresh the app is not provided by the app, but rather the app might be launched in the background based on a set of instructions defined in advance for the app (e.g., defined for a software agent in the device, possibly residing in the OS). Alternatively, a refresh might be accomplished by killing the process running the app and issuing a launch request for the app (as if the user had clicked on the app) or for the in-app content (as if the user had clicked on the in-app content link)—with a key difference being that the refresh takes place in the background.”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to further modify (refer to claim 1) the teaching of Sung to as taught by Yellin. One would have been motivated to make such a combination to provide users with quicker, effective, and time saving document processing mechanism to enter text entry data in an application from the speech data.

Response to Arguments
Claim Rejections - 35 U.S.C. § 112(b), 
	The rejection to the claims as being indefinite under - 35 U.S.C. § 112(b), has been sustained based on applicant amendment.
	
Claim Rejections - 35 U.S.C. § 103,
Applicant's prior art arguments with respect to the currently amended independent claims and the dependent claims have been fully considered but they are moot in view of the new grounds of rejection presented above. Applicant is respectfully referred to the complete rejections presented above and the newly cited portions of the references previously relied upon. Examiner further notes that Applicant’s arguments are mere allegations that the cited art does not teach the limitations of the independent claim as amended and do not explicitly show any deficiencies with the previously cited art of the record in relationship with the newly recited limitations. 
Thus, Examiner respectfully reasserts that the combination of Sung, in view of Gao, Yanagihara, and Yellin sufficiently teaches all the limitations recited in the independent claims, as amended, and therefore claims 11-28 are still rejected under 35 U.S.C. 103 as being unpatentable over Sung, in view of Gao, Yanagihara, and Yellin.

	Conclusion
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
PGPUB
 NUMBER:
INVENTOR-INFORMATION:
TITLE / DESCRIPTION
US 20120022866 A1
Ballinger; Brandon M.
Title:  Language Model Selection For Speech-to-Text Conversion
Description: is document relates to systems and techniques for multi-modal input into an electronic device and conversion of spoken input to text.


	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZELALEM W SHALU whose telephone number is (571)272-3003. The examiner can normally be reached M- F 0800am- 0500pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Zelalem Shalu
Examiner
Art Unit 2145



/Zelalem Shalu/Examiner, Art Unit 2145                                                                                                                                                                                                        



/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2145
Read full office action
Prosecution Timeline

May 08, 2020
Application Filed
Dec 09, 2021
Non-Final Rejection — §103, §112
Mar 16, 2022
Response Filed
May 12, 2022
Final Rejection — §103, §112
Jul 18, 2022
Response after Non-Final Action
Aug 17, 2022
Request for Continued Examination
Aug 25, 2022
Response after Non-Final Action
Sep 01, 2022
Non-Final Rejection — §103, §112
Dec 12, 2022
Response Filed
Dec 21, 2022
Applicant Interview (Telephonic)
Dec 21, 2022
Examiner Interview Summary
Feb 11, 2023
Final Rejection — §103, §112
Apr 17, 2023
Response after Non-Final Action
Apr 20, 2023
Response after Non-Final Action
May 15, 2023
Request for Continued Examination
May 16, 2023
Response after Non-Final Action
Jun 03, 2023
Non-Final Rejection — §103, §112
Oct 24, 2023
Response Filed
Jan 27, 2024
Final Rejection — §103, §112
Jul 09, 2024
Request for Continued Examination
Jul 13, 2024
Response after Non-Final Action
Aug 09, 2024
Non-Final Rejection — §103, §112
Nov 15, 2024
Response Filed
Dec 10, 2024
Final Rejection — §103, §112
May 22, 2025
Request for Continued Examination
May 28, 2025
Response after Non-Final Action
Jun 14, 2025
Non-Final Rejection — §103, §112
Sep 17, 2025
Response Filed
Sep 25, 2025
Final Rejection — §103, §112
Dec 30, 2025
Request for Continued Examination
Jan 05, 2026
Response after Non-Final Action
Jan 08, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/827,588
Patent 12477016
AUTOMATION OF VISUAL INDICATORS FOR DISTINGUISHING ACTIVE SPEAKERS OF USERS DISPLAYED AS THREE-DIMENSIONAL REPRESENTATIONS
2y 5m to grant Granted Nov 18, 2025
17/808,093
Patent 12468969
METHODS FOR CORRELATED HISTOGRAM CLUSTERING FOR MACHINE LEARNING
2y 5m to grant Granted Nov 11, 2025
15/770,665
Patent 12419611
PATIENT MONITOR, PHYSIOLOGICAL INFORMATION MEASUREMENT SYSTEM, PROGRAM TO BE USED IN PATIENT MONITOR, AND NON-TRANSITORY COMPUTER READABLE MEDIUM IN WHICH PROGRAM TO BE USED IN PATIENT MONITOR IS STORED
2y 5m to grant Granted Sep 23, 2025
17/344,053
Patent 12153783
User Interfaces and Methods for Generating a New Artifact Based on Existing Artifacts
2y 5m to grant Granted Nov 26, 2024
17/573,118
Patent 12120422
SYSTEMS AND METHODS FOR CAPTURING AND DISPLAYING MEDIA DURING AN EVENT
2y 5m to grant Granted Oct 15, 2024
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

11-12
Expected OA Rounds
29%
Grant Probability
48%
With Interview (+19.0%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEMS, METHODS, AND APPARATUS FOR ASYNCHRONOUS SPEECH TO TEXT DATA PROCESSING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email