Prosecution Insights
Last updated: April 19, 2026
Application No. 18/599,600

CASCADED SPEECH RECOGNITION FOR ENHANCED PRIVACY

Non-Final OA §101§103§112
Filed
Mar 08, 2024
Examiner
WOZNIAK, JAMES S
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Adeia Guides Inc.
OA Round
1 (Non-Final)
59%
Grant Probability
Moderate
1-2
OA Rounds
3y 7m
To Grant
99%
With Interview

Examiner Intelligence

Grants 59% of resolved cases
59%
Career Allow Rate
227 granted / 385 resolved
-3.0% vs TC avg
Strong +40% interview lift
Without
With
+40.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
42 currently pending
Career history
427
Total Applications
across all art units

Statute-Specific Performance

§101
18.1%
-21.9% vs TC avg
§103
40.1%
+0.1% vs TC avg
§102
18.4%
-21.6% vs TC avg
§112
16.1%
-23.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 385 resolved cases

Office Action

§101 §103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 7-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 7, line 3 features a second instance of "a text-to-speech converter" when the term was originally introduced in line 2 of this claim. Thus, it is unclear whether Applicant is attempting to introduce a second instance of the term or whether this limitation should find antecedence in the earlier recitation. For the purposes of claim interpretation in the interest of compact prosecution, "a text-to-speech converter" will be construed as --the text-to-speech converter--. Claim 8 features a similar antecedent basis issue, but with respect to a "speech-to-text" converter that is construed as being preceded by --the-- instead of "a" for claim interpretation and is likewise rejected under 35 U.S.C. 112(b). Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 for being directed towards a patent ineligible mental process under the broadest reasonable interpretation (BRI). Independent Claims 1 and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims regard a process that, as drafted under its broadest reasonable interpretation, covers performance of the limitations as a process of organizing human behavior (i.e., managing personal behavior or relationships or interactions between people) in a division of labor based upon confidential information, but for the recitation of exchanging information across a network and generic computer software/components. For example, under the BRI, the process/functionality of claims 1 and 15 could be construed as organizing human behavior by: generating, by the target service, a first voice response in relation to the first user voice input (human customer agent speaks a reply back to a customer (e.g., relating to account, health appointment, or order information); receiving, (a human transcriber and/or manager can listen to speech from a customer); generating, by the target service, a second user text input in relation to the second user voice input, wherein generating the second user text input comprises: generating a plurality of second user voice input segments based on the second user voice input; transmitting each respective second user voice input segment of the plurality of second user voice input segments to a different speech-to-text converter, wherein the different speech-to-text converters generate a plurality of second user text input segments; and combining the plurality of second user text input segments to generate the second user text input (a manager of a transcription service listens to the speech and then assigns jobs to various human transcribers for manually merging into a complete transcription); generating, by the target service, a second voice response in relation to the second user text input (the manager verbally replies to the customer (e.g., letting them know that the transcription process has been completed); and This judicial exception is not integrated into a practical application. Outside of the identified abstract idea, the claimed invention only recites processing and input/output circuitry and generic computer devices which amount to no more than mere instructions to implement an otherwise abstract idea using generic computer components and the exchange of information across a network that amounts to mere data gathering for exchanging information. These additional components are used for their ordinary purposes (e.g., exchanging information and automating an otherwise human behavior in communication) and have not been invented or improved by the applicant as currently claimed. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The above identified additional generic computer components are no more than mere instructions to apply the exception using generic computer components that are well-known, routine, and conventional as is evidenced by Bancorp Services v. Sun Life (Fed. Cir. 2012) and Alice Corp. v. CLS Bank (2014). Moreover, transmission over a network is well-known per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). Accordingly, claims 1 and 15 are not directed towards patent eligible subject matter under 35 U.S.C. 101. The remaining dependent claims fail to add patent eligible subject matter to their respective parent claims: Claims 2 and 16 involve a manager listening to an initial request to start the transcription service. Claims 3 and 17 narrow the type of data in the request that can be considered and understood by a human. Claims 4, 12-13, and 18 involve a manager producing the response and having a number of different agents speak back a complete response (e.g., based upon the topic of the reply- health information and prescription information) in a sequence along with generic computer components and the exchange of information over a network. Claims 5 and 19 narrow the type of information used for segmentation that can be considered and understood by a human. Claims 6 and 20 add the exchange of additional sequencing parameters over a network where an information sequence can be considered by a human in the human activity. Claim 7 involves a manager mentally deciding whether a service as a transcriber and if not contracting the transcription job to a different service along with generic computer software to automate the manual transcription process. Claim 8 involves a process similar to claim 7 but regards the assigned agent available to talk about the response (e.g., a health insurance representative). Claim 9 narrows the process of the data gathering and exchange of information over a computer network. Claim 10 involves data gathering and then manually dividing voice based thereupon. Claim 11 involves a human listening to voice for pauses or silence. Claim 14 features the dividing of a transcription job addressed in claim 1 with an additional data gathering step specifying transcribers. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-2, 8-10, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Di Fabbrizio, et al. (U.S. PG Publication: 2021/0158811 A1) in view of Ganong, III et al. (U.S. PG Publication: 2023/0394169 A1). With respect to Claim 1, Di Fabbrizio discloses: A method comprising: receiving, by a target service, a first user voice input (target service (i.e., tenant/entity subsystem) receives audio data for a user utterance, Paragraphs 0036, 0038 and 0072-0073; Fig. 3, Elements 130A, 130B); generating, by the target service, a first voice response in relation to the first user voice input (text-to-speech synthesis engine of the tenant subsystem generates a "verbalized...response" to a "user's query," Paragraphs 0040 and 0051); transmitting the first voice response to a user device, wherein the first voice response is transmitted to the user device via a connection established by a voice assistant service between the user device and the target service (DPP server/Orchestrator acts as a proxy between the tenant subsystems and a client device and provides the verbalized response to a user, Paragraphs 0034, 0078, and 0102-0104; see also the network connection to a client device in Fig. 1 allowing the transmission and delivery of a voice output to a user (e.g., regarding an order status); receiving, from the user device, a second user voice input (target service (i.e., tenant/entity subsystem) receives audio data for a user utterance from the client device, Paragraphs 0032, 0036, 0038 and 0072-0073; Fig. 3, Elements 130A, 130B; note that user utterances are part of a dialog session that includes multiple user turns in a “dialog sequence”, Paragraphs 0023, 0033, 0039 and 0082); and generating, by the target service, a second user text input in relation to the second user voice input (use of ASR by target tenant service to generate a textual output corresponding to a user utterance, Paragraphs 0036 and 0038); generating, by the target service, a second voice response in relation to the second user text input (text-to-speech synthesis engine of the tenant subsystem generates a "verbalized...response" to a "user's query," Paragraphs 0040 and 0051); note that user utterances are part of a dialog session that includes multiple system response turns in a “dialog sequence”, Paragraphs 0023, 0033, 0039 and 0082); and transmitting the second voice response to the user device (DPP server/Orchestrator acts as a proxy between the tenant subsystems and a client device and provides the verbalized response to a user, Paragraphs 0029. 0034, 0078, and 0102-0104; see also the network connection to a client device in Fig. 1 allowing the transmission and delivery of a voice output to a user (e.g., regarding an order status); note that user utterances are part of a dialog session that includes multiple system response turns in a “dialog sequence”, Paragraphs 0023, 0033, 0039 and 0082). Although Di Fabbrizio teaches a dialog system method similar to the claim invention to process and reply to a user's utterance in dialog turns, Di Fabbrizio does not teach the piecewise segment-based voice transcription set forth in claim 1 to generate the "second user text input." Ganong, however, discloses: generating a plurality of second user voice input segments based on the second user voice input (“input speech signal maybe split 1006 into the one or more sensitive content portions and the one or more non-sensitive content portions based upon, at least in part, the one or more splitting points,” Paragraph 0074; Fig. 11, Element 1110); transmitting each respective second user voice input segment of the plurality of second user voice input segments to a different speech-to-text converter, wherein the different speech-to-text converters generate a plurality of second user text input segments (sending the portions of the speech splitter output to different ASR transcription systems to generate corresponding transcriptions (See Fig. 11, Elements 1110, 1112, 1114, 1120, and 1128); see also Paragraphs 0085 and 0090-0092); and combining the plurality of second user text input segments to generate the second user text input (combiner that operates "to form a combined transcription (e.g., combined transcription 1134) representative of all of the content (i.e., sensitive and non-sensitive portions) of input speech signal," Paragraph 0096, Fig. 11, Elements 1132 and 1134). Di Fabbrizio and Ganong are analogous art because they are from a similar field of endeavor in network-based speech recognition services. Thus, it would have been obvious to one of ordinary skill before the effective filing date to utilize the splitting/combining approach to speech transcription taught by Ganong to transcribe certain speech inputs in the dialog system of Di Fabbrizio to provide a predictable result of accurately transcribing speech while better protecting sensitive/confidential information (Ganong, Paragraphs 0013-0014). With respect to Claim 2, Di Fabbrizio further discloses: The method of claim 1, further comprising: receiving, by the target service from the voice assistant service, a request to initiate a conversation between the user device and the target service, wherein the first user voice input comprises the request to initiate the conversation (initiating user query indicative of an "objective that the user seeks to accomplish in cooperation with the tenant" (e.g., starting a dialog with that particular target service/tenant) that is then received by that tenant module for ASR, Paragraphs 0038, 0072-0073 and 0095). With respect to Claim 8, Di Fabbrizio further discloses: The method of claim 1, further comprising: determining whether the target service comprises a speech-to-text converter; and in response to determining that the target service does not comprise a speech-to-text converter, generating, by the target service, the second user text input using the different speech-to-text converters (determining whether a tenant service has a corresponding ASR engine in a “tenant profile”, otherwise a generic different (e.g., publicly available) engine is selected and used for textual output, Paragraphs 0073 and 0096). With respect to Claim 9, Ganong further discloses: The method of claim 1, further comprising: determining whether a request to enable enhanced privacy for the connection between the user device and the target service has been received; and in response to determining that the request to enable enhanced privacy has been received: transmitting each of the plurality of second user voice input segments to the different speech-to-text converters (request is received as specific rules/examples/categories for identifying PII/PHI that when detected utilizes multiple speech-to-text transcribers, Paragraphs 0079, 0082, and 0091-0092). With respect to Claim 10, Ganong further discloses: The method of claim 1, further comprising: receiving, from the user device, (1) the second user voice input (user speech input, Paragraphs 0023 and 0076) and (2) one or more candidate locations in the second user voice input for segmentation (user interface for providing rules, examples, etc. for identifying sensitive content to split speech, Paragraph 0079); and generating the plurality of second user voice input segments by segmenting the second user voice input based on the one or more candidate locations in the second user voice input for segmentation received from the user device (such information is used in the segmentation process for transcription, Paragraphs 0079-0080). With respect to Claim 14, Ganong further discloses: The method of claim 1, further comprising: receiving, from the user device, an indication of a set of speech-to-text converters (indication provided as the term of term/segment in the utterance qualifying as PII and/or PHI, Paragraphs 0077-0078 and 0082); and transmitting each respective second user voice input segment of the plurality of second user voice input segments to a different speech-to-text converter of the set of speech-to-text converters (the preceding indications are used to select and set speech segments to different transcription processes, Paragraphs 0091-0092; Fig. 11, Elements 1110, 1120, and 1128). Claim 15 is directed towards an embodiment variation of a system carrying out the functionality of method claim 1, and thus, is rejected under similar rationale. Moreover, Di Fabbrizio teaches control circuitry in the form of a processor (Fig. 1, Element 124) as well as input/output circuitry in the form of a communications module (Fig. 1, Element 122 and Paragraphs 0031 and 0034). Claim 16 contains subject matter similar to Claim 2, and thus, is rejected under similar rationale. Claims 3 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Di Fabbrizio, et al. in view of Ganong, III et al. and further in view of Jones, et al. (U.S. PG Publication: 2022/0238120 A1). With respect to Claim 3, Di Fabbrizio in view of Ganong teaches the method for a spoken dialog system utilizing multiple tenant services and corresponding submodules along with piecewise speech transcription as applied to Claim 1. Di Fabbrizio in view of Ganong do not teach that the first user voice input comprises a wake phrase and a target service identifier, and wherein the target service is identified based on the target service identifier in the first user voice input. Jones, however, discloses that a voice input can comprise a wake word in the form of a phrase where the voice input also contains an identification of "a particular voice service to process the request" to send the voice input to that particular voice service associated with that type of command (Paragraphs 0028 and 0142). Di Fabbrizio, Ganong, and Jones are analogous art because they are from a similar field of endeavor in network-based speech recognition services. Thus, it would have been obvious to one of ordinary skill before the effective filing date to utilize the wakeword processing taught by Jones in the spoken dialog system of Di Fabbrizio in view of Ganong to provide a predictable result of better ensuring that spoken audio is intended for a device and/or allowing a device to operate in a low power mode (by only having to listen for a wakeword initially). Claim 17 contains subject matter similar to Claim 3, and thus, is rejected under similar rationale. Claims 4, 6, 12-13, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Di Fabbrizio, et al. in view of Ganong, III et al. and further in view of Yae (U.S. PG Publication: 2025/0166599 A1). With respect to Claim 4, Di Fabbrizio in view of Ganong teaches the method for a spoken dialog system utilizing multiple tenant services and corresponding submodules along with piecewise speech transcription as applied to Claim 1. While Di Fabbrizio further teaches determining a first text response for text-to-speech synthesis (determining “textual responses” for TTS into verbalized responses, Paragraph 0040), Di Fabbrizio in view of Ganong fail to teach the segmentation of the text to be provided to multiple text-to-speech synthesizers as set forth in claim 4. Yae, however, discloses: segmenting the first text response into a plurality of first text response segments (sequentially segmenting a text input, Paragraphs 0009, 0037 and 0040); transmitting each respective first text response of the plurality of first text response segments to a different text-to-speech converter, wherein the different text-to-speech converters generate a plurality of first voice response prompts (sending text segments to different TTS engines to generate a spoken output, Paragraphs 0045-0046 and 0048; Fig. 1, Elements 2 and 13); and combining the plurality of first voice response prompts to generate the first voice response (merging of the sound segments to generate an output sound, Paragraph 0050). Di Fabbrizio, Ganong, and Yae are analogous art because they are from a similar field of endeavor in interactive speech processing services. Thus, it would have been obvious to one of ordinary skill before the effective filing date to utilize the multiple text-to-speech synthesizer taught by Yae for the generation of voice responses in the dialog system of Di Fabbrizio in view of Ganong to provide a predictable result of implementing text-to-speech synthesis for different types of text inputs for an improved virtual assistant (Yae, Paragraphs 0005-0007). With respect to Claim 6, Yae further discloses: The method of claim 4, wherein transmitting each of the plurality of first text response segments to the different text-to-speech converters, to generate the plurality of first voice response prompts further comprises: transmitting output parameters along with the plurality of first text response segments, wherein the output parameters enable continuity between the plurality of first voice response prompts (sent information to be used in the merging process once speech is generated to produce a speech sequence including chronological ordering, Paragraph 0050). Claim 12 contains subject matter similar to Claim 4, and thus, is rejected under similar rationale with reference to the fact that multiple system turns are part of an ongoing dialog with a user in Di Fabbrizio as noted in the claim 1 rejection. With respect to Claim 13, Ganong and Yae further disclose: The method of claim 12, wherein a number of different text-to-speech converters used in generating the second voice response differs from a number of different text-to-speech converters used in generating the first voice response (the number of synthesizers is selected based upon the number of different content types, Paragraph 0045; the concept of selecting a number of recognizers in the opposite operation (i.e., speech-to-text) is also taught by Ganong at Paragraphs 0091-0092 where a number of services used is based upon the number of different PII and/or PHI instances; thus, combining concepts results (types of information in Yae leading to different synthesizers and types of information related to PII/PHI in Ganong) results in the invention set forth in claim 13 when taken in combination with Di Fabbrizio). Claim 18 contains subject matter similar to Claim 4, and thus, is rejected under similar rationale. Claim 20 contains subject matter similar to Claim 6, and thus, is rejected under similar rationale. Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Di Fabbrizio, et al. in view of Ganong, III et al. in view of Yae and further in view of Buford, et al. (U.S. PG Publication: 2019/0104124 A1). With respect to Claim 5, Di Fabbrizio in view of Ganong and further in view of Yae teaches the method for a spoken dialog system utilizing multiple tenant services and corresponding submodules along with piecewise speech synthesis for non-overlapping text segments as applied to Claim 4. The combination of Di Fabbrizio in view of Ganong and further in view of Yae fails to teach that the text for synthesis is segmented based on portions of “sensitive information.” Buford, however, discloses: identifying sensitive information in the first text response, the sensitive information comprising a first portion and a second portion that do not overlap ("identify restricted data, such as PII content" for marking, Paragraph 0061 and 0070); and segmenting the first text response such that the first portion of the sensitive information is included in a first segment of the plurality of first text response segments, and the second portion of the sensitive information is included in a second segment of the plurality of first text response segments (streams are "segmented" by breaking apart PII to make such information less recognizable, Paragraphs 0033, 0053, and 0075). Di Fabbrizio, Ganong, Yae, and Buford are analogous art because they are from a similar field of endeavor in interactive speech processing services. Thus, it would have been obvious to one of ordinary skill before the effective filing date to utilize the PII-based segmentation taught by Buford in the multiple text-to-speech synthesizer arrangement taught by Di Fabbrizio in view of Ganong and further in view of Yae to provide a predictable result of limiting how much PII is in an information stream to prevent recognition (Buford, Paragraph 0075). Claim 19 contains subject matter similar to Claim 5, and thus, is rejected under similar rationale. Claims 7 are rejected under 35 U.S.C. 103 as being unpatentable over Di Fabbrizio, et al. in view of Ganong, III et al. in view of Yae and further in view of Harb, et al. (U.S. Patent: 8,571,863). With respect Claim 7, Di Fabbrizio in view of Ganong and further in view of Yae teaches the method for a spoken dialog system utilizing multiple tenant services and corresponding submodules along with piecewise speech synthesis for non-overlapping text segments as applied to Claim 4. Although Di Fabbrizio also discloses that that a "tenant profile" makes associations with particular TTS synthesis engines (Paragraph 0077), Di Fabbrizio in view of Ganong and further in view of Yae does not identify whether a text-to-speech converter is comprised at a target service and in response transmitting the text to the different converter for synthesis. Harb, however, discloses checking whether a specific text-to-speech capable component comprising a device is connected and if not uploading text to another converter for synthesis (Col. 4, Line 58- Col. 5, Line 4). Di Fabbrizio, Ganong, Yae, and Harb are analogous art because they are from a similar field of endeavor in interactive speech processing services. Thus, it would have been obvious to one of ordinary skill before the effective filing date to utilize the synthesizer checking procedure taught by Harb in the dialog system taught by Di Fabbrizio in view of Ganong and further in view of Yae to provide a predictable result of better preventing service errors or latency when a dedicated synthesizer is not connected. Claims 11 are rejected under 35 U.S.C. 103 as being unpatentable over Di Fabbrizio, et al. in view of Ganong, III et al. and further in view of Segalis, et al. (U.S. PG Publication: 2017/0359464 A1). With respect to Claim 11, Di Fabbrizio in view of Ganong teaches the method for a spoken dialog system utilizing multiple tenant services and corresponding submodules along with piecewise speech transcription based upon provided segmentation information as applied to Claim 10. Di Fabbrizio in view of Ganong do not teach that such segmentation is provided via voice activity or pause detection as recited in claim 11. Segalis, however, teaches speech endpointing that provides later utilized metadata in the form of endpoints of speech segments for speech-to-text transcription ("converts the audio data parsed by the speech endpoint detector") based upon voice activity/pause (i.e., silence) detection (Paragraph 0067-0071). Di Fabbrizio, Ganong, and Segalis are analogous art because they are from a similar field of endeavor in interactive speech processing services. Thus, it would have been obvious to one of ordinary skill before the effective filing date to utilize the speech endpointing for speech-to-text conversion taught by Segalis as an initial segmentation in the segmentation taught by Di Fabbrizio in view of Ganong to provide a predictable result in the form of separating distinct statements in an utterance sequence to further identify confidential and non-confidential portions. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Reyes, et al. (U.S. PG Publication: 2025/0106321 A1)- teaches an orchestration module between a user terminal and various services to carry out different dialog operations (see swim lane diagrams in Figs. 4A-4B; Paragraphs 0075-0078). Jeong, et al. (U.S. Patent: 11,289,083)- teaches segmenting text sentence elements into different pieces for synthesis in parallel prior to merging (Fig. 5, Elements 510, 530, 540, and 550). Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. JAMES S. WOZNIAK Primary Examiner Art Unit 2655 /JAMES S WOZNIAK/ Primary Examiner, Art Unit 2655
Read full office action

Prosecution Timeline

Mar 08, 2024
Application Filed
Oct 22, 2025
Non-Final Rejection — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597422
SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION
2y 5m to grant Granted Apr 07, 2026
Patent 12586569
Knowledge Distillation with Domain Mismatch For Speech Recognition
2y 5m to grant Granted Mar 24, 2026
Patent 12511476
CONCEPT-CONDITIONED AND PRETRAINED LANGUAGE MODELS BASED ON TIME SERIES TO FREE-FORM TEXT DESCRIPTION GENERATION
2y 5m to grant Granted Dec 30, 2025
Patent 12512100
AUTOMATED SEGMENTATION AND TRANSCRIPTION OF UNLABELED AUDIO SPEECH CORPUS
2y 5m to grant Granted Dec 30, 2025
Patent 12475882
METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION (ASR) USING MULTI-TASK LEARNED (MTL) EMBEDDINGS
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
59%
Grant Probability
99%
With Interview (+40.1%)
3y 7m
Median Time to Grant
Low
PTA Risk
Based on 385 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month