Last updated: April 19, 2026
Application No. 18/793,589
AUTOMATED GENERATION OF TRANSCRIPTS THROUGH INDEPENDENT TRANSCRIPTION

Non-Final OA §101§102§DP
Filed
Aug 02, 2024
Examiner
SHIN, SEONG-AH A
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Descript Inc.
OA Round
1 (Non-Final)
Interview Optional

— +20.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 409 resolved cases, 2023–2026
Examiner Intelligence

SHIN, SEONG-AH A View full profile →
Grants 78% — above average
Career Allow Rate
321 granted / 409 resolved
+16.5% vs TC avg
Strong +20% interview lift
Without
With
+20.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
25 currently pending
Career history
434
Total Applications
across all art units
Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
45.2%
+5.2% vs TC avg
§102
16.7%
-23.3% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 409 resolved cases
Office Action

§101 §102 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 are pending in this application.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
Claims 1-9 and 11-20  are rejected on the ground of nonstatutory double patenting over claims 1-16 of U.S. Patent No. 12,062,373. Although the claims at issue are not identical, they are not patentably distinct from each other because adding inherent and/or unnecessary limitations/step and rearranging the claims would be within the level of one of ordinary skill in the art. It is well settled that the insertion of an element, e.g. “forwarding a separate copy of an audio file to each of multiple transcription services”, and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Insertion of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.
Instant Application No. 18/793,589
U.S. Patent No. 12,062,373
1. A method for generating a single transcript from multiple transcripts that are independently generated by multiple transcription services for an audio file, the method comprising:
acquiring the multiple transcripts by obtaining a separate transcript from each of the multiple transcription services;
producing, based on an analysis of the multiple transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure,
wherein each tuple includes
(i) multiple fields in which interpretations of a corresponding word across the multiple transcripts are populated in a predetermined order, and
(ii) a field in which it is indicated whether the interpretations of the corresponding word are identical; and
deriving the single transcript based on an analysis of the data structure.
2. The method of claim 1, further comprising:
identifying at least one tuple for which the field indicates that the interpretations of the corresponding word are not identical; and
for each identified tuple,
establishing a given one of the interpretations as a correct interpretation for the corresponding word.
3. The method of claim 1, wherein the interpretations are ordered in terms of likelihood of being correct as determined based on historical accuracy of the multiple transcription services.
4. The method of claim 1, wherein the interpretations are ordered such that identical interpretations are arranged in adjacent fields.
5. The method of claim 1, wherein each of the multiple transcription services is associated with a different one of the multiple fields, such that the interpretations by each transcription service are populated into a same field across the series of tuples.
6. The method of claim 1, further comprising:
posting, to an interface, the single transcript in such a manner that words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical are visually distinguishable from words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are identical.
7. The method of claim 6, wherein through the interface, a user is able to replace the words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical by selecting an alternative interpretation.
8. The method of claim 7, wherein for each of the words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical, one or more alternative interpretations are positioned within the single transcript.
9. The method of claim 8, further comprising:
receiving input that is indicative of a selection, made through the interface, of the alternative interpretation from among the one or more alternative interpretations presented for a given word within the single transcript; and
in response to said receiving,
replacing the given word with the alternative interpretation within the single transcript; and
removing the one or more alternative interpretations presented for the given word.
11. The method of claim 1, wherein each tuple further comprises information regarding relation, if any, of the corresponding word to one or more other words uttered in the audio file.
1. A non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations comprising:
forwarding a separate copy of an audio file to each of multiple transcription services via a corresponding application programming interface;
acquiring multiple transcripts by obtaining a separate transcript from each of the multiple transcription services via the corresponding application programming interface;
producing, based on an analysis of the multiple transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure,
wherein each tuple comprises
(i) multiple fields in which interpretations of the corresponding word across the multiple transcripts are populated in a predetermined order, and
(ii) a field in which it is indicated whether the interpretations of the corresponding word are identical; and
causing display of a master transcript that is derived from the data structure,
wherein the master transcript is displayed in such a manner that words for which the corresponding tuples indicate that the interpretations are not identical are visually distinguishable from words for which the corresponding tuples indicate that the interpretations are identical, and
wherein the master transcript allows replacement of the words for which the corresponding tuples indicate that the interpretations are not identical with a suggested replacement embedded within the master transcript.
2. The non-transitory computer-readable medium of claim 1, wherein each tuple further comprises information regarding part of speech of the corresponding word.
3. The non-transitory computer-readable medium of claim 1, wherein each tuple further comprises information regarding relation of the corresponding word to one or more other words uttered in the audio file.
4. The non-transitory computer-readable medium of claim 1, wherein the suggested replacement includes all interpretations of the corresponding word by the multiple transcription services.
5. The non-transitory computer-readable medium of claim 1, wherein each tuple is representative of a sequence of the interpretations of the corresponding word in a predetermined order.
6. The non-transitory computer-readable medium of claim 5, wherein the interpretations are ordered in terms of likelihood of being correct as determined based on historical accuracy of the corresponding transcription service.


Claims 1, 2, 4, 6, 7, 12, 15 and 20 are rejected on the ground of nonstatutory double patenting over claims 1-3, of U.S. Patent No. 12,062,373. Although the claims at issue are not identical, they are not patentably distinct from each other because adding inherent and/or unnecessary limitations/step and rearranging the claims would be within the level of one of ordinary skill in the art. It is well settled that the insertion of an element, e.g. “receiving input indicative of a selection of an audio file”, and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Insertion of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.
Instant Application No. 18/793,589
U.S. Patent No. 12,062,373
1. A method for generating a single transcript from multiple transcripts that are independently generated by multiple transcription services for an audio file, the method comprising:
acquiring the multiple transcripts by obtaining a separate transcript from each of the multiple transcription services;
producing, based on an analysis of the multiple transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure,
wherein each tuple includes
(i) multiple fields in which interpretations of a corresponding word across the multiple transcripts are populated in a predetermined order, and
(ii) a field in which it is indicated whether the interpretations of the corresponding word are identical; and
deriving the single transcript based on an analysis of the data structure.
2. The method of claim 1, further comprising:
identifying at least one tuple for which the field indicates that the interpretations of the corresponding word are not identical; and
for each identified tuple,
establishing a given one of the interpretations as a correct interpretation for the corresponding word.
4. The method of claim 1, wherein the interpretations are ordered such that identical interpretations are arranged in adjacent fields.
6. The method of claim 1, further comprising:
posting, to an interface, the single transcript in such a manner that words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical are visually distinguishable from words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are identical.
7. The method of claim 6, wherein through the interface, a user is able to replace the words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical by selecting an alternative interpretation.
12. A non-transitory medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations comprising:
obtaining a data structure that includes a series of entries arranged in temporal order,
wherein each entry includes (i) a plurality of fields that include a plurality of interpretations of a corresponding word across a plurality of transcripts prepared for an audio file,
(ii) a first field in which it is indicated whether the plurality of interpretations of the corresponding word are identical, and
(iii) a second field that includes information regarding relation, if any, of the corresponding word to another word in the audio file; and
posting, to an interface, a single transcript in such a manner that words for which the corresponding entry indicates that the plurality of interpretations across the plurality of transcripts are not identical are visually distinguishable from words for which the corresponding entry indicates that the plurality of interpretations across the plurality of transcripts are identical.
15. The non-transitory medium of claim 12, wherein the operations further comprise:
identifying a discrepancy by examining the data structure to identify a conflicting translation between a first transcript of the plurality of transcripts and a second transcript of the plurality of transcripts,
wherein the conflicting translation corresponds to a portion of the audio file for which the first transcript has a first interpretation and the second transcript has a second interpretation; and
applying, to the first and second interpretations, a computer-implemented model that addresses the discrepancy by identifying an appropriate translation for the conflicting translation from among the first and second interpretations.
20. The method of claim 18, further comprising:
receiving input that is indicative of a request to generate the single transcript for the audio file; and
forwarding the first and second copies of the audio file to the first and second transcription services via respective application programming interfaces.
1. A method performed by a processor included in a computing device, the method comprising:
receiving input indicative of a selection of an audio file;
retrieving the audio file from a storage medium;
forwarding a first copy of the audio file to a first transcription service via a first application programming interface, and
a second copy of the audio file to a second transcription service via a second application programming interface;
receiving
a first transcript from the first transcription service via the first application programming interface, and
a second transcript from the second transcription service via the second application programming interface;
producing, based on an analysis of the first and second transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure,
wherein each tuple includes a field in which it is indicated whether interpretations of a corresponding word across the first and second transcripts are identical;
identifying a discrepancy by examining the data structure to identify a conflicting translation between the first and second transcripts,
wherein the conflicting translation corresponds to a portion of the audio file for which the first transcription service had a first interpretation and the second transcription service had a second interpretation;
applying, to the first and second interpretations, a computer-implemented model that addresses the discrepancy by identifying an appropriate translation for the conflicting translation from among the first and second interpretations,
wherein upon being applied to the first and second interpretations, the computer-implemented model analyzes grammar or sentence structure, of the first interpretation and surrounding words and of the second interpretation and the surrounding words, to identify the appropriate translation;
generating a master transcript, with the appropriate translation identified by the computer-implemented model, based on the data structure; and
causing display of the master transcript in such a manner that (i) the appropriate translation is visually distinguishable from a remainder of the master transcript and (ii) an alternative translation is positioned adjacent to the appropriate translation in line with the master transcript,
wherein the alternative translation is whichever of the first and second interpretations is not identified as the appropriate translation.
2. The method of claim 1, further comprising:
receiving second input indicative of the alternative translation; and
replacing, in response to receiving the second input, the appropriate translation with the alternative translation in the master transcript.
3. The method of claim 1, wherein each tuple further includes a pair of fields in which the interpretations of the corresponding word across the first and second transcripts are populated in a predetermined order.
4. The method of claim 1, wherein the storage medium is accessible to the processor via a network.




Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 and are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 2A, Prong One: The independent claim 1 recites “acquiring the multiple transcripts by obtaining a separate transcript from each of the multiple transcription services; producing, based on an analysis of the multiple transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure, wherein each tuple includes (i) multiple fields in which interpretations of a corresponding word across the multiple transcripts are populated in a predetermined order, and (ii) a field in which it is indicated whether the interpretations of the corresponding word are identical; and deriving the single transcript based on an analysis of the data structure”.

Claims 1, 12 and 28 recite obtaining audio, obtaining and comparing multiple versions of transcription and determining the final version. 
[Abstract idea indicators]
Transcribing speech into text is the conversion of verbal content to written form—a task humans routinely perform mentally or with conventional tools.
Mapping/comparing them and determining the final version are decision-making and planning steps that are mental processes.
Accordingly, the claims are directed to the judicial exception of a mental process.
Step 2A, Prong Two: This judicial exception is not integrated into a practical application. The computer is recited at a high-level of generality (i.e., as performing a generic computer function and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer. Accordingly, there additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
Step 2B — Claims Do Not Recite an Inventive Concept That Transforms the Mental Process into Patent-Eligible Subject Matter
The claims add generic, well-understood computer components (medium and processor) and broadly recite use of “speech recognition algorithm” without describing any specific, unconventional structure, algorithmic detail, data structure, or system architecture that provides a concrete technical improvement in computer functionality.
Applying Alice step two and relevant Federal Circuit precedent:
The recitation of conventional computer components (memory and processor) performing routine functions does not supply an inventive concept.
The claims recite high-level, result-oriented steps (e.g., “acquire,” “produce,” “derive”, “obtain”, “post”) that describe mental processes rather than specific technical means for performing those processes.
Because the claims lack limitations that tie the mental-process steps to a particular way of achieving a technological improvement (for example, a novel model architecture, specialized data representation, unique training regimen that yields demonstrable technical performance gains, a specialized streaming/decoding pipeline that reduces latency by a quantifiable amount, or hardware/software co-design), the additional elements do not transform the mental processes into significantly more. 
With respect to claims 12 and 18, the claim is similar to claim 1 and claims 12 and 18 recite additional element of “processor”, “memory” and “non-transitory medium”. The processor and memory are recited at a high-level of generality (i.e., as a generic processor performing generic computer functions and being used as an applying) such that it amounts no more than mere instructions to apply the exception using a generic computer component as well. These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
Therefore, claims 1, 12 and 18 fail to recite an inventive concept sufficient to transform the judicial exception into patent-eligible subject matter.
With respect to dependent claims 2-11, 13-17 and 19-20, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
Conclusion — Rejection
Claims 1-20 are rejected under 35 U.S.C. § 101 as being directed to a judicial exception (mental processes) and failing to recite additional elements that amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-20 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by Thomson et al., (US Pub. 2020/0175961).
Regarding claim 1, Thomson discloses a method for generating a single transcript from multiple transcripts that are independently generated by multiple transcription services for an audio file, the method comprising: 
acquiring the multiple transcripts by obtaining a separate transcript from each of the multiple transcription services (Figs. 4, 6, 12-14 and [0166][0341] receiving audio and generating transcript separately from two or more ASR systems); 
producing, based on an analysis of the multiple transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure, wherein each tuple includes (i) multiple fields in which interpretations of a corresponding word across the multiple transcripts are populated in a predetermined order, and (ii) a field in which it is indicated whether the interpretations of the corresponding word are identical (Figs. 6, 12-14 and [0348][0352]-[0366] aligning tokens in each hypothesis in order with timestamps and inserting them into a row of a spreadsheet or database, with matching words from each hypothesis arranged in the same column so that as many identical tokens as possible lie in each token group); and 
deriving the single transcript based on an analysis of the data structure (Figs. 13 and 14, [0343][0399][0519] outputting transcription after comparing the aligned transcriptions).
Regarding claim 2, Thomson discloses the method of claim 1, and Thomson further discloses: 
identifying at least one tuple for which the field indicates that the interpretations of the corresponding word are not identical; and for each identified tuple, establishing a given one of the interpretations as a correct interpretation for the corresponding word ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
Regarding claim 3, Thomson discloses the method of claim 1, and Thomson further discloses: 
wherein the interpretations are ordered in terms of likelihood of being correct as determined based on historical accuracy of the multiple transcription services ([0171][0229] ASR system uses historical accuracy).
Regarding claim 4, Thomson discloses the method of claim 1, and Thomson further discloses: 
wherein the interpretations are ordered such that identical interpretations are arranged in adjacent fields ([0365] tokens which are interpreted identically are aligned in adjacent column).
Regarding claim 5, Thomson discloses the method of claim 1, and Thomson further discloses: 
wherein each of the multiple transcription services is associated with a different one of the multiple fields, such that the interpretations by each transcription service are populated into a same field across the series of tuples ([0357]-[0360][0365] each of the multiple transcription services generates a different one of the fields).
Regarding claim 6, Thomson discloses the method of claim 1, and Thomson further discloses: 
posting, to an interface, the single transcript] in such a manner that words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical are visually distinguishable from words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are identical ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][0475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”; Fig. 18, [0507]-[0514][0679]-[0688] presenting the transcriptions to a captioning agent to correct the errors).
Regarding claim 7, Thomson discloses the method of claim 6, and Thomson further discloses: 
wherein through the interface, a user is able to replace the words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical by selecting an alternative interpretation ([0679]-[0688] a captioning agent and editing/changing the transcription to correct the errors).
Regarding claim 8, Thomson discloses the method of claim 7, and Thomson further discloses: 
wherein for each of the words for which the corresponding tuple indicates that the interpretations across the multiple transcripts are not identical, one or more alternative interpretations are positioned within the single transcript ([0347][0348] providing multiple options such as {Cathy, Kathy, Kathie} and a CA may edit).
Regarding claim 9, Thomson discloses the method of claim 8, and Thomson further discloses: 
receiving input that is indicative of a selection, made through the interface, of the alternative interpretation from among the one or more alternative interpretations presented for a given word within the single transcript; and in response to said receiving, replacing the given word with the alternative interpretation within the single transcript; and removing the one or more alternative interpretations presented for the given word (Figs. 18, 45, 55 and , [0507]-[0514][0679]-[0688][1039]-[1047] estimating one or more transcription segments based on CA feedback or input to a CA activity monitor and editing the transcriptions to a captioning agent to correct the errors).
Regarding claim 10, Thomson discloses the method of claim 1, and Thomson further discloses: 
wherein each tuple further comprises information regarding part of speech of the corresponding word ([0257][0258] considering additional information using an n-gram language model and performing syntax checks).
Regarding claim 11, Thomson discloses the method of claim 1, and Thomson further discloses: 
wherein each tuple further comprises information regarding relation, if any, of the corresponding word to one or more other words uttered in the audio file ([0257][0258] analyzing the multiple hypotheses and considering additional information using an n-gram language model).
Regarding claim 12, Thomson discloses a non-transitory medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to perform operations comprising: 
obtaining a data structure that includes a series of entries arranged in temporal order, wherein each entry includes (i) a plurality of fields that include a plurality of interpretations of a corresponding word across a plurality of transcripts prepared for an audio file, (ii) a first field in which it is indicated whether the plurality of interpretations of the corresponding word are identical, and (iii) a second field that includes information regarding relation, if any, of the corresponding word to another word in the audio file (Figs. 4, 6, 12-14 and [0348][0352]-[0366] aligning tokens in each hypothesis in order with timestamps and inserting them into a row of a spreadsheet or database, with matching words from each hypothesis arranged in the same column so that as many identical tokens as possible lie in each token group); and 
posting, to an interface, a single transcript in such a manner that words for which the corresponding entry indicates that the plurality of interpretations across the plurality of transcripts are not identical are visually distinguishable from words for which the corresponding entry indicates that the plurality of interpretations across the plurality of transcripts are identical ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][0475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”; Fig. 18, [0507]-[0514][0679]-[0688] presenting the transcriptions to a captioning agent to correct the errors; Figs. 13 and 14, [0343][0399][0519] outputting transcription after comparing the aligned transcriptions).
Regarding claim 13, Thomson discloses the non-transitory medium of claim 12, and Thomson further discloses: 
wherein each transcript of the plurality of transcripts is provided by a different one of a plurality of transcription services (Figs. 4, 6, 12-14 and [0166][0341] receiving audio and generating transcript separately from two or more ASR systems; [0357]-[0360][0365] each of the multiple transcription services generates a different transcription).
Regarding claim 14, Thomson discloses the non-transitory medium of claim 12, and Thomson further discloses: 
wherein the plurality of transcripts includes at least one transcript that is obtained from a transcription service and at least one transcript that is generated by applying a speech recognition algorithm to the audio file (Figs. 4, 6, 12-14 and [0166][0341] receiving audio and generating transcript separately from two or more ASR systems).
Regarding claim 15, Thomson discloses the non-transitory medium of claim 12, and Thomson further discloses: 
wherein the operations further comprise: identifying a discrepancy by examining the data structure to identify a conflicting translation between a first transcript of the plurality of transcripts and a second transcript of the plurality of transcripts, wherein the conflicting translation corresponds to a portion of the audio file for which the first transcript has a first interpretation and the second transcript has a second interpretation; and applying, to the first and second interpretations, a computer-implemented model that addresses the discrepancy by identifying an appropriate translation for the conflicting translation from among the first and second interpretations ([0401]-[0407] an example of Error map which is displayed with word misalignment; [0474][475] “the detector 1720 … identifying key words, and/or phrases … provide an indication of the identified key words and/or phrases in the transcription that may be adjusted and the type of adjustment”).
Regarding claim 16, Thomson discloses the non-transitory medium of claim 15, and Thomson further discloses: 
wherein upon being applied to the first and second interpretations, the computer-implemented model analyzes grammar or sentence structure, of the first interpretation and surrounding words and of the second interpretation and the surrounding words, to identify the appropriate translation ([0257][0258] considering additional information using an n-gram language model and performing syntax checks).
Regarding claim 17, Thomson discloses the non-transitory medium of claim 15, and Thomson further discloses: 
wherein the operations further comprise: adjusting the single transcript posted to the interface by replacing the conflicting translation with the appropriate translation (Figs. 18, 45, 55 and , [0507]-[0514][0679]-[0688][1039]-[1047] estimating one or more transcription segments based on CA feedback or input to a CA activity monitor and editing the transcriptions to a captioning agent to correct the errors).
Regarding claim 18, Thomson discloses a method performed by a computer program executing on a computing device, the method comprising: 
receiving, from a first transcription service, a first transcript of words determined to be uttered in a first copy of an audio file; receiving, from a second transcription service, a second transcript of words determined to be uttered in a second copy of the audio file (Figs. 4, 6, 12-14 and [0166][0341] receiving audio and generating transcript separately from two or more ASR systems); 
producing a tuple for each word uttered in the audio file, so as to create a series of tuples, wherein each tuple comprises multiple fields in which interpretations of a corresponding word across the first and second transcripts are populated (Figs. 6, 12-14 and [0348][0352]-[0366] aligning tokens in each hypothesis in order with timestamps and inserting them into a row of a spreadsheet or database, with matching words from each hypothesis arranged in the same column so that as many identical tokens as possible lie in each token group); and
deriving a single transcript for the audio file based on the series of tuples (Figs. 13 and 14, [0343][0399][0519] outputting transcription after comparing the aligned transcriptions).
Regarding claim 19, Thomson discloses the method of claim 18, and Thomson further discloses:
causing display of the single transcript on an interface, wherein for each word for which the interpretations across the first and second transcripts are not identical, at least one suggested replacement is positioned adjacent that word on the interface, and that word is replaceable within the single transcript via selection of a suggested replacement (Figs. 18, 45, 55 and , [0507]-[0514][0679]-[0688][1039]-[1047] estimating one or more transcription segments based on CA feedback or input to a CA activity monitor and editing the transcriptions to a captioning agent to correct the errors).
Regarding claim 20, Thomson discloses the method of claim 18, and Thomson further discloses:
receiving input that is indicative of a request to generate the single transcript for the audio file; and forwarding the first and second copies of the audio file to the first and second transcription services via respective application programming interfaces (Figs. 4, 6, 12-14 and [0166][0341] receiving audio and generating transcript separately from two or more ASR systems).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/           Primary Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Aug 02, 2024
Application Filed
Mar 20, 2026
Non-Final Rejection — §101, §102, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/015,706
Patent 12598095
DISPLAY DEVICE
2y 5m to grant Granted Apr 07, 2026
18/516,290
Patent 12591452
INVOKING AN AUTOMATED ASSISTANT TO PERFORM MULTIPLE TASKS THROUGH AN INDIVIDUAL COMMAND
2y 5m to grant Granted Mar 31, 2026
17/902,485
Patent 12585696
REDUCING METADATA TRANSMITTED WITH AUTOMATED ASSISTANT REQUESTS
2y 5m to grant Granted Mar 24, 2026
18/086,249
Patent 12555568
DEVICE CONTROL METHOD AND APPARATUS, READABLE STORAGE MEDIUM AND CHIP
2y 5m to grant Granted Feb 17, 2026
18/088,585
Patent 12554935
COMPUTER IMPLEMENTED METHOD FOR THE AUTOMATED ANALYSIS OR USE OF DATA
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+20.5%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 409 resolved cases by this examiner. Grant probability derived from career allow rate.