Prosecution Insights
Last updated: May 29, 2026
Application No. 17/904,975

METHOD FOR OUTPUTTING VOICE TRANSCRIPT, VOICE TRANSCRIPT GENERATING SYSTEM, AND COMPUTER-PROGRAM PRODUCT

Non-Final OA §102§103
Filed
May 04, 2023
Priority
Oct 28, 2021 — nonprovisional of PCTCN2021127147
Examiner
LELAND III, EDWIN S
Art Unit
2654
Tech Center
2600 — Communications
Assignee
BOE TECHNOLOGY GROUP CO., LTD.
OA Round
3 (Non-Final)
75%
Grant Probability
Favorable
3-4
OA Rounds
0m
Est. Remaining
74%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allowance Rate
342 granted / 456 resolved
+13.0% vs TC avg
Minimal -1% lift
Without
With
+-0.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
14 currently pending
Career history
471
Total Applications
across all art units

Statute-Specific Performance

§101
11.1%
-28.9% vs TC avg
§103
67.9%
+27.9% vs TC avg
§102
10.9%
-29.1% vs TC avg
§112
5.6%
-34.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 456 resolved cases

Office Action

§102 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 4/9/2026 has been entered. Information Disclosure Statement The information disclosure statement (IDS) submitted on 2/19/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Status of Claims Claims 1-2 and 5-22 are pending in this application. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-2, 5-9, 11, and 13-22 are rejected under 35 U.S.C. 103 as being unpatentable over Degraye et al. (U.S. Patent Application Publication 2020/0211561) in view of Kukde et al. (U.S. Patent Application Publication 2023/00054495). As per claims 1, 21 and 22, Degraye et al. discloses: A voice transcript generating system (Figure 10 and paragraph [0141]), comprising: one or more processors (Figure 10, item 1004 and paragraph [0141]) configured to: transmit the candidate audio stream from a terminal device to a server (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients); extract candidate voiceprint feature information from a candidate audio stream (Figure 7, item 720 and Paragraphs [0058-0059], [0061] & [0103] – voiceprint information is extracted from the audio stream and matched to mature pre-existing voiceprints); perform voice recognition on the candidate audio stream to generate a candidate voice transcript (Figure 7, item 724 and paragraphs [0105-0107] - a meeting transcription is generated); compare the candidate voiceprint feature information with target voiceprint feature information of at least one target subject (Figure 7, item 720 and Paragraphs [0058-0059], [0061] & [0103] – voiceprint information is extracted from the audio stream and matched to mature pre-existing voiceprints); upon determination that the candidate voiceprint feature information matches with target voiceprint feature information of a target subject, store the candidate voice transcript and a target identifier for the target subject, the target identifier corresponding to the target voiceprint feature information of the target subject (Figure 7, items 724 & 726 and paragraphs [0105-0110] - a meeting transcription is generated with linked identifiers (i.e. names)). reiterating steps of transmitting, extracting, performing voice recognition, comparing, and storing for at least one additional candidate audio stream (Paragraphs [0063-0065] – the process is repeated for each audio segment); and integrating a plurality of candidate audio streams associated with a same target identifier into an integrated audio stream associated with the same target identifier (Paragraphs [0064-65] – the segments belonging to each different voiceprint are integrated prior to transcription); wherein performing voice recognition on the candidate audio stream comprises performing voice recognition on the integrated audio stream associated with the same target identifier to generate a meeting record or a meeting summary for a same target subject (Paragraphs [0064-65] – the segments belonging to each different voiceprint are integrated prior to transcription, then transcribed using that voiceprint to improve accuracy). Dgrave et al. fails to explicitly disclose but Kukde et al. in the same field of endeavor teaches: the candidate audio stream transmitted from the terminal device to the server is a fragment of an original candidate audio stream; andthe original candidate audio stream comprises the candidate audio stream and at least one interval audio stream that is not transmitted to the server (Figure 6, item 604 and paragraphs [0042-0043], [0046], [0060-0062], [0077] – non-speech audio segments are removed prior to speaker identification which may be done by the client device before sending the audio to the server and does not effect the voice recognition process). It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method of Degraye et al. with the non-speech deletion of Kukde et al. because it is a case of combining prior art elements according to known methods to yield predictable results. Pause and non-speech audio deletion is a well known technique in the art that simplifies transcription production and speaker identification. Claim 1 is directed to the method of using the system of claim 21, so is rejected for similar reasons. Claim 22 is directed to a computer readable medium containing instructions to cause a process to act as the system of claim 21, so is rejected for similar reasons. As per claim 2, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses: extracting the target voiceprint feature information of the target subject from a voice sample (Figure 7, items 706-710 and paragraphs [0094-0098] – reference voiceprints are created); and storing the target voiceprint feature information of the target subject, the target identifier for the target subject, and correspondence between the target voiceprint feature information of the target subject and the target identifier (Figure 7, items 720 & 722 and Paragraphs [0103-0104] – all of these items are stored in memory in order to identify the speaker). As per claim 5, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses: reiterating steps of extracting, performing voice recognition, comparing, and storing for at least one additional candidate audio stream: and integrating a plurality of candidate voice transcripts associated with a same target identifier into a meeting record for a same target subject (Paragraphs [0064-65] & [0067-0068] – the segments belonging to each different voiceprint are integrated prior to transcription, then transcribed using that voiceprint to improve accuracy. All of the transcribed segments linked to an identifier may be retrieved (i.e.. integrated) using the meeting analysis engine). As per claim 6, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses: wherein steps of extracting, performing voice recognition, comparing, and storing are performed by a terminal device (Paragraph [0031]). As per claim 7, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses: comprising transmitting the candidate audio stream from a terminal device to a server; wherein steps of extracting and performing voice recognition are performed by the server (Paragraphs [0031], [0076], [0084-0086] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 8, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 7 above. Degraye et al. in the combination further discloses: comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the server; the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the server; and the candidate voice transcript is stored on the server (Paragraphs [0031], [0076], [0084-0086] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 9, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 8 above. Degraye et al. in the combination further discloses: transmitting the candidate voice transcript and the target identifier from the server to the terminal device, upon determination that the candidate voiceprint feature information matches with the target voiceprint feature information of the target subject (Figure 3 & 4 and paragraphs [0076-0082] – the GUI showing the identified transcript is shown to the participants (i.e. on terminal devices) where the participants provide feedback about accuracy ). As per claim 11, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 7 above. Degraye et al. in the combination further discloses: comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the terminal device (Figure 7, item 720 and Paragraphs [0058-0059], [0061] & [0103] – voiceprint information is extracted from the audio stream and matched to mature pre-existing voiceprints); the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the terminal device; the candidate voice transcript is stored on the terminal device (Figure 7, items 724 & 726 and paragraphs [0105-0110] - a meeting transcription is generated with linked identifiers (i.e. names)); and the method further comprises transmitting the candidate voiceprint feature information and the candidate voice transcript from the server to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 13, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 2 above. Degraye et al. in the combination further discloses: the target voiceprint feature information of the target subject, the target identifier for the target subject, and correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the terminal device; extracting the target voiceprint feature information of the target subject is performed by the server; the method further comprises: transmitting the voice sample of the target subject from the terminal device to the server; transmitting the target identifier for the target subject from the terminal device to the server; and transmitting the target voiceprint feature information of the target subject from the server to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 14, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses: steps of extracting, comparing, and storing are performed by a terminal device; step of performing voice recognition is performed by a server; the method further comprising: transmitting the candidate audio stream from the terminal device to the server; and transmitting the candidate voice transcript from the server to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 15, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 14 above. Degraye et al. in the combination further discloses: the candidate audio stream is transmitted from the terminal device to the server upon determination that the candidate voiceprint feature information matches with target voiceprint feature information of a target subject; and the server transmits the candidate voice transcript and the target identifier to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 16, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses: transmitting the candidate audio stream from a terminal device to a server; wherein step of extracting is performed by the server; step of performing voice recognition and storing are performed by the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 17, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 16 above. Degraye et al. in the combination further discloses: comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the server; and the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the server (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients). As per claim 18, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 17 above. Degraye et al. in the combination further discloses: transmitting a signal from the server to the terminal device indicating that the candidate voiceprint feature information matches with target voiceprint feature information of a target subject; and transmitting a target identifier for the target subject from the server to the terminal device, the target identifier corresponding to the target voiceprint feature information of the target subject (Figure 3 & 4 and paragraphs [0076-0082] – the GUI showing the identified transcript is shown to the participants (i.e. on terminal devices) where the participants provide feedback about accuracy ). As per claim 19, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 16 above. Degraye et al. in the combination further discloses: comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the terminal device; the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the terminal device; and the method further comprises transmitting the candidate voiceprint feature information from the server to the terminal device (Figure 3 & 4 and paragraphs [0076-0082] – the GUI showing the identified transcript is shown to the participants (i.e. on terminal devices) where the participants provide feedback about accuracy ). As per claim 20, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 16 above. Kukde et al. in the combination further discloses: the candidate audio stream transmitted from the terminal device to the server is a fragment of an original candidate audio stream; the original candidate audio stream comprises the candidate audio stream and at least one interval audio stream that is not transmitted to the server; and performing voice recognition on the candidate audio stream comprises performing voice recognition on the original candidate audio stream (Figure 6, item 604 and paragraphs [0042-0043], [0046], [0060-0062], [0077] – non-speech audio segments are removed prior to speaker identification which may be done by the client device before sending the audio to the server and does not effect the voice recognition process). Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Degrave et al. (U.S. Patent Application Publication 2020/0211561) and Kukde et al. (U.S. Patent Application Publication 2023/00054495) in view of Yang et al. (U.S. Patent Application Publication 20210134301). As per claims 10 and 12, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claims 8 and 11 above. The combination fails to explicitly disclose but Yang et al. in the same field of endeavor teaches: discarding the candidate voice transcript by the server, upon determination that the candidate voiceprint feature information does not match with target voiceprint feature information of any target subject (Paragraph [0333] – speech data that is not verified as being from a registered user is deleted). It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method of Degraye et al. and Kukde et al. with the unverified speech deletion of Yang et al. because it is a case of simple substitution of one known element for another to obtain predictable results. Degraye et al. keeps the unverified speech data and creates a voiceprint for the unverified user, while Yang et al. deletes the unverified speech data instead of creating a voiceprint for the unverified user, which is a simple substitution that will obtain predictable results. Response to Arguments Applicant’s arguments, see page 9, filed 4/9/2026, with respect to the rejections of claims 1-2 and 5-22 under 35 U.S.C. 112 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. Applicant’s arguments, see pages 9-12, filed 4/9/2026, with respect to the rejection of claims 1-2, 5-9, 11, 13-19 and 21-22 under 35 U.S.C. 102 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Kukde et al.. Examiner Notes The Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the Applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully considers the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or as disclosed by the Examiner. Communications via Internet e-mail are at the discretion of the applicant and require written authorization. Should the Applicant wish to communicate via e-mail, including the following paragraph in their response will allow the Examiner to do so: “Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with me concerning any subject matter of this application by electronic mail. I understand that a copy of these communications will be made of record in the application file.” Should e-mail communication be desired, the Examiner can be reached at Edwin.Leland@USPTO.gov Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWIN S LELAND III whose telephone number is (571)270-5678. The examiner can normally be reached 8:00 - 5:00 M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /EDWIN S LELAND III/Primary Examiner, Art Unit 2654
Read full office action

Prosecution Timeline

May 04, 2023
Application Filed
Sep 23, 2025
Non-Final Rejection mailed — §102, §103
Dec 19, 2025
Response Filed
Jan 09, 2026
Final Rejection mailed — §102, §103
Mar 03, 2026
Response after Non-Final Action
Apr 09, 2026
Request for Continued Examination
Apr 15, 2026
Response after Non-Final Action
Apr 15, 2026
Non-Final Rejection (signed) — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12608548
METHODS AND SYSTEMS FOR PARSING A MIX OF FEATURES AND INSTRUCTIONS INTO A PROMPT
2y 11m to grant Granted Apr 21, 2026
Patent 12596869
DETECTING ARTIFICIAL INTELLIGENCE GENERATED TEXT
1y 11m to grant Granted Apr 07, 2026
Patent 12591602
TRAINING MACHINE LEARNING BASED NATURAL LANGUAGE PROCESSING FOR SPECIALTY JARGON
3y 6m to grant Granted Mar 31, 2026
Patent 12579370
MULTILINGUAL CHATBOT
3y 3m to grant Granted Mar 17, 2026
Patent 12579986
Systems and Methods for Distinguishing Between Human Speech and Machine Generated Speech
2y 0m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
74%
With Interview (-0.6%)
2y 5m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 456 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month