Last updated: May 29, 2026

Application No. 17/904,975

METHOD FOR OUTPUTTING VOICE TRANSCRIPT, VOICE TRANSCRIPT GENERATING SYSTEM, AND COMPUTER-PROGRAM PRODUCT

Non-Final OA §102§103

Filed

May 04, 2023

Priority

Oct 28, 2021 — nonprovisional of PCTCN2021127147

Examiner

LELAND III, EDWIN S

Art Unit

2654

Tech Center

2600 — Communications

Assignee

BOE TECHNOLOGY GROUP CO., LTD.

OA Round

3 (Non-Final)

Interview Optional

— -0.6% interview lift. Interview lift (-0.6%) is below the 15.0% threshold. A written response is recommended.

Based on 456 resolved cases, 2023–2026

Examiner Intelligence

LELAND III, EDWIN S View full profile →

Grants 75% — above average

Career Allowance Rate

342 granted / 456 resolved

+13.0% vs TC avg

Minimal -1% lift

Without

With

+-0.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

14 currently pending

Career history

471

Total Applications

across all art units

Statute-Specific Performance

§101

11.1%

-28.9% vs TC avg

§103

67.9%

+27.9% vs TC avg

§102

10.9%

-29.1% vs TC avg

§112

5.6%

-34.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 456 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/9/2026 has been entered.
 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/19/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Status of Claims
Claims 1-2 and 5-22 are pending in this application.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 5-9, 11, and 13-22 are rejected under 35 U.S.C. 103 as being unpatentable over Degraye et al. (U.S. Patent Application Publication 2020/0211561) in view of Kukde et al. (U.S. Patent Application Publication 2023/00054495).
As per claims 1, 21 and 22, Degraye et al. discloses:
A voice transcript generating system (Figure 10 and paragraph [0141]), comprising: 
one or more processors (Figure 10, item 1004 and paragraph [0141]) configured to: 
transmit the candidate audio stream from a terminal device to a server (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients);
extract candidate voiceprint feature information from a candidate audio stream (Figure 7, item 720 and Paragraphs [0058-0059], [0061] & [0103] – voiceprint information is extracted from the audio stream and matched to mature pre-existing voiceprints); 
perform voice recognition on the candidate audio stream to generate a candidate voice transcript (Figure 7, item 724 and paragraphs [0105-0107] -  a meeting transcription is generated); 
compare the candidate voiceprint feature information with target voiceprint feature information of at least one target subject (Figure 7, item 720 and Paragraphs [0058-0059], [0061] & [0103] – voiceprint information is extracted from the audio stream and matched to mature pre-existing voiceprints);  
upon determination that the candidate voiceprint feature information matches with target voiceprint feature information of a target subject, store the candidate voice transcript and a target identifier for the target subject, the target identifier corresponding to the target voiceprint feature information of the target subject (Figure 7, items 724 & 726 and paragraphs [0105-0110] -  a meeting transcription is generated with linked identifiers (i.e. names)).
reiterating steps of transmitting, extracting, performing voice recognition, comparing, and storing for at least one additional candidate audio stream (Paragraphs [0063-0065] – the process is repeated for each audio segment); and
integrating a plurality of candidate audio streams associated with a same target identifier into an integrated audio stream associated with the same target identifier (Paragraphs [0064-65] – the segments belonging to each different voiceprint are integrated prior to transcription);
wherein performing voice recognition on the candidate audio stream comprises performing voice recognition on the integrated audio stream associated with the same target identifier to generate a meeting record or a meeting summary for a same target subject (Paragraphs [0064-65] – the segments belonging to each different voiceprint are integrated prior to transcription, then transcribed using that voiceprint to improve accuracy).
Dgrave et al. fails to explicitly disclose but Kukde et al. in the same field of endeavor teaches:
the candidate audio stream transmitted from the terminal device to the server is a fragment of an original candidate audio stream; andthe original candidate audio stream comprises the candidate audio stream and at least one interval audio stream that is not transmitted to the server (Figure 6, item 604 and paragraphs [0042-0043], [0046], [0060-0062], [0077] – non-speech audio segments are removed prior to speaker identification which may be done by the client device before sending the audio to the server and does not effect the voice recognition process).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method of Degraye et al. with the non-speech deletion of Kukde et al. because it is a case of combining prior art elements according to known methods to yield predictable results. Pause and non-speech audio deletion is a well known technique in the art that simplifies transcription production and speaker identification.
Claim 1 is directed to the method of using the system of claim 21, so is rejected for similar reasons.
Claim 22 is directed to a computer readable medium containing instructions to cause a process to act as the system of claim 21, so is rejected for similar reasons.

As per claim 2, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses:
extracting the target voiceprint feature information of the target subject from a voice sample (Figure 7, items 706-710 and paragraphs [0094-0098] – reference voiceprints are created); and
storing the target voiceprint feature information of the target subject, the target identifier for the target subject, and correspondence between the target voiceprint feature information of the target subject and the target identifier (Figure 7, items 720 & 722 and Paragraphs [0103-0104] – all of these items are stored in memory in order to identify the speaker).
  
As per claim 5, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses:
reiterating steps of extracting, performing voice recognition, comparing, and storing for at least one additional candidate audio stream: and integrating a plurality of candidate voice transcripts associated with a same target identifier into a meeting record for a same target subject (Paragraphs [0064-65] & [0067-0068] – the segments belonging to each different voiceprint are integrated prior to transcription, then transcribed using that voiceprint to improve accuracy. All of the transcribed segments linked to an identifier may be retrieved (i.e.. integrated) using the meeting analysis engine).

As per claim 6, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses:
wherein steps of extracting, performing voice recognition, comparing, and storing are performed by a terminal device (Paragraph [0031]).

As per claim 7, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses:
comprising transmitting the candidate audio stream from a terminal device to a server; wherein steps of extracting and performing voice recognition are performed by the server (Paragraphs [0031], [0076], [0084-0086] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 8, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 7 above. Degraye et al. in the combination further discloses:
comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the server; the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the server; and the candidate voice transcript is stored on the server (Paragraphs [0031], [0076], [0084-0086] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 9, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 8 above. Degraye et al. in the combination further discloses:
transmitting the candidate voice transcript and the target identifier from the server to the terminal device, upon determination that the candidate voiceprint feature information matches with the target voiceprint feature information of the target subject (Figure 3 & 4 and paragraphs [0076-0082] – the GUI showing the identified transcript is shown to the participants (i.e. on terminal devices) where the participants provide feedback about accuracy ).

As per claim 11, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 7 above. Degraye et al. in the combination further discloses:
comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the terminal device (Figure 7, item 720 and Paragraphs [0058-0059], [0061] & [0103] – voiceprint information is extracted from the audio stream and matched to mature pre-existing voiceprints); the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the terminal device; the candidate voice transcript is stored on the terminal device (Figure 7, items 724 & 726 and paragraphs [0105-0110] -  a meeting transcription is generated with linked identifiers (i.e. names)); and the method further comprises transmitting the candidate voiceprint feature information and the candidate voice transcript from the server to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 13, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 2 above. Degraye et al. in the combination further discloses:
the target voiceprint feature information of the target subject, the target identifier for the target subject, and correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the terminal device; extracting the target voiceprint feature information of the target subject is performed by the server; the method further comprises: transmitting the voice sample of the target subject from the terminal device to the server; transmitting the target identifier for the target subject from the terminal device to the server; and transmitting the target voiceprint feature information of the target subject from the server to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 14, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses:
steps of extracting, comparing, and storing are performed by a terminal device; step of performing voice recognition is performed by a server; the method further comprising: transmitting the candidate audio stream from the terminal device to the server; and transmitting the candidate voice transcript from the server to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 15, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 14 above. Degraye et al. in the combination further discloses:
the candidate audio stream is transmitted from the terminal device to the server upon determination that the candidate voiceprint feature information matches with target voiceprint feature information of a target subject; and the server transmits the candidate voice transcript and the target identifier to the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 16, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 1 above. Degraye et al. in the combination further discloses:
transmitting the candidate audio stream from a terminal device to a server; wherein step of extracting is performed by the server; step of performing voice recognition and storing are performed by the terminal device (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 17, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 16 above. Degraye et al. in the combination further discloses:
comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the server; and the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the server (Paragraphs [0031], [0076], [0085] & [0150] – the functionality of the system may be divided between different computers, to include servers and clients).

As per claim 18, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 17 above. Degraye et al. in the combination further discloses:
transmitting a signal from the server to the terminal device indicating that the candidate voiceprint feature information matches with target voiceprint feature information of a target subject; and transmitting a target identifier for the target subject from the server to the terminal device, the target identifier corresponding to the target voiceprint feature information of the target subject (Figure 3 & 4 and paragraphs [0076-0082] – the GUI showing the identified transcript is shown to the participants (i.e. on terminal devices) where the participants provide feedback about accuracy ).

As per claim 19, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 16 above. Degraye et al. in the combination further discloses:
comparing the candidate voiceprint feature information with the target voiceprint feature information of at least one target subject is performed by the terminal device; the target voiceprint feature information of the target subject, the target identifier for the target subject, and the correspondence between the target voiceprint feature information of the target subject and the target identifier are stored on the terminal device; and the method further comprises transmitting the candidate voiceprint feature information from the server to the terminal device (Figure 3 & 4 and paragraphs [0076-0082] – the GUI showing the identified transcript is shown to the participants (i.e. on terminal devices) where the participants provide feedback about accuracy ).

As per claim 20, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claim 16 above. Kukde et al. in the combination further discloses:
the candidate audio stream transmitted from the terminal device to the server is a fragment of an original candidate audio stream; the original candidate audio stream comprises the candidate audio stream and at least one interval audio stream that is not transmitted to the server; and performing voice recognition on the candidate audio stream comprises performing voice recognition on the original candidate audio stream (Figure 6, item 604 and paragraphs [0042-0043], [0046], [0060-0062], [0077] – non-speech audio segments are removed prior to speaker identification which may be done by the client device before sending the audio to the server and does not effect the voice recognition process).

Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Degrave et al. (U.S. Patent Application Publication 2020/0211561) and Kukde et al. (U.S. Patent Application Publication 2023/00054495) in view of Yang et al. (U.S. Patent Application Publication 20210134301).
As per claims 10 and 12, the combination of Degraye et al. and Kukde et al. discloses all of the limitations of claims 8 and 11 above. The combination fails to explicitly disclose but Yang et al. in the same field of endeavor teaches:
discarding the candidate voice transcript by the server, upon determination that the candidate voiceprint feature information does not match with target voiceprint feature information of any target subject (Paragraph [0333] – speech data that is not verified as being from a registered user is deleted).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method of Degraye et al. and Kukde et al. with the unverified speech deletion of Yang et al. because it is a case of simple substitution of one known element for another to obtain predictable results. Degraye et al. keeps the unverified speech data and creates a voiceprint for the unverified user, while Yang et al. deletes the unverified speech data instead of creating a voiceprint for the unverified user, which is a simple substitution that will obtain predictable results.

Response to Arguments
Applicant’s arguments, see page 9, filed 4/9/2026, with respect to the rejections of claims 1-2 and 5-22 under 35 U.S.C. 112 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  
Applicant’s arguments, see pages 9-12, filed 4/9/2026, with respect to the rejection of claims 1-2, 5-9, 11, 13-19 and 21-22 under 35 U.S.C. 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Kukde et al..

Examiner Notes
The Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the Applicant.  Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested that, in preparing responses, the Applicant fully considers the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or as disclosed by the Examiner. 
Communications via Internet e-mail are at the discretion of the applicant and require written authorization. Should the Applicant wish to communicate via e-mail, including the following paragraph in their response will allow the Examiner to do so:
“Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with me concerning any subject matter of this application by electronic mail. I understand that a copy of these communications will be made of record in the application file.”
Should e-mail communication be desired, the Examiner can be reached at Edwin.Leland@USPTO.gov

Conclusion
        Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWIN S LELAND III whose telephone number is (571)270-5678. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/EDWIN S LELAND III/Primary Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

May 04, 2023

Application Filed

Sep 23, 2025

Non-Final Rejection mailed — §102, §103

Dec 19, 2025

Response Filed

Jan 09, 2026

Final Rejection mailed — §102, §103

Mar 03, 2026

Response after Non-Final Action

Apr 09, 2026

Request for Continued Examination

Apr 15, 2026

Response after Non-Final Action

Apr 15, 2026

Non-Final Rejection (signed) — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/315,789

Patent 12608548

METHODS AND SYSTEMS FOR PARSING A MIX OF FEATURES AND INSTRUCTIONS INTO A PROMPT

2y 11m to grant Granted Apr 21, 2026

18/654,795

Patent 12596869

DETECTING ARTIFICIAL INTELLIGENCE GENERATED TEXT

1y 11m to grant Granted Apr 07, 2026

17/936,873

Patent 12591602

TRAINING MACHINE LEARNING BASED NATURAL LANGUAGE PROCESSING FOR SPECIALTY JARGON

3y 6m to grant Granted Mar 31, 2026

17/993,063

Patent 12579370

MULTILINGUAL CHATBOT

3y 3m to grant Granted Mar 17, 2026

18/602,835

Patent 12579986

Systems and Methods for Distinguishing Between Human Speech and Machine Generated Speech

2y 0m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

75%

Grant Probability

74%

With Interview (-0.6%)

2y 5m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 456 resolved cases by this examiner. Grant probability derived from career allowance rate.