Last updated: April 19, 2026

Application No. 18/619,480

METHOD FOR SPEECH RECOGNIZING IN MULTI-SPEAKER ENVIRONMENT AND SYSTEM THEREOF

Non-Final OA §101§102§103

Filed

Mar 28, 2024

Examiner

PATEL, SHREYANS A

Art Unit

2659

Tech Center

2600 — Communications

Assignee

InventisBio Co., Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +7.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 403 resolved cases, 2023–2026

Examiner Intelligence

PATEL, SHREYANS A View full profile →

Grants 89% — above average

Career Allow Rate

359 granted / 403 resolved

+27.1% vs TC avg

Moderate +7% lift

Without

With

+7.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 3m

Avg Prosecution

46 currently pending

Career history

449

Total Applications

across all art units

Statute-Specific Performance

§101

21.3%

-18.7% vs TC avg

§103

36.0%

-4.0% vs TC avg

§102

22.6%

-17.4% vs TC avg

§112

8.8%

-31.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 403 resolved cases

Office Action

§101 §102 §103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claims 1, 5, 11 and 15 are directed to the abstract idea of human organizing of activities. The present claims encompass a human performing each of the limitations recited in the independent and dependent claims of speech recognizing in a multi-speaker environment. This can be accomplished by two human talking to each, for example, client and agent relationship. Based on the voice and semantic, one can provide personalized service based on relationship. The claims do not provide an details of how the process of speech recognition is improved in a multi-speaker environment. Further The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device. 
Dependent claims further recite an abstract idea performable by a human and do not amount to significantly more than the abstract idea as they do not provide steps other than what is conventionally known in speech recognition.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-3 and 11-13 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liu et al. (CN 108074576).

Claims 1 and 11, 
Liu teaches a speech recognizing method in a multi-speaker environment performed by a computing system comprises: analyzing a first conversation voice of a first speaker and a second speaker inputted through a microphone included in the computing system to determine a relation type between the first speaker and the second speaker ([Figs. 1-2] microphone; obtaining voice data from an interrogation scene including a quesman and quesmen (first and second speaker); and classifying their relation (quesman or quesmen) based on extracted role-identification features);
receiving an input of a first conversation voice uttered by the first speaker or the second speaker ([Fig. 1-2] voice data), and determining a role of the speaker in the relation type through semantic analysis of the inputted conversation voice ([Fig. 2] semantic feature analysis of the voice data); 
extracting a voice feature of the inputted first conversation voice, and determining the extracted voice feature as a voice feature of a speaker of the determined role ([Fig. 2] extracting the identification feature to extract voice feature of the voice data and classifying their relation based on extracted role-identification features); 
receiving an input of a second conversation voice uttered by the first speaker or the second speaker ([Fig. 2] voice data), and determining a role in the relation type of the speaker of the second conversation voice by using voice feature extracted from the second conversation voice ([Fig. 2] determining whether is it quesman or quesmen based on the role identification feature); and 
determining a personalized service corresponding to the second conversation voice by using a role of the speaker of the second conversation voice in the relation type ([Fig. 2] realizing the automatic identification of the two roles of the quesman and the quesmen so as to provide effective auxiliary information for voice transliterate during the interrogation scene).

Claims 2 and 12,  
Liu further teaches the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 1, further comprises: displaying on a screen a script obtained as a result of STT(Speak-To-Text) processing of an utterance by the first speaker or the second speaker ([Content of the Invention] [Figs. 1-2] the term voice transliterate in this patent means displaying recognized speech a text on a screen); 
receiving optional input for a third conversation voice uttered by the second speaker included in the script ([Content of the Invention] [Figs. 1-2] the feature extraction model 302, extracting the role identification characteristic of each analysis unit from the voice data, each analysis unit of plurality of analysis units comprise one speaker; multiple conversations can be had and processed); and 
determining a role of the second speaker in the relation type according to the information input for the third conversation voice ([Fig. 2] quesman or quesmen (quesmen is one or more people)).

Claims 3 and 13, 
Liu further teaches the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 2, further comprises: identifying that an utterance of the first speaker or an utterance of the second speaker is not received for more than a threshold period of time ([Content of the Invention] start/end point detection); 
inputting the script into a first artificial neural network and outputting, as voice, a text to speech (TTS) voice generated by the first artificial neural network, based on an output of the first artificial neural network ([Content of the Invention] [Fig. 2] input of the transcribed script into its speaker role identification model (DNN/RNN/CNN/SVM structure)).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (CN 108074576) and further in view of Ying (CN 102800315).

Claims 4 and 14, 
Liu teaches all the limitations in claim 1. The difference between the prior art and the claimed invention is that Liu does not explicitly teach identifying a first speech command corresponding to an operation associated with the seat in the fourth conversation voice uttered by the first speaker or the second speaker; and further performing an operation corresponding to the first speech command for a seat occupied by the speaker of the fourth conversation voice.
Ying teaches identifying a first speech command corresponding to an operation associated with the seat in the fourth conversation voice uttered by the first speaker or the second speaker; and further performing an operation corresponding to the first speech command for a seat occupied by the speaker of the fourth conversation voice ([0025] (1) microphone mounted on the each seat of the automobile, and the microphone setting authority, (2) obtaining a voice command via the microphone, and for identifying, (3) confirmation voice command received by the microphone, (4) executing the corresponding voice command authority of the microphone of the voice command is received).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Liu with teachings of Ying by modifying a speaker role separating method and system under interrogation scene as taught by Liu to include identifying a first speech command corresponding to an operation associated with the seat in the fourth conversation voice uttered by the first speaker or the second speaker; and further performing an operation corresponding to the first speech command for a seat occupied by the speaker of the fourth conversation voice as taught by Ying for the benefit of executing the corresponding voice command according to the right of the microphone (Ying [0015]).

Claims 5-8 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (CN 108074576) in view of Ying (CN 102800315).

Claims 5 and 15,  
Liu teaches a speech recognizing method in a multi-speaker environment performed by a computing system comprises: determining a relation type between the first speaker and the second speaker by analyzing a conversation voice of the first speaker and the second speaker inputted through a microphone included in the computing system ([Figs. 1-2] microphone; obtaining voice data from an interrogation scene including a quesman and quesmen (first and second speaker) and classifying their relation (quesman or quesmen) based on extracted role-identification features);
Liu does not explicitly teach determining a list of speech commands allowed for the second speaker by using the relation type; and disregarding at least some of the speech commands uttered by the second speaker by using a list of speech commands allowed for the second speaker.
Ying teaches determining a list of speech commands allowed for the second speaker by using the relation type ([Fig. 1] [Steps 110-160] mounting microphone and setting permissions (authority) for the microphone; executing a corresponding voice command according to the permissions of the microphone receiving the voice command); and 
disregarding at least some of the speech commands uttered by the second speaker by using a list of speech commands allowed for the second speaker  ([Fig. 1] [Steps 110-160] carry out the relevant voice order according to the authority of the microphone of receiving voice command and executing a corresponding voice command according to the permission; disregarding commands not permitted by speaker’s permission (authority)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Liu with teachings of Ying by modifying a speaker role separating method and system under interrogation scene as taught by Liu to include determining a list of speech commands allowed for the second speaker by using the relation type; and disregarding at least some of the speech commands uttered by the second speaker by using a list of speech commands allowed for the second speaker as taught by Ying for the benefit of executing the corresponding voice command according to the right of the microphone (Ying [0015]).

Claims 6 and 16,  
Ying further teaches according to the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 5,
disregarding at least some of the speech commands uttered by the second speaker by using a list of speech commands allowed for the second speaker further comprises displaying on a screen an alarm indicating that the second speaker has no permission for the second speech command if the second speech command uttered by the second speaker is disregarded ([0034] [Step 140] reports the error microphone obtaining voice command of the user; when a voice command is not recognized out of, a vehicular system reports the error and prompt the user to re-enter the voice command, the user sends the voice command microphone obtaining voice command of the user).

Claims 7 and 17, 
Ying further teaches according to the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 5, 
identifying a third speech command corresponding to an operation associated with the seat from a fifth conversation voice uttered by the first speaker or the second speaker; and performing an operation corresponding to the third speech command for a seat occupied by the speaker of the fifth conversation voice ([Fig. 1] [Steps 110-160] on each seat of automobile, microphone is installed, and microphone is provided with authority; carry out the relevant voice order according to the authority of the microphone of receiving voice command; seat-associated operation (seat adjustment, air vent control) and execution based on that seat’s microphone authority).

Claims 8 and 18, 
Ying further teaches according to the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 5, 
defining a list of speech commands which are allowed for each of the relation type between the first speaker and the second speaker based on user input ([Fig. 1] [Steps 110-160] Step 110; secondary driving base and the back row seat is of the user vehicle system when sending the voice command associated with the music or broadcast can be executed; template matching is used to list voice commands and identify).

Claims 9-10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (CN 108074576) in view of Ying (CN 102800315) and further in view of Fry (US 20190057693).

Claims 9 and 20, 
Ying further teaches according to the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 5,
 receiving a sixth conversation voice uttered by the first speaker or the second speaker; identifying that the sixth conversation voice corresponds to the fourth speech command ([Fig. 1] [Steps 110-160] voice command), but identifying that the sixth conversation voice does not comprise a first parameter necessary to perform the operation corresponding to the fourth speech command; outputting query the first parameter as a voice ([Fig. 1] [Steps 110-1160] prompt); and performing an operation corresponding to the fourth speech command by using the first parameter in response to receiving a seventh conversation voice from a speaker of the sixth conversation voice including the first parameter. ([0034] [Step 140] reports the error microphone obtaining voice command of the user; when a voice command is not recognized out of, a vehicular system reports the error and prompt the user to re-enter the voice command, the user sends the voice command microphone obtaining voice command of the user).
The difference between the prior art and the claimed invention is that Liu nor Ying explicitly teach TTS voice for querying.
Fry teaches TTS voice for querying ([0024] a Text-To-Speech (TTS) prompt module may be used to synthesize voice alone or for prompts provided in textual form via prompt module 144).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Liu and Ying with teachings of Fry by modifying a speaker role separating method and system under interrogation scene as taught by Liu to include TTS voice for querying as taught by Fry for the benefit of rendering correct/appropriate prompts (Fry [0024]).

Claims 10 and 20, 
Liu further teaches the speech recognizing method in a multi-speaker environment performed by a computing system of Claim 9, further comprises: converting the value of the first parameter based on the role of the speaker of the seventh conversation voice in the relation type ([Fig. 2] the role identification feature comprises speaker type feature, voice feature, semantic feature; determining the speaker role corresponding to the current analysis unit according to the model output; determining whether is it quesman or quesmen based on the role identification feature).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Church et al. (US 10,147,438) – Role modeling in call centers and work centers.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Primary Examiner
Art Unit 2653



/SHREYANS A PATEL/               Examiner, Art Unit 2659

Read full office action

Prosecution Timeline

Mar 28, 2024

Application Filed

Nov 11, 2025

Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/934,906

Patent 12586597

ENHANCED AUDIO FILE GENERATOR

2y 5m to grant Granted Mar 24, 2026

18/744,449

Patent 12586561

TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM, AND A METHOD OF CALCULATING AN EXPRESSIVITY SCORE

2y 5m to grant Granted Mar 24, 2026

17/983,671

Patent 12548549

ON-DEVICE PERSONALIZATION OF SPEECH SYNTHESIS FOR TRAINING OF SPEECH RECOGNITION MODEL(S)

2y 5m to grant Granted Feb 10, 2026

18/589,789

Patent 12548583

ACOUSTIC CONTROL APPARATUS, STORAGE MEDIUM AND ACCOUSTIC CONTROL METHOD

2y 5m to grant Granted Feb 10, 2026

18/201,105

Patent 12536988

SPEECH SYNTHESIS METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM

2y 5m to grant Granted Jan 27, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

89%

Grant Probability

96%

With Interview (+7.4%)

2y 3m

Median Time to Grant

Low

PTA Risk

Based on 403 resolved cases by this examiner. Grant probability derived from career allow rate.