Last updated: April 19, 2026

Application No. 19/326,569

DISTINGUISHING USER SPEECH FROM BACKGROUND SPEECH IN SPEECH-DENSE ENVIRONMENTS

Non-Final OA §101§103

Filed

Sep 11, 2025

Examiner

YANG, QIAN

Art Unit

2677

Tech Center

2600 — Communications

Assignee

Vocollect Inc.

OA Round

1 (Non-Final)

Interview Optional

— +31.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 963 resolved cases, 2023–2026

Examiner Intelligence

YANG, QIAN View full profile →

Grants 74% — above average

Career Allow Rate

709 granted / 963 resolved

+11.6% vs TC avg

Strong +31% interview lift

Without

With

+31.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

34 currently pending

Career history

997

Total Applications

across all art units

Statute-Specific Performance

§101

15.3%

-24.7% vs TC avg

§103

48.3%

+8.3% vs TC avg

§102

21.2%

-18.8% vs TC avg

§112

11.1%

-28.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 963 resolved cases

Office Action

§101 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 13 – 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims 13 – 18 claimed a computer program product. It can be interpreted as a software or a carrier wave signal. It fails to fall within a statutory category of invention. It is not a process occurring as a result of executing the software, a machine programmed to operate in accordance with the software nor a manufacture structurally and functionally interconnected with the software in a manner which enables the software to act as a computer component and realize its functionality. It is also clearly not directed to a composition of matter. Therefore, it is non-statutory under 35 U.S.C. 101. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 – 2, 7 – 8 and 13 – 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho et al. (US Patent Application Publication 2014/0278391), hereinafter referred as Braho, in view of Commons (US Patent 8,775,341).

Regarding claim 7, Braho discloses a speech recognition device (Figs. 1 – 3) comprising:
a microphone (Fig. 1, #120 a-b);
at least one processor (Fig. 3, microprocessor 302); and
at least one memory  (Fig. 3, ROM/RAM 306 and 308) including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the speech recognition device (SRD) to at least:
receive, at the microphone, an audio input (Fig. 5, 504, [0110]), wherein the audio input comprises at least one of a speech of at least one user and a background sound in an environment of the at least one user ([0013, 0050, 0081, 0114 - 0118]), and wherein at least a portion of audio input corresponds to a task being executed by the at least one user ([0005 - 0007, 0013]);
determine a presence of a background sound in the audio input based on an algorithm, wherein the algorithm is processed based on at least one of a plurality of audio speech samples and a plurality of background sound samples (Fig. 5, step 514 – 518, [0114 - 0118]);
in an instance in which the background sound is absent from the audio input, generate at least one of words and phrases related to the task in a workflow (Fig. 5, 518, [0119], “This may advantageously limit the information being sent to be information which has been classified (i.e., determined to likely be) as speech rather than noise”; Fig. 5, 520 - 522, [0120 - 0122], “digitized audio to recognize speech”; [0162], “outputs recognized text”); and
in an instance in which the audio input comprises the background sound, filter out the background sound from the audio input (Fig. 8, step 810 – 813, [0147 – 0150]; Fig. 9, step 912 – 916, [0161 – 0612], reject/block background sound). 
However, Braho fails to explicitly disclose the device wherein determining a presence of a background sound is based on a neural network, wherein the neural network is trained based on at least one of a plurality of audio speech samples and a plurality of background sound samples.
However, in a similar field of endeavor Commons discloses a system of using neural network to detect voice message (col. 45, lines 19 - 37). In addition, Commons discloses the system wherein determining a presence of a background sound is based on a neural network, wherein the neural network is trained based on at least one of a plurality of audio speech samples and a plurality of background sound samples (col. 45, lines 19 - 37).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and determining a presence of a background sound is based on a neural network, wherein the neural network is trained based on at least one of a plurality of audio speech samples and a plurality of background sound samples. The motivation for doing this is that the device of Braho can be more powerful and advanced for artificial intelligence.

Regarding claim 8 (depends on claim 7), Braho discloses the device wherein the at least one processor is configured to receive the plurality of audio speech samples and the plurality of background sound samples, wherein the plurality of audio speech samples corresponds to a speech of a plurality of users ([0007], “a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system. This approach allows the user(s) to provide spoken or voice input to the voice driven system, including commands and/or information”; also [0110]), and wherein the plurality of background sound samples corresponds to the background sound in the environment of the plurality of users ([0007], “a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system”; [0008], “conversations which are not intended as input”). 

Regarding claims 1 – 2, they are corresponding to claims 7 – 8, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 7 – 8.

Regarding claims 13 – 14, they are corresponding to claims 7 – 8, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 7 – 8.

Claim(s) 3, 9 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho in view of Commons, and in further view of Faisman et al. (US Patent Application Publication 2008/0319743), hereinafter referred as Faisman.

Regarding claim 9 (depends on claim 7), Braho fails to explicitly disclose the device wherein the at least one processor is configured to: generate a first transcript of each speech sample of the plurality of audio speech samples; and generate a second transcript of a set of background sound samples of the plurality of background sound samples, wherein the set of background sound samples include speech of one or more users in the environment.  
However, in a similar field of endeavor Faisman discloses an ASR-aided transcription system (abstract). In addition, Faisman discloses the system wherein generate transcripts of each speech sample of the plurality of audio speech samples (Fig. 1, [0013 – 0018], generate transcripts for each speech sample of the plurality of audio speech samples).
Braho discloses process each speech sample of the plurality of audio speech samples, and process a set of background sound samples of the plurality of background sound samples, wherein the set of background sound samples include speech of one or more users in the environment ([0007 – 0008]).
There was some teaching, suggestion, or motivation, either in the references themselves or in the knowledge generally available to one of ordinary skill in the art, to modify Braho and Faisman, or to combine references teachings.
There was reasonable expectation of success to modify Braho and Faisman, or to combine references teachings to achieve the claimed limitations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and generate a first transcript of each speech sample of the plurality of audio speech samples; and generate a second transcript of a set of background sound samples of the plurality of background sound samples, wherein the set of background sound samples include speech of one or more users in the environment. The motivation for doing this is that the all conversation can be recorded and logged so that it is beneficial for a later checking.

Regarding claims 3 and 15, they are corresponding to claim 9, thus, they are interpreted and rejected for a same reason set forth for claim 9.

Claim(s) 4, 10 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho in view of Commons, in further view of Faisman and Sak et al. (“LEARNING ACOUSTIC FRAME LABELING FOR SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS”, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)), hereinafter referred as Sak.

Regarding claim 10 (depends on claim 9), Braho fails to explicitly disclose the device wherein the at least one processor is configured to train the neural network based at least on the first transcript and the second transcript.  
However, in a similar field of endeavor Sak discloses a method for acoustic modeling (abstract). In addition, Sak discloses the method wherein train the neural network based at least on transcripts (section 1, 2.2, 2.3, 4.1, transcript).
There was some teaching, suggestion, or motivation, either in the references themselves or in the knowledge generally available to one of ordinary skill in the art, to modify Braho and Sak, or to combine references teachings.
There was reasonable expectation of success to modify Braho and Sak, or to combine references teachings to achieve the claimed limitations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and train the neural network based at least on the first transcript and the second transcript. The motivation for doing this is that training can be more precise and effective with labeled transcripts.

Regarding claims 4 and 16, they are corresponding to claim 10, thus, they are interpreted and rejected for a same reason set forth for claim 10.

Claim(s) 5 – 6, 11 – 12 and 17 – 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho in view of Commons, in further view of Faisman and Ramalho et al. (US Patent 8,600,750), hereinafter referred as Ramalho.

Regarding claim 11 (depends on claim 9), Braho fails to explicitly disclose the device wherein the at least one processor is configured to determine sound characterization for one or more words based on the first transcript and the second transcript.  
However, in a similar field of endeavor Ramalho discloses an ASR system (abstract). In addition, Ramalho discloses the system wherein determine sound characterization for one or more words based on the first transcript and the second transcript (col. 1, line 47 to col. 2, line 3; col. 4, lines 29 - 43).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and determine sound characterization for one or more words based on the first transcript and the second transcript. The motivation for doing this is that the application of Braho can be extended to specify a specific user so that the system is more advanced.

Regarding claim 12 (depends on claim 11), Braho discloses the device wherein the at least one processor is configured to determine a rejection threshold based on the sound classifier, wherein the rejection threshold is utilized to at least accept or reject the audio input ([0143 – 0150]).  
However, Braho fails to explicitly disclose wherein the sound classifier is the sound characterization.
However, in a similar field of endeavor Ramalho discloses an ASR system (abstract). In addition, Ramalho discloses the system wherein classifies the sound characterization to group different people (col. 2, line 52 to col. 3, line 20).
Substituting sound classification with sound characterization were known to the art.
One of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and determine a rejection threshold based on the sound characterization. The motivation for doing this is that the application of Braho can be extended to specify a specific user so that the system is more advanced.

Regarding claims 5 – 6, they are corresponding to claims 11 – 12, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 11 – 12.

Regarding claims 17 – 18, they are corresponding to claims 11 – 12, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 11 – 12.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QIAN YANG whose telephone number is (571)270-7239. The examiner can normally be reached on Monday-Thursday 8am-6pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached on 571-270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/QIAN YANG/
Primary Examiner, Art Unit 2677

Read full office action

Prosecution Timeline

Sep 11, 2025

Application Filed

Feb 09, 2026

Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/667,568

Patent 12598273

Camera Platform Incorporating Schedule and Stature

2y 5m to grant Granted Apr 07, 2026

18/235,124

Patent 12586560

ELECTRONIC APPARATUS, TERMINAL APPARATUS AND CONTROLLING METHOD THEREOF

2y 5m to grant Granted Mar 24, 2026

18/335,210

Patent 12586239

SMART IMAGE PROCESSING METHOD AND DEVICE USING SAME

2y 5m to grant Granted Mar 24, 2026

17/755,465

Patent 12579432

METHODS AND APPARATUS FOR AUTOMATED SPECIMEN CHARACTERIZATION USING DIAGNOSTIC ANALYSIS SYSTEM WITH CONTINUOUS PERFORMANCE BASED TRAINING

2y 5m to grant Granted Mar 17, 2026

18/661,521

Patent 12579686

Mixed Depth Object Detection

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

74%

Grant Probability

99%

With Interview (+31.3%)

2y 7m

Median Time to Grant

Low

PTA Risk

Based on 963 resolved cases by this examiner. Grant probability derived from career allow rate.