Prosecution Insights
Last updated: April 19, 2026
Application No. 19/326,569

DISTINGUISHING USER SPEECH FROM BACKGROUND SPEECH IN SPEECH-DENSE ENVIRONMENTS

Non-Final OA §101§103
Filed
Sep 11, 2025
Examiner
YANG, QIAN
Art Unit
2677
Tech Center
2600 — Communications
Assignee
Vocollect Inc.
OA Round
1 (Non-Final)
74%
Grant Probability
Favorable
1-2
OA Rounds
2y 7m
To Grant
99%
With Interview

Examiner Intelligence

Grants 74% — above average
74%
Career Allow Rate
709 granted / 963 resolved
+11.6% vs TC avg
Strong +31% interview lift
Without
With
+31.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
34 currently pending
Career history
997
Total Applications
across all art units

Statute-Specific Performance

§101
15.3%
-24.7% vs TC avg
§103
48.3%
+8.3% vs TC avg
§102
21.2%
-18.8% vs TC avg
§112
11.1%
-28.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 963 resolved cases

Office Action

§101 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 13 – 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims 13 – 18 claimed a computer program product. It can be interpreted as a software or a carrier wave signal. It fails to fall within a statutory category of invention. It is not a process occurring as a result of executing the software, a machine programmed to operate in accordance with the software nor a manufacture structurally and functionally interconnected with the software in a manner which enables the software to act as a computer component and realize its functionality. It is also clearly not directed to a composition of matter. Therefore, it is non-statutory under 35 U.S.C. 101. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1 – 2, 7 – 8 and 13 – 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho et al. (US Patent Application Publication 2014/0278391), hereinafter referred as Braho, in view of Commons (US Patent 8,775,341). Regarding claim 7, Braho discloses a speech recognition device (Figs. 1 – 3) comprising: a microphone (Fig. 1, #120 a-b); at least one processor (Fig. 3, microprocessor 302); and at least one memory (Fig. 3, ROM/RAM 306 and 308) including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the speech recognition device (SRD) to at least: receive, at the microphone, an audio input (Fig. 5, 504, [0110]), wherein the audio input comprises at least one of a speech of at least one user and a background sound in an environment of the at least one user ([0013, 0050, 0081, 0114 - 0118]), and wherein at least a portion of audio input corresponds to a task being executed by the at least one user ([0005 - 0007, 0013]); determine a presence of a background sound in the audio input based on an algorithm, wherein the algorithm is processed based on at least one of a plurality of audio speech samples and a plurality of background sound samples (Fig. 5, step 514 – 518, [0114 - 0118]); in an instance in which the background sound is absent from the audio input, generate at least one of words and phrases related to the task in a workflow (Fig. 5, 518, [0119], “This may advantageously limit the information being sent to be information which has been classified (i.e., determined to likely be) as speech rather than noise”; Fig. 5, 520 - 522, [0120 - 0122], “digitized audio to recognize speech”; [0162], “outputs recognized text”); and in an instance in which the audio input comprises the background sound, filter out the background sound from the audio input (Fig. 8, step 810 – 813, [0147 – 0150]; Fig. 9, step 912 – 916, [0161 – 0612], reject/block background sound). However, Braho fails to explicitly disclose the device wherein determining a presence of a background sound is based on a neural network, wherein the neural network is trained based on at least one of a plurality of audio speech samples and a plurality of background sound samples. However, in a similar field of endeavor Commons discloses a system of using neural network to detect voice message (col. 45, lines 19 - 37). In addition, Commons discloses the system wherein determining a presence of a background sound is based on a neural network, wherein the neural network is trained based on at least one of a plurality of audio speech samples and a plurality of background sound samples (col. 45, lines 19 - 37). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and determining a presence of a background sound is based on a neural network, wherein the neural network is trained based on at least one of a plurality of audio speech samples and a plurality of background sound samples. The motivation for doing this is that the device of Braho can be more powerful and advanced for artificial intelligence. Regarding claim 8 (depends on claim 7), Braho discloses the device wherein the at least one processor is configured to receive the plurality of audio speech samples and the plurality of background sound samples, wherein the plurality of audio speech samples corresponds to a speech of a plurality of users ([0007], “a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system. This approach allows the user(s) to provide spoken or voice input to the voice driven system, including commands and/or information”; also [0110]), and wherein the plurality of background sound samples corresponds to the background sound in the environment of the plurality of users ([0007], “a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system”; [0008], “conversations which are not intended as input”). Regarding claims 1 – 2, they are corresponding to claims 7 – 8, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 7 – 8. Regarding claims 13 – 14, they are corresponding to claims 7 – 8, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 7 – 8. Claim(s) 3, 9 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho in view of Commons, and in further view of Faisman et al. (US Patent Application Publication 2008/0319743), hereinafter referred as Faisman. Regarding claim 9 (depends on claim 7), Braho fails to explicitly disclose the device wherein the at least one processor is configured to: generate a first transcript of each speech sample of the plurality of audio speech samples; and generate a second transcript of a set of background sound samples of the plurality of background sound samples, wherein the set of background sound samples include speech of one or more users in the environment. However, in a similar field of endeavor Faisman discloses an ASR-aided transcription system (abstract). In addition, Faisman discloses the system wherein generate transcripts of each speech sample of the plurality of audio speech samples (Fig. 1, [0013 – 0018], generate transcripts for each speech sample of the plurality of audio speech samples). Braho discloses process each speech sample of the plurality of audio speech samples, and process a set of background sound samples of the plurality of background sound samples, wherein the set of background sound samples include speech of one or more users in the environment ([0007 – 0008]). There was some teaching, suggestion, or motivation, either in the references themselves or in the knowledge generally available to one of ordinary skill in the art, to modify Braho and Faisman, or to combine references teachings. There was reasonable expectation of success to modify Braho and Faisman, or to combine references teachings to achieve the claimed limitations. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and generate a first transcript of each speech sample of the plurality of audio speech samples; and generate a second transcript of a set of background sound samples of the plurality of background sound samples, wherein the set of background sound samples include speech of one or more users in the environment. The motivation for doing this is that the all conversation can be recorded and logged so that it is beneficial for a later checking. Regarding claims 3 and 15, they are corresponding to claim 9, thus, they are interpreted and rejected for a same reason set forth for claim 9. Claim(s) 4, 10 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho in view of Commons, in further view of Faisman and Sak et al. (“LEARNING ACOUSTIC FRAME LABELING FOR SPEECH RECOGNITION WITH RECURRENT NEURAL NETWORKS”, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)), hereinafter referred as Sak. Regarding claim 10 (depends on claim 9), Braho fails to explicitly disclose the device wherein the at least one processor is configured to train the neural network based at least on the first transcript and the second transcript. However, in a similar field of endeavor Sak discloses a method for acoustic modeling (abstract). In addition, Sak discloses the method wherein train the neural network based at least on transcripts (section 1, 2.2, 2.3, 4.1, transcript). There was some teaching, suggestion, or motivation, either in the references themselves or in the knowledge generally available to one of ordinary skill in the art, to modify Braho and Sak, or to combine references teachings. There was reasonable expectation of success to modify Braho and Sak, or to combine references teachings to achieve the claimed limitations. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and train the neural network based at least on the first transcript and the second transcript. The motivation for doing this is that training can be more precise and effective with labeled transcripts. Regarding claims 4 and 16, they are corresponding to claim 10, thus, they are interpreted and rejected for a same reason set forth for claim 10. Claim(s) 5 – 6, 11 – 12 and 17 – 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Braho in view of Commons, in further view of Faisman and Ramalho et al. (US Patent 8,600,750), hereinafter referred as Ramalho. Regarding claim 11 (depends on claim 9), Braho fails to explicitly disclose the device wherein the at least one processor is configured to determine sound characterization for one or more words based on the first transcript and the second transcript. However, in a similar field of endeavor Ramalho discloses an ASR system (abstract). In addition, Ramalho discloses the system wherein determine sound characterization for one or more words based on the first transcript and the second transcript (col. 1, line 47 to col. 2, line 3; col. 4, lines 29 - 43). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and determine sound characterization for one or more words based on the first transcript and the second transcript. The motivation for doing this is that the application of Braho can be extended to specify a specific user so that the system is more advanced. Regarding claim 12 (depends on claim 11), Braho discloses the device wherein the at least one processor is configured to determine a rejection threshold based on the sound classifier, wherein the rejection threshold is utilized to at least accept or reject the audio input ([0143 – 0150]). However, Braho fails to explicitly disclose wherein the sound classifier is the sound characterization. However, in a similar field of endeavor Ramalho discloses an ASR system (abstract). In addition, Ramalho discloses the system wherein classifies the sound characterization to group different people (col. 2, line 52 to col. 3, line 20). Substituting sound classification with sound characterization were known to the art. One of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Braho, and determine a rejection threshold based on the sound characterization. The motivation for doing this is that the application of Braho can be extended to specify a specific user so that the system is more advanced. Regarding claims 5 – 6, they are corresponding to claims 11 – 12, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 11 – 12. Regarding claims 17 – 18, they are corresponding to claims 11 – 12, respectively, thus, they are interpreted and rejected for a same reason set forth for claims 11 – 12. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to QIAN YANG whose telephone number is (571)270-7239. The examiner can normally be reached on Monday-Thursday 8am-6pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached on 571-270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /QIAN YANG/ Primary Examiner, Art Unit 2677
Read full office action

Prosecution Timeline

Sep 11, 2025
Application Filed
Feb 09, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12598273
Camera Platform Incorporating Schedule and Stature
2y 5m to grant Granted Apr 07, 2026
Patent 12586560
ELECTRONIC APPARATUS, TERMINAL APPARATUS AND CONTROLLING METHOD THEREOF
2y 5m to grant Granted Mar 24, 2026
Patent 12586239
SMART IMAGE PROCESSING METHOD AND DEVICE USING SAME
2y 5m to grant Granted Mar 24, 2026
Patent 12579432
METHODS AND APPARATUS FOR AUTOMATED SPECIMEN CHARACTERIZATION USING DIAGNOSTIC ANALYSIS SYSTEM WITH CONTINUOUS PERFORMANCE BASED TRAINING
2y 5m to grant Granted Mar 17, 2026
Patent 12579686
Mixed Depth Object Detection
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+31.3%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 963 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month