Last updated: April 19, 2026

Application No. 18/478,759

UNIFIED AUDIO SUPPRESSION MODEL

Non-Final OA §102§103§112

Filed

Sep 29, 2023

Examiner

FLANDERS, ANDREW C

Art Unit

2655

Tech Center

2600 — Communications

Assignee

Amazon Technologies, Inc.

OA Round

2 (Non-Final)

Interview Optional

— +14.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 775 resolved cases, 2023–2026

Examiner Intelligence

FLANDERS, ANDREW C View full profile →

Grants 74% — above average

Career Allow Rate

574 granted / 775 resolved

+12.1% vs TC avg

Moderate +14% lift

Without

With

+14.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

9 currently pending

Career history

784

Total Applications

across all art units

Statute-Specific Performance

§101

10.3%

-29.7% vs TC avg

§103

38.7%

-1.3% vs TC avg

§102

31.6%

-8.4% vs TC avg

§112

8.3%

-31.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 775 resolved cases

Office Action

§102 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to the claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8 – 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 8 contains limitations that are directed to two statutory classes, e.g. a process and an apparatus.  Claim 8 recites “a method for enhancing audio of a communication application, the method comprising: memory that stores computer-executable instructions...”  The preamble and remainder of the claim is directed to a method, while the body includes system elements of the memory.  It is unclear how a method can contain a memory.  Please refer to MPEP 2173.05(p) regarding Product and Process in the same claim for further information.
It is likely applicant intends this claim to read similar to related independent claim 14, which provides the medium along with the instructions, but the method only occurs when executed.  For the purposes of expediting examination, the claim will be understood in this manner.
Claims 9 – 13 depend upon claim 8 and are rejected under the same grounds as they inherit the deficiencies of claim 8 by virtue of their dependence.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 2, 4 – 9, 11 – 15, and 17 – 20 is/are rejected under 35 U.S.C. 102(a)(1) and 35 U.S.C. 102(a)(2) as being anticipated by Nighman et al. (hereinafter Nighman, U.S. Patent Application Publication 2023/0115674).

Regarding Claim 1, Nighman discloses:
A system for enhancing teleconference application audio (e.g. teleconference system; abstract, entire doc; for improving/enhancing audio signals; [0011], [0022], [0033], [0073], [0089], [0113]), the system comprising:
memory that stores computer-executable instructions (e.g. software module residing in various memories/disks or other storage [0228]); and
a processor in communication with the memory (e.g. storage medium coupled to a processor; [0228]), wherein the computer-executable instructions, when executed by the processor, cause the processor (e.g. software module executed by a processor; [0228]) to:
obtain a voice sample of a user (e.g. voice biometrics engine 330 can perform enrollment and verification phases to record and extract a number of features from a voice print [0118]);
map the voice sample to an identifier associated with the user (e.g. voice biometrics engine 330 assigns unique acoustic speech signatures to each speaker; [0118]);
receive an audio mixture detected by an audio sensor (e.g. receiving audio inputs; [0074], [0098]; voice activity detector 320 identifies speech sources in the input signal; see also that the microphone(s) 140 detect sounds in the environment, convert the sounds to digital audio signals, and stream the audio signals to the processing core; [0084]);
receive a selection via a teleconference application that identifies a portion of the audio mixture to suppress (e.g. user input; [0080]; user interface input data; [0086],[0087] interaction with a GUI to input information; [0158], [0159]; initialization data 372 provided by manual input [0158], note 372 includes participant info; Fig. 3G; participant info includes information such as main speaker, authorized speaker etc; [0161]note differentiating a primary person speaking from a noise source such as other talkers; [0089]; note that a noise source can be any noise source such as speech from another talker; [0090]; noise suppressor 345 implements custom noise suppression depending on the source; [0130]; talker-based personalization [0113]; and apply audio signal processing to the source separated data streams 322, 324, 326, where some or all of the signal processing operations can be customized, optimized, and/or personalized based on information relating to the source [0127]; see further EQ in [0129]; In essence, an administrator enters participant info through a user interface, the participant info corresponds to authorized speakers and the system will then suppress other sources of speakers);
modify a representation of the audio mixture to include a flag that corresponds to the selection (e.g. insert flags in the point speech source stream 322, where each flag indicates a biometrically identified talker associated with a source in the enhanced point speech source stream; [0119]); and
apply the modified representation of the audio mixture as an input into a machine learning model (e.g. identify and process flags in the stream corresponding to the talker that uttered the speech content [0121]; note that any of the engines in speech source processor can apply machine learning or AI algorithms to tune or adapt; [0124]), wherein application of the modified representation of the audio mixture as the input to the machine learning model causes the machine learning model (e.g. implementing machine learning to improve recognition of unique speakers; [0124]; note that any of the blocks, including noise suppressor 345 can implement machine learning or AI algorithms; [0132]) to one of:
suppress a background noise of the audio mixture (e.g. train AI models (e.g., neural network-based models) to identify and suppress noises... suppress noise specific to the deployed environment; [0134]; see also suppression of background noises; [0177]-[0179]) or
suppress all noise of the audio mixture except a voice identified by the user identifier (e.g. custom automatic gain control on a speaker-by-speaker basis; [0128]; see further the process of Fig. 7, and note output whether to include desired sources, such as one or more speech sources and to exclude or attenuate other sources such as noise; [0144]; Thus, in some arrangements one speech source is included/amplified and the others excluded or attenuated).

Regarding Claim 2, in addition to the elements stated above regarding claim 1, Nighman further discloses:
wherein the modified representation of the audio mixture includes the user identifier, the audio mixture, and the flag (e.g. voice print/acoustic speech signature for talkers; [0118]; and inserted flags in the (audio) stream; [0118]-[0121])

Regarding Claim 4, in addition to the elements stated above regarding claim 1, Nighman further discloses:
wherein suppressing the background noise of the audio mixture comprises preserving a second voice of a second user from being suppressed (e.g. train AI models (e.g., neural network-based models) to identify and suppress noises... suppress noise specific to the deployed environment; [0134]; see also suppression of background noises; [0177]-[0179]; further, as noted above in some arrangements one or more speech source is included/amplified and the others excluded or attenuated).

Regarding Claim 5, in addition to the elements stated above regarding claim 1, Nighman further discloses:
wherein the machine learning model is trained on combined training data that comprises a first training data item (e.g. machine learning adaptively trains and tunes the algorithm to the particular deployed environment in response to training data; [0103], [0115]),
wherein the first training data item includes a combination of a first type of clean speech data and a first type of background noise data (e.g. training data includes publicly available corpuses including a data set of human labeled [“clean”] sound events and curated noise samples [0103], [0115]), and
wherein the first type of clean speech data is identified as a target output (e.g. any of the blocks can train on data on the fly or at a later time; [0115]; not enrolment and verification phrases to record and extract for voice prints [0118]; These are used to identify user speech to output, or “target output”).

Regarding Claim 6, in addition to the elements stated above regarding claim 1, Nighman further discloses:
wherein the machine learning model is trained on the voice sample of the user during an enrollment phase (e.g. any of the blocks can train on data on the fly or at a later time; [0115]; not enrolment and verification phrases to record and extract for voice prints [0118]; These are used to identify user speech to output, or “target output”).

Regarding Claim 7, in addition to the elements stated above regarding claim 1, Nighman further discloses:
receive, during a teleconference session in which the selection is received (e.g. note that audio processing engine automatically adjusts configuration as the nature of audio sources change; [0215], in other words, as different users are speaking, different noises appear, etc, the configuration or exclusion/attenuation of various can sources change) , a second selection via the teleconference application that identifies a second portion of the audio mixture to suppress that is different than the portion of the audio mixture; and cause the second portion of the audio mixture to be suppressed  (e.g. custom automatic gain control on a speaker-by-speaker basis; [0128]; see further the process of Fig. 7, and note output whether to include desired sources, such as one or more speech sources and to exclude or attenuate other sources such as noise; [0144]; see also the portions referred to above in the rejection of claim 1 related to the selection; In essence, an administrator enters participant info through a user interface, the participant info corresponds to authorized speakers and the system will then suppress other sources of speakers for the adjusted configuration of the changing audio sources).

Regarding Claim 8, Nighman discloses:
A method for enhancing audio of a communication application (e.g. teleconference system; abstract, entire doc; for improving/enhancing audio signals; [0011], [0022], [0033], [0073], [0089], [0113]), the method comprising:
memory that stores computer-executable instructions (e.g. software module residing in various memories/disks or other storage [0228]); 
obtaining a voice sample of a user (e.g. voice biometrics engine 330 can perform enrollment and verification phases to record and extract a number of features from a voice print [0118]);
mapping the voice sample to an identifier associated with the user (e.g. voice biometrics engine 330 assigns unique acoustic speech signatures to each speaker; [0118]);
receiving an audio mixture detected by an audio sensor (e.g. receiving audio inputs; [0074], [0098]; voice activity detector 320 identifies speech sources in the input signal; see also that the microphone(s) 140 detect sounds in the environment, convert the sounds to digital audio signals, and stream the audio signals to the processing core; [0084]);
receive a selection via a teleconference application that identifies a portion of the audio mixture to enhance (e.g. user input; [0080]; user interface input data; [0086],[0087] interaction with a GUI to input information; [0158], [0159]; initialization data 372 provided by manual input [0158], note 372 includes participant info; Fig. 3G; participant info includes information such as main speaker, authorized speaker etc; [0161]note differentiating a primary person speaking from a noise source such as other talkers; [0089]; note that a noise source can be any noise source such as speech from another talker; [0090]; noise suppressor 345 implements custom noise suppression depending on the source; [0130]; talker-based personalization [0113]; and apply audio signal processing to the source separated data streams 322, 324, 326, where some or all of the signal processing operations can be customized, optimized, and/or personalized based on information relating to the source [0127]; see further EQ in [0129]; In essence, an administrator enters participant info through a user interface, the participant info corresponds to authorized speakers and the system will then suppress other sources of speakers);
modifying a representation of the audio mixture to include a flag that corresponds to the selection (e.g. insert flags in the point speech source stream 322, where each flag indicates a biometrically identified talker associated with a source in the enhanced point speech source stream; [0119]); and
apply the modified representation of the audio mixture as an input into a machine learning model (e.g. identify and process flags in the stream corresponding to the talker that uttered the speech content [0121]; note that any of the engines in speech source processor can apply machine learning or AI algorithms to tune or adapt; [0124]), wherein application of the modified representation of the audio mixture as the input to the machine learning model causes the machine learning model (e.g. implementing machine learning to improve recognition of unique speakers; [0124]; note that any of the blocks, including noise suppressor 345 can implement machine learning or AI algorithms; [0132]) to enhance a portion of the audio mixture corresponding to the selection (e.g. custom automatic gain control on a speaker-by-speaker basis; [0128]; see further the process of Fig. 7, and note output whether to include desired sources, such as one or more speech sources and to exclude or attenuate other sources such as noise; [0144]; Thus, in some arrangements one speech source is included/amplified and the others excluded or attenuated; in the alternative, consider training AI models (e.g., neural network-based models) to identify and suppress noises... suppress noise specific to the deployed environment; [0134]; see also suppression of background noises; [0177]-[0179])).

Regarding Claim 9, claim 9 is directed to the method claim that corresponds to the system claimed in claim 2 and is rejected under the same grounds.

Regarding Claim 11, in addition to the elements stated above regarding claim 8, Nighman further discloses:
wherein the portion of the audio mixture includes a background noise of the audio mixture or all noise audio mixture except a voice identified by the user identifier (e.g. Each of the input signals include a mixed source combined signal component S.sub.comb, an echo component E, and a noise component N; [0196]; note further custom automatic gain control on a speaker-by-speaker basis; [0128]; see further the process of Fig. 7, and note output whether to include desired sources, such as one or more speech sources and to exclude or attenuate other sources such as noise; [0144]; Thus, in some arrangements one speech source is included/amplified and the others excluded or attenuated).

Regarding Claim 12, claim 12 is directed to the method claim that corresponds to the system claimed in claim 4 and is rejected under the same grounds.

Regarding Claim 13, claim 13 is directed to the method claim that corresponds to the system claimed in claim 6 and is rejected under the same grounds.

Regarding Claim 14, claim 14 is directed to the computer-readable medium claim that corresponds to the system claimed in claim 6 and method claim in claim 1 and is rejected under the same grounds.

Regarding Claim 15, claim 15 is directed to the computer-readable medium claim that corresponds to the system claimed in claim 2 and is rejected under the same grounds.

Regarding Claim 17, claim 17 is directed to the computer-readable medium claim that corresponds to the system claimed in claim 6 and is rejected under the same grounds.

Regarding Claim 18, claim 18 is directed to the computer-readable medium claim that corresponds to the method claimed in claim 11 and is rejected under the same grounds.

Regarding Claim 19, in addition to the elements stated above regarding claim 14, Nighman further discloses:
wherein the computer-executable instructions, when executed, further cause the computer system to suppress a second voice of a second user (e.g. custom automatic gain control on a speaker-by-speaker basis; [0128]; see further the process of Fig. 7, and note output whether to include desired sources, such as one or more speech sources and to exclude or attenuate other sources such as noise; [0144]; Thus, in some arrangements one speech source is included/amplified and the others excluded or attenuated; in the alternative, consider training AI models (e.g., neural network-based models) to identify and suppress noises... suppress noise specific to the deployed environment; [0134]; see also suppression of background noises; [0177]-[0179])).

Regarding Claim 20, claim 20 is directed to the computer-readable medium claim that corresponds to the system claimed in claim 4 and is rejected under the same grounds. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 3, 10 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nighman et al. (hereinafter Nighman, U.S. Patent Application Publication 2023/0115674).

Regarding Claim 3, in addition to the elements stated above regarding claim 1, Nighman further discloses:
wherein the flag is indicating whether the selection corresponds to background noise suppression or all noise suppression except the voice identified by the user identifier (e.g. flags corresponding to the speaker; [0118]-[0121]; and custom automatic gain control on a speaker-by-speaker basis; [0128]; see further the process of Fig. 7, and note output whether to include desired sources, such as one or more speech sources and to exclude or attenuate other sources such as noise; [0144]; Thus, in some arrangements one speech source is included/amplified and the others excluded or attenuated).
Nighman fails to explicitly disclose that the flag is a binary bit.
In a related field of endeavor (e.g. enhancing of audio during including using noise suppression in a networked conference environment), Liu details using a neural network to process features of audio and derive and then output a binary flag to provide an indicator (see Figs. 2 and 3 and [0047]).
Modifying the flags disclosed by Nighman to operate as binary flags as disclosed by Liu further makes obvious:
the flag is a binary bit (e.g. Nighman’s flag, now configured to be a binary flag as disclosed by Liu’s Figs. 2 and 3 and [0047]).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to apply the teachings of Liu to the system of Nighman.  Doing so would have provided users of Nighman’s system with the techniques of the sound enhancement system 200 of Liu which provides the advantage of performing high-fidelity audio processing and improves sound quality for both music and voice signals, see Liu [0035].  Further, given the substantial overlap of Nighman and Liu, e.g. they’re both directed to audio conferencing applications, improving audio quality, using neural networks/machine learning, and suppressing noise, integration of the various teachings from one to the other would have been seen as predictable to one of ordinary skill in the art.

Regarding Claim 10, claim 10 is directed to the method claim that corresponds to the system claimed in claim 3 and is rejected under the same grounds.

Regarding Claim 16, claim 16 is directed to the computer-readable medium claim that corresponds to the system claimed in claim 3 and is rejected under the same grounds.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew C Flanders whose telephone number is (571)272-7516. The examiner can normally be reached M-F 8:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW C FLANDERS/             Supervisory Patent Examiner, Art Unit 2655

Read full office action

Prosecution Timeline

Sep 29, 2023

Application Filed

Aug 18, 2025

Non-Final Rejection — §102, §103, §112

Nov 11, 2025

Response Filed

Feb 25, 2026

Non-Final Rejection — §102, §103, §112

Mar 31, 2026

Examiner Interview Summary

Mar 31, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/081,580

Patent 12562160

ARBITRATION BETWEEN AUTOMATED ASSISTANT DEVICES BASED ON INTERACTION CUES

2y 5m to grant Granted Feb 24, 2026

18/065,758

Patent 12547835

AUTOMATIC EXTRACTION OF SEMANTICALLY SIMILAR QUESTION TOPICS

2y 5m to grant Granted Feb 10, 2026

18/062,857

Patent 12512089

TESTING CASCADED DEEP LEARNING PIPELINES COMPRISING A SPEECH-TO-TEXT MODEL AND A TEXT INTENT CLASSIFIER

2y 5m to grant Granted Dec 30, 2025

18/384,764

Patent 12394416

DETECTING NEAR MATCHES TO A HOTWORD OR PHRASE

2y 5m to grant Granted Aug 19, 2025

16/267,368

Patent 11328007

GENERATING A DOMAIN-SPECIFIC PHRASAL DICTIONARY

2y 5m to grant Granted May 10, 2022

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

2-3

Expected OA Rounds

74%

Grant Probability

88%

With Interview (+14.0%)

3y 3m

Median Time to Grant

Moderate

PTA Risk

Based on 775 resolved cases by this examiner. Grant probability derived from career allow rate.