Last updated: April 19, 2026
Application No. 18/189,764
Systems and Methods for Audio Preparation and Delivery

Final Rejection §103
Filed
Mar 24, 2023
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Super Hi-Fi, LLC
OA Round
4 (Final)
Interview Optional

— +13.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1052 resolved cases, 2023–2026
Examiner Intelligence

PULLIAS, JESSE SCOTT View full profile →
Grants 83% — above average
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units
Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 02/25/26 regarding application 18/189,764, in which claims 1, 14, and 21 were amended. Claims 1, 4-10, 14, and 16-26 are pending and have been considered.

Response to Arguments
Applicant’s arguments on pages 14-16 regarding the 35 U.S.C. 103 rejections based on Nighman, Zhang, and Jasinski are moot in view of the new grounds for rejection, based in part on the newly discovered reference to Mitcheltree et al. (“SerumRNN: Step by Step Audio VST Effect Programming”. arXiv:2104.03876v1 [cs.SD] 8 Apr 2021), which discloses analyzing audio features with machine learning to determine a sequence of effects to apply and their parameters. The new grounds for rejection based in part on the newly discovered reference to Mitcheltree are necessitated by Applicant’s claim amendments.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 5, 8, 9, 14, 16, 17, 20-23, 25, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Nighman et al. (US 20230115674) in view of Zhang et al. (US 20170330579), in further view of Mitcheltree et al. (“SerumRNN: Step by Step Audio VST Effect Programming”. arXiv:2104.03876v1 [cs.SD] 8 Apr 2021).

Consider claim 1, Nighman discloses an audio preparation and delivery system (a source processing system for delivering enhanced audio, [0074], [0138]), comprising: 
a controller having at least one processor and a memory, wherein the at least one processor executes program instructions stored in the memory so as to carry out operations (one or more processors 112 and memory 123, [0075], which execute software, [0227]), the operations comprising: 
receiving source audio (microphones receive speech from four people, [0091], Fig. 2B, having speech segments such as in Fig. 5, [0181]);
analyzing the source audio (classifying speech signals by source using machine learning, [0114]); 
dynamically arranging, based on the analysis and using a trained machine learning model, a processing chain, wherein the processing chain comprises a plurality of audio processing modules (applying audio signal processing, which includes a chain of AGC, EQ, Noise Suppressor, and DR Compressor, dynamically configured using a trained machine learning algorithm for custom processing profiles based on the source classification, to the source separated data streams, [0127-0132], Fig 3E), 
adjusting, by way of the processing chain, at least a portion of the source audio (applying audio signal processing, which includes the custom processing profile for each source, to the source separated data streams, [0127-0132]); and 
providing output audio based on the adjusted portion of source audio (output audio streams are deployed to output devices and loudspeakers, [0142]-[0145], Fig. 3F).
Nighman does not specifically mention analyzing the source audio based on a predetermined audio quality standard; and wherein the output audio meets or exceeds the predetermined quality standard.
Zhang discloses analyzing a source audio based on a predetermined audio quality standard (analyzing a processing quality of the audio data, step S103, Figure 1, [0037-0038]); and wherein the output audio meets or exceeds the predetermined quality standard (optimizing the audio processing if the quality does not reach a preset quality standard, Step S104, Figure 1, [0039-0040], for example, output signal-to-noise ratio, [0053]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman by analyzing the source audio based on a predetermined audio quality standard; and wherein the output audio meets or exceeds the predetermined quality standard in order to improve audio quality of voice calls and audio chats, as suggested by Zhang ([0003]). Doing so would have led to predictable results of improving end-user satisfaction with internet communications, as suggested by Zhang ([0003]). The references cited are analogous art in the same field of audio processing. 
Nighman and Zhang do not specifically mention the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis, and wherein dynamically arranging the processing chain comprises determining an order of the plurality of audio processing modules using the trained machine learning model.
Mitcheltree discloses the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis, and dynamically arranging the processing chain comprises determining an order of the plurality of audio processing modules using the trained machine learning model (effect selection model is trained on unique sequences consisted of one to five steps of applied effects, Section 3.3 “Training”, page 7; Effect Selection model dynamically determines which effect should be applied next using Mel spectrograms, MFCCs, and one-hot vectors representing the sequence of previously applied effects, Section 3.2 “Effect Selection Model”, pages 6-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman and Zhang such that the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis, and dynamically arranging the processing chain comprises determining an order of the plurality of audio processing modules using the trained machine learning model in order to assist sound designers in the music industry with the difficult task of sound design, as suggested by Micheltree (Section 1, page 1). Doing so would have predictably assisted sound designers with the very steep learning curve often requiring years of experience and limited education tools, as suggested by Micheltree (Section 1, page 1). The references cited are analogous art in the same field of audio processing.


Consider claim 14, Nighman discloses a method of adjusting source audio (EQ adjusts the volume of different frequency bands for each audio source, [0129]), the method comprising 
receiving source audio (microphones receive speech from four people, [0091], Fig. 2B, having speech segments such as in Fig. 5, [0181]);
analyzing the source audio (classifying speech signals by source using machine learning, [0114]); 
a plurality of audio processing modules to at least partially define a processing chain (applying audio signal processing, which includes a chain of AGC, EQ, Noise Suppressor, and DR Compressor, dynamically configured using a trained machine learning algorithm for custom processing profiles based on the source classification, to the source separated data streams, [0127-0132], Fig 3E), 
adjusting, by way of the processing chain, at least a portion of the source audio (applying audio signal processing, which includes the custom processing profile for each source, to the source separated data streams, [0127-0132]); and 
providing output audio based on the adjusted portion of source audio (output audio streams are deployed to output devices and loudspeakers, [0142]-[0145], Fig. 3F).
Nighman does not specifically mention analyzing the source audio based on a predetermined audio quality standard; and wherein the output audio meets or exceeds the predetermined quality standard.
Zhang discloses analyzing a source audio based on a predetermined audio quality standard (analyzing a processing quality of the audio data, step S103, Figure 1, [0037-0038]); and wherein the output audio meets or exceeds the predetermined quality standard (optimizing the audio processing if the quality does not reach a preset quality standard, Step S104, Figure 1, [0039-0040], for example, output signal-to-noise ratio, [0053]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman by analyzing the source audio based on a predetermined audio quality standard; and wherein the output audio meets or exceeds the predetermined quality standard for reasons similar to those for claim 1. 
Nighman and Zhang do not specifically mention dynamically ordering, based on the analysis and using a trained machine learning model, a plurality of audio processing modules, wherein the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis.
Mitcheltree discloses the dynamically ordering, based on the analysis and using a trained machine learning model, a plurality of audio processing modules, wherein the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis (effect selection model is trained on unique sequences consisted of one to five steps of applied effects, Section 3.3 “Training”, page 7; Effect Selection model dynamically determines which effect should be applied next using Mel spectrograms, MFCCs, and one-hot vectors representing the sequence of previously applied effects, Section 3.2 “Effect Selection Model”, pages 6-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman and Zhang such by dynamically ordering, based on the analysis and using a trained machine learning model, a plurality of audio processing modules, wherein the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis for reasons similar to those for claim 1. 

Consider claim 21, Nighman discloses an audio preparation and delivery system (a source processing system for delivering enhanced audio, [0074], [0138]), comprising: 
a controller having at least one processor and a memory, wherein the at least one processor executes program instructions stored in the memory so as to carry out operations (one or more processors 112 and memory 123, [0075], which execute software, [0227]), the operations comprising: 
receiving source audio (microphones receive speech from four people, [0091], Fig. 2B, having speech segments such as in Fig. 5, [0181]);
analyzing the source audio (classifying speech signals by source using machine learning, [0114]); 
dynamically arranging, based on the analysis and using a trained machine learning model, a processing chain, wherein the processing chain comprises a plurality of audio processing modules (applying audio signal processing, which includes a chain of AGC, EQ, Noise Suppressor, and DR Compressor, dynamically configured using a trained machine learning algorithm for custom processing profiles based on the source classification, to the source separated data streams, [0127-0132], Fig 3E), 
adjusting, by way of the processing chain, at least a portion of the source audio (applying audio signal processing, which includes the custom processing profile for each source, to the source separated data streams, [0127-0132]); and 
providing output audio based on the adjusted portion of source audio (output audio streams are deployed to output devices and loudspeakers, [0142]-[0145], Fig. 3F).
Nighman does not specifically mention analyzing the source audio based on a predetermined audio quality standard; and wherein the output audio meets or exceeds the predetermined quality standard.
Zhang discloses analyzing a source audio based on a predetermined audio quality standard (analyzing a processing quality of the audio data, step S103, Figure 1, [0037-0038]); and wherein the output audio meets or exceeds the predetermined quality standard (optimizing the audio processing if the quality does not reach a preset quality standard, Step S104, Figure 1, [0039-0040], for example, output signal-to-noise ratio, [0053]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman by analyzing the source audio based on a predetermined audio quality standard; and wherein the output audio meets or exceeds the predetermined quality standard in order to improve audio quality of voice calls and audio chats, as suggested by Zhang ([0003]). Doing so would have led to predictable results of improving end-user satisfaction with internet communications, as suggested by Zhang ([0003]). The references cited are analogous art in the same field of audio processing. 
Nighman and Zhang do not specifically mention wherein the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis, and wherein dynamically arranging the processing chain comprises selecting one or more audio processing modules for inclusion in the plurality of audio processing modules using the trained machine learning model.
Mitcheltree discloses a trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis, and wherein dynamically arranging the processing chain comprises selecting one or more audio processing modules for inclusion in the plurality of audio processing modules using the trained machine learning model (effect selection model is trained on unique sequences consisted of one to five steps of applied effects, Section 3.3 “Training”, page 7; Effect Selection model dynamically determines which effect should be applied next using Mel spectrograms, MFCCs, and one-hot vectors representing the sequence of previously applied effects, Section 3.2 “Effect Selection Model”, pages 6-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman and Zhang such that the trained machine learning model is trained to determine an order of the plurality of audio processing modules based on the analysis, and wherein dynamically arranging the processing chain comprises selecting one or more audio processing modules for inclusion in the plurality of audio processing modules using the trained machine learning model for reasons similar to those for claim 1. 

Consider claim 4, Nighman discloses wherein at least a portion of the audio processing modules are configured to apply a trained machine learning model to adjust the portion of source audio (classifying speech signals by source using machine learning, [0114], recognizing unique speakers using trained biometric algorithm, [0124]-[0125], applying audio signal processing, which includes custom equalization profile, to the source separated data streams, [0127-0129]).

Consider claim 5, Nighman discloses the plurality of audio processing modules further comprises at least one of: a noise reduction module (filtering or reducing noise content, [0090]); a timbre management module (EQ 344 applies settings for specific timber of speakers, [0129]); a de-essing module; a plosive reduction module; a voice profiling module (custom voice profiles for EQ, [0130]); a dynamic compression module (talker-based compression, [0113]); a silence trimming module; an adaptive limiting module; a speaker extraction module (extracting the sound sources, [0097], by separating speakers, [0101]); a selective excitation module; a channel selection module (mixer/control engine, Fig 3F, [0142-0143]); a breath reduction module; an artifact reduction module; a gain optimization module (automatic gain controller, [0127]); a spectral reconstruction module; a spectral equalizer module (EQ 344 adjusts the volume of different frequency bands, i.e. the frequency spectrum, [0129]); a spatial audio module (spatial filtering, [0084]); an upsampling module; a reverb module; a de-reverb module (reducing reverberation, [0098]); a de-clipping module; a de-muxing module; and a batch processing module (noting that the claim language requires “at least one of”).


Consider claim 8, Nighman discloses one or more pre-processing modules, wherein the pre-processing modules comprise at least one of: a file format conversion module; a text-to-speech module; a speech-to-text module (speech-to-text, [0123]); an annotation module (part-of-speech tagging, [0123]); a mono-to-stereo conversion module; a stereo-to-mono conversion module; a multi-track-to-stereo conversion module; a source audio file generation module; a voice analysis/profiling module; a noise profiling module (recognizing noises such as diffuse fan or air conditioner noise, [0105]); and a diarization module (noting the claim language requires “at least one of).

Consider claim 9, Nighman discloses the audio preparation and delivery system comprises at least one of: a private cloud computing server system or a public cloud computing server (cloud services and storage connected to the processing core, [0076], [0077]), wherein the private cloud computing server system and the public cloud computing server comprise distributed cloud data storage and distributed cloud computing capacity (cloud services, i.e. computing capacity, and storage connected to, i.e. distributed from, the processing core, [0076], [0077]).

Consider claim 16, Nighman discloses wherein at least a portion of the audio processing modules are configured to apply a trained machine learning model to adjust the portion of source audio (classifying speech signals by source using machine learning, [0114], recognizing unique speakers using trained biometric algorithm, [0124]-[0125], applying audio signal processing, which includes custom equalization profile, to the source separated data streams, [0127-0129]).

Consider claim 17, Nighman discloses the plurality of audio processing modules further comprises at least one of: a noise reduction module (filtering or reducing noise content, [0090]); a timbre management module (EQ 344 applies settings for specific timber of speakers, [0129]); a de-essing module; a plosive reduction module; a voice profiling module (custom voice profiles for EQ, [0130]); a dynamic compression module (talker-based compression, [0113]); a silence trimming module; an adaptive limiting module; a speaker extraction module (extracting the sound sources, [0097], by separating speakers, [0101]); a selective excitation module; a channel selection module (mixer/control engine, Fig 3F, [0142-0143]); a breath reduction module; an artifact reduction module; a gain optimization module (automatic gain controller, [0127]); a spectral reconstruction module; a spectral equalizer module (EQ 344 adjusts the volume of different frequency bands, i.e. the frequency spectrum, [0129]); a spatial audio module (spatial filtering, [0084]); an upsampling module; a reverb module; a de-reverb module (reducing reverberation, [0098]); a de-clipping module; a de-muxing module; and a batch processing module (noting that the claim language requires “at least one of”).

Consider claim 20, Nighman discloses one or more pre-processing modules, wherein the pre-processing modules comprise at least one of: a file format conversion module; a text-to-speech module; a speech-to-text module (speech-to-text, [0123]); an annotation module (part-of-speech tagging, [0123]); a mono-to-stereo conversion module; a stereo-to-mono conversion module; a multi-track-to-stereo conversion module; a source audio file generation module; a voice analysis/profiling module; a noise profiling module (recognizing noises such as diffuse fan or air conditioner noise, [0105]); and a diarization module (noting the claim language requires “at least one of).

Consider claim 22, Nighman discloses the plurality of audio processing modules further comprises at least one of: a noise reduction module (filtering or reducing noise content, [0090]); a timbre management module (EQ 344 applies settings for specific timber of speakers, [0129]); a de-essing module; a plosive reduction module; a voice profiling module (custom voice profiles for EQ, [0130]); a dynamic compression module (talker-based compression, [0113]); a silence trimming module; an adaptive limiting module; a speaker extraction module (extracting the sound sources, [0097], by separating speakers, [0101]); a selective excitation module; a channel selection module (mixer/control engine, Fig 3F, [0142-0143]); a breath reduction module; an artifact reduction module; a gain optimization module (automatic gain controller, [0127]); a spectral reconstruction module; a spectral equalizer module (EQ 344 adjusts the volume of different frequency bands, i.e. the frequency spectrum, [0129]); a spatial audio module (spatial filtering, [0084]); an upsampling module; a reverb module; a de-reverb module (reducing reverberation, [0098]); a de-clipping module; a de-muxing module; and a batch processing module (noting that the claim language requires “at least one of”).

Consider claim 23, Nighman discloses wherein at least a one of the plurality of audio processing modules is configured to apply a trained machine learning model to adjust the portion of source audio (classifying speech signals by source using machine learning, [0114], recognizing unique speakers using trained biometric algorithm, [0124]-[0125], applying audio signal processing, which includes custom equalization profile, to the source separated data streams, [0127-0129]).

Consider claim 25, Nighman and Zhang do not, but Mitcheltree discloses dynamically arranging the processing chain comprises selecting one or more audio processing modules for inclusion in the plurality of audio processing modules using the trained machine learning model (Effect Selection model dynamically determines, i.e. selects for inclusion, which effect should be applied next using Mel spectrograms, MFCCs, and one-hot vectors representing the sequence of previously applied effects, Section 3.2 “Effect Selection Model”, pages 6-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman and Zhang such that dynamically arranging the processing chain comprises selecting one or more audio processing modules for inclusion in the plurality of audio processing modules using the trained machine learning model for reasons similar to those for claim 1.

Consider claim 26, Nighman and Zhang do not, but Mitcheltree discloses dynamically selecting, based on the analysis and using a trained machine learning model, an audio processing module for inclusion in the plurality of audio processing modules (Effect Selection model dynamically determines, i.e. selects for inclusion, which effect should be applied next using Mel spectrograms, MFCCs, and one-hot vectors representing the sequence of previously applied effects, Section 3.2 “Effect Selection Model”, pages 6-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman and Zhang by dynamically selecting, based on the analysis and using a trained machine learning model, an audio processing module for inclusion in the plurality of audio processing modules for reasons similar to those for claim 1.

Claims 6, 18, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Nighman et al. (US 20230115674) in view of Zhang et al. (US 20170330579), in further view of Mitcheltree et al. (“SerumRNN: Step by Step Audio VST Effect Programming”. arXiv:2104.03876v1 [cs.SD] 8 Apr 2021), in further view of Horton et al. (US 20200411013).

	
Consider claim 6, Nighman discloses the plurality of audio processing modules comprises: a diarization module, wherein the diarization module is configured to: determine, based on the vocal portion, a plurality of distinct speakers (first talker speech and second talker speech S1 and S4, Fig 5, [0177]); annotate portions of the vocal portion that represent the respective distinct speakers (see audio stream labels, Fig 5, [0177]); provide diary metadata, wherein the diary metadata comprises information indicative of the distinct speakers of the annotated portions of the vocal portion (metadata indicated source speakers, [0175], Fig. 5).
Nighman, Zhang, and Mitcheltree do not specifically mention provide a speaker-specific audio file for each distinct speaker.
Horton discloses providing a speaker-specific audio file for each distinct speaker (generate audio files specific to individual speakers in the audio data, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, and Mitcheltree by providing a speaker-specific audio file for each distinct speaker in order to improve caller identification as suggested by Horton ([0002]), predictably improving monitoring of calls in secure facilities, as suggested by Horton ([0002]). The references cited are analogous art in the same field of speaker identification.

Consider claim 18, Nighman discloses the plurality of audio processing modules comprises: a diarization module, wherein the diarization module is configured to: determine, based on the vocal portion, a plurality of distinct speakers (first talker speech and second talker speech S1 and S4, Fig 5, [0177]); annotate portions of the vocal portion that represent the respective distinct speakers (see audio stream labels, Fig 5, [0177]); provide diary metadata, wherein the diary metadata comprises information indicative of the distinct speakers of the annotated portions of the vocal portion (metadata indicated source speakers, [0175], Fig. 5).
Nighman, Zhang, and Mitcheltree do not specifically mention provide a speaker-specific audio file for each distinct speaker.
Horton discloses providing a speaker-specific audio file for each distinct speaker (generate audio files specific to individual speakers in the audio data, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, and Mitcheltree by providing a speaker-specific audio file for each distinct speaker for reasons similar to those for claim 6.


Consider claim 24, Nighman discloses the plurality of audio processing modules comprises: a diarization module, wherein the diarization module is configured to: determine, based on the vocal portion, a plurality of distinct speakers (first talker speech and second talker speech S1 and S4, Fig 5, [0177]); annotate portions of the vocal portion that represent the respective distinct speakers (see audio stream labels, Fig 5, [0177]); provide diary metadata, wherein the diary metadata comprises information indicative of the distinct speakers of the annotated portions of the vocal portion (metadata indicated source speakers, [0175], Fig. 5).
Nighman, Zhang, and Mitcheltree do not specifically mention provide a speaker-specific audio file for each distinct speaker.
Horton discloses providing a speaker-specific audio file for each distinct speaker (generate audio files specific to individual speakers in the audio data, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, and Mitcheltree by providing a speaker-specific audio file for each distinct speaker for reasons similar to those for claim 6.


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Nighman et al. (US 20230115674) in view of Zhang et al. (US 20170330579), in further view of Mitcheltree et al. (“SerumRNN: Step by Step Audio VST Effect Programming”. arXiv:2104.03876v1 [cs.SD] 8 Apr 2021), in further view of Khoury et al. (US 20210326421).

Consider claim 10, Nighman, Zhang, and Mitcheltree do not, but Khoury discloses the trained machine learning model comprises at least one of: a convolutional neural network (CNN), a long short-term memory (LSTM) algorithm, or a WaveNet (trained CNN, [0118], [0119], noting the claim language requires “at least one of”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, and Mitcheltree such that the trained machine learning model comprises at least one of: a convolutional neural network (CNN), a long short-term memory (LSTM) algorithm, or a WaveNet in order to increase reliability of speaker identification, as suggested by Khoury ([0008]), predictably improving accuracy and security, as suggested by Khoury ([0008]). The references cited are analogous art in the same field of speaker identification.

Claims 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nighman et al. (US 20230115674) in view of Zhang et al. (US 20170330579), in further view of Mitcheltree et al. (“SerumRNN: Step by Step Audio VST Effect Programming”. arXiv:2104.03876v1 [cs.SD] 8 Apr 2021), in further view of Horton et al. (US 20200411013), in further view of Aggarwal et al. (US 11895371).

Consider claim 7, Nighman discloses adjusting the at least a portion of the source audio (applying audio signal processing, which includes the custom equalization profile for each source, to the source separated data streams, [0127-0129]).
Nighman, Zhang, and Mitcheltree do not specifically mention speaker-specific audio files.
Horton discloses speaker-specific audio files (generate audio files specific to individual speakers in the audio data, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, and Mitcheltree by including speaker-specific audio files in  order to improve caller identification as suggested by Horton ([0002]), predictably improving monitoring of calls in secure facilities, as suggested by Horton ([0002]). 
Nighman, Zhang, Mitcheltree, and Horton do not specifically mention smoothing a perimeter portion of each speaker-specific audio file; and adjusting each speaker-specific audio file separately, wherein the output audio comprises a reassembled version of each adjusted speaker-specific audio file.
Aggarwal discloses smoothing a perimeter portion of each file (smoothing by modifying start and end times of the clips, Col 19 lines 49-63); and adjusting each file separately, wherein the output audio comprises a reassembled version of each adjusted file (smoothing by modifying start and end times of the clips, the output including audio clips such as “You can’t handle the truth!”, Col 19 lines 49-63, output as a media segment, Col 29 lines 55-65). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, Mitcheltree, and Horton by smoothing a perimeter portion, as in Aggarwal, of each speaker-specific audio file of Horton; and adjusting, as in Aggarwal, each speaker-specific audio file of Horton separately, wherein the combination results in output audio comprising a reassembled version of each adjusted speaker-specific audio file in order to handle the vast assortment of media content provided by service providers, as suggested by Aggarwal (Col 1 lines 6-24), predictably reducing required resources, time, costs, and computing resources associated with generating media segments, as suggested by Aggarwal (Col 1 lines 6-24). The references cited are analogous art in the same field of speech enhancement.

Consider claim 19, Nighman discloses adjusting the at least a portion of the source audio (applying audio signal processing, which includes the custom equalization profile for each source, to the source separated data streams, [0127-0129]).
Nighman, Zhang, and Mitcheltree do not specifically mention speaker-specific audio files.
Horton discloses speaker-specific audio files (generate audio files specific to individual speakers in the audio data, [0045]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, and Mitcheltree by including speaker-specific audio files for reasons similar to those for claim 7.
Nighman, Zhang, Mitcheltree, and Horton do not specifically mention smoothing a perimeter portion of each speaker-specific audio file; and adjusting each speaker-specific audio file separately, wherein the output audio comprises a reassembled version of each adjusted speaker-specific audio file.
Aggarwal discloses smoothing a perimeter portion of each file (smoothing by modifying start and end times of the clips, Col 19 lines 49-63); and adjusting each file separately, wherein the output audio comprises a reassembled version of each adjusted file (smoothing by modifying start and end times of the clips, the output including audio clips such as “You can’t handle the truth!”, Col 19 lines 49-63, output as a media segment, Col 29 lines 55-65). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Nighman, Zhang, Mitcheltree, and Horton by smoothing a perimeter portion, as in Aggarwal, of each speaker-specific audio file of Horton; and adjusting, as in Aggarwal, each speaker-specific audio file of Horton separately, wherein the combination results in output audio comprising a reassembled version of each adjusted speaker-specific audio file for reasons similar to those for claim 7.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                        03/09/26
Read full office action
Prosecution Timeline

Mar 24, 2023
Application Filed
Dec 06, 2024
Non-Final Rejection — §103
Apr 11, 2025
Response Filed
May 02, 2025
Final Rejection — §103
May 20, 2025
Applicant Interview (Telephonic)
May 20, 2025
Examiner Interview Summary
Aug 06, 2025
Request for Continued Examination
Aug 07, 2025
Response after Non-Final Action
Sep 12, 2025
Non-Final Rejection — §103
Feb 25, 2026
Response Filed
Mar 09, 2026
Final Rejection — §103
Apr 14, 2026
Examiner Interview Summary
Apr 14, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/385,358
Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
17/747,704
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
18/168,450
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
18/410,097
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
17/838,199
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
High
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.