DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-15, 29-32 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-7, 9, 10, 13-15, 29-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Woodruff U.S. PAP 2019/0206417 A1, in view of Edwards U.S. PAP 2013/0322645 A1.
Regarding claim 1 Woodruff teaches an audio signal processing apparatus comprising one or more processors and one or more memories storing instructions that are operable, when executed by the one or more processors (audio separation system 600 may include a processor 610, a memory 620, one or more acoustic sensors 630, an audio processing system 640, and an output device 650, see par. [0045]), to cause the audio signal processing apparatus to:
input an audio signal sample associated with at least one audio capture device to an audio source feature separation model that is configured to generate one or more isolate source audio features from the audio signal sample (n audio input 418 containing audio signals of various sound categories can be fed to a feature extraction module 460 to generate signal features in the frequency domain, see par. [0036]);
input the isolate source audio features subset to an audio generation model that is configured to generate a target source generated audio sample ( The deep neural network 450 (trained during the offline process described above) receives the signal features and generates a set of time-varying filters 470 (i.e. frequency masks). The time-varying filters 470 filter the audio signals to generate separated audio signals 480 of various sound content categories, see par. [0036]);
and output the target source generated audio sample to one or more audio output devices (The time-varying filters 470 filter the audio signals to generate separated audio signals 480 of various sound content categories, which may include the target signal (e.g., an audio signal of a target sound content category such as speech, see par. [0036].
However Woodruff does not teach determine, based on the one or more isolate source audio features, an isolate source audio features subset comprising those isolate source audio features of the one or more isolate source audio features associated with a target source.
In the same field of endeavor Edwards teaches receiving audio signal that includes audio from several audio subsets having different source characteristics and originating from different sources. The audio signal is separated into several segments of audio data. The segments are compared to audio fingerprints. The second segment corresponding to second subset of several subsets of audio data is separated from first segment corresponding to first subset of several subsets of audio signal, based on comparison of segments to audio fingerprints, see abstract. The separate sources of audio data within audio signal can be isolated effectively for analyzing digital data to isolate particular patterns for applying desirable masks, by separating second segment corresponding to second subset of several subsets of audio data from first segment corresponding to first subset of several subsets of audio signal, based on comparison of segments to audio fingerprints, see par. [0004].
It would have been obvious to one of ordinary skill in the art to combine the Woodruff invention with the teachings of Edwards for identifying patterns within an audio source and there is a need for tools which can analyze digital data to isolate particular patterns in order to apply desirable masks and/or filters to digital data., see par. [0004].
Regarding claim 2 Woodruff teaches the audio signal processing apparatus of claim 1, wherein each isolate source audio feature is associated with a targeted portion of the audio signal sample and the audio generation model is configured to generate the target source generated audio sample from the one or more isolate source audio features (time segments of live audio 418 that contain only one of the pre-defined content categories can be identified, see par. [0037]).
Regarding claim 3 Woodruff teaches the audio signal processing apparatus of claim 1, the one or more memories storing instructions that are operable, when executed by the one or more processors, to further cause the audio signal processing apparatus to:
input the one or more isolate source audio features and one or more isolate source audio components to an audio source feature classification model configured to classify the one or more isolate source audio features into one or more audio source categories (one type of output of a disclosed audio separation system can be multiple channels of audio streams for different content categories, see par. [0021]);
and based on determining that the one or more audio source categories are associated with a target source, input the one or more audio source categories to the audio generation model, wherein the audio generation model is configured to generate the target source generated audio sample based at least in part on one or more of the one or more audio source categories of the one or more isolate source audio features or isolate source audio components (At step 230, the audio separation system separates the audio signal into a plurality of content-based (i.e., category specific) audio signals by applying the time-varying filters to the audio signal. Each of the content-based (i.e., category specific) audio signals contains content of a corresponding sound content category among the plurality of sound content categories for which the system has been trained see par. [0028]).
Regarding claim 4 Woodruff teaches the audio signal processing apparatus of claim 3, wherein the one or more audio source categories include at least a desired or targeted audio source category and undesired audio source category ( the disclosed technology is capable of separating the speech content (e.g., from both talkers) from the music content (e.g., from the entire jazz trio) and from other ambient sounds., see par. [0016]).
Regarding claim 5 Woodruff teaches the audio signal processing apparatus of claim 3, wherein one or more of the isolate source audio feature or the isolate source audio component is classified into the one or more audio source categories based at least in part on at least one of an associated time domain signal, frequency domain signal, signal coordinate, signal class, or signal confidence (Each of the time-varying filters corresponds to one of a plurality of sound content categories, see par. [0027]).
Regarding claim 6 Woodruff teaches thee audio signal processing apparatus of claim 3, wherein the desired or targeted audio source category includes one or more identified individual speakers (the environment 100 may include at least one individual talker who is speaking, see par. [0017]).
Regarding claim 7 Woodruff teaches the audio signal processing apparatus of claim 1, the one or more memories storing instructions that are operable, when executed by the one or more processors, to further cause the audio signal processing apparatus to:
input the one or more isolate source audio features and one or more isolate source audio components to an audio source feature classification model configured to classify the one or more isolate source audio features and the one or more isolate source audio components into one or more audio source categories (one type of output of a disclosed audio separation system can be multiple channels of audio streams for different content categories, see par. [0021]);
and based on determining that the one or more audio source categories are associated with a target source, input the one or more audio source categories to the audio generation model, wherein the audio generation model is configured to generate the target source generated audio sample based at least in part on one or more of the one or more audio source categories of the one or more isolate source audio features and isolate source audio components (At least one type of output of a disclosed audio separation system can be multiple channels of audio streams for different content categories, see par. [0021]).
Regarding claim 9 Woodruff teaches the audio signal processing apparatus of claim 1, wherein: the audio source feature separation model is configured to perform one or more audio processing operations on the one or more target audio source components to generate a target audio source component feature set for the one or more target audio source components, and the audio generation model is configured to process the target source component feature set to generate the target source generated audio sample (The disclosed audio separation technology performs feature extraction on the frequency domain representation of the audio signal. The extracted signal features are used as inputs to at least one deep neural network. The neural network may run in a real time as the audio signal is captured and received. The neural network receives a new set of features for each new time frame and generates one or more filters (i.e. time-frequency masks) for that time frame, see par. [0020]).
Regarding claim 10 Woodruff teaches the audio signal processing apparatus of claim 1, wherein: the audio source feature separation model is configured to perform one or more audio processing operations on the one or more target audio source components to generate one or more partial target source generated audio samples for the one or more target audio source components, and the audio generation model is configured to provide the one or more partial target source generated audio samples to a set of target selection layers that are configured to process one or more the partial target source generated audio samples to generate the target source generated audio sample ( time segments of live audio 418 that contain only one of the pre-defined content categories can be identified in various ways, and these time segments can be used to refine the model coefficients so as to more closely align the deep neural network 450 for the particular online device and/or environment. In a passive approach example, the time segments are identified by comparing the estimated content signals 480 to the input audio mixture 418, see par. [0037]).
Regarding claim 13 Woodruff teaches the audio signal processing apparatus of claim 1, wherein the audio generation model is further configured to perform one or more audio signal processing techniques including one or more of automatic gain control or audio filtering to generate the target source (a value of 0.5 for a given frequency or frequency range would cause the signal amplitude for that frequency or frequency range to be reduced by half, see par. [0042]; the neural network of the audio separation system generates a plurality of time-varying filters in a frequency domain using the signal features as inputs of the neural network. Each of the time-varying filters corresponds to one of a plurality of sound content categories, see par. [0042]) .
Regarding claim 14 Woodruff teaches the audio signal processing apparatus of claim 1, wherein the audio source feature separation model is configured to generate the one or more isolate source audio features from the audio signal sample or to generate the one or more isolate source audio features from one or more isolate source audio components generated based on the audio signal sample ( At step 235, the audio separation system outputs the content-based audio signals, possibly along with spatial information of sound sources that emit sounds of the sound content categories, see par. [0029]).
Regarding claim 15 Woodruff teaches a computer program product comprising at least one non-transitory computer readable storage medium having computer-readable program code portions stored thereon that, when executed by at least one processor (Memory 620 (for example, non-transitory computer readable storage medium) stores, at least in part, instructions and data for execution by processor 610 and/or the audio processing system 640, see par. [0046]), cause an apparatus to:
input an audio signal sample associated with at least one audio capture device to an audio source feature separation model that is configured to generate one or more isolate source audio features from the audio signal sample (n audio input 418 containing audio signals of various sound categories can be fed to a feature extraction module 460 to generate signal features in the frequency domain, see par. [0036]);
input the isolate source audio features subset to an audio generation model that is configured to generate a target source generated audio sample ( The deep neural network 450 (trained during the offline process described above) receives the signal features and generates a set of time-varying filters 470 (i.e. frequency masks). The time-varying filters 470 filter the audio signals to generate separated audio signals 480 of various sound content categories, see par. [0036]);
and output the target source generated audio sample to one or more audio output devices (The time-varying filters 470 filter the audio signals to generate separated audio signals 480 of various sound content categories, which may include the target signal (e.g., an audio signal of a target sound content category such as speech, see par. [0036].
However Woodruff does not teach determine, based on the one or more isolate source audio features, an isolate source audio features subset comprising those isolate source audio features of the one or more isolate source audio features associated with a target source.
In the same field of endeavor Edwards teaches receiving audio signal that includes audio from several audio subsets having different source characteristics and originating from different sources. The audio signal is separated into several segments of audio data. The segments are compared to audio fingerprints. The second segment corresponding to second subset of several subsets of audio data is separated from first segment corresponding to first subset of several subsets of audio signal, based on comparison of segments to audio fingerprints, see abstract. The separate sources of audio data within audio signal can be isolated effectively for analyzing digital data to isolate particular patterns for applying desirable masks, by separating second segment corresponding to second subset of several subsets of audio data from first segment corresponding to first subset of several subsets of audio signal, based on comparison of segments to audio fingerprints, see par. [0004].
It would have been obvious to one of ordinary skill in the art to combine the Woodruff invention with the teachings of Edwards for identifying patterns within an audio source and there is a need for tools which can analyze digital data to isolate particular patterns in order to apply desirable masks and/or filters to digital data., see par. [0004].
Regarding claim 29 Woodruff teaches a method, comprising:
inputting an audio signal sample associated with at least one audio capture device to an audio source feature separation model that is configured to generate one or more isolate source audio features from the audio signal sample (n audio input 418 containing audio signals of various sound categories can be fed to a feature extraction module 460 to generate signal features in the frequency domain, see par. [0036]);
inputting the isolate source audio features subset to an audio generation model that is configured to generate a target source generated audio sample ( The deep neural network 450 (trained during the offline process described above) receives the signal features and generates a set of time-varying filters 470 (i.e. frequency masks). The time-varying filters 470 filter the audio signals to generate separated audio signals 480 of various sound content categories, see par. [0036]);
and outputting the target source generated audio sample to one or more audio output devices (The time-varying filters 470 filter the audio signals to generate separated audio signals 480 of various sound content categories, which may include the target signal (e.g., an audio signal of a target sound content category such as speech, see par. [0036].
Regarding claim 30 Woodruff teaches the method of claim 29, wherein each isolate source audio feature is associated with a targeted portion of the audio signal sample and the audio generation model is configured to generate the target source generated audio sample from the one or more isolate source audio features (time segments of live audio 418 that contain only one of the pre-defined content categories can be identified, see par. [0037]).
However Woodruff does not teach determine, based on the one or more isolate source audio features, an isolate source audio features subset comprising those isolate source audio features of the one or more isolate source audio features associated with a target source.
In the same field of endeavor Edwards teaches receiving audio signal that includes audio from several audio subsets having different source characteristics and originating from different sources. The audio signal is separated into several segments of audio data. The segments are compared to audio fingerprints. The second segment corresponding to second subset of several subsets of audio data is separated from first segment corresponding to first subset of several subsets of audio signal, based on comparison of segments to audio fingerprints, see abstract. The separate sources of audio data within audio signal can be isolated effectively for analyzing digital data to isolate particular patterns for applying desirable masks, by separating second segment corresponding to second subset of several subsets of audio data from first segment corresponding to first subset of several subsets of audio signal, based on comparison of segments to audio fingerprints, see par. [0004].
It would have been obvious to one of ordinary skill in the art to combine the Woodruff invention with the teachings of Edwards for identifying patterns within an audio source and there is a need for tools which can analyze digital data to isolate particular patterns in order to apply desirable masks and/or filters to digital data., see par. [0004].
Regarding claim 31 Woodruff teaches the method of claim 29, further comprising:
inputting the one or more isolate source audio features and one or more isolate source audio components to an audio source feature classification model configured to classify the one or more isolate source audio features into one or more audio source categories (one type of output of a disclosed audio separation system can be multiple channels of audio streams for different content categories, see par. [0021]);
and based on determining that the one or more audio source categories are associated with a target source, input the one or more audio source categories to the audio generation model, wherein the audio generation model is configured to generate the target source generated audio sample based at least in part on one or more of the one or more audio source categories of the one or more isolate source audio features or isolate source audio components (At step 230, the audio separation system separates the audio signal into a plurality of content-based (i.e., category specific) audio signals by applying the time-varying filters to the audio signal. Each of the content-based (i.e., category specific) audio signals contains content of a corresponding sound content category among the plurality of sound content categories for which the system has been trained see par. [0028]).
Regarding claim 32 Woodruff teaches the method of claim 31, wherein the one or more audio source categories include at least a desired or targeted audio source category and undesired audio source category ( the disclosed technology is capable of separating the speech content (e.g., from both talkers) from the music content (e.g., from the entire jazz trio) and from other ambient sounds., see par. [0016]).
Claim(s) 8 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Woodruff U.S. PAP 2019/0206417 A1, in view of Edwards U.S. PAP 2013/0322645 A1,in view of Wang “Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks”.
Regarding claim 8 Woodruff in view of Edwards does not teach the audio signal processing apparatus of claim 1, wherein the audio source feature separation model comprises a non-invertible transform layer configured to perform one or more feature extraction or non-invertible transform operations on the audio signal sample.
In similar field of endeavor Wang teaches a unifying framework for protecting deep neural models using a non-invertible data transformation–developing two adversary-resilient architectures utilizing both linear and nonlinear dimensionality reduction, see abstract. we build adversary-resilient DNN architectures using non-invertible dimension reduction methods. Related research commonly falls into the area of adversarial machine learning or dimensionality reduction methods. In this section we introduce several state-of-the-art adversarial machine learning technologies. These technologies can be categorized as either data augmentation or DNN model complexity enhancement, see section VII.
It would have been obvious to one of ordinary skill in the art to combine the Woodruff in view of Edwards invention with the teachings of Wang for the benefit of enhancing the DNN models, see section VII.
Regarding claim 11 Woodruff in view of Edwards does not teach the audio signal processing apparatus of claim 1, wherein the one or more isolate source audio features is in a non-invertible domain.
In similar field of endeavor Wang teaches a unifying framework for protecting deep neural models using a non-invertible data transformation–developing two adversary-resilient architectures utilizing both linear and nonlinear dimensionality reduction, see abstract. we build adversary-resilient DNN architectures using non-invertible dimension reduction methods. Related research commonly falls into the area of adversarial machine learning or dimensionality reduction methods. In this section we introduce several state-of-the-art adversarial machine learning technologies. These technologies can be categorized as either data augmentation or DNN model complexity enhancement, see section VII.
It would have been obvious to one of ordinary skill in the art to combine the Woodruff in view of Edwards invention with the teachings of Wang for the benefit of enhancing the DNN models, see section VII.
Claim(s) 8 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Woodruff U.S. PAP 2019/0206417 A1 in view of Edwards U.S. PAP 2013/0322645 A1, further in view of Lashkari U.S. PAP .
Regarding claim 12 Woodruff in view of Edwards does not teach the audio signal processing apparatus of claim 1, wherein one or more of the isolate source audio features or an isolate source audio component is classified as a far end audio signals and the audio generation model excludes the far end audio signals when generating the target source.
In a similar field of endeavor Lashkari teaches an AEC system 112 uses the input speaker signals to remove echoes from the input microphone signal to generate an output audio signal. An adaptive filter is controlled through filter coefficients to decouple the far-end signal of the speakers from a near-end signal of the microphone, see par. [0003].
It would have been obvious to one of ordinary skill in the art to combine the Woodruff in view of Edwards invention with the teachings of Lashkari for the benefit of removing echoes from input microphone signals, see par. [0003].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656