Last updated: April 19, 2026
Application No. 16/952,398
SOUND ANOMALY DETECTION USING DATA AUGMENTATION

Final Rejection §101§102§103
Filed
Nov 19, 2020
Examiner
SMITH, KEVIN LEE
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
6 (Final)
Interview Optional

— +18.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 134 resolved cases, 2023–2026
Examiner Intelligence

SMITH, KEVIN LEE View full profile →
Grants only 37% of cases
Career Allow Rate
49 granted / 134 resolved
-18.4% vs TC avg
Strong +18% interview lift
Without
With
+18.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
45 currently pending
Career history
179
Total Applications
across all art units
Statute-Specific Performance

§101
30.7%
-9.3% vs TC avg
§103
36.4%
-3.6% vs TC avg
§102
10.1%
-29.9% vs TC avg
§112
17.3%
-22.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 134 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	Applicant's submission filed on 01 December 2025 [hereinafter Response] has been entered, where:
Claims 3 and 16 have been amended.
Claims 1-25 are pending.
Claims 1-25 are rejected.
Information Disclosure Statement
3.	An information disclosure statement was submitted on 21 January 2026. The submission complies with the provisions of 37 CFR 1.97. Accordingly, the Examiner considered the information disclosure statement.
Claim Rejections - 35 U.S.C. § 101
4.	35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
5.	Claims 1-25 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites a “method,” which is a process and is thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). However, under Step 2A Prong One, the claim recites the limitations of “performing multiple forms of data augmentation on a sample waveform to generate a plurality of data augmentation samples,” and “determining an anomaly score by averaging outputs of the neural network model for the data augmentation samples.” These limitations recite a mental process, which includes observations, evaluations, judgments, and opinions. (MPEP § 2106.04(a)(2) subsection III), and thus are one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)). Thus, claim 1 recites an abstract idea. 
Under Step 2A Prong Two, the abstract idea of claim 1 is not integrated into a practical application, because the additional elements beyond the identified judicial exception recited in the claim are “a computer-implemented method,” where instructions to apply the abstract idea on generic computer components (i.e., computer-implemented) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim also recites “training a neural network model to identify a form of data augmentation . . . ,” which is an additional element. In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application. Also, the claim recites “classifying the data augmentation samples with the neural network model” is an additional element of “applying” the neural network to implement the judicial exception, (MPEP § 2106.05(f)), that does not integrate the abstract idea into a practical application. Also, the claim recites the additional element of “performing a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not integrate the abstract idea into a practical application. Therefore, claim 1 is directed to the abstract idea. 
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements beyond the identified judicial exception recited in the claim are “a computer-implemented method,” where instructions to apply the abstract idea on generic computer components (i.e., computer-implemented) does not amount to significantly more than the abstract idea. (MPEP § 2106.05(f)). The claim also recites “training a neural network model to identify a form of data augmentation . . . ,” which is an additional element. In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not amount to significantly more than the abstract idea. Also, the claim recites “classifying the data augmentation samples with the neural network model,” is an additional element of “applying” the neural network with the juridical exception, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. Also, the claim recites the additional element of “performing a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the well-understood, routine, and conventional activity of transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. Therefore, claim 1 is subject-matter ineligible.
Claim 11 recites a “method,” which is a process and is thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). Under Step 2A Prong One, the claim, however, recites limitations of “performing multiple forms of data augmentation on a sample waveform, including differing types of data augmentation and differing degrees of each type of data augmentation, to generate a plurality of data augmentation samples,” “segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size” and “determining an anomaly score by averaging outputs of the neural network model for the data augmentation sample segments.” These limitations recite a mental process, which includes observations, evaluations, judgments, and opinions. (MPEP § 2106.04(a)(2) subsection III), and thus are one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)). Thus, claim 11 recites an abstract idea. 
Under Step 2A Prong Two, the abstract idea of claim 11 is not integrated into a practical application, because the additional elements beyond the identified judicial exception recited in the claim are “a computer-implemented method,” where instructions to apply the abstract idea on generic computer components (i.e., computer-implemented) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim also recites “training a neural network to identify a form of data augmentation . . . ,” which is an additional element. In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application. Also, the claim recites “classifying the data augmentation sample segments with the neural network model to identify a form of data augmentation that has been performed on each of the segments,” which is an additional element of “applying” the neural network model with the judicial exception, (MPEP § 2106.05(f)), that does not integrate the abstract idea into a practical application. Also, the claim recites the additional element of “performing a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not integrate the abstract idea into a practical application. Therefore, claim 1 is directed to the abstract idea. 
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements beyond the identified judicial exception recited in the claim are “a computer-implemented method,” where instructions to apply the abstract idea on generic computer components (i.e., computer-implemented) does not amount to significantly more than the abstract idea. (MPEP § 2106.05(f)). The claim also recites “training a neural network to identify a form of data augmentation . . . ,” which is an additional element. In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not amount to significantly more than the abstract idea. Also, the claim recites “classifying the data augmentation samples with the neural network model . . . ,” is an additional element of “applying” the neural network with the judicial exception, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. Also, the claim recites the additional element of “performing a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the well-understood, routine, and conventional activity of transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. Therefore, claim 11 is subject-matter ineligible.
Claim 13 recites “a non-transitory computer readable storage medium,” which is a product and is thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). However, under Step 2A Prong One, the claim recites the limitations of “perform multiple forms of data augmentation on a sample waveform to generate a plurality of data augmentation samples,” and “determine an anomaly score by averaging outputs of the neural network model for the data augmentation samples.” These limitations recite a mental process, which includes observations, evaluations, judgments, and opinions. (MPEP § 2106.04(a)(2) subsection III), and thus are one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)). Thus, claim 13 recites an abstract idea. 
Under Step 2A Prong Two, the abstract idea of claim 13 is not integrated into a practical application, because the additional elements beyond the identified judicial exception recited in the claim are “a non-transitory computer readable storage medium,” and “a computer,” where instructions to apply the abstract idea on generic computer components (i.e., “a non-transitory computer readable storage medium,” and “a computer”) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim also recites “train a neural network model to identify a form of data augmentation . . . ,” which is an additional element. In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application. Also, the claim recites “classify the data augmentation samples with the neural network model,” which is an additional element of “applying” the neural network with the juridical exception, (MPEP § 2106.05(f)), that does not integrate the abstract idea into a practical application. Also, the claim recites the additional element of “performing a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not integrate the abstract idea into a practical application. Therefore, claim 13 is directed to the abstract idea. 
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements beyond the identified judicial exception recited in the claim are “a non-transitory computer readable storage medium,” and “a computer,” where instructions to apply the abstract idea on generic computer components (i.e., “a non-transitory computer readable storage medium,” and “a computer”) does not amount to significantly more than the abstract idea. (MPEP § 2106.05(f)). The claim also recites “train a neural network model to identify a form of data augmentation . . . ,” which is an additional element. In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not amount to significantly more than the abstract idea. Also, the claim recites “classify the data augmentation samples with the neural network model,” which is an additional element of “applying” the neural network with the judicial exception, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. Also, the claim recites the additional element of “perform a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the well-understood, routine, and conventional activity of transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. Therefore, claim 13 is subject-matter ineligible.
Claim 14 recites a “system,” which is a machine and is thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). The claim, under Step 2A Prong One, however, recites “. . . performs multiple forms of data augmentation on a sample waveform . . . ,” and “. . . that determines an anomaly score by averaging outputs of the neural network model for the data augmentation samples.” These limitations recite a mental process, which includes observations, evaluations, judgments, and opinions. (MPEP § 2106.04(a)(2) subsection III), and thus are one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)). Thus, claim 14 recites an abstract idea.
Under Step 2A Prong Two, the abstract idea of claim 14 is not integrated into a practical application, because the additional elements beyond the identified judicial exception recited in the claim are “a system,” “a hardware processor,” “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector.” where instructions to apply the abstract idea on generic computer components (i.e., “a system,” “a hardware processor,” and “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector”) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim recites the additional element of “a model trainer that trains a neural network model.” In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application. The claim also recites that the “neural network model that identifies a form of data augmentation,” and “wherein the neural network model classifies the data augmentation samples,” which are each additional elements of “applying” the neural network with the juridical exception, (MPEP § 2106.05(f)), that does not integrate the abstract idea into a practical application. Also, the claim recites the additional element of “. . . performs a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not integrate the abstract idea into a practical application. Therefore, claim 14 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The additional elements beyond the identified judicial exception recited in the claim are “a system,” “a hardware processor,” “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector.” where instructions to apply the abstract idea on generic computer components (i.e., “a system,” “a hardware processor,” and “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector”) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim recites the additional element of “a model trainer that trains a neural network model.” In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application. The claim also recites that the “neural network model that identifies a form of data augmentation,” and “wherein the neural network model classifies the data augmentation samples,” which are each additional elements of “applying” the neural network with the judicial exception, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. Also, the claim recites the additional element of “. . . performs a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the well-understood, routine, and conventional activity of transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. Therefore, claim 14 is subject-matter ineligible.
Claim 23 recites a “system,” which is a machine and is thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). However, under Step 2A Prong One, the claim recites “. . . performs multiple forms of data augmentation on a sample waveform including differing types of data augmentation and differing degrees of each type of data augmentation, to generate a plurality of data augmentation samples, and that segments the data augmentation samples into respective sets of segments, separated from one another by a hop size,” and “. . . that determines an anomaly score by averaging outputs of the neural network model for the data augmentation samples.” These limitations recite a mental process, which includes observations, evaluations, judgments, and opinions. (MPEP § 2106.04(a)(2) subsection III), and thus are one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)). Thus, claim 23 recites an abstract idea.
Under Step 2A Prong Two, the abstract idea of claim 23 is not integrated into a practical application, because the additional elements beyond the identified judicial exception recited in the claim are “a system,” “a hardware processor,” “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector.” where instructions to apply the abstract idea on generic computer components (i.e., “a system,” “a hardware processor,” and “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector”) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim recites the additional element of “a model trainer that trains a neural network model.” In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application. The claim also recites more details or specifics to the abstract idea of “applying” the neural network, “wherein the neural network model classifies the data augmentation samples to identify a form of data augmentation that has been performed on each of the data augmentation sample segments,” which is still an additional element of “applying” the neural network with the juridical exception, (MPEP § 2106.05(f)), that does not integrate the abstract idea into a practical application. Also, the claim recites the additional element of “. . . performs a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not integrate the abstract idea into a practical application. Therefore, claim 23 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. the additional elements beyond the identified judicial exception recited in the claim are “a system,” “a hardware processor,” “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector.” where instructions to apply the abstract idea on generic computer components (i.e., “a system,” “a hardware processor,” and “a memory that stores computer code,” “a neural network model,” “a model trainer,” a data augmenter,” and “an anomaly detector”) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim recites the additional element of “a model trainer that trains a neural network model.” In this instance, the training of the neural network model is mere instructions to implement the abstract idea on a generic computer component computer, (MPEP § 2106.05(f)), which does not amount to significantly more than the abstract idea. The claim also recites more details or specifics to the abstract idea of “applying” the neural network, “wherein the neural network model classifies the data augmentation samples to identify a form of data augmentation that has been performed on each of the data augmentation sample segments,” which is still an additional element of “applying” the neural network with the judicial exception, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. Also, the claim recites the additional element of “. . . performs a corrective action responsive to the anomaly score,” which as a corrective action being “notification,” is the well-understood, routine, and conventional activity of transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. Therefore, claim 23 is subject-matter ineligible.
Claim 2 depends from claim 1. Claim 15 depends from claim 14. The claims recite limitations that are mental processes, (Claim 2: “further comprising segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size, before classifying the data augmentation samples; Claim 15: wherein the data augmenter segments the data augmentation samples into respective sets of segments, separated from one another by a hop size, before classifying the data augmentation samples;”), and accordingly, are one of the groupings of abstract ideas. (MPEP § 2106.04(a)(2)). None of the claims include an additional element that integrates the abstract idea into a practical application because the claims do not impose any meaningful limits on practicing the abstract idea. Also, the claims do not include an additional element that amounts to “significantly more” than the abstract idea. Therefore, claims 2 and 15 are each subject-matter ineligible. 
Claims 3, 4 and 8 depend directly or indirectly from claim 1. Claims 16, 17, and 21 depend directly or indirectly from claim 14. The claims recite more details or specifics of the additional elements of “classifying . . . with the neural network model,” and/or a training dataset of the neural network model. (Claims 3 and 16: “wherein classifying the data augmentation samples includes classifying the sets of segments to identify a form of data augmentation that has been performed on each of the segments;” Claims 4 and 17: wherein classifying the data augmentation samples includes determining a probability that each form of data augmentation has been performed on each of the data augmentation samples, each probability being determined as an average of probabilities that each segment of the set of segments corresponding to a given data augmentation sample has been subjected to the respective form of data augmentation;” Claims 8 and 21: wherein training the neural network model includes performing the multiple forms of data augmentation on training waveforms in a training dataset”). Accordingly, the claims are more specific to the additional elements. The abstract idea of these claim is not integrated into a practical application, (see MPEP § 2106.04(d)), nor do they amount to significantly more than the abstract idea, (MPEP § 2106.05), because the claims recite no more than the abstract idea. Therefore, claims 3, 4, 8, 16, 17, and 21 are each subject-matter ineligible.
Claims 5-7 and 9 depend directly or indirectly from claim 1. Claim 12 depends from claim 11. Claims 18-20 depend directly or indirectly from claim 14. Claims 24 and 25 depend from claim 23. The claims recite more details or specifics of the abstract idea of “data augmentation,” (Claim 5, 12, 18 and 24: wherein the multiple forms of data augmentation include one or more types of data augmentation selected from the group consisting of . . . . ;” Claims 6 and 19: “wherein the multiple forms of data augmentation include differing degrees of a single type of data augmentation;” Claims 7 and 20: wherein the multiple forms of data augmentation include at least two distinct types of data augmentation, each performed to at least three different degrees, to provide at least nine different forms of combined data augmentation”; Claims 9 and 25: wherein the sample waveform is an audio waveform), and accordingly, are merely more specific to the abstract idea. None of the claims include an additional element that integrates the abstract idea into a practical application because the claims do not impose any meaningful limits on practicing the abstract idea. Also, the claims do not include an additional element that amounts to “significantly more” than the abstract idea. Therefore, claims 5-7, 9, 12, 18-20, and 25 are each subject-matter ineligible.
Claim 10 depends from claim 1. Claim 22 depends from 14. The claim recites more details or specifics of the additional element of “performs a corrective action,” (claim 10: wherein the corrective action is selected from the group consisting of diverting a faulty product, halting equipment operation, and automatically adjusting operational parameters of a system to compensate for a detected anomaly;” claim 22: “wherein the response function performs a corrective action selected from the group consisting of diverting a faulty product, halting equipment operation, and automatically adjusting operational parameters of a system to compensate for a detected anomaly”), which is merely more specific to the additional element. The abstract idea of these claim is not integrated into a practical application, (see MPEP § 2106.04(d)), nor do they amount to significantly more than the abstract idea, (MPEP § 2106.05), because the claims recite no more than the abstract idea. Accordingly, claims 10 and 22 are each subject-matter ineligible.
Claim Rejections - 35 U.S.C. § 103
6.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
8.	This application currently names joint inventors. In considering patentability of the claims the Examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Appellant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the Examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
9.	Claims 1-5, 8, 9, 13-18, and 21 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 12008457 to Isik et al. [hereinafter Isik] in view of Giri et al., “Unsupervised Anomalous Sound Detection using Self-Supervised Classification and Group Masked Autoencoder for Density Estimation,” DCASE 2020 (02 Nov 2020) [hereinafter Giri], and Henze et al., "AudioForesight: A Process Model for Audio Predictive Maintenance in Industrial Environments," IEEE (2019) [hereinafter Henze].
Regarding claims 1, 13, and 14, Isik teaches [a] computer-implemented method for anomaly detection (Isik 15:5-7 teaches “various methods and techniques as described herein, including the application of self-supervised training for audio anomaly detection [(that is, a computer implemented method for anomaly detection)]”) of claim 1, a non-transitory computer readable storage medium (Isik 12:55-57 teaches “one or more processors executing program instructions stored on one or more computer-readable storage media coupled to the processor [(that is, a non-transitory computer-readable storage medium )]”) of claim 13, and a system for anomaly detection (Isik 12:5-8 teaches “embeddings for audio processing, according to some embodiments. Various different systems and devices may implement the various methods and techniques described below, either singly or working together [(that is, a system for anomaly detection)]”) of claim 14, comprising:
training a neural network model to identify a form of data augmentation that has been performed on a waveform (Isik, Fig. 5, teaches training a neural network model [Examiner annotations in dashed-line text boxes]:

    PNG
    media_image1.png
    563
    962
    media_image1.png
    Greyscale

Isik 7:63-66 teaches “augmentations 330 may be applied to training data. In this way, specific failure modes may be addressed to improve performance of the convolutional neural network model. . . . As indicated at 332, the augmented audio may be provided to audio processing model training 316 for training a convolutional neural network [(that is, training a neural network model to identify a form of data augmentation that has been performed on a waveform)]”; Isik 15:4-7 teaches “memory 1020 may include program instructions 1025 . . . including the application of self-supervised training for audio anomaly detection [(that is, “anomaly detection” is to identify a form of data augmentation that has been performed on a waveform)] and data storage 1035, comprising various data accessible by program instructions 1025”);
performing multiple forms of data augmentation on a sample waveform to generate a plurality of data augmentation samples (Isik 8:4-38 teaches “different augmentations may be implemented, including the examples below. For example, an augmentation stack may include one (or more) of: Equalization . . . , Clipping . . . , Level and Silence . . . ., Band-limiting . . . , Reverberation [(that is, performing multiple forms of data augmentation on a sample waveform to generate a plurality of data augmentation samples ))]”; Isik 7:63 thru 8:3 teaches “augmentations 330 [(that is, “augmentations,” being plural, is performing multiple forms of data augmentation)] may be applied to training data [(that is, to generate a plurality of data augmentation samples)]. In this way, specific failure modes may be addressed to improve performance of the convolutional neural network model, in various embodiments. As indicated at 332 [(that is, “augmented audio data 332” is a plurality of data augmentation samples)], the augmented audio may be provided to audio processing model training 316 for training a convolutional neural network”);
classifying the data augmentation samples with the neural network model (Isik, Fig. 5, teaches a model deployment 213 [Examiner annotations in dashed-line text boxes]:

    PNG
    media_image2.png
    658
    741
    media_image2.png
    Greyscale

Isik 6:4-17 teaches a “[m]achine learning service 210 may implement model deployment 213 which may support implementation of and/or various applications that include a trained machine learning model, such as a convolutional neural networks, as discussed above with regard to [FIG. 3]. Model deployment 213 may host various audio processing pipelines 214. Audio processing pipelines 214 may be deployed on one or more nodes, which may, upon receipt of audio data [(that is, “audio data” is the data augmentation samples)] directed to the audio processing pipeline 214 (e.g., to a network endpoint or other resource identifier for machine learning service 210) to perform various audio processing tasks on the received audio data, such as audio classification 273 a, audio enhancement 273 b, audio source separation 273 c, among others [(that is, “audio classification 273a, audio enhancement 273b, audio source separation 273c, etc.,” are classifying the data augmentation samples with the neural network model)]”);
* * *
Though Isik teaches the feature of a machine learning service to train and deploy various machine learning models for audio processing tasks, Isik, however, does not explicitly teach –
* * *
determining an anomaly score by averaging outputs of the neural network model for the data augmentation samples; and 
* * *
But Giri teaches –
* * *
determining an anomaly score (Giri, left column of p. 2, “2.2 Self-Supervised Classification,” last partial paragraph, teaches “2.1. Group Masked Autoencoder (Group-MADE),” third paragraph, teaches “[t]he proposed Group MADE model is trained using negative log likelihood as cost function, using all the normal training data across all IDs for a specific machine. During inference we use the negative log likelihood as anomaly score for each test sample [(that is, determining an anomaly score)]”) by averaging outputs of the neural network model for the data augmentation samples, (Giri, left column of p. 3, “2.3 Ensembling,” second paragraph, teaches we “transform the anomaly scores of each model into a standardized scale, before combining them. The standardization transformation for any given model is applied in a per-machine ID fashion, by computing the mean and variance of its anomaly scores over the training data for that machine ID. The anomaly scores are then transformed to have zero mean and unit variance over the training data of that machine ID. Standardized anomaly scores across different models are then combined using mean or max ensembling [(that is, “mean ensembling” is averaging outputs of the neural network model for the data augmentation samples)]”); and
* * *
Isik and Giri are from the same or similar field of endeavor. Isik teaches audio processing of audio data, in which a result of processing the audio data through the convolutional neural network may be used to perform an audio processing task. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score.
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Isik pertaining to audio data classification via a neural network model to produce a result with the classification anomaly score of Giri.
The motivation to do so is to produce “an ensemble of a novel density estimation based anomaly detector (Group Masked Autoencoder for Density Estimation (GMADE)) and self-supervised classification based anomaly detector.” (Giri, Abstract).
Though Isik and Giri teach the feature of performing various result action(s) based on the result of processing the audio data, the combination of Isik and Giri, however, do not explicitly teach –
* * *
and performing a corrective action responsive to the anomaly score.
But Henze teaches -
* * *
and performing a corrective action responsive to the anomaly score (Henze, right column of p. 353, “III. Audio Anomaly Detection & Classification Process Model,” third paragraph, teaches “[t]he classification provides a first suggestion about the issue with the industrial equipment. While this is a useful hint for the technician who has to perform the maintenance, it can also save a lot of time [(that is, the “hint of the technician,” via “predictive maintenance,” is performing a corrective action)]”; Henze, right column of p. 354, “IV. Case Study,” first full paragraph, teaches “1) Anomaly Detector. . . . We use a deep autoencoder to produce anomaly scores in order to differentiate between normal and abnormal behavior [(that is, responsive to the anomaly score)]).
Isik, Giri and Henze are from the same or similar field of endeavor. Isik teaches audio processing of audio data, in which a result of processing the audio data through the convolutional neural network may be used to perform an audio processing task. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score. Henze teaches anomaly detection to predict abnormal behavior in comparison to the normal behavior observed after setting up the system, and subsequently, a classifier tries to identify the system error based on predefined failure classes. 
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Isik and Giri pertaining to audio data classification via a neural network model to produce a result via a classification anomaly score with the identifying of system errors, or anomalies, of Henze.
The motivation to do so is because “[a]nalyzing the equipment conditions ensures the maximum interval between repairs and minimizing the number and costs of unscheduled outages [3] which offers an opportunity to cost effective maintenance [6].” (Henze, right column of p. 352, “I. Introduction,” first partial paragraph).
Regarding claims 2 and 15, the combination of Isik, Giri, and Henze teaches all of the limitations of claims 1 and 14, respectively, as described in detail above.
Giri teaches –
further comprising segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size, before classifying the data augmentation samples (Giri, right column of p. 2, “2.2.1. Classifier Architectures,” first paragraph, teaches “[f]or the classification task, we employ two different architectures; MobileNetV2 and ResNet-50. MobileNetV2 is introduced in [4] as a computationally efficient improvisation of convolutional neural networks for visual recognition tasks such as object detection, classification and semantic segmentation”; Giri, left column of p. 2, “2.1. Group Masked Autoencoder (Group-MADE),” firth paragraph, teaches “[f]ollowing the baseline model, each input 10s file is split into frames [(that is, segmentation)] of length 64ms, with hop length [(that is, separated from one another by a hop size)] of 32ms between frames. 1024-FFT and 128 Mel bins are used to featurize each frame. 5 frames are concatenated, resulting in 5 x 128 = 640 dimensional input [(that is, segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size, before classifying the data augmentation samples)]”).
Regarding claims 3 and 16, the combination of Isik, Giri, and Henze teaches all of the limitations of claims 2 and 15, respectively, as described in detail above. 
Giri teaches -
wherein classifying the data augmentation samples includes classifying a form of data augmentation that has been performed on each of the segments (Giri, right column of p. 2, “2.2.2. Inputs,” first paragraph, teaches “inputs to the classifiers are 64 x 128 images . . . [where each] input 10s file is split into frames of length 64 ms, with hop length of 32 ms between frames [(that is, “segmented input data,” is classifying the data augmentation samples includes classifying the sets of segments to identify a form of data augmentation that has been performed on each of the segments)]”; Giri, left column of p. 2, “2.2. Self-Supervised Classification,” second paragraph, teaches “[s]elf-supervision using classification tasks has been previously used for detecting anomalies . . . . We employ a different strategy here. We leverage machine ID metadata, combined with different types of audio-inspired data augmentations to set up classification tasks [(that is, classifying a form of data augmentation )]”).
Regarding claims 4 and 17, the combination of Isik, Giri, and Henze teaches all of the limitations of claims 3 and 16, as described in detail above. 
Giri teaches -
wherein classifying the data augmentation samples includes 
determining a probability that each form of data augmentation has been performed on each of the data augmentation samples, 
each probability being determined as an average of probabilities that each segment of the set of segments corresponding to a given data augmentation sample has been subjected to the respective form of data augmentation (Giri, Table 1, teaches experimental results of the classifiers [Examiner annotations in dashed-line text boxes]:

    PNG
    media_image3.png
    309
    896
    media_image3.png
    Greyscale

Giri, right column of p. 3, “4. Results,” first paragraph, teaches “we only report results using the development set. In Table 1, we present AUC results [Examiner notes that this measurement assesses the general performance of a model across all possible operating points] and pAUC in parentheses [Examiner notes, the pAUC focuses on a specific, high-value region] for both the challenge baseline autoencoder model, and our 4 submissions for all 6 machines averaged across IDs [(that is, “averaged across IDs is “determining a probability that each form of data augmentation has been performed on each of the data augmentation samples, each probability being determined as an average of probability that each segment of the set of segments corresponding to a given data augmentation sample has been subjected to the respective form of data augmentation)]”).
Regarding claims 5 and 18, the combination of Isik, Giri, and Henze teaches all of the limitations of claims 1 and 14, respectively, as described in detail above. Isik teaches -
wherein the multiple forms of data augmentation include one or more types of data augmentation selected from the group consisting of pitch shift, . . . low/high pass filters (Isik 8:4-17 teaches “in various embodiments, different augmentations may be implemented, including the examples below. For example, an augmentation stack may include one (or more) of: Equalization. Random high and low-shelf EQ filters. With center frequency chosen uniformly in logarithmic domain between 40 and 8000 Hz, gain between ±10 dB. Two random EQ bell-curves per datapoint, symmetric in log domain, with Q-value between 0.5 and 1.5; frequency chosen from the same interval as shelf EQ. Randomized and applied to both speech and noise separately [(that is, low/high pass filters)]. Pitch shifts. Random resampling with ±10% of the original sample rate [(that is, include one or more types of data augmentation selected from the group consisting of pitch shift, . . . , low/high pass filters)]”
[Examiner construes that the phrase “one or more types of augmentation” controls in relation to the “group [of types] consisting of,” and accordingly, the broadest reasonable interpretation of the claim at a minimum calls for “one . . .type[] of data augmentation”]), . . . .
Regarding claims 8 and 21, the combination of Isik, Giri, and Henze teaches all of the limitations of claims 1 and 14, respectively, as described in detail above.
Isik teaches –
wherein training the neural network model includes performing the multiple forms of data augmentation on training waveforms in a training dataset ((Isik, Fig. 5, teaches training a neural network model [Examiner annotations in dashed-line text boxes]:

    PNG
    media_image4.png
    683
    1230
    media_image4.png
    Greyscale

Isik 7:63-66 teaches “augmentations 330 may be applied to training data. In this way, specific failure modes may be addressed to improve performance of the convolutional neural network model. . . . As indicated at 332, the augmented audio may be provided to audio processing model training 316 for training a convolutional neural network [(that is, wherein training the neural network model includes performing the multiple forms of data augmentation on training waveforms in a training dataset)]”).
Regarding claims 9, the combination of Isik, Giri, and Henze teaches all of the limitations of claim 1, as described in detail above.
Isik teaches -
wherein the sample waveform is an audio waveform (Isik, Fig. 5 above, teaches “training audio data 302” is an audio waveform; also, Isik 3:16-20 teaches “Machine learning (ML) audio processing pipeline 110 may, in various embodiments, perform various audio processing tasks on received audio data 102. For example, as discussed above, audio processing tasks may include audio enhancement (e.g., speech enhancement), source audio separation 20 (e.g., for speech audio, music audio, etc.), audio classification, and/or event detection/audio monitoring, among others.”).
10.	Claims 6, 7, 19, and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 12008457 to Isik et al. [hereinafter Isik] in view of “Unsupervised Anomalous Sound Detection using Self-Supervised Classification and Group Masked Autoencoder for Density Estimation,” DCASE 2020 (02 Nov 2020) [hereinafter Giri], Henze et al., "AudioForesight: A Process Model for Audio Predictive Maintenance in Industrial Environments," IEEE (2019) [hereinafter Henze], and Becker et al., “Acoustic Anomaly Detection in Additive Manufacturing with Long Short-Term Memory Neural Networks,” IEEE (April 2020) [hereinafter Becker].
Regarding claims 6 and 19, the combination of Isik, Giri, and Henze teaches all of the limitations of claims 1 and 14, respectively, as described in detail above.
Though Isik, Giri, and Henze teach the features of performing various result action(s) based on the result of classifying audio data anomalies, the combination of Isik, Giri, and Henze, however, does not explicitly teach -
wherein the multiple forms of data augmentation include differing degrees of a single type of data augmentation.
But Becker teaches - 
wherein the multiple forms of data augmentation include differing degrees of a single type of data augmentation (Becker, right column of p. 923, “III. Approach,” Figs. 3(a) – 3(g), teach “differing degrees,” where:

    PNG
    media_image5.png
    260
    386
    media_image5.png
    Greyscale

Becker, right column of p. 923, “III. Approach,” first & second full paragraphs (referring to Figs. 3(a) – 3(g)), teaches [i]n time stretching, the signal is stretched or compressed in the direction of time while the pitch remains the same. Figure 3a shows the original audio signal. If a stretch with various factors k (that is, “various factors k” is include differing degrees of a single type of data augmentation)) is applied to this signal, it is compressed (see Figure 3b) or stretched (see Figure 3c). The pitch of the individual samples is increased by a certain number k halftones shifted up or down. Figure 3d shows a shift of 10 halftones down, Figure 3e shows a shift of 10 halftones up).
Isik, Giri, Henze, and Becker are from the same or similar field of endeavor. Isik teaches audio processing of audio data, in which a result of processing the audio data through the convolutional neural network may be used to perform an audio processing task. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score. Henze teaches anomaly detection to predict abnormal behavior in comparison to the normal behavior observed after setting up the system, and subsequently, a classifier tries to identify the system error based on predefined failure classes. Becker teaches the training of neural network models using augmented audio waveforms for detecting waveform anomalies of a printing process.
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Isik, Giri, and Henze pertaining to audio data classification via a neural network model to produce a result via a classification anomaly score with the anomaly detection models trained on augmented audio waveforms of Becker.
The motivation to do so is because “[a]udio sensors and current methods of data processing and machine learning can be used to detect these errors and notify the user, allowing him to react accordingly. Critical errors can also be corrected immediately, preventing damage or wear to the printer, wasted material and time.” (Becker, right column of p. 921, “I. Introduction,” first paragraph).
Regarding claims 7 and 20, the combination of Isik, Giri, Henze and Becker teaches all of the limitations of claims 6 and 19, respectively, as described in detail above. 
Becker teaches - 
wherein the multiple forms of data augmentation include at least two distinct types of data augmentation (Becker, left column of p. 923, “III. Approach, last partial paragraph, teaches [s]everal different ways of data augmentation were used like time stretching, pitch shifting, or amplifying (that is, “time stretching” and “pitch shifting” are include at least two distinct types of data augmentation)), each performed to at least three different degrees (Becker, right column of p. 923, “III. Approach,” first & second full paragraphs (referring to Figs. 3(a) – 3(g)), teaches [i]n time stretching, the signal is stretched or compressed in the direction of time while the pitch remains the same. Figure 3a shows the original audio signal. If a stretch with various factors k is applied to this signal, it is compressed (see Figure 3b) or stretched (see Figure 3c) (that is, “time stretch (TS)” is “compressed,” “stretched,” and “neutral,” where each performed to at least three different degrees). The pitch of the individual samples is increased by a certain number k halftones shifted up or down. Figure 3d shows a shift of 10 halftones down, Figure 3e shows a shift of 10 halftones up (that is, “pitch shifting (PS)” is “10 halftones down,” “10 halftones up,” and “neutral” where each performed to at least three different degrees)), to provide at least nine different forms of combined data augmentation (that is, Becker teaches degree combinations, where with all possible degrees the result is: (1) (PS 10 halftones down, TS compressed), (2) (PS neutral, TS compressed), (3) (PS 10 halftones up, TS compressed), (4) (PS 10 halftones down, TS neutral), (5) (PS 10 halftones up, TS neutral), (6) (PS 10 halftones down, TS stretched), (7) (PS neutral, TS stretched), (8) (PS 10 halftones down, TS stretched), and (9) (PS neutral, TS neutral) is to provide at least nine different forms of combined data augmentation)).
11.	Claims 10 and 22 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 12008457 to Isik et al. [hereinafter Isik] in view of Giri et al., “Unsupervised Anomalous Sound Detection using Self-Supervised Classification and Group Masked Autoencoder for Density Estimation,” DCASE 2020 (02 Nov 2020) [hereinafter Giri], Henze et al., "AudioForesight: A Process Model for Audio Predictive Maintenance in Industrial Environments," IEEE (2019) [hereinafter Henze], and US Published Application 20210334645 to Pardeshi et al. [hereinafter Pardeshi].
Regarding claims 10 and 22, the combination of the combination of Isik, Giri, and Henze teaches all of the limitations of claims 1 and 14, respectively, as described in detail above.
Though Isik, Giri, and Henze teach the features of a framework for corrective maintenance by a technician, the combination of Isik, Giri, and Henze, however, does not explicitly teach-
wherein the corrective action is selected from the group consisting of diverting a faulty product, halting equipment operation, and automatically adjusting operation parameters of a system to compensate for a detected anomaly.
But Pardeshi teaches -
wherein the corrective action is selected from the group consisting of diverting a faulty product, halting equipment operation (Pardeshi ¶ 0064 teaches an [appropriate] action determined to be severe (or otherwise satisfying a relevant threshold or criterion) can result in content being paused (that is, “being paused” is halting equipment operation)), and automatically adjusting operation parameters of a system to compensate for a detected anomaly (Pardeshi ¶ 0356 teaches “AI-assisted annotations 3110 [(that is, “AI assisted” is automatically)] may then be used directly, or may be adjusted or fine-tuned using an annotation tool ( e.g., by a researcher, a clinician, a doctor, a scientist, etc.), to generate ground truth data. [(that is, “adjusted or fine-tuned” is automatically adjusting operation parameters of a system to compensate for a detected anomaly )]”).
* * *
Isik, Giri, Henze, and Pardeshi are from the same or similar field of endeavor. Isik teaches audio processing of audio data, in which a result of processing the audio data through the convolutional neural network may be used to perform an audio processing task. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score. Henze teaches anomaly detection to predict abnormal behavior in comparison to the normal behavior observed after setting up the system, and subsequently, a classifier tries to identify the system error based on predefined failure classes. Pardeshi teaches an auto-encoder trained on various “normal” acoustic signals to detect “abnormal” events from audio segments.
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Isik, Giri, and Henze pertaining to audio data classification via a neural network model to produce a result via a classification anomaly score for identifying of system errors, or anomalies, with the responsive action to a detected audio abnormality and model parameter adjustment / fine-tuning of Pardeshi.
The motivation to do so is because, in an immersive environment, there may be occurrences or events that happen outside this immersive environment for which a user should at least be notified. (Pardeshi ¶ 0053).
12.	Claims 11 and 12 are rejected under 35 U.S.C. § 103 as being unpatentable over Becker et al., “Acoustic Anomaly Detection in Additive Manufacturing with Long-Short Term Memory Neural Networks,” IEEE (April 2020) [hereinafter Becker], in view of US Published Application 20190354895 to Vasudevan et al. [hereinafter Vasudevan], Henze et al., "AudioForesight: A Process Model for Audio Predictive Maintenance in Industrial Environments," IEEE (2019) [hereinafter Henze], and Giri et al., “Unsupervised Anomalous Sound Detection using Self-Supervised Classification and Group Masked Autoencoder for Density Estimation,” DCASE 2020 (02 Nov 2020) [hereinafter Giri].
Regarding claim 11, Becker teaches [a] computer-implemented method for anomaly detection (Becker, left column of p. 923, “III. Approach,” third paragraph, teaches detection can also be used to detect if the current state of the printing process is correct (that is, a method for anomaly detection)), comprising:
training a neural network model (Becker, Table I, illustrates the trained classes:

    PNG
    media_image6.png
    183
    403
    media_image6.png
    Greyscale

Becker, left column of p. 923, “III. Approach,” third paragraph, teaches detection can also be used detect if the current state of the printing process is correct by detecting the fan noise, the printing noise or the movements of the z axis (that is, training a neural network)) . . . ;
performing multiple forms of data augmentation on a sample waveform, including differing types of data augmentation and differing degrees of each type of data augmentation, to generate a plurality of data augmentation samples (Becker, right column of p. 923, “III. Approach,” last partial paragraph, teaches “[s]everal different ways of data augmentation were used like time stretching, pitch shifting or amplifying (that is, “time stretching,” “pitch shifting,” and “amplifying” are performing multiple forms of data augmentation on a sample waveform, including differing types of data augmentation . . . to generate a plurality of data augmentation samples)]”; Becker, right column of p. 923, “III. Approach,” Figs. 3(a) – 3(g), teach “differing degrees,” where:

    PNG
    media_image5.png
    260
    386
    media_image5.png
    Greyscale

Becker, right column of p. 923, “III. Approach,” first & second full paragraphs (referring to Figs. 3(a) – 3(g)), teaches [i]n time stretching, the signal is stretched or compressed in the direction of time while the pitch remains the same. Figure 3a shows the original audio signal. If a stretch with various factors k (that is, “various factors k” is augmentation and differing degrees of each type of data augmentation)) is applied to this signal, it is compressed (see Figure 3b) or stretched (see Figure 3c). The pitch of the individual samples is increased by a certain number k halftones shifted up or down. Figure 3d shows a shift of 10 halftones down, Figure 3e shows a shift of 10 halftones up (that is, differing degrees of each type of data augmentation));
* * *
Though Becker teaches the use of audio waveform augmentation for detecting whether the current state of the printing process is correct based on input data, Becker does not explicitly teach that the training of the neural network model is directed to identify a form of data augmentation that has been performed on a waveform. 
But Vasudevan teaches –
[training] . . . to identify a form of data augmentation (Vasudevan, Abstract, teaches training a machine learning model on the training data using the current data augmentation policy; Vasudevan ¶ 0100 teaches [f]or each current data augmentation policy, the system determines a quality measure of the current data augmentation policy using the machine learning model after it has been trained using the current data augmentation policy (708) (that is, via “a quality measure of the current data augmentation policy,” the model is trained to identify a form of data augmentation)) that has been performed on a waveform (Vasudevan ¶ 0047 teaches the machine learning model is configured to process a representation of an audio waveform (that is, the data input is that of an waveform, in which the “current data augmentation policy” is performed on a waveform));
* * *
Becker and Vasudevan are from the same or similar field of endeavor. Becker teaches the training of neural network models using augmented audio waveforms for detecting waveform anomalies of a printing process. Vasudevan teaches identifying augmentations of a machine learning model trained using a current data augmentation policy, and determining the quality measure of a machine learning model trained using the current data augmentation policy applied in the training. 
It would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Appellant’s invention to modify Becker pertaining to models trained to detect anomalies from augmented audio waveforms with the identified current data augmentation policy of Vasudevan.
The motivation to do so is to be able to quantity and diversity of the training inputs used in training the machine learning model to result in greater prediction accuracy. (Vasudevan ¶ 0039).
Though Becker and Vasudevan teaches the use of audio waveform augmentation for training models for identifying augmentation in the form of anomalies, the combination of Becker and Vasudevan does not explicitly teach -
* * *
classifying the data augmentation sample segments with the neural network model to identify a form of data augmentation that has been performed on each of the segments; and
* * *
and performing a corrective action responsive to the anomaly score.
But Henze teaches -
classifying the data augmentation sample segments with the neural network model to identify a form of data augmentation that has been performed on each of the segments (Henze, Fig. 1, teaches an “Audio Anomaly Detection & Classification Process Model:”

    PNG
    media_image7.png
    492
    805
    media_image7.png
    Greyscale

Henze, right column of p. 353, “III. Audio Anomaly Detection & Classification Process Model,” second & third paragraphs, teaches “3) Anomaly Detector: . . . In this state, the system uses an anomaly detection model, trained by the Model Trainer, to detect uncommon behaviors in the sound recordings. . . . If an Anomaly detected the Anomaly Classifier is triggered. 4) Anomaly Classifier: In the Anomaly Classifier state, the found anomaly is tried to be classified based on predefined anomaly classes. These predefined classes are domain specific and thus depend on the industrial equipment used. The classification provides a first suggestion about the issue with the industrial equipment. While this is a useful hint for the technician who has to perform the maintenance, it can also save a lot of time. This classification uses the classifier as trained in the Model Trainer [(that is, classifying the data augmentation samples with the neural network model)]”); Henze, left column of p. 356, “C. Case Study Results,” first paragraph, teaches “Using the collected data from the experimental procedure implemented in the case study, we trained the Anomaly Detector and Anomaly Classifier. In the following, we present the achieved results of the trained neural network models [(that is, classifying . . . with the neural network)]”);
* * *
and performing a corrective action responsive to the anomaly score (Henze, right column of p. 353, “III. Audio Anomaly Detection & Classification Process Model,” third paragraph, teaches “[t]he classification provides a first suggestion about the issue with the industrial equipment. While this is a useful hint for the technician who has to perform the maintenance, it can also save a lot of time [(that is, the “hint of the technician,” via “predictive maintenance,” is performing a corrective action)]”; Henze, right column of p. 354, “IV. Case Study,” first full paragraph, teaches “1) Anomaly Detector. . . . We use a deep autoencoder to produce anomaly scores in order to differentiate between normal and abnormal behavior [(that is, responsive to the anomaly score)]).
Becker, Vasudevan, and Henze, are from the same or similar field of endeavor. Becker teaches the training of neural network models using augmented audio waveforms for detecting waveform anomalies of a printing process. Vasudevan teaches identifying augmentations of a machine learning model trained using a current data augmentation policy, and determining the quality measure of a machine learning model trained using the current data augmentation policy applied in the training. Henze teaches anomaly detection to predict abnormal behavior in comparison to the normal behavior observed after setting up the system, and subsequently, a classifier tries to identify the system error based on predefined failure classes. 
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Appellant’s invention to modify the combination of Becker and Vasudevan pertaining to models trained to detect anomalies from augmented audio waveforms based on an augmentation policy with the identifying of system errors, or anomalies, of Henze.
The motivation to do so is because “[a]nalyzing the equipment conditions ensures the maximum interval between repairs and minimizing the number and costs of unscheduled outages [3] which offers an opportunity to cost effective maintenance [6].” (Henze, right column of p. 352, “I. Introduction,” first partial paragraph).
Though Becker, Vasudevan, and Henze teaches the features of use of audio waveform augmentation for training models for identifying augmentation in the form of anomalies; the combination of Becker, Vasudevan, and Henze, however, does not explicitly teach –
* * *
segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size;
* * *
determining an anomaly score by averaging outputs of the neural network model for the data augmentation sample segments; and
But Giri teaches -
* * *
segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size (Giri, right column of p. 2, “2.2.1. Classifier Architectures,” first paragraph, teaches “[f]or the classification task, we employ two different architectures; MobileNetV2 and ResNet-50. MobileNetV2 is introduced in [4] as a computationally efficient improvisation of convolutional neural networks for visual recognition tasks such as object detection, classification and semantic segmentation”; Giri, left column of p. 2, “2.1. Group Masked Autoencoder (Group-MADE),” firth paragraph, teaches “[f]ollowing the baseline model, each input 10s file is split into frames [(that is, segmentation)] of length 64ms, with hop length [(that is, separated from one another by a hop size)] of 32ms between frames. 1024-FFT and 128 Mel bins are used to featurize each frame. 5 frames are concatenated, resulting in 5 x 128 = 640 dimensional input [(that is, segmenting the data augmentation samples into respective sets of segments, separated from one another by a hop size)]”);
* * *
and determining an anomaly score (Giri, left column of p. 2, “2.2 Self-Supervised Classification,” last partial paragraph, teaches “2.1. Group Masked Autoencoder (Group-MADE),” third paragraph, teaches “[t]he proposed Group MADE model is trained using negative log likelihood as cost function, using all the normal training data across all IDs for a specific machine. During inference we use the negative log likelihood as anomaly score for each test sample [(that is, determining an anomaly score)]”) by averaging outputs of the neural network model for the data augmentation sample segments for the data augmentation samples (Giri, left column of p. 3, “2.3 Ensembling,” second paragraph, teaches we “transform the anomaly scores of each model into a standardized scale, before combining them. The standardization transformation for any given model is applied in a per-machine ID fashion, by computing the mean and variance of its anomaly scores over the training data for that machine ID. The anomaly scores are then transformed to have zero mean and unit variance over the training data of that machine ID. Standardized anomaly scores across different models are then combined using mean or max ensembling [(that is, “mean ensembling” is averaging outputs of the neural network model for the data augmentation samples)]”).
Becker, Vasudevan, Henze, and Giri are from the same or similar field of endeavor. Becker teaches the training of neural network models using augmented audio waveforms for detecting waveform anomalies of a printing process. Vasudevan teaches identifying augmentations of a machine learning model trained using a current data augmentation policy, and determining the quality measure of a machine learning model trained using the current data augmentation policy applied in the training. Henze teaches anomaly detection to predict abnormal behavior in comparison to the normal behavior observed after setting up the system, and subsequently, a classifier tries to identify the system error based on predefined failure classes. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score.
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Appellant’s invention to modify the combination of Becker, Vasudevan, and Henze pertaining to models trained to detect anomalies from augmented audio waveforms based on an augmentation policy for augmented data identification with the classification anomaly score of Giri.
The motivation for doing so is to “detect audio recordings containing anomalous machine sounds in a test set, when the training dataset itself does not contain any examples of anomalies.” (Giri, Abstract).
Regarding claim 12, the combination of Becker, Vasudevan, Henze, and Giri teaches all of the limitations of claim 11, as described in detail above.
Becker teaches -
wherein the multiple forms of data augmentation include one or more types of data augmentation selected from the group consisting of pitch shift, time stretch (Becker, left column of p. 923, “III. Approach, last partial paragraph, teaches [s]everal different ways of data augmentation were used like time stretching, pitch shifting, or amplifying (that is, include one or more types of data augmentation selected from the group consisting of pitch shift, time stretch)
[Examiner notes that the phrase “one or more types of augmentation” controls in relation to the “group [of types] consisting of,” and accordingly, the broadest reasonable interpretation of the claim at a minimum calls for “one . . .type[] of data augmentation”]), . . . .
13.	Claims 23-25 are rejected under 35 U.S.C. § 103 as being unpatentable over Becker et al., “Acoustic Anomaly Detection in Additive Manufacturing with Long Short-Term Memory Neural Networks,” IEEE (April 2020) [hereinafter Becker] in view of Giri et al., “Unsupervised Anomalous Sound Detection using Self-Supervised Classification and Group Masked Autoencoder for Density Estimation,” DCASE 2020 (02 Nov 2020) [hereinafter Giri], and Henze et al., "AudioForesight: A Process Model for Audio Predictive Maintenance in Industrial Environments," IEEE (2019) [hereinafter Henze].
Regarding claim 23, Becker teaches [a] system for anomaly detection (Becker, left column of p. 922, “II. Related Work,” first full paragraph, teaches this paper discusses the tradeoff between computation time and detection rate (that is, “computation time and detection rate” pertain to a system ), comprising:
a hardware processor; and
a memory that stores computer program code which, when executed by the hardware processor, implements:
a neural network model (Becker, left column of p. 923, “III. Approach,” last partial paragraph, teaches “to get a more general result with the neural network model”) that identifies a form of data augmentation (Becker, left column of p. 923, “III. Approach,” last partial paragraph, teaches [r]ecording data of errors during 3d prints is quite laborious as they do only occur occasionally. Some of them can be forced, so there are ways to artificially generate errors. But as this is harmful for the printer and not an intended use we used data augmentation to get more data on one hand, but also to get a more general result with the neural network model. Several different ways of data augmentation were used like time stretching, pitch shifting or amplifying (that is, to identify a form of data augmentation)) that has been performed on a waveform (Becker, left column of p. 923,”III. Approach,” first partial paragraph, teaches recorded audio data was saved in the wav [(that is, Waveform Audio File format)] file format which doesn’t use a compression);
a model trainer that trains the neural network model (Becker, Table I, illustrates the trained classes:

    PNG
    media_image6.png
    183
    403
    media_image6.png
    Greyscale

Becker, left column of p. 923, “III. Approach,” third paragraph, teaches detection can also be used detect if the current state of the printing process is correct by detecting the fan noise, the printing noise or the movements of the z axis (that is, training a neural network));
a data augmenter that performs multiple forms of data augmentation on a sample waveform, including differing types of data augmentation and differing degrees of each type of data augmentation, to generate a plurality of data augmentation samples (Becker, right column of p. 923, “III. Approach,” last partial paragraph, teaches [s]everal different ways of data augmentation were used like time stretching, pitch shifting or amplifying (that is, “time stretching,” “pitch shifting,” and “amplifying” are performing multiple forms of data augmentation on a sample waveform, including differing types of data augmentation . . . to generate a plurality of data augmentation samples); (Becker, right column of p. 923, “III. Approach,” Figs. 3(a) – 3(g), teach “differing degrees,” where:

    PNG
    media_image5.png
    260
    386
    media_image5.png
    Greyscale

Becker, right column of p. 923, “III. Approach,” first & second full paragraphs (referring to Figs. 3(a) – 3(g)), teaches [i]n time stretching, the signal is stretched or compressed in the direction of time while the pitch remains the same. Figure 3a shows the original audio signal. If a stretch with various factors k (that is, “various factors k” is augmentation and differing degrees of each type of data augmentation)) is applied to this signal, it is compressed (see Figure 3b) or stretched (see Figure 3c). The pitch of the individual samples is increased by a certain number k halftones shifted up or down. Figure 3d shows a shift of 10 halftones down, Figure 3e shows a shift of 10 halftones up (that is, differing degrees of each type of data augmentation)), . . . 
* * *
Though Becker teaches the use of audio waveform augmentation for detecting if the current state of the printing process is correct, Becker, however, does not explicitly teach –
* * *
[a data augmenter] . . . and that segments the data augmentation samples into respective sets of segments, separated from one another by a hop size . . . ;
an anomaly detector that determines an anomaly score by averaging outputs of the neural network model for the data augmentation samples; and
* * *
But Giri teaches -
* * *
[a data augmenter] . . . and that segments the data augmentation samples into respective sets of segments, separated from one another by a hop size (Giri, right column of p. 2, “2.2.1. Classifier Architectures,” first paragraph, teaches “[f]or the classification task, we employ two different architectures; MobileNetV2 and ResNet-50. MobileNetV2 is introduced in [4] as a computationally efficient improvisation of convolutional neural networks for visual recognition tasks such as object detection, classification and semantic segmentation”; Giri, left column of p. 2, “2.1. Group Masked Autoencoder (Group-MADE),” firth paragraph, teaches “[f]ollowing the baseline model, each input 10s file is split into frames [(that is, segmentation)] of length 64ms, with hop length [(that is, separated from one another by a hop size)] of 32ms between frames. 1024-FFT and 128 Mel bins are used to featurize each frame. 5 frames are concatenated, resulting in 5 x 128 = 640 dimensional input [(that is, [a data augmenter] . . . and that segments the data augmentation samples into respective sets of segments, separated from one another by a hop size)]) . . . ;
an anomaly detector that determines an anomaly score (Giri, left column of p. 2, “2.2 Self-Supervised Classification,” last partial paragraph, teaches “2.1. Group Masked Autoencoder (Group-MADE),” third paragraph, teaches “[t]he proposed Group MADE model is trained using negative log likelihood as cost function, using all the normal training data across all IDs for a specific machine. During inference we use the negative log likelihood as anomaly score for each test sample [(that is, determines an anomaly score)]”) based on the classification of the data augmentation samples for the data augmentation samples (Giri, left column of p. 3, “2.3 Ensembling,” second paragraph, teaches we “transform the anomaly scores of each model into a standardized scale, before combining them. The standardization transformation for any given model is applied in a per-machine ID fashion, by computing the mean and variance of its anomaly scores over the training data for that machine ID. The anomaly scores are then transformed to have zero mean and unit variance over the training data of that machine ID. Standardized anomaly scores across different models are then combined using mean or max ensembling [(that is, “mean ensembling” is based on the classification of the data augmentation samples for the data augmentation samples)]”); and
* * *
Becker and Giri are from the same or similar field of endeavor. Becker teaches the training of neural network models using augmented audio waveforms for detecting waveform anomalies of a printing process. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score.
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Appellant’s invention to modify Becker pertaining to models trained to detect anomalies from augmented audio waveforms based on an augmentation policy for augmented data identification with the anomaly score of Giri.
The motivation for doing so is to “detect audio recordings containing anomalous machine sounds in a test set, when the training dataset itself does not contain any examples of anomalies.” (Giri, Abstract).
Though Becker and Giri teach the features of a framework for corrective maintenance by a technician, the combination of Becker and Giri, however, does not explicitly teach-
* * *
[a data augmenter] . . . , wherein the neural network model classifies the data augmentation samples to identify a form of data augmentation that has been performed on each of the data augmentation sample segments;
* * *
and a response function that performs a corrective action responsive to the anomaly score.
But Henze teaches -
* * *
[a data augmenter] . . . , wherein the neural network model classifies the data augmentation samples to identify a form of data augmentation that has been performed on each of the data augmentation sample segments (Henze, Fig. 1, teaches an “Audio Anomaly Detection & Classification Process Model:”

    PNG
    media_image7.png
    492
    805
    media_image7.png
    Greyscale
 
Henze, right column of p. 353, “III. Audio Anomaly Detection & Classification Process Model,” second & third paragraphs, teaches “3) Anomaly Detector: . . . In this state, the system uses an anomaly detection model, trained by the Model Trainer, to detect uncommon behaviors in the sound recordings. . . . If an Anomaly detected the Anomaly Classifier is triggered. 4) Anomaly Classifier: In the Anomaly Classifier state, the found anomaly is tried to be classified based on predefined anomaly classes. These predefined classes are domain specific and thus depend on the industrial equipment used. The classification provides a first suggestion about the issue with the industrial equipment. While this is a useful hint for the technician who has to perform the maintenance, it can also save a lot of time. This classification uses the classifier as trained in the Model Trainer [(that is, classifying the data augmentation samples with the neural network model)]”); Henze, left column of p. 356, “C. Case Study Results,” first paragraph, teaches “Using the collected data from the experimental procedure implemented in the case study, we trained the Anomaly Detector and Anomaly Classifier. In the following, we present the achieved results of the trained neural network models [(that is, classifying . . . with the neural network)]”); 
* * *
and a response function that performs a corrective action responsive to the anomaly score (Henze, right column of p. 353, “III. Audio Anomaly Detection & Classification Process Model,” third paragraph, teaches “[t]he classification provides a first suggestion about the issue with the industrial equipment. While this is a useful hint for the technician who has to perform the maintenance, it can also save a lot of time [(that is, the “hint of the technician,” via “predictive maintenance,” is performing a corrective action)]”; Henze, right column of p. 354, “IV. Case Study,” first full paragraph, teaches “1) Anomaly Detector. . . . We use a deep autoencoder to produce anomaly scores in order to differentiate between normal and abnormal behavior [(that is, responsive to the anomaly score)]”).
Becker, Giri, and Henze are from the same or similar field of endeavor. Becker teaches the training of neural network models using augmented audio waveforms for detecting waveform anomalies of a printing process. Giri teaches detecting anomalous machine sounds in a test set providing an anomaly score. Henze teaches anomaly detection to predict abnormal behavior in comparison to the normal behavior observed after setting up the system, and subsequently, a classifier tries to identify the system error based on predefined failure classes. 
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Appellant’s invention to modify the combination of Becker and Giri pertaining to models trained to detect anomalies from augmented audio waveforms implementing an anomaly score based on an ensemble approach the identifying of system errors, or anomalies, of Henze.
The motivation to do so is because “[a]nalyzing the equipment conditions ensures the maximum interval between repairs and minimizing the number and costs of unscheduled outages [3] which offers an opportunity to cost effective maintenance [6].” (Henze, right column of p. 352, “I. Introduction,” first partial paragraph).
Examiner notes that the terms "hardware processor" and "a memory that stores computer program code which, when executed by the hardware processor” recited in Appellant's claims are interpreted to be a well-known hardware structures. 
Regarding claim 24, the combination of Becker, Giri, and Henze teaches all of the limitations of claim 23, as described in detail above. 
Becker teaches -
wherein the differing types of data augmentation are selected from the group consisting of pitch shift, time stretch (Becker, left column of p. 923, “III. Approach, last partial paragraph, teaches [s]everal different ways of data augmentation were used like time stretching, pitch shifting, or amplifying (that is, the differing types of data augmentation are selected from the group consisting of pitch shift, time stretch)
[Examiner construes that the phrase “one or more types of augmentation” controls in relation to the “group [of types] consisting of,” and accordingly, the broadest reasonable interpretation of the claim at a minimum calls for “one . . .type[] of data augmentation”]), . . . .
Regarding claim 25, the combination of Becker, Giri, and Henze teaches all of the limitations of claim 23, as described in detail above. 
Becker teaches -
wherein the sample waveform is an audio waveform (Becker, left column of p. 923,”III. Approach,” first partial paragraph, teaches recorded audio data was saved in the wav [(that is, Waveform Audio File format)] file format which doesn’t use a compression (that is, the sample waveform is an audio waveform)).
Response to Arguments
14.	Examiner has fully considered Appellant’s arguments, and responds below accordingly.
Claim Rejections under Section 101
15.	Regarding Step 2A Prong Two, Applicant submits that “[t]he Examiner argues that the present specification presents its improvement as "a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art." In support of this position, the Examiner cites paragraphs 7-8 of the present specification, which note the deficiencies of techniques such as reconstruction, feature learning, classification, and geometric transformation.” (Response at p. 11 (Applicant quoting Specification ¶¶ 0024-45)).
Referring to the Specification, Applicant submits apart from paragraphs 0024-25, “[t]he remaining passages in the present specification provide even more detail as to how the improvement is implemented. Applicant therefore respectfully maintains that those having ordinary skill in the art would recognize the present specification as providing sufficient detail to make the improvement apparent.
The claims reflect this improvement. Notably, they recite training a model to identify a form of data augmentation, performing multiple forms of data augmentation on a sample waveform, classifying the data augmentation samples, and determining an anomaly score by averaging outputs of the neural network model for the data augmentation samples. Those having ordinary skill in the art would recognize that these steps reflect the improvement described in the present specification. As such, the present claims are directed to a practical application.” (Response at p. 13).
Examiner Response:
Examiner respectfully disagrees because the claim as a whole does not serve to integrate the abstract idea into a practical application.
Under Step 2A Prong Two, an improvement may serve to integrate the abstract idea into a practical application when first, the specification is evaluated to determine if the disclosure provides sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. Second, if sufficient details are present, the claims are evaluated to ensure that the claim itself reflects the disclosed improvement. (see MPEP § 2106.04(d)(1)).
Under the first criteria, it is unclear to the Examiner as to whether “disclosure provides sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement” because such improvements are set out in a conclusory manner. Applicant points to portions of the disclosure as representing an improvement, in that:
Block 510 then determines an anomaly score for the new sample. For example, this score may be determined as:

    PNG
    media_image8.png
    88
    315
    media_image8.png
    Greyscale

where x is the new sample, Tj(x) is the output of performing the jth combination of data augmentation types and degrees on the new sample x, y(•) is the output of the classifier that is used to determine what type and degree of data augmentation was performed on the new data augmentation sample, and k is a total number of combinations of data augmentation types and degrees. In particular, the value of y(•) may be the averaged probability of the segments for the data augmentation sample. For example, following the illustration of FIG. 1, k may be 9.
(Response at p. 12-13 (Specification ¶ 0044)). In relation to an anomaly score, the Applicant’s disclosure does not provide sufficient details indicating such an anomaly score to a person of ordinary skill in the art would recognize this form of anomaly score as providing an improvement. Generally, the Applicant’s disclosure submits that 
[a]n anomaly score can then be generated, on the basis of a confidence with which the augmented input data is classified to trained augmentation type classes. Anomalous sound data may generally have a lower confidence value and higher anomaly score than normal sound data. 
(Specification ¶ 0024). Specifically, as with general anomaly score, an anomaly determination is made, where
Once the anomaly score for the new sample has been determined by block 510, block 512 uses the anomaly score to determine whether the new sample represents an anomaly. For example, this may include comparing the anomaly score to a threshold value, with above-threshold anomaly scores indicating that an anomaly has occurred, and with at- or below-threshold anomaly scores indicating that no anomaly has occurred.
(Specification ¶ 0045). In other words, no indication is presented as to the improvement contributed by this form of anomaly score.
The Specification may be said to set forth “an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art).” (MPEP § 2106.04(d)(1)). In this instance, the Examiner should not determine the claim improves technology. Accordingly, whether the claims “reflect the improvement” is a moot point.
However, under Step 2A Prong Two, the claim may additional elements of a claim, when considered as a whole, may serve to integrate the abstract idea into a practical application. That is, the analysis “considers the claim as a whole. That is, the limitations containing the [abstract idea] as well as the additional elements in the claim besides the [abstract idea] need to be evaluated together to determine whether the claim integrates the [abstract idea] into a practical application.” (MPEP § 2106.04(d) sub III).
The additional elements beyond the identified judicial exception recited in the claim are “a computer-implemented method,” where instructions to apply the abstract idea on generic computer components (i.e., computer-implemented) do not integrate an abstract idea into a practical application. (MPEP § 2106.05(f)). The claim also recites “training a neural network model to identify a form of data augmentation . . . ,” which is recited at a high level of generality, and is used to generally apply the abstract idea without placing any limits on how the trained ANN functions. The training of the neural network model is mere instructions to implement the abstract idea on a generic computer component, (MPEP § 2106.05(f)), which does not serve to integrate the abstract idea into a practical application.
16.	Applicant submits that “claims 10 and 22 provide a practical application in another way, in the corrective action. with respect to the independent claims, the Examiner asserts that the corrective action could, broadly construed, include a report to a human being that represents insignificant extra-solution activity. With respect to claims 10 and 22, the Examiner argues that the claims ‘do not place any limits on how these respective actions are accomplished.’” (Response at p. 13).
Applicant also points to claims 10 and 22, which relate to “Claim 10, which further recites, 
wherein the corrective action is selected from the group consisting of 
diverting a faulty product, 
halting equipment operation, and 
automatically adjusting operational parameters of a system to compensate for a detected anomaly.
(Response at p. 14 (see claim 10, lines 1-4)). These claims, however, do not place any limits on how these respective actions are accomplished. (see MPEP § 2106.05(f)).
Turning to the Applicant’s disclosure, such actions are set out in a conclusory manner. For example, 
The responsive action can be used to quickly and automatically respond to any such anomaly, providing a rapid response to new circumstances. For example, anomalies may indicate a product defect in a factory, in which case the faulty product can be diverted from the factory line and can be repaired. Anomalies may also indicate an equipment fault, in which case the factory line may be halted, to repair the equipment and prevent further damage. In some cases, where the anomaly may be addressed automatically, the responsive action may adjust operational parameters of a system to compensate, such as increasing a cooling action when an overheating condition is detected. (Specification ¶ 0036).
A response function 616 is triggered by the detection of an anomaly. The response function 616 may include any appropriate action that corrects, reports, or otherwise addresses the detected anomaly. (Specification ¶ 0063).
Neither the Specification, nor the claims, set out specifics of how such actions are undertaken, nor an indication of improvement having the detail necessary to be apparent to a person of ordinary skill in the art.
Accordingly, as set out hereinabove, claims 1-25 are subject-matter ineligible.
Claim Rejections under 35 U.S.C. § 103
17.	Applicant submits that “[a]s an initial matter, Applicant respectfully notes that the grace period disclosure, "DETECTION OF ANOMALOUS SOUNDS FOR MACHINE CONDITION MONITORING
USING CLASSIFICATION CONFIDENCE," was published on July 1, 2020. The Isik reference was filed on September 29, 2020, and the Giri reference was published on November 2, 2020, both after the grace period disclosure. The Isik and Giri references were published less than one year before the effective filing date of the present application and are cited for disclosures relating to subjected matter that was previously disclosed in the grace period disclosure. Applicant therefore respectfully asserts that Isik and Giri are not valid prior art under the exception of 35 U.S.C. § 102(b)(l)(B).” (Response at pp. 15-16).
 (Response at p. 14).
Examiner’s Response:
Examiner respectfully submits that the cited prior art of Giri and Isik are proper references under Section 102(a)(1), and that Applicant has not made a requisite showing as being excepted under Section 102(b)(1)(B).
Under Section 102, “[a] person shall be entitled to a patent unless – the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.” (35 U.S.C. § 102(a)(1)). Also, under “[Section] 102(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
AIA  35 U.S.C. 102(b)(1)(B)  provides that a disclosure which would otherwise qualify as prior art under AIA  35 U.S.C. 102(a)(1)  (patent, printed publication, public use, sale, or other means of public availability) is excepted as prior art if: (1) the disclosure was made one year or less before the effective filing date of the claimed invention; and (2) the subject matter disclosed had been previously publicly disclosed by the inventor, a joint inventor, or another who obtained the subject matter directly or indirectly from the inventor or joint inventor. 
Applicant, without basis, alleges that the Giri reference and the Isik reference, are excepted as prior art to the instant application. “[a]ccording to 35 U.S.C. § 102(b)(1)(B), a disclosure that is made one year or less before the effective filing data of the claimed invention (November 19, 2020) shall not be prior art under 35 U.S.C. § 102(a)(1) if the subject matter disclosed [in the purported prior art reference] had, before such disclosure, been publicly disclosed by the inventor or a joint inventor.
The earliest effective filing date of the Applicant’s invention is the filing date of the instant application, which is 19 November 2020. The Applicant’s Application Data Sheet does not claim priority to an earlier date.
The inventive entity of the instant application is:
Tadanobu Inoue
Phongtharin Vinayavekhin
Shu Morikuni
Michiaki Tatsubori
Ryuki Tachibana
Applicant of the instant application is “International Business Machines Corporation.” 
The non-patent literature of the Giri reference is shown as published “2-3 November 2020, Tokyo Japan” in connection with the “Detection and Classification of Acoustic Scenes and Events 2020” conference. Accordingly, the publication of the Giri reference predates that of the effective filing date of the instant application. Moreover, the authors “Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng, Karim Helwani, Umut Isik, Arvindh Krishnaswamy” are employed by Amazon Web Services, Palo Alto, CA. The authors are not “the inventor, a joint inventor, or another who obtained the subject matter directly or indirectly from the inventor or joint inventor.” Accordingly, the Giri reference is not excepted as prior art under Section 102(a)(1).
US Patent 12008457 to Isik et al. has an effective filing date of 29 September 2020, and a patent date of 11 June 2024, and accordingly, is prior art under Section 102(a)(2). The inventive entity of Isik includes “Mehmet Umut Isik, Ritwik Giri, Neerad Dilip Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh KRISHNASWAMY,” and the applicant is Amazon Technologies. Because Isik was “effectively filed before the effective filing date of the claimed invention” and issued as a patent, Isik is a proper prior art reference under Section 102(a)(2).
Applicant has not asserted an exception under Section 102(b)(2)(B) to the Isik prior art. 
Accordingly, without proper showing by Applicant to the contrary, the cited prior art references of Giri and Isik are proper prior art under Section 102(a)(1) and/or Section 102(a)(2).
18.	Applicant submits that “the cited art fails to disclose or suggest the recited claim features. Claim 1 recites, inter alia, "training a neural network model to identify a form of data augmentation that has been performed on a waveform." Claims 11, 13-14, and 23 recite analogous language. With respect to claims 1 and 13-14, the rejection asserts that Isik teaches this feature in its discussion of applying augmentations to training data. With respect to claims 11, 14, and 23, the rejection relies on Becker to teach this feature.” (Response at p. 16).
Examiner’s Response:
Applicant’s exemplar claim 1 recites:
A computer-implemented method for anomaly detection, comprising:
[(a)] training a neural network model to identify a form of data augmentation that has been performed on a waveform;
[(b)] performing multiple forms of data augmentation on a sample waveform to generate a plurality of data augmentation samples;
[(c)] classifying the data augmentation samples with the neural network model;
[(d)] determining an anomaly score by averaging outputs of the neural network model for the data augmentation samples; and
[(e)] performing a corrective action responsive to the anomaly score,
(Claim 1, lines 1-10).
Claim 8, which depends from claim 1, recites with regard to [(a)] training a neural network:”
wherein training the neural network model includes performing the multiple forms of data augmentation on training waveforms in a training dataset.
(Claim 8, lines 1-2).
However, Isik teaches that “augmentations 330 may be applied to training audio data 302.” (see above, Isik 7:63-66 & Fig. 5). The broadest reasonable interpretation of “training . . . to identify” would include the use of augmented audio data to train the model, which is not inconsistent with Applicant’s disclosure. (MPEP § 2111). Moreover, dependent claim 8 further clarifies that “multiple forms of data augmentation on training waveforms” would be used. Accordingly, the Applicant’s claims cover the teachings of Isik relating to “training . . . to identify.” 
Applicant continues to argue against the references individually. For example, Applicant argues that “while Isik describes ‘audio classification’ as being an example of what a model be trained to do, it provides no detail as to what information that classification is obtaining about a sample.” (Response at p. 16). Generally, a model is trained to predict that which it is trained with – augmented audio training data. In this instance, Isik in Fig. 5 and accompanying text, provide such training through augmented audio data 332 derived from “training audio data 302.” The broadest reasonable interpretation of Applicant’s term “to identify a form of data augmentation” encompasses the teachings of Isik pertaining to “audio classification 271a.” 
Applicant attacks the references individually. Applicant alleges the references of Becker is alleged as “cited to show training generally,” Vasudevan is “cited relating to a quality measure of a data augmentation policy,” and Henze the “classification of sample segments.” 
One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. Where a rejection of a claim is based on two or more references, a reply that is limited to what a subset of the applied references teaches or fails to teach, or that fails to address the combined teaching of the applied references may be considered to be an argument that attacks the reference(s) individually, as is the case here with the cited prior art of Isik, Becker, Vasudevan, and Henze. MPEP § 2145.IV.
Moreover, the rejections hereinabove clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant's invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
Applicant argues that “[t]he Becker reference is cited, with respect to claim 23, to address a model that identifies a form of data augmentation that has been perfom1ed on a waveform. The cited passage of Becker describes the use of different types of data augmentation to artificially generate errors that help to generalize the results of the model. However, the model that is being trained is not used to identity the form of data augmentation that was used, but is instead used to identify errors in a 3D printing process.” (Response at p. 17).
Claim 23 recites:
* * *
a neural network model that identifies a form of data augmentation that has been performed on a waveform;
a model trainer that trains the neural network model;
a data augmenter that perfom1s mu11iple forms of data augmentation on a sample waveform, including differing types of data augmentation and differing degrees of each type of data augmentation, to generate a plurality of data augmentation samples, and that segments the data augmentation samples into respective sets of segments, separated from one another by a hop size, wherein the neural network model classifies the data augrnentation sarnples to identify a form of data augmentation that has been performed on each of the data augmentation sample segments;
an anomaly detector that determines an anomaly score by averaging outputs of the neural network model for the data augmentation samples; and 
a response function that performs a corrective action responsive to the anomaly score.
(Claim 23, lines 5-18).
The “neural network model that identifies a form of data augmentation that has been performed on a waveform” is not so limited as argued by Applicant. Accordingly, the broadest reasonable interpretation of a “model that identifies a form of data augmentation” covers the teachings of Becker, which is not inconsistent with the Applicant’s disclosure. (MPEP § 2111; see Specification ¶ 0036 (“Anomaly detection may be used for a variety of applications, such as in equipment fault detection, product defect detection, network intrusion detection, fraud detection, medical diagnosis, and earthquake detection.”))
Applicant attacks the references individually. Applicant alleges the references of Giri is alleged as to “never discloses nor suggests identifying a form of data augmentation that has been performed,” that the models at issue selects from predefined anomaly classes which are "domain specific and thus depend on the industrial equipment used,” Henze the “classification of sample segments,” and Pardeshi reference “is not cited to address features relating to identifying a form of data augmentation.”
One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. Where a rejection of a claim is based on two or more references, a reply that is limited to what a subset of the applied references teaches or fails to teach, or that fails to address the combined teaching of the applied references may be considered to be an argument that attacks the reference(s) individually, as is the case here with the cited prior art of Isik, Becker, Vasudevan, and Henze. MPEP § 2145.IV.
Moreover, the rejections hereinabove clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant's invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
Conclusion
19.	THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
20.	The prior art made of record and not relied upon is considered pertinent to Appellant's disclosure:
(US Published Application 20230124877 to Fukuoka et al.) teaches that when an anomaly occurs during machining, a machine tool informs the outside world of this occurrence of anomaly, temporarily stops the machining, etc. Various methods may be used to detect occurrence of an anomaly, one of which is to detect occurrence of an anomaly on the basis of a sound produced during machining. This method uses a difference in sound between normal machining and anomalous machining to determine whether the machining is normal or anomalous.
(US Published Application 20220394200 to Wang) teaches that when a machine in operation breaks down, it is possible to detect an anomaly of the machine occurred in the past from the recorded machine sound. The operation sound of the machine in the normal operation is recorded, and normal content is generated therefrom. Training data is generated based on the normal content and anomalous content acquired by adding an anomaly to the normal content. A learned model is generated from the generated training data. The anomaly of the machine occurred in the past is detected by using the learned model.
(Muller et al., “Acoustic Anomaly Detection for Machine Sounds based on Image Transfer Learning,” arXiv (2020)) teaches to extract features using neural networks that were pretrained on the task of image classification. We then use these features to train a variety of anomaly detection models and show that this improves results compared to convolutional autoencoders in recordings of four different factory machines in noisy environments. In our setting, Gaussian Mixture Models and One-Class Support Vector Machines achieve the best anomaly detection performance.
21.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Appellant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/K.L.S./
Examiner, Art Unit 2122

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
Prosecution Timeline

Nov 19, 2020
Application Filed
Jun 30, 2023
Non-Final Rejection — §101, §102, §103
Sep 21, 2023
Interview Requested
Oct 06, 2023
Examiner Interview Summary
Oct 10, 2023
Response Filed
Feb 03, 2024
Final Rejection — §101, §102, §103
Apr 04, 2024
Interview Requested
Apr 11, 2024
Examiner Interview Summary
Apr 12, 2024
Response after Non-Final Action
May 01, 2024
Response after Non-Final Action
May 14, 2024
Response after Non-Final Action
May 14, 2024
Notice of Allowance
Jun 06, 2024
Response after Non-Final Action
Jul 15, 2024
Response after Non-Final Action
Jul 15, 2024
Response after Non-Final Action
Jul 18, 2024
Response after Non-Final Action
Jul 24, 2024
Response after Non-Final Action
Aug 21, 2024
Response after Non-Final Action
Nov 16, 2024
Non-Final Rejection — §101, §102, §103
Feb 06, 2025
Interview Requested
Feb 18, 2025
Examiner Interview Summary
Feb 18, 2025
Response Filed
May 20, 2025
Final Rejection — §101, §102, §103
Jul 21, 2025
Interview Requested
Jul 28, 2025
Response after Non-Final Action
Aug 06, 2025
Examiner Interview Summary
Sep 05, 2025
Non-Final Rejection — §101, §102, §103
Nov 13, 2025
Interview Requested
Nov 20, 2025
Examiner Interview Summary
Dec 01, 2025
Response Filed
Mar 10, 2026
Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/321,251
Patent 12591815
METHOD AND SYSTEM FOR UPDATING MACHINE LEARNING BASED CLASSIFIERS FOR RECONFIGURABLE SENSORS
2y 5m to grant Granted Mar 31, 2026
17/704,721
Patent 12585917
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES
2y 5m to grant Granted Mar 24, 2026
16/994,396
Patent 12547759
PRIVACY PRESERVING MACHINE LEARNING MODEL TRAINING
2y 5m to grant Granted Feb 10, 2026
18/514,482
Patent 12530613
SYSTEMS AND METHODS FOR PERFORMING QUANTUM EVOLUTION IN QUANTUM COMPUTATION
2y 5m to grant Granted Jan 20, 2026
18/137,812
Patent 12518214
DISTRIBUTED MACHINE LEARNING SYSTEMS INCLUDING GENERATION OF SYNTHETIC DATA
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
37%
Grant Probability
55%
With Interview (+18.0%)
4y 8m
Median Time to Grant
High
PTA Risk
Based on 134 resolved cases by this examiner. Grant probability derived from career allow rate.
SOUND ANOMALY DETECTION USING DATA AUGMENTATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email