Prosecution Insights
Last updated: April 19, 2026
Application No. 17/856,652

SYSTEM AND METHOD FOR VIBROACOUSTIC DIAGNOSTIC AND CONDITION MONITORING A SYSTEM USING NEURAL NETWORKS

Final Rejection §103§112
Filed
Jul 01, 2022
Examiner
LEE, HANA
Art Unit
3662
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY
OA Round
4 (Final)
60%
Grant Probability
Moderate
5-6
OA Rounds
3y 0m
To Grant
96%
With Interview

Examiner Intelligence

Grants 60% of resolved cases
60%
Career Allow Rate
84 granted / 141 resolved
+7.6% vs TC avg
Strong +37% interview lift
Without
With
+36.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
36 currently pending
Career history
177
Total Applications
across all art units

Statute-Specific Performance

§101
12.6%
-27.4% vs TC avg
§103
48.8%
+8.8% vs TC avg
§102
14.2%
-25.8% vs TC avg
§112
22.1%
-17.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 141 resolved cases

Office Action

§103 §112
DETAILED ACTION The amendments filed 10/28/2025 have been entered. Claims 8, 13-15, 17-18, and 22 have been amended, claims 1-7, 16, and 28-29 have been cancelled, and claims 31-39 have been added. Claims 8-15, 17-18, 20-27, and 30-38 remain pending in the application and are discussed on the merits below. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments filed 10/28/2025 have been fully considered but are considered moot because they are directed toward subject matter that has not been previously considered and have necessitated a new grounds of rejection. Response to Amendment Regarding the objection to the specification, Applicant has amended the specification to overcome the objection. The objection to the specification has been withdrawn. Regarding the objection to the claims, Applicant has amended the claims to overcome the previously set forth objection. The previously set forth objection to the claims has been withdrawn. However, the amendments have necessitated new objections as outlined below. Regarding the rejections under 35 USC §103, amendments made to the claims have necessitated a new grounds of rejection as outlined below. Claim Objections Claims 1 and 18 are objected to because of the following informalities: “a state prediction from the neural network model” should read "a state prediction from the two-stage convolutional neural network model”. Appropriate correction is required. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 35-37 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 35 recites the limitation "the one-dimensional feature" in lines 2-3 and 5. There is insufficient antecedent basis for this limitation in the claim. Claim 36 recites the limitation "the MFCCs feature" in line 2. There is insufficient antecedent basis for this limitation in the claim. Claim 37 recites the limitation "the spectrogram feature" in line 2. There is insufficient antecedent basis for this limitation in the claim. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 8-9 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby et al. (U.S. Patent Application Publication No. 2021/0049444 A1; hereinafter Bielby) in view of Anthony (U.S. Patent Application Publication No. 2018/0315260 A1) and further in view of Sinitsyn et al. (U.S. Patent Application Publication No. 2017/0300818 A1; hereinafter Sinitsyn) and Park et al. (“Two-stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-token Connectionist Temporal Classification”; see reference U on PTO-892; hereinafter Park). Regarding claim 8, Bielby discloses: A method for diagnostic and condition monitoring of a system (method and apparatus of predictive maintenance of automotive engines, see at least abstract), the method comprising: receiving data from one or more sensors (data collected by sensors 103, see at least [0041]), the data associated with the system (sensor data of vehicles, see at least [0046]); generating an audio feature based on the data (audio features extracted by processors, see at least [0244]); inputting the audio feature into a neural network model (audio features used as inputs to Artificial Neural Network (ANN), see at least [0244]); and receiving one or more attribute predictions and a state prediction from the neural network model (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219]) Bielby does not disclose: receiving data from one or more mobile off-board sensors two-stage convolutional neural network wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type However, Anthony teaches: receiving data from one or more mobile off-board sensors (receiving output from sensors in a smartphone and uses this information to develop a learning model to map on-board diagnostic system outputs and auxiliary sensor outputs to an automotive fault condition, see at least [0008]; use onboard microphone, accelerometer, gyroscope, magnetometer etc. on the smartphone as audio sensor, see at least [0039]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the audio feature inputs into an artificial neural network for diagnostics as disclosed by Bielby by adding the smartphone sensors to gather audio data as taught by Anthony with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because it “eliminates reliance on static sensors that are hardwired to Onboard Diagnostic (OBD) systems. It also reduces the need to rely on the extent of a mechanic's personal knowledge, and may be especially helpful in managing driverless vehicle fleets.” (see abstract). Additionally, Sinitsyn teaches: wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type (fuel type tags trained to predict tag from audio of motor vehicle to estimate pollution level, see at least [0017]-[0018]; pollution estimation system may be configured to estimate different types of pollutants, see at least [0073]; tags give information on the type of motor and indicate a fuel type, see at least [0084]-[0085]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “distinguish between the sounds of a car on diesel and one running on gasoline” (see [0020]). Furthermore, Park teaches: two-stage convolutional neural network (CNN) model (two-stage detection model using convolutional Neural Networks, see at least abstract) a first stage of the two-stage CNN model receives the audio feature and produces the one or more attribute predictions (first stage, region is selected using faster Regional CNN (R-CNN) and proposed regions predicted by regions are passed to an attention-LSTM to classify the event), and wherein a second stage of the two-stage CNN receives the audio feature and the one or more attribute predictions and produces the state prediction (second stage represents a CNN_LSTM based feature map, proposed regions from first stage is used and are pooled for CNN feature maps to determine an event, see at least Fig. 1 and subheadings 2.1.1. and 2.1.2. on pages 2-3) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features and a neural network disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the two-stage model taught by Park with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because the structure of a two-stage model for classification because doing so would result in a higher F1 score (see abstract). Regarding claim 9, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above and Bielby further discloses: the system is a vehicle (vehicle 111, see at least [0041[ and Fig. 1). Regarding claim 20, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but Bielby and Anthony do not teach: the fuel type is indicative of gasoline or diesel, wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration. However, Sinitsyn teaches: the fuel type is indicative of gasoline or diesel (a fuel type distinguishing between gasoline and diesel fuel, see at least [0085]), wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to estimate pollution levels” (see [0011]). Regarding claim 21, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above and Bielby further discloses: the state prediction is indicative of a normal state or an abnormal state (artificial neural network can detect abnormal conditions, see at least [0039]). Regarding claim 22, Bielby discloses: A system for diagnostic and condition monitoring of a vehicle (apparatus of predictive maintenance of automotive engines, see at least abstract; vehicle 111, see at least [0041])), the system comprising: one or more sensors (one or more sensors 103, see at least [0041]); a memory (data storage device 101, see at least [0041]); and a processor (one or more processors 133, see at least [0058]) coupled to the one or more sensors and the memory (data storage device 101 configured to communicate with processor 133 and host interface 157 to receive sensor data, see at least [0091]-[0092]), the processor configured to: receive data from one or more sensors, the data associated with the monitored system (receive sensor data generated by sensors, see at least [0092]); generate an audio feature based on the data (audio features extracted by processors, see at least [0244]); input the audio feature into a neural network model (audio features used as inputs to Artificial Neural Network (ANN), see at least [0244]); and receive one or more attribute predictions and a state prediction from the neural network model (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219]) Bielby does not disclose: mobile off-board sensors wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type a two-stage convolutional neural network (CNN) model However, Anthony teaches: one or more mobile off-board sensors (use onboard microphone, accelerometer, gyroscope, magnetometer etc. on the smartphone as audio sensor, see at least [0039]) a processor coupled to the one more mobile off-board sensors (processor uses this information from smartphone to develop a learning model, see at least [0008]) receive data from one or more mobile off-board sensors (receiving output from sensors in a smartphone and uses this information to develop a learning model to map on-board diagnostic system outputs and auxiliary sensor outputs to an automotive fault condition, see at least [0008]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the audio feature inputs into an artificial neural network for diagnostics as disclosed by Bielby by adding the smartphone sensors to gather audio data as taught by Anthony with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because it “eliminates reliance on static sensors that are hardwired to Onboard Diagnostic (OBD) systems. It also reduces the need to rely on the extent of a mechanic's personal knowledge, and may be especially helpful in managing driverless vehicle fleets.” (see abstract). Additionally, Sinitsyn teaches: wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type (fuel type tags trained to predict tag from audio of motor vehicle to estimate pollution level, see at least [0017]-[0018]; pollution estimation system may be configured to estimate different types of pollutants, see at least [0073]; tags give information on the type of motor and indicate a fuel type, see at least [0084]-[0085]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “distinguish between the sounds of a car on diesel and one running on gasoline” (see [0020]). Furthermore, Park teaches: two-stage convolutional neural network (CNN) model (two-stage detection model using convolutional Neural Networks, see at least abstract) having a first stage of the two-stage CNN model receives the audio feature and produces the one or more attribute predictions (first stage, region is selected using faster Regional CNN (R-CNN) and proposed regions predicted by regions are passed to an attention-LSTM to classify the event), and a second stage that receives the audio feature and the one or more attribute predictions and produces the state prediction (second stage represents a CNN_LSTM based feature map, proposed regions from first stage is used and are pooled for CNN feature maps to determine an event, see at least Fig. 1 and subheadings 2.1.1. and 2.1.2. on pages 2-3), or a single-stage combined CNN model that receives the audio feature and produces the one or more attribute predictions and the state prediction It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features and a neural network disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the two-stage model taught by Park with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because the structure of a two-stage model for classification because doing so would result in a higher F1 score (see abstract). Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Park as applied to claim 8 above and further in view of Raguenet et al. (U.S. Patent Application Publication No. 2012/0318063 A1; hereinafter Raguenet). Regarding claim 10, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach: vibroacoustic data However, Raguenet teaches: the one or more mobile off-board sensors are configured to acquire vibroacoustic data (clamp used as vibroacoustic diagnosis tool for automobile maintenance, see at least abstract) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance taught by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the vibroacoustic diagnosis tool taught by Raguenet with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to provide a listening clamp for an acoustical vibration analysis tool, specifically a listening clamp which facilitates and improves, on the one hand, the acoustical vibration analysis performed during automotive maintenance or after-sales service operations, and on the other hand, the work of experts who can benefit jointly from a quick tool for investigating, analyzing and reporting, through the intermediary of registered sounds and videos” (see [0005]). Claims 11-12 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Park as applied to claims 8 and 22 above and further in view of Sun et al. (U.S. Patent No. 11,302,329 B1; hereinafter Sun). Regarding claim 11, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach: the audio feature is a one-dimensional feature or a two-dimensional feature However, Sun teaches: the audio feature is a one-dimensional feature or a two-dimensional feature (audio feature data 352 may correspond to a one-dimensional vector and/or two-dimensional feature map, see at least col. 18 lines 7-8) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the dimensions taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Regarding claim 12, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby, Anthony, Sinitsyn, and Park do not teach: the one-dimensional feature is a Fast Fourier Transform (FFT) feature, a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature. However, Sun teaches: the one-dimensional feature is a Fast Fourier Transform (FFT) feature (process audio data to create acoustic feature data using fast Fourier transform (FFT), see at least col. 6 lines 16-30), a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature (acoustic feature data may represent Mel-frequency cepstrum coefficients (MFCC), acoustic feature data may include one or more vectors based on size, see at least col. 6 lines 39-67) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the acoustic feature transformations taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Regarding claim 23, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach: the audio feature is a one-dimensional feature or two-dimensional feature. However, Sun teaches: the audio feature is one-dimensional feature or two-dimensional feature (audio feature data 352 may correspond to a one-dimensional vector and/or two-dimensional feature map, see at least col. 18 lines 7-8) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the dimensions taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Regarding claim 24, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby, Anthony, Sinitsyn, and Park do not teach: the one-dimensional feature is a Fast Fourier Transform (FFT) feature, a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature. However, Sun teaches: the one-dimensional feature is a Fast Fourier Transform (FFT) feature (process audio data to create acoustic feature data using fast Fourier transform (FFT), see at least col. 6 lines 16-30), a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature (acoustic feature data may represent Mel-frequency cepstrum coefficients (MFCC), acoustic feature data may include one or more vectors based on size, see at least col. 6 lines 39-67) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the acoustic feature transformations taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Claims 13, 15, 25, and 27 rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, Park, and Sun as applied to claims 11 and 23 above and further in view of Wieman et al. (U.S. Patent Application Publication No. 2021/0256386 A1; hereinafter Wieman). Regarding claim 13, the combination of Bielby, Anthony, Sinitsyn, Park and Sun teaches the elements above but Bielby and Sinitsyn do not teach: the two-stage CNN model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the two-stage CNN model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature However, Sun teaches: two-stage CNN model comprises a first layer, and wherein the two-stage CNN model comprises other layers (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Furthermore, Wieman teaches: the neural network model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the neural network model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature (“the initial convolution operation may have a relatively large kernel size as compared to the convolution operations used within the convolutional groups” see at least [0060]; convolutional neural network layer may use a convolutional kernel with a size greater than 1, see at least [0104]) *Examiner sets forth that both 80 and 3 are larger than 1, therefore, the size falls under the condition taught by Wieman. Furthermore, Wieman teaches the initial operation being larger than the other layers It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes. Regarding claim 15, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby does not disclose: the two-stage CNN model comprises a first layer with a convolutional kernel size of 3 x 3 for the spectrogram feature However, Sun teaches: the n two-stage CNN model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Additionally, Wieman teaches: the neural network model comprises a first layer with a convolutional kernel size of 3 x 3 (first convolutional neural network layer may have kernel size of 3x3, see at least [0074]) for the spectrogram feature (plurality of audio frames collectively comprise a two-dimensional array that represents a spectrogram, see at least [0052]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes. Regarding claim 25, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach: the neural network model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the neural network model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature However, Sun teaches: neural network model comprises a first layer, and wherein the neural network model comprises other layers (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Furthermore, Wieman teaches: the neural network model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the neural network model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature (“the initial convolution operation may have a relatively large kernel size as compared to the convolution operations used within the convolutional groups” see at least [0060]; convolutional neural network layer may use a convolutional kernel with a size greater than 1, see at least [0104]) *Examiner sets forth that both 80 and 3 are larger than 1, therefore, the size falls under the condition taught by Wieman. Furthermore, Wieman teaches the initial operation being larger than the other layers It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes. Regarding claim 27, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach: the neural network model comprises a first layer with a convolutional kernel size of 3 x 3 for the spectrogram feature. However, Sun teaches: the neural network model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Additionally, Wieman teaches: the neural network model comprises a first layer with a convolutional kernel size of 3 x 3 (first convolutional neural network layer may have kernel size of 3x3, see at least [0074]) for the spectrogram feature (plurality of audio frames collectively comprise a two-dimensional array that represents a spectrogram, see at least [0052]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes. Claims 14 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, Park, and Sun as applied to claim 11 and 23 above and further in view of Tek et al. (“Animal Sound Classification Using a Convolutional Neural Network”; see reference U on PTO-892; hereinafter Tek). Regarding claim 14, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach: the two-stage CNN model comprises a first layer with a convolutional kernel size of 2 x 2 for the MFCCs feature However, Sun teaches: the two-stage CNN model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Additionally, Tek teaches: the neural network model comprises a first layer with a convolutional kernel size of 2 x 2 (convolution and max-pooling layers used 2x2 kernel size, see at least page 2 col. 1 paragraph 5 under “C. Network”) for the MFCCs feature (commonly used lossy representation system called Mel Frequency Cepstral Coefficient is used which models the shape of sound frequency spectrum, see at least page 2 col. 1 paragraph 4 under “B. Feature Extraction”) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the 2x2 kernel size taught by Tek with a reasonable expectation of success. Although Tek is directed toward classifying animal sounds, One of ordinary skill in the art would understand that the structure of classifying the sounds of animals using a neural network with the specific kernel size could carry over to classifying sounds of vehicles. One of ordinary skill in the art would have been motivated to make this modification because “CNN using log Mel-spectrograms performed best with the overall accuracy” in a study (see page 1, col. 1 paragraph 3). Regarding claim 26, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach: the neural network model comprises a first layer with a convolutional kernel size of 2 x 2 for the MFCCs feature However, Sun teaches: the neural network model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Additionally, Tek teaches: the neural network model comprises a first layer with a convolutional kernel size of 2 x 2 (convolution and max-pooling layers used 2x2 kernel size, see at least page 2 col. 1 paragraph 5 under “C. Network”) for the MFCCs feature (commonly used lossy representation system called Mel Frequency Cepstral Coefficient is used which models the shape of sound frequency spectrum, see at least page 2 col. 1 paragraph 4 under “B. Feature Extraction”) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the 2x2 kernel size taught by Tek with a reasonable expectation of success. Although Tek is directed toward classifying animal sounds, One of ordinary skill in the art would understand that the structure of classifying the sounds of animals using a neural network with the specific kernel size could carry over to classifying sounds of vehicles. One of ordinary skill in the art would have been motivated to make this modification because “CNN using log Mel-spectrograms performed best with the overall accuracy” in a study (see page 1, col. 1 paragraph 3). Claims 17 and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Park as applied to claims 1 and 20 above and further in view of Wieman. Regarding claim 17, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but Bielby and Sinitsyn do not teach: the second stage of the two-stage CNN model receives a concatenation of the audio feature and the one or more attribute predictions However, Wieman teaches: the second stage of the two-stage CNN model receives a concatenation of the audio feature and the one or more attribute predictions (audio processing system 200 uses a concatenation component 245 to combine the output of feed-forward neural network layer 225 and data from skip connection 235, see at least [0062] and Fig. 2) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the audio processing system taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “provide a mechanism to reduce a size of (input) audio data in one or more of frequency and time dimensions, e.g. to effectively extract features that may be fed to the recurrent neural network architecture for temporal modelling” which “may help to reduce a number of multiplications and thus allow for faster execution at run time” (see [0009]). Regarding claim 30, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach: the neural network model comprises a combined convolutional neural network (CNN), wherein the combined CNN receives the audio feature and produces the one or more attribute predictions and the state prediction. However, Wieman teaches: the neural network model comprises a combined convolutional neural network (CNN) (combining data mappings arranged in parallel by a plurality of convolutional groups arranged in series, see at least [0022] and [0054] and references 120, 160-1, and 160-n of Fig. 1), wherein the combined CNN receives the audio feature and produces the one or more attribute predictions and the state prediction (convolutional group comprises and input and output for data, see at least [0049] and [0054]; given audio data 110, output probability vectors for plurality of possible sound units which can be used to determine a voice command, see at least [0054]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the audio processing system taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “provide a mechanism to reduce a size of (input) audio data in one or more of frequency and time dimensions, e.g. to effectively extract features that may be fed to the recurrent neural network architecture for temporal modelling” which “may help to reduce a number of multiplications and thus allow for faster execution at run time” (see [0009]). Claims 18, 31, and 38-39 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu et al. (“A Two-Stage Approach to Device-Robust Acoustic Scene Classification”; see reference V on PTO-892; hereinafter Hu). Regarding claim 18, the combination of Bielby discloses: A method for diagnostic and condition monitoring of a system (method and apparatus of predictive maintenance of automotive engines, see at least abstract), the method comprising: receiving data from one or more sensors (data collected by sensors 103, see at least [0041]), the data associated with the system (sensor data of vehicles, see at least [0046]); generating an audio feature based on the data (audio features extracted by processors, see at least [0244]); inputting the audio feature into a neural network model (audio features used as inputs to Artificial Neural Network (ANN), see at least [0244]); and receiving one or more attribute predictions and a state prediction from the neural network model (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219]), wherein the single-stage combined CNN model receives the audio feature and produces the one or more attribute predictions and the state prediction (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219]) Bielby does not explicitly disclose: mobile off-board sensors a single-stage combined convolutional neural network (CNN) model wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type, However, Anthony teaches: mobile off-board sensors (receiving output from sensors in a smartphone and uses this information to develop a learning model to map on-board diagnostic system outputs and auxiliary sensor outputs to an automotive fault condition, see at least [0008]; use onboard microphone, accelerometer, gyroscope, magnetometer etc. on the smartphone as audio sensor, see at least [0039]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the audio feature inputs into an artificial neural network for diagnostics as disclosed by Bielby by adding the smartphone sensors to gather audio data as taught by Anthony with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because it “eliminates reliance on static sensors that are hardwired to Onboard Diagnostic (OBD) systems. It also reduces the need to rely on the extent of a mechanic's personal knowledge, and may be especially helpful in managing driverless vehicle fleets.” (see abstract). Additionally, Sinitsyn teaches: wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type (fuel type tags trained to predict tag from audio of motor vehicle to estimate pollution level, see at least [0017]-[0018]; pollution estimation system may be configured to estimate different types of pollutants, see at least [0073]; tags give information on the type of motor and indicate a fuel type, see at least [0084]-[0085]) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “distinguish between the sounds of a car on diesel and one running on gasoline” (see [0020]). Furthermore, Hu teaches: inputting the audio feature into a single-stage combined convolutional neural network (CNN) model (see fig. 1 with the parallel “two-stage” system using CNN models, see also abstract) *Examiner sets forth that in Fig. 1 the “two-stage” models are in parallel which is considered a “single-stage combined convolutional neural network” according to Applicant’s specification paragraph [0158] It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features and neural networks disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the structure of the parallel CNN classification system disclosed by Hu with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to attain “state-of-the-art accuracy” (see abstract). Regarding claim 31, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above and Bielby further discloses: the system is a vehicle (vehicle 111, see at least [0041[ and Fig. 1] Regarding claim 38, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but Bielby and Anthony do not teach: the fuel type is indicative of gasoline or diesel, wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration However, Sinitsyn teaches: the fuel type is indicative of gasoline or diesel (a fuel type distinguishing between gasoline and diesel fuel, see at least [0085]), wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to estimate pollution levels” (see [0011]). Regarding claim 39, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above and Bielby further discloses: the state prediction is indicative of a normal state or an abnormal state (artificial neural network can detect abnormal conditions, see at least [0039]) Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Raguenet. Regarding claim 32, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach: vibroacoustic data However, Raguenet teaches: the one or more off-board sensors are configured to acquire vibroacoustic data (clamp used as vibroacoustic diagnosis tool for automobile maintenance, see at least abstract) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance taught by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the structure of the parallel CNN classification system disclosed by Hu by adding the vibroacoustic diagnosis tool taught by Raguenet with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to provide a listening clamp for an acoustical vibration analysis tool, specifically a listening clamp which facilitates and improves, on the one hand, the acoustical vibration analysis performed during automotive maintenance or after-sales service operations, and on the other hand, the work of experts who can benefit jointly from a quick tool for investigating, analyzing and reporting, through the intermediary of registered sounds and videos” (see [0005]). Claims 33-34 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Sun. Regarding claim 33, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach: the audio feature is a one-dimensional or two-dimensional features However, Sun teaches: the audio feature is a one-dimensional feature or a two-dimensional feature (audio feature data 352 may correspond to a one-dimensional vector and/or two-dimensional feature map, see at least col. 18 lines 7-8) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the structure of the parallel CNN classification system disclosed by Hu by adding the dimensions taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Regarding claim 34, the combination of Bielby, Anthony, Sinitsyn, Hu, and Sun teaches the elements above but Bielby, Anthony, Sinitsyn, and Hu do not teach: wherein the one-dimensional feature is a Fast Fourier Transform (FFT) feature, a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature However, Sun teaches: the one-dimensional feature is a Fast Fourier Transform (FFT) feature (process audio data to create acoustic feature data using fast Fourier transform (FFT), see at least col. 6 lines 16-30), a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature (acoustic feature data may represent Mel-frequency cepstrum coefficients (MFCC), acoustic feature data may include one or more vectors based on size, see at least col. 6 lines 39-67) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the acoustic feature transformations taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Claims 35 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Sun and Wieman. Regarding claim 35, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but Bielby and Sinitsyn do not teach: the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the single-stage combined CNN model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature However, Sun teaches: a first layer, model comprises other layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Furthermore, Wieman teaches: the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the single-stage combined CNN model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature (“the initial convolution operation may have a relatively large kernel size as compared to the convolution operations used within the convolutional groups” see at least [0060]; convolutional neural network layer may use a convolutional kernel with a size greater than 1, see at least [0104]) *Examiner sets forth that both 80 and 3 are larger than 1, therefore, the size falls under the condition taught by Wieman. Furthermore, Wieman teaches the initial operation being larger than the other layers It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the parallel CNN classification system disclosed by Hu, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes. Regarding claim 37, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach: the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 3x3 for the spectrogram feature However, Sun teaches: a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Additionally, Wieman teaches: the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 3x3 (first convolutional neural network layer may have kernel size of 3x3, see at least [0074]) for the spectrogram feature (plurality of audio frames collectively comprise a two-dimensional array that represents a spectrogram, see at least [0052] It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the parallel CNN classification system disclosed by Hu, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes. Claim 36 is rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Sun and Tek. Regarding claim 36, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach: the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 2x2 for the MFCCs feature However, Sun teaches: a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. Additionally, Tek teaches: the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 2x2 for the MFCCs feature (convolution and max-pooling layers used 2x2 kernel size, see at least page 2 col. 1 paragraph 5 under “C. Network”) for the MFCCs feature (commonly used lossy representation system called Mel Frequency Cepstral Coefficient is used which models the shape of sound frequency spectrum, see at least page 2 col. 1 paragraph 4 under “B. Feature Extraction”) It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the parallel CNN classification system disclosed by Hu, and the one or more layers taught by Sun by adding the 2x2 kernel size taught by Tek with a reasonable expectation of success. Although Tek is directed toward classifying animal sounds, One of ordinary skill in the art would understand that the structure of classifying the sounds of animals using a neural network with the specific kernel size could carry over to classifying sounds of vehicles. One of ordinary skill in the art would have been motivated to make this modification because “CNN using log Mel-spectrograms performed best with the overall accuracy” in a study (see page 1, col. 1 paragraph 3). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to HANA LEE whose telephone number is (571)272-5277. The examiner can normally be reached Monday-Friday: 7:30AM-4:30PM EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jelani Smith can be reached at (571) 270-3969. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /H.L./Examiner, Art Unit 3662 /DALE W HILGENDORF/Primary Examiner, Art Unit 3662
Read full office action

Prosecution Timeline

Jul 01, 2022
Application Filed
Nov 20, 2024
Non-Final Rejection — §103, §112
Feb 26, 2025
Response Filed
Apr 16, 2025
Final Rejection — §103, §112
Jun 17, 2025
Response after Non-Final Action
Jun 30, 2025
Request for Continued Examination
Jul 04, 2025
Response after Non-Final Action
Jul 14, 2025
Non-Final Rejection — §103, §112
Oct 15, 2025
Interview Requested
Oct 21, 2025
Examiner Interview (Telephonic)
Oct 21, 2025
Examiner Interview Summary
Oct 28, 2025
Response Filed
Jan 16, 2026
Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12534067
SYSTEM AND METHOD FOR VEHICLE NAVIGATION
2y 5m to grant Granted Jan 27, 2026
Patent 12509078
VEHICLE CONTROL DEVICE
2y 5m to grant Granted Dec 30, 2025
Patent 12485990
DRIVER ASSISTANCE SYSTEM
2y 5m to grant Granted Dec 02, 2025
Patent 12453305
MOBILE ROBOT SYSTEM AND BOUNDARY INFORMATION GENERATION METHOD FOR MOBILE ROBOT SYSTEM
2y 5m to grant Granted Oct 28, 2025
Patent 12442161
WORK MACHINE
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
60%
Grant Probability
96%
With Interview (+36.6%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 141 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month