Last updated: May 29, 2026
Application No. 17/856,652
SYSTEM AND METHOD FOR VIBROACOUSTIC DIAGNOSTIC AND CONDITION MONITORING A SYSTEM USING NEURAL NETWORKS

Final Rejection §103§112
Filed
Jul 01, 2022
Priority
Jul 01, 2021 — provisional 63/217,646 +1 more
Examiner
LEE, HANA
Art Unit
3662
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY
OA Round
4 (Final)
Interview Optional

— +36.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 59% grant rate with +36.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

LEE, HANA View full profile →
Grants 59% of resolved cases
Career Allowance Rate
85 granted / 144 resolved
+7.0% vs TC avg
Strong +37% interview lift
Without
With
+36.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
29 currently pending
Career history
178
Total Applications
across all art units
Statute-Specific Performance

§101
2.5%
-37.5% vs TC avg
§103
88.5%
+48.5% vs TC avg
§102
3.0%
-37.0% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§103 §112
DETAILED ACTION
	The amendments filed 10/28/2025 have been entered. Claims 8, 13-15, 17-18, and 22 have been amended, claims 1-7, 16, and 28-29 have been cancelled, and claims 31-39 have been added. Claims 8-15, 17-18, 20-27, and 30-38 remain pending in the application and are discussed on the merits below.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments filed 10/28/2025 have been fully considered but are considered moot because they are directed toward subject matter that has not been previously considered and have necessitated a new grounds of rejection.
Response to Amendment
Regarding the objection to the specification, Applicant has amended the specification to overcome the objection. The objection to the specification has been withdrawn.
Regarding the objection to the claims, Applicant has amended the claims to overcome the previously set forth objection. The previously set forth objection to the claims has been withdrawn. However, the amendments have necessitated new objections as outlined below.
Regarding the rejections under 35 USC §103, amendments made to the claims have necessitated a new grounds of rejection as outlined below.
Claim Objections
Claims 1 and 18 are objected to because of the following informalities: “a state prediction from the neural network model” should read "a state prediction from the two-stage convolutional neural network model”.  Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 35-37 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 35 recites the limitation "the one-dimensional feature" in lines 2-3 and 5.  There is insufficient antecedent basis for this limitation in the claim.
Claim 36 recites the limitation "the MFCCs feature" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 37 recites the limitation "the spectrogram feature" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 8-9 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby et al. (U.S. Patent Application Publication No. 2021/0049444 A1; hereinafter Bielby) in view of Anthony (U.S. Patent Application Publication No. 2018/0315260 A1) and further in view of Sinitsyn et al. (U.S. Patent Application Publication No. 2017/0300818 A1; hereinafter Sinitsyn) and Park et al. (“Two-stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-token Connectionist Temporal Classification”; see reference U on PTO-892; hereinafter Park).
Regarding claim 8, Bielby discloses:
A method for diagnostic and condition monitoring of a system (method and apparatus of predictive maintenance of automotive engines, see at least abstract), the method comprising: receiving data from one or more sensors (data collected by sensors 103, see at least [0041]), the data associated with the system (sensor data of vehicles, see at least [0046]); 
generating an audio feature based on the data (audio features extracted by processors, see at least [0244]); 
inputting the audio feature into a neural network model (audio features used as inputs to Artificial Neural Network (ANN), see at least [0244]); and 
receiving one or more attribute predictions and a state prediction from the neural network model (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219])
Bielby does not disclose:
receiving data from one or more mobile off-board sensors
two-stage convolutional neural network 
wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type
However, Anthony teaches:
receiving data from one or more mobile off-board sensors (receiving output from sensors in a smartphone and uses this information to develop a learning model to map on-board diagnostic system outputs and auxiliary sensor outputs to an automotive fault condition, see at least [0008]; use onboard microphone, accelerometer, gyroscope, magnetometer etc. on the smartphone as audio sensor, see at least [0039])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the audio feature inputs into an artificial neural network for diagnostics as disclosed by Bielby by adding the smartphone sensors to gather audio data as taught by Anthony with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because it “eliminates reliance on static sensors that are hardwired to Onboard Diagnostic (OBD) systems. It also reduces the need to rely on the extent of a mechanic's personal knowledge, and may be especially helpful in managing driverless vehicle fleets.” (see abstract).
Additionally, Sinitsyn teaches:
wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type (fuel type tags trained to predict tag from audio of motor vehicle to estimate pollution level, see at least [0017]-[0018]; pollution estimation system may be configured to estimate different types of pollutants, see at least [0073]; tags give information on the type of motor and indicate a fuel type, see at least [0084]-[0085])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “distinguish between the sounds of a car on diesel and one running on gasoline” (see [0020]).
Furthermore, Park teaches:
two-stage convolutional  neural network (CNN) model (two-stage detection model using convolutional Neural Networks, see at least abstract)
a first stage of the two-stage CNN model receives the audio feature and produces the one or more attribute predictions (first stage, region is selected using faster Regional CNN (R-CNN) and proposed regions predicted by regions are passed to an attention-LSTM to classify the event), and
wherein a second stage of the two-stage CNN receives the audio feature and the one or more attribute predictions and produces the state prediction (second stage represents a CNN_LSTM based feature map, proposed regions from first stage is used and are pooled for CNN feature maps to determine an event, see at least Fig. 1 and subheadings 2.1.1. and 2.1.2. on pages 2-3)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features and a neural network disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the two-stage model taught by Park with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because the structure of a two-stage model for classification because doing so would result in a higher F1 score (see abstract).
Regarding claim 9, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above and Bielby further discloses:
the system is a vehicle (vehicle 111, see at least [0041[ and Fig. 1).
Regarding claim 20, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but Bielby and Anthony do not teach:
the fuel type is indicative of gasoline or diesel, wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration.
However, Sinitsyn teaches:
the fuel type is indicative of gasoline or diesel (a fuel type distinguishing between gasoline and diesel fuel, see at least [0085]), wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to estimate pollution levels” (see [0011]).
Regarding claim 21, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above and Bielby further discloses:
the state prediction is indicative of a normal state or an abnormal state (artificial neural network can detect abnormal conditions, see at least [0039]).
Regarding claim 22, Bielby discloses:
A system for diagnostic and condition monitoring of a vehicle (apparatus of predictive maintenance of automotive engines, see at least abstract; vehicle 111, see at least [0041])), the system comprising: 
one or more sensors (one or more sensors 103, see at least [0041]); 
a memory (data storage device 101, see at least [0041]); and 
a processor (one or more processors 133, see at least [0058]) coupled to the one or more sensors and the memory (data storage device 101 configured to communicate with processor 133 and host interface 157 to receive sensor data, see at least [0091]-[0092]), the processor configured to: 
receive data from one or more sensors, the data associated with the monitored system (receive sensor data generated by sensors, see at least [0092]); 
generate an audio feature based on the data (audio features extracted by processors, see at least [0244]); 
input the audio feature into a neural network model (audio features used as inputs to Artificial Neural Network (ANN), see at least [0244]); and 
receive one or more attribute predictions and a state prediction from the neural network model (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219])
Bielby does not disclose:
mobile off-board sensors
wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type
a two-stage convolutional neural network (CNN) model
However, Anthony teaches:
one or more mobile off-board sensors (use onboard microphone, accelerometer, gyroscope, magnetometer etc. on the smartphone as audio sensor, see at least [0039]) 
a processor coupled to the one more mobile off-board sensors (processor uses this information from smartphone to develop a learning model, see at least [0008])
receive data from one or more mobile off-board sensors (receiving output from sensors in a smartphone and uses this information to develop a learning model to map on-board diagnostic system outputs and auxiliary sensor outputs to an automotive fault condition, see at least [0008])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the audio feature inputs into an artificial neural network for diagnostics as disclosed by Bielby by adding the smartphone sensors to gather audio data as taught by Anthony with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because it “eliminates reliance on static sensors that are hardwired to Onboard Diagnostic (OBD) systems. It also reduces the need to rely on the extent of a mechanic's personal knowledge, and may be especially helpful in managing driverless vehicle fleets.” (see abstract).
Additionally, Sinitsyn teaches:
wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type (fuel type tags trained to predict tag from audio of motor vehicle to estimate pollution level, see at least [0017]-[0018]; pollution estimation system may be configured to estimate different types of pollutants, see at least [0073]; tags give information on the type of motor and indicate a fuel type, see at least [0084]-[0085])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “distinguish between the sounds of a car on diesel and one running on gasoline” (see [0020]).
Furthermore, Park teaches:
two-stage convolutional  neural network (CNN) model (two-stage detection model using convolutional Neural Networks, see at least abstract) having a first stage of the two-stage CNN model receives the audio feature and produces the one or more attribute predictions (first stage, region is selected using faster Regional CNN (R-CNN) and proposed regions predicted by regions are passed to an attention-LSTM to classify the event), and a second stage that receives the audio feature and the one or more attribute predictions and produces the state prediction (second stage represents a CNN_LSTM based feature map, proposed regions from first stage is used and are pooled for CNN feature maps to determine an event, see at least Fig. 1 and subheadings 2.1.1. and 2.1.2. on pages 2-3), or a single-stage combined CNN model that receives the audio feature and produces the one or more attribute predictions and the state prediction
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features and a neural network disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the two-stage model taught by Park with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because the structure of a two-stage model for classification because doing so would result in a higher F1 score (see abstract).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Park as applied to claim 8 above and further in view of Raguenet et al. (U.S. Patent Application Publication No. 2012/0318063 A1; hereinafter Raguenet).
Regarding claim 10, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach:
vibroacoustic data
However, Raguenet teaches:
the one or more mobile off-board sensors are configured to acquire vibroacoustic data (clamp used as vibroacoustic diagnosis tool for automobile maintenance, see at least abstract)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance taught by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the vibroacoustic diagnosis tool taught by Raguenet with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to provide a listening clamp for an acoustical vibration analysis tool, specifically a listening clamp which facilitates and improves, on the one hand, the acoustical vibration analysis performed during automotive maintenance or after-sales service operations, and on the other hand, the work of experts who can benefit jointly from a quick tool for investigating, analyzing and reporting, through the intermediary of registered sounds and videos” (see [0005]).
Claims 11-12 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Park as applied to claims 8 and 22 above and further in view of Sun et al. (U.S. Patent No. 11,302,329 B1; hereinafter Sun).
Regarding claim 11, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach:
the audio feature is a one-dimensional feature or a two-dimensional feature
However, Sun teaches:
the audio feature is a one-dimensional feature or a two-dimensional feature (audio feature data 352 may correspond to a one-dimensional vector and/or two-dimensional feature map, see at least col. 18 lines 7-8)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the dimensions taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25).
Regarding claim 12, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby, Anthony, Sinitsyn, and Park do not teach:
the one-dimensional feature is a Fast Fourier Transform (FFT) feature, a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature.
However, Sun teaches:
the one-dimensional feature is a Fast Fourier Transform (FFT) feature (process audio data to create acoustic feature data using fast Fourier transform (FFT), see at least col. 6 lines 16-30), a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature (acoustic feature data may represent Mel-frequency cepstrum coefficients (MFCC), acoustic feature data may include one or more vectors based on size, see at least col. 6 lines 39-67)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the acoustic feature transformations taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25).
Regarding claim 23, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach:
the audio feature is a one-dimensional feature or two-dimensional feature.
However, Sun teaches:
the audio feature is one-dimensional feature or two-dimensional feature (audio feature data 352 may correspond to a one-dimensional vector and/or two-dimensional feature map, see at least col. 18 lines 7-8)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the dimensions taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25).
Regarding claim 24, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby, Anthony, Sinitsyn, and Park do not teach:
the one-dimensional feature is a Fast Fourier Transform (FFT) feature, a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature.
However, Sun teaches:
the one-dimensional feature is a Fast Fourier Transform (FFT) feature (process audio data to create acoustic feature data using fast Fourier transform (FFT), see at least col. 6 lines 16-30), a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature (acoustic feature data may represent Mel-frequency cepstrum coefficients (MFCC), acoustic feature data may include one or more vectors based on size, see at least col. 6 lines 39-67)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the acoustic feature transformations taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25).
Claims 13, 15, 25, and 27 rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, Park, and Sun as applied to claims 11 and 23 above and further in view of Wieman et al. (U.S. Patent Application Publication No. 2021/0256386 A1; hereinafter Wieman).
Regarding claim 13, the combination of Bielby, Anthony, Sinitsyn, Park and Sun teaches the elements above but Bielby and Sinitsyn do not teach:
the two-stage CNN model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the two-stage CNN model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature
However, Sun teaches:
two-stage CNN model comprises a first layer, and wherein the two-stage CNN model comprises other layers (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Furthermore, Wieman teaches:
the neural network model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the neural network model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature (“the initial convolution operation may have a relatively large kernel size as compared to the convolution operations used within the convolutional groups” see at least [0060]; convolutional neural network layer may use a convolutional kernel with a size greater than 1, see at least [0104]) *Examiner sets forth that both 80 and 3 are larger than 1, therefore, the size falls under the condition taught by Wieman. Furthermore, Wieman teaches the initial operation being larger than the other layers
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes.
Regarding claim 15, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby does not disclose:
the two-stage CNN model comprises a first layer with a convolutional kernel size of 3 x 3 for the spectrogram feature
However, Sun teaches:
the n two-stage CNN model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Additionally, Wieman teaches:
the neural network model comprises a first layer with a convolutional kernel size of 3 x 3 (first convolutional neural network layer may have kernel size of 3x3, see at least [0074]) for the spectrogram feature (plurality of audio frames collectively comprise a two-dimensional array that represents a spectrogram, see at least [0052])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes.
Regarding claim 25, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach:
the neural network model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the neural network model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature
However, Sun teaches:
neural network model comprises a first layer, and wherein the neural network model comprises other layers (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Furthermore, Wieman teaches:
the neural network model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the neural network model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature (“the initial convolution operation may have a relatively large kernel size as compared to the convolution operations used within the convolutional groups” see at least [0060]; convolutional neural network layer may use a convolutional kernel with a size greater than 1, see at least [0104]) *Examiner sets forth that both 80 and 3 are larger than 1, therefore, the size falls under the condition taught by Wieman. Furthermore, Wieman teaches the initial operation being larger than the other layers
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes.
Regarding claim 27, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach:
the neural network model comprises a first layer with a convolutional kernel size of 3 x 3 for the spectrogram feature.
However, Sun teaches:
the neural network model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Additionally, Wieman teaches:
the neural network model comprises a first layer with a convolutional kernel size of 3 x 3 (first convolutional neural network layer may have kernel size of 3x3, see at least [0074]) for the spectrogram feature (plurality of audio frames collectively comprise a two-dimensional array that represents a spectrogram, see at least [0052])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes.
Claims 14 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, Park, and Sun as applied to claim 11 and 23 above and further in view of Tek et al. (“Animal Sound Classification Using a Convolutional Neural Network”; see reference U on PTO-892; hereinafter Tek).
Regarding claim 14, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach:
the two-stage CNN model comprises a first layer with a convolutional kernel size of 2 x 2 for the MFCCs feature
However, Sun teaches:
the two-stage CNN model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Additionally, Tek teaches:
the neural network model comprises a first layer with a convolutional kernel size of 2 x 2 (convolution and max-pooling layers used 2x2 kernel size, see at least page 2 col. 1 paragraph 5 under “C. Network”) for the MFCCs feature (commonly used lossy representation system called Mel Frequency Cepstral Coefficient is used which models the shape of sound frequency spectrum, see at least page 2 col. 1 paragraph 4 under “B. Feature Extraction”)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the 2x2 kernel size taught by Tek with a reasonable expectation of success. Although Tek is directed toward classifying animal sounds, One of ordinary skill in the art would understand that the structure of classifying the sounds of animals using a neural network with the specific kernel size could carry over to classifying sounds of vehicles. One of ordinary skill in the art would have been motivated to make this modification because “CNN using log Mel-spectrograms performed best with the overall accuracy” in a study (see page 1, col. 1 paragraph 3).
Regarding claim 26, the combination of Bielby, Anthony, Sinitsyn, Park, and Sun teaches the elements above but Bielby and Sinitsyn do not teach:
the neural network model comprises a first layer with a convolutional kernel size of 2 x 2 for the MFCCs feature
However, Sun teaches:
the neural network model comprises a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Additionally, Tek teaches:
the neural network model comprises a first layer with a convolutional kernel size of 2 x 2 (convolution and max-pooling layers used 2x2 kernel size, see at least page 2 col. 1 paragraph 5 under “C. Network”) for the MFCCs feature (commonly used lossy representation system called Mel Frequency Cepstral Coefficient is used which models the shape of sound frequency spectrum, see at least page 2 col. 1 paragraph 4 under “B. Feature Extraction”)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the two-stage model taught by Park, and the one or more layers taught by Sun by adding the 2x2 kernel size taught by Tek with a reasonable expectation of success. Although Tek is directed toward classifying animal sounds, One of ordinary skill in the art would understand that the structure of classifying the sounds of animals using a neural network with the specific kernel size could carry over to classifying sounds of vehicles. One of ordinary skill in the art would have been motivated to make this modification because “CNN using log Mel-spectrograms performed best with the overall accuracy” in a study (see page 1, col. 1 paragraph 3).
Claims 17 and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Park as applied to claims 1 and 20 above and further in view of Wieman.
Regarding claim 17, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but Bielby and Sinitsyn do not teach:
the second stage of the two-stage CNN model receives a concatenation of the audio feature and the one or more attribute predictions
However, Wieman teaches:
the second stage of the two-stage CNN model receives a concatenation of the audio feature and the one or more attribute predictions (audio processing system 200 uses a concatenation component 245 to combine the output of feed-forward neural network layer 225 and data from skip connection 235, see at least [0062] and Fig. 2) 
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the audio processing system taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “provide a mechanism to reduce a size of (input) audio data in one or more of frequency and time dimensions, e.g. to effectively extract features that may be fed to the recurrent neural network architecture for temporal modelling” which “may help to reduce a number of multiplications and thus allow for faster execution at run time” (see [0009]).
Regarding claim 30, the combination of Bielby, Anthony, Sinitsyn, and Park teaches the elements above but does not teach:
the neural network model comprises a combined convolutional neural network (CNN), wherein the combined CNN receives the audio feature and produces the one or more attribute predictions and the state prediction.
However, Wieman teaches:
the neural network model comprises a combined convolutional neural network (CNN) (combining data mappings arranged in parallel by a plurality of convolutional groups arranged in series, see at least [0022] and [0054] and references 120, 160-1, and 160-n of Fig. 1), wherein the combined CNN receives the audio feature and produces the one or more attribute predictions and the state prediction (convolutional group comprises and input and output for data, see at least [0049] and [0054]; given audio data 110, output probability vectors for plurality of possible sound units which can be used to determine a voice command, see at least [0054])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the two-stage model taught by Park by adding the audio processing system taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “provide a mechanism to reduce a size of (input) audio data in one or more of frequency and time dimensions, e.g. to effectively extract features that may be fed to the recurrent neural network architecture for temporal modelling” which “may help to reduce a number of multiplications and thus allow for faster execution at run time” (see [0009]).
Claims 18, 31, and 38-39 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu et al. (“A Two-Stage Approach to Device-Robust Acoustic Scene Classification”; see reference V on PTO-892; hereinafter Hu).
Regarding claim 18, the combination of Bielby discloses:
A method for diagnostic and condition monitoring of a system (method and apparatus of predictive maintenance of automotive engines, see at least abstract), the method comprising: receiving data from one or more sensors (data collected by sensors 103, see at least [0041]), the data associated with the system (sensor data of vehicles, see at least [0046]);
generating an audio feature based on the data (audio features extracted by processors, see at least [0244]);
inputting the audio feature into a neural network model (audio features used as inputs to Artificial Neural Network (ANN), see at least [0244]); and
receiving one or more attribute predictions and a state prediction from the neural network model (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219]), 
wherein the single-stage combined CNN model receives the audio feature and produces the one or more attribute predictions and the state prediction (ANN used to predict diagnosis/maintenance service associated with sound patterns for a vehicle, see at least [0041] and [0221]; artificial neural network trained to recognize sound patterns of the engine, see at least [0219])
Bielby does not explicitly disclose:
mobile off-board sensors
a single-stage combined convolutional neural network (CNN) model
wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type,
However, Anthony teaches:
mobile off-board sensors (receiving output from sensors in a smartphone and uses this information to develop a learning model to map on-board diagnostic system outputs and auxiliary sensor outputs to an automotive fault condition, see at least [0008]; use onboard microphone, accelerometer, gyroscope, magnetometer etc. on the smartphone as audio sensor, see at least [0039])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the audio feature inputs into an artificial neural network for diagnostics as disclosed by Bielby by adding the smartphone sensors to gather audio data as taught by Anthony with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification because it “eliminates reliance on static sensors that are hardwired to Onboard Diagnostic (OBD) systems. It also reduces the need to rely on the extent of a mechanic's personal knowledge, and may be especially helpful in managing driverless vehicle fleets.” (see abstract).
Additionally, Sinitsyn teaches:
wherein the one or more attribute predictions comprise at least one of: a fuel type, an engine configuration, a cylinder count, or an aspiration type (fuel type tags trained to predict tag from audio of motor vehicle to estimate pollution level, see at least [0017]-[0018]; pollution estimation system may be configured to estimate different types of pollutants, see at least [0073]; tags give information on the type of motor and indicate a fuel type, see at least [0084]-[0085])
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “distinguish between the sounds of a car on diesel and one running on gasoline” (see [0020]).
	Furthermore, Hu teaches:
inputting the audio feature into a single-stage combined convolutional neural network (CNN) model (see fig. 1 with the parallel “two-stage” system using CNN models, see also abstract) *Examiner sets forth that in Fig. 1 the “two-stage” models are in parallel which is considered a “single-stage combined convolutional neural network” according to Applicant’s specification paragraph [0158]
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features and neural networks disclosed by Bielby, the smartphone audio data taught by Anthony, and the fuel type tags taught by Sinitsyn by adding the structure of the parallel CNN classification system disclosed by Hu with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to attain “state-of-the-art accuracy” (see abstract). 
Regarding claim 31, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above and Bielby further discloses:
the system is a vehicle (vehicle 111, see at least [0041[ and Fig. 1]
Regarding claim 38, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but Bielby and Anthony do not teach:
the fuel type is indicative of gasoline or diesel, wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration
However, Sinitsyn teaches:
the fuel type is indicative of gasoline or diesel (a fuel type distinguishing between gasoline and diesel fuel, see at least [0085]), wherein the engine configuration is indicative of flat configuration, inline configuration, or Vee configuration, wherein the cylinder count is indicative of 2, 3, 4, 5, 6, or 8, and wherein the aspiration type is indicative of normal aspiration or turbocharge aspiration
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby and the smartphone audio data taught by Anthony by adding the fuel type tags taught by Sinitsyn with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to estimate pollution levels” (see [0011]).
Regarding claim 39, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above and Bielby further discloses:
the state prediction is indicative of a normal state or an abnormal state (artificial neural network can detect abnormal conditions, see at least [0039])
Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Raguenet.
Regarding claim 32, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach:
vibroacoustic data
However, Raguenet teaches:
the one or more off-board sensors are configured to acquire vibroacoustic data (clamp used as vibroacoustic diagnosis tool for automobile maintenance, see at least abstract)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance taught by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the structure of the parallel CNN classification system disclosed by Hu by adding the vibroacoustic diagnosis tool taught by Raguenet with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to provide a listening clamp for an acoustical vibration analysis tool, specifically a listening clamp which facilitates and improves, on the one hand, the acoustical vibration analysis performed during automotive maintenance or after-sales service operations, and on the other hand, the work of experts who can benefit jointly from a quick tool for investigating, analyzing and reporting, through the intermediary of registered sounds and videos” (see [0005]).
Claims 33-34 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Sun.
Regarding claim 33, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach:
the audio feature is a one-dimensional or two-dimensional features
However, Sun teaches:
the audio feature is a one-dimensional feature or a two-dimensional feature (audio feature data 352 may correspond to a one-dimensional vector and/or two-dimensional feature map, see at least col. 18 lines 7-8)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the structure of the parallel CNN classification system disclosed by Hu by adding the dimensions taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25).
Regarding claim 34, the combination of Bielby, Anthony, Sinitsyn, Hu, and Sun teaches the elements above but Bielby, Anthony, Sinitsyn, and Hu do not teach:
wherein the one-dimensional feature is a Fast Fourier Transform (FFT) feature, a waveform feature, or a wavelets feature, and
wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature
However, Sun teaches:
the one-dimensional feature is a Fast Fourier Transform (FFT) feature (process audio data to create acoustic feature data using fast Fourier transform (FFT), see at least col. 6 lines 16-30), a waveform feature, or a wavelets feature, and wherein the two-dimensional feature is a spectrogram feature or a Mel-frequency Cepstral Coefficients (MFCCs) feature (acoustic feature data may represent Mel-frequency cepstrum coefficients (MFCC), acoustic feature data may include one or more vectors based on size, see at least col. 6 lines 39-67)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the predictive maintenance using audio features disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the acoustic feature transformations taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25).
Claims 35 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Sun and Wieman.
Regarding claim 35, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but Bielby and Sinitsyn do not teach:
the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and
wherein the single-stage combined CNN model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature
However, Sun teaches:
a first layer, model comprises other layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Furthermore, Wieman teaches:
the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 80 for the one-dimensional feature, and wherein the single-stage combined CNN model comprises other layers with a convolutional kernel size of 3 for the one-dimensional feature (“the initial convolution operation may have a relatively large kernel size as compared to the convolution operations used within the convolutional groups” see at least [0060]; convolutional neural network layer may use a convolutional kernel with a size greater than 1, see at least [0104]) *Examiner sets forth that both 80 and 3 are larger than 1, therefore, the size falls under the condition taught by Wieman. Furthermore, Wieman teaches the initial operation being larger than the other layers
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the parallel CNN classification system disclosed by Hu, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes.
Regarding claim 37, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach:
the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 3x3 for the spectrogram feature
However, Sun teaches:
a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art. 
Additionally, Wieman teaches:
the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 3x3 (first convolutional neural network layer may have kernel size of 3x3, see at least [0074]) for the spectrogram feature (plurality of audio frames collectively comprise a two-dimensional array that represents a spectrogram, see at least [0052]
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the parallel CNN classification system disclosed by Hu, and the one or more layers taught by Sun by adding the kernel size taught by Wieman with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order to “improve training and parameter convergence” (see [0010]). Furthermore, it is well known to those having ordinary skill in the art that layers may have different kernel sizes.
Claim 36 is rejected under 35 U.S.C. 103 as being unpatentable over Bielby in view of Anthony, Sinitsyn, and Hu as applied to claim 18 above and further in view of Sun and Tek.
Regarding claim 36, the combination of Bielby, Anthony, Sinitsyn, and Hu teaches the elements above but does not teach:
the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 2x2 for the MFCCs feature
However, Sun teaches:
a first layer (one or more recurrent layers, see at least col. 7 lines 19-20 and 46-47; neural networks, see at least col. 8 line 29)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, and the parallel CNN classification system disclosed by Hu by adding the one or more layers taught by Sun with a reasonable expectation of success. One of ordinary skill in the art would have been motivated to make this modification in order “to process audio data to determine if properties of the audio data correspond to properties associated with an acoustic event” (see col. 2 lines 23-25). Furthermore, the use of layers in a neural network is well known by those of ordinary skill in the art.
Additionally, Tek teaches:
the single-stage combined CNN model comprises a first layer with a convolutional kernel size of 2x2 for the MFCCs feature (convolution and max-pooling layers used 2x2 kernel size, see at least page 2 col. 1 paragraph 5 under “C. Network”) for the MFCCs feature (commonly used lossy representation system called Mel Frequency Cepstral Coefficient is used which models the shape of sound frequency spectrum, see at least page 2 col. 1 paragraph 4 under “B. Feature Extraction”)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network disclosed by Bielby, the smartphone audio data taught by Anthony, the fuel type tags taught by Sinitsyn, the parallel CNN classification system disclosed by Hu, and the one or more layers taught by Sun by adding the 2x2 kernel size taught by Tek with a reasonable expectation of success. Although Tek is directed toward classifying animal sounds, One of ordinary skill in the art would understand that the structure of classifying the sounds of animals using a neural network with the specific kernel size could carry over to classifying sounds of vehicles. One of ordinary skill in the art would have been motivated to make this modification because “CNN using log Mel-spectrograms performed best with the overall accuracy” in a study (see page 1, col. 1 paragraph 3).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HANA LEE whose telephone number is (571)272-5277. The examiner can normally be reached Monday-Friday: 7:30AM-4:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jelani Smith can be reached at (571) 270-3969. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/H.L./Examiner, Art Unit 3662                                                                                                                                                                                                        
/DALE W HILGENDORF/Primary Examiner, Art Unit 3662
Read full office action
Prosecution Timeline

Show 5 earlier events
Jun 30, 2025
Request for Continued Examination
Jul 04, 2025
Response after Non-Final Action
Jul 28, 2025
Non-Final Rejection mailed — §103, §112
Oct 15, 2025
Interview Requested
Oct 21, 2025
Examiner Interview (Telephonic)
Oct 21, 2025
Examiner Interview Summary
Oct 28, 2025
Response Filed
Jan 22, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/986,059
Patent 12630143
HYBRID ELECTRIC VEHICLE AND CONTROL METHOD THEREOF
3y 6m to grant Granted May 19, 2026
17/959,653
Patent 12619235
AGRICULTURAL MACHINE CONTROL BASED ON AGRONOMIC AND MACHINE PARAMETERS
3y 7m to grant Granted May 05, 2026
18/259,557
Patent 12534067
SYSTEM AND METHOD FOR VEHICLE NAVIGATION
2y 7m to grant Granted Jan 27, 2026
18/315,226
Patent 12509078
VEHICLE CONTROL DEVICE
2y 7m to grant Granted Dec 30, 2025
18/186,215
Patent 12485990
DRIVER ASSISTANCE SYSTEM
2y 8m to grant Granted Dec 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
59%
Grant Probability
96%
With Interview (+36.7%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allowance rate.