DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see Remarks page 7, filed 02/12/2026, with respect to the rejections of claims 4, 11, and 18 under 35 U.S.C. 112(b) have been fully considered and are persuasive. The rejections of claims 4, 11, and 18 under 35 U.S.C. 112(b) have been withdrawn.
Applicant’s arguments, see Remarks pages 7-9, filed 02/12/2026, with respect to the rejection of amended claim(s) 1, 8, and 15 under 35 U.S.C. 103 have been fully considered and are moot in view of the new grounds of rejection (detailed in the rejections below) necessitated by Applicant’s amendment to the claim(s).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-5, 8-12, and 15-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (Audio Classification of Accelerating Vehicles) hereinafter referenced as Yang, in view of Wang et al. (CN107730902A) hereinafter referenced as Wang, and Hohenacker (US2018174453A1).
Regarding claim 1, Yang discloses: A method for training a neural network to identify an electric vehicle based on audio (Yang: Figure 1:
PNG
media_image1.png
224
1009
media_image1.png
Greyscale
;
Abstract: “In this work, using careful feature extraction and various deep learning architectures, we demonstrate that vehicles can be effectively classified by recording vehicle acceleration with a cellphone.”), the method comprising:
generating video data from a camera, wherein the camera has a field of view of a roadway; generating audio data from a microphone, wherein the audio data is associated with vehicles traveling across the roadway (Yang: 3.1 Data Collection: “audio was recorded at an isolated stop sign. A cellphone (Samsung S9+) was fastened to a light-post directly adjacent to the stop sign, allowing direct vehicle visibility and clear audio recording of the accelerating vehicle. In total, 936 audio clips were extracted from 5 hrs of video, and divided into 7 classes and numerous manufacturers.”);
segmenting the audio data into a plurality of audio segments, wherein each audio segment has a start time and a finish time associated with that of a respective video segment (Yang: 3.2 Feature Extraction: “The audio track was directly imported into Audacity software 2 for splitting and labeling, and no further audio processing was performed prior to analysis. Labeling was performed by matching the audio track to the simultaneously taken video.”; Wherein the audio data was segmented and labeled based on vehicle classification within video segments);
training a neural network to identify electric vehicles based on the audio segments and the labels of the respective video segments; and based on the training, outputting a trained neural network configured to identify electric vehicles based on audio (Yang: 3.1 Data Collection: “A cellphone (Samsung S9+) was fastened to a light-post directly adjacent to the stop sign, allowing direct vehicle visibility and clear audio recording of the accelerating vehicle. In total, 936 audio clips were extracted from 5 hrs of video, and divided into 7 classes and numerous manufacturers.”;
3.2 Feature Extraction: “Labeling was performed by matching the audio track to the simultaneously taken video.”;
4.1 Fully-connected Neural Networks (FCNN): “For the FCNN models, either no hidden layers or one hidden layer with 20 neurons (with ReLU as the activation function) were studied. Training examples with 678 features generated from the raw .wav file and FFT were sent to the model and batch gradient descent was used in the backpropagation process…In the output layer, we used softmax as the activation function to determine the result for the full 5-class and 7-class problems (Figure 5a).”).
Yang does not disclose expressly: segmenting the video data into a plurality of video segments, wherein each video segment has a start time and a finish time that corresponds to a respective vehicle traveling across the roadway within the field of view; based on the respective vehicle in each video segment, labeling each video segment with label indicating the respective vehicle as either an electric vehicle (EV) or a non-electric internal combustion vehicle; and segmenting the audio data into a plurality of audio segments, wherein each audio segment has a start time and a finish time associated with that of a respective one of the video segments.
Wang discloses: a method comprising: generating video data from a camera, wherein the camera has a field of view of a roadway; generating audio data from a microphone, wherein the audio data is associated with vehicles traveling across the roadway (Wang: 0031: “This solution pre-sets a recording start area, a recording end area, and a license plate recognition area within the camera's field of view. When the camera captures a vehicle entering the recording start area, recording begins.”
0044: “the camera device may also include a camera unit, RF (Radio Frequency) circuitry, sensors, audio circuitry, WiFi module, and so on.”; Wherein the camera, containing audio circuitry, implies the video recording performed by the camera containing audio.);
segmenting the video data into a plurality of video segments, wherein each video segment has a start time and a finish time that corresponds to a respective vehicle traveling across the roadway within the field of view; and segmenting the audio data into a plurality of audio segments, wherein each audio segment has a start time and a finish time associated with that of a respective one of the video segments (Wang: 0041: “This invention provides a method for recording vehicle videos. Within the camera's field of view, a recording start area, a recording end area, and a license plate recognition area are preset. When the camera detects a vehicle entering the recording start area, recording begins…When the vehicle leaves the recording end area, the camera stops recording and stores the recorded vehicle video based on the license plate information.”; Wherein the recording of live video based on vehicle detection constitutes the segmentation of video and audio data.);
based on the respective vehicle in each video segment, labeling each video segment with label indicating the respective vehicle’s license plate information (Wang: 0073: “when the vehicle enters the license plate recognition area, license plate recognition is performed and the recognized license plate information is output. When the vehicle leaves the recording end area, the camera stops recording and the recorded vehicle video is stored in association with the license plate information.”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the camera for video recording based on vehicle detection taught by Wang for the capturing of audio and video data disclosed by Yang. The suggestion/motivation for doing so would have been “This method eliminates the need for external detection equipment, effectively reducing costs.” (Wang: 0041; Wherein the vehicle detection and video segmenting is performed by the camera.). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Yang in view of Wang does not disclose expressly: based on the respective vehicle in each video segment, labeling each video segment with label indicating the respective vehicle as either an electric vehicle (EV) or a non-electric internal combustion vehicle.
Hohenacker discloses: a method for monitoring the statuses of parking space areas based on an image capture system (Hohenacker: 0003-0006: “This object is satisfied by a method in accordance with claim 1 and in particular by a system composed of at least one street-lighting device, a camera system mounted at the street-lighting device, a recognition unit, a transmission unit and a mobile display unit, wherein the camera system is configured for delivering image indications from within parking space areas located within a parking space zone, and wherein the recognition unit is configured to…associate a respective occupation status in dependence on the image indications with the parking space areas, said occupation status marking whether a respective parking space area is free or occupied;”). Wherein, based on a respective vehicle in each image, labeling each image with label indicating the respective vehicle as either an electric vehicle (EV) or a non-electric internal combustion vehicle (Hohenacker: 0046: “The recognition unit in accordance with the invention can be configured to classify motor vehicles detected by the camera system using typical features and thus to distinguish electric models from models with an internal combustion engine.”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to substitute the algorithms for detecting vehicle based on camera frame feature information disclosed by Yang in view of Wang with the algorithms for classifying vehicles as electric or internal combustion taught by Hohenacker. The suggestion/motivation for doing so would have been “The recognition unit in accordance with the invention can be configured to classify motor vehicles detected by the camera system using typical features and thus to distinguish electric models from models with an internal combustion engine” (Hohenacker: 0046; Wherein the classification of vehicles can be automated/combined with the detection of vehicles.). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Yang in view of Wang with Hohenacker to obtain the invention as specified in claim 1.
Regarding claim 2, Yang in view of Wang and Hohenacker discloses: The method of claim 1, further comprising: associating each audio segments with a respective one of the labels (Wang: 0044: “the camera device may also include a camera unit, RF (Radio Frequency) circuitry, sensors, audio circuitry, WiFi module, and so on.”;
0073: “when the vehicle enters the license plate recognition area, license plate recognition is performed and the recognized license plate information is output. When the vehicle leaves the recording end area, the camera stops recording and the recorded vehicle video is stored in association with the license plate information.”; Wherein the labeling of the video captured by the camera, constitutes the labeling of an audio segment with the label of its respective video segment.);
wherein the training includes training the neural network based on each audio segment and its respective label (Yang: 3.1 Data Collection: “A cellphone (Samsung S9+) was fastened to a light-post directly adjacent to the stop sign, allowing direct vehicle visibility and clear audio recording of the accelerating vehicle. In total, 936 audio clips were extracted from 5 hrs of video, and divided into 7 classes and numerous manufacturers.”;
4.1 Fully-connected Neural Networks (FCNN): “For the FCNN models, either no hidden layers or one hidden layer with 20 neurons (with ReLU as the activation function) were studied. Training examples with 678 features generated from the raw .wav file and FFT were sent to the model and batch gradient descent was used in the backpropagation process…In the output layer, we used softmax as the activation function to determine the result for the full 5-class and 7-class problems (Figure 5a).”).
Regarding claim 3, Yang in view of Wang and Hohenacker discloses: The method of claim 1, wherein the trained neural network is configured to identify electric vehicles based on audio and not video (Yang: Abstract: “Audio identification of vehicles is promising because it requires simple and cheap recording devices and much lower quantities of data than other technologies. Importantly, audio-based technologies don’t suffer from low visibility and are equally effective in low-light conditions. In this work, using careful feature extraction and various deep learning architectures, we demonstrate that vehicles can be effectively classified by recording vehicle acceleration with a cellphone.”).
Regarding claim 4, Yang in view of Wang and Hohenacker discloses: The method of claim 1, wherein the start time and the finish time of each audio segment is identical to the start time and finish time of its respective video segment (Wang: 0044: “the camera device may also include a camera unit, RF (Radio Frequency) circuitry, sensors, audio circuitry, WiFi module, and so on.”;
0073: “when the vehicle enters the license plate recognition area, license plate recognition is performed and the recognized license plate information is output. When the vehicle leaves the recording end area, the camera stops recording and the recorded vehicle video is stored in association with the license plate information.”; Wherein the labeling of the video captured by the camera, constitutes the labeling of an audio segment with the label of its respective video segment.).
Regarding claim 5, Yang in view of Wang and Hohenacker discloses: The method of claim 1, wherein the microphone is installed adjacent to the camera (Wang: 0044: “the camera device may also include a camera unit, RF (Radio Frequency) circuitry, sensors, audio circuitry, WiFi module, and so on.”; Wherein the camera device including the audio circuitry constitutes an adjacent microphone).
As per claim(s) 8, arguments made in rejecting claim(s) 1 are analogous. Section: 3.1 Data Collection of Yang discloses the collection of video and audio data via a cell phone. In addition, Section: 5 Results and Discussion of Yang discloses the training of neural networks for audio classification. Thus, implying the disclosure of “A system for training a neural network…comprising: an image sensor…an audio sensor…and a processor in communication with the image sensor and the audio sensor.”
As per claim(s) 9, arguments made in rejecting claim(s) 2 are analogous.
As per claim(s) 10, arguments made in rejecting claim(s) 3 are analogous.
As per claim(s) 11, arguments made in rejecting claim(s) 4 are analogous.
As per claim(s) 12, arguments made in rejecting claim(s) 5 are analogous.
As per claim(s) 15, arguments made in rejecting claim(s) 1 are analogous. Section: 3.1 Data Collection of Yang discloses the collection of video and audio data via a cell phone. In addition, Section: 5 Results and Discussion of Yang discloses the training of neural networks for audio classification. Thus, implying the disclosure of “A non-transitory computer-readable storage medium storing executable instructions that, when executed by one or more processors, cause the processor to: generate video data from a camera…generate audio data from a microphone.”
As per claim(s) 16, arguments made in rejecting claim(s) 2 are analogous.
As per claim(s) 17, arguments made in rejecting claim(s) 3 are analogous.
As per claim(s) 18, arguments made in rejecting claim(s) 4 are analogous.
Claim(s) 6-7, 13-14, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Wang and Hohenacker, and further in view of Szegedy et al. (Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning) hereinafter referenced as Szegedy.
Regarding claim 6, Yang in view of Wang and Hohenacker discloses: The method of claim 1, wherein the start time and the finish time associated with each video segment is associated with the respective vehicle entering the field of view and exiting the field of view, respectively (Wang: 0073: “when the vehicle enters the license plate recognition area, license plate recognition is performed and the recognized license plate information is output. When the vehicle leaves the recording end area, the camera stops recording and the recorded vehicle video is stored in association with the license plate information.”; Wherein the labeling of the video captured by the camera, constitutes the labeling of an audio segment with the label of its respective video segment.).
Yang in view of Wang and Hohenacker does not disclose expressly: further comprising: executing an object detection and classification machine learning model to identify and classify the vehicles.
Szegedy discloses: deep learning convolutional networks, capable of executing object detection and classification tasks (Szegedy: Abstract: “Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks.”;
1. Introduction: “Since the 2012 ImageNet competition [11] winning entry by Krizhevsky et al [8], their network “AlexNet” has been successfully applied to a larger variety of computer vision tasks, for example to object-detection [4], segmentation [10], human pose estimation [17], video classification [7], object tracking [18], and super-resolution [3]. These examples are but a few of all the applications to which deep convolutional networks have been very successfully applied ever since. In this work we study the combination of the two most recent ideas: Residual connections introduced by He et al. in [5] and the latest revised version of the Inception architecture [15].”)
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to substitute the algorithms for detecting and classifying vehicles disclosed by Yang in view of Wang and Hohenacker with the Inception-v4 model taught by Szegedy. The suggestion/motivation for doing so would have been “Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost.” (Szegedy: Abstract). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Yang in view of Wang and Hohenacker with Szegedy to obtain the invention as specified in claim 6.
Regarding claim 7, Yang in view of Wang, Hohenacker, and Szegedy: The method of claim 6, wherein the object detection and classification machine learning model generates the labels of each video segment based upon the classification of the vehicles (Hohenacker: 0046: “The recognition unit in accordance with the invention can be configured to classify motor vehicles detected by the camera system using typical features and thus to distinguish electric models from models with an internal combustion engine.”);
(Szegedy: Abstract: “We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly.”).
As per claim(s) 13, arguments made in rejecting claim(s) 6 are analogous.
As per claim(s) 14, arguments made in rejecting claim(s) 7 are analogous.
As per claim(s) 19, arguments made in rejecting claim(s) 6 are analogous.
As per claim(s) 20, arguments made in rejecting claim(s) 7 are analogous.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY J RODRIGUEZ whose telephone number is (703)756-5821. The examiner can normally be reached Monday-Friday 10am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANTHONY J RODRIGUEZ/Examiner, Art Unit 2672
/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672