Last updated: May 29, 2026

Application No. 18/768,409

DEEP REINFORCEMENT ACTIVE MACHINE LEARNING SYSTEM FOR AUDIO EVENT DETECTION AND CLASSIFICATION

Non-Final OA §103

Filed

Jul 10, 2024

Examiner

SIDDO, IBRAHIM

Art Unit

2681

Tech Center

2600 — Communications

Assignee

Robert Bosch GmbH

OA Round

1 (Non-Final)

Interview Optional

— +13.1% interview lift. Interview lift (+13.1%) is below the 15.0% threshold. A written response is recommended.

Based on 477 resolved cases, 2023–2026

Examiner Intelligence

SIDDO, IBRAHIM View full profile →

Grants 84% — above average

Career Allowance Rate

400 granted / 477 resolved

+21.9% vs TC avg

Moderate +13% lift

Without

With

+13.1%

Interview Lift

resolved cases with interview

Fast prosecutor

2y 1m

Avg Prosecution

23 currently pending

Career history

494

Total Applications

across all art units

Statute-Specific Performance

§101

0.9%

-39.1% vs TC avg

§103

86.7%

+46.7% vs TC avg

§102

7.2%

-32.8% vs TC avg

§112

1.4%

-38.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 477 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 5-11, 13-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahnoosh (NPL “Active learning for classifying long-duration audio recordings of the environment”) in view of Gomez (US 2023/0173683).
With respect to claim 9 (similarly claims 1 and 17), Mahnoosh teaches a system (e.g. the pool- based active learning (AL) process and algorithm of Fig 1-2, section 2.1) for active machine learning for audio event detection and classification (e.g. for active machine learning for audio event detection and classification, see the abstract), comprising: 
a memory storing a labeled training pool of audio samples (e.g. the memory storing the labelled data, Fig 1); 
an audio event classifier trained using the labeled training pool (e.g. the classification model of Fig 1, section 2.5-2.6); 
a reinforcement learning agent configured to select a batch of audio samples from an unlabeled pool for annotation (e.g. the reinforcement learning agent that selects and annotates a set of information instances in Fig 1); and 
a processor (e.g. inherently the process has a processor) configured to: 
calculate one or more environment states for each audio sample using outputs of the audio event classifier (e.g. query strategies that measure sampling criterion for each instance, section 2.2), 
add an annotated batch of audio samples to the labeled training pool (e.g. add new labelled instances, Fig 1), 
retrain the audio event classifier using an updated labeled training pool (e.g. retrain the classification model using the added/updated labelled training pool, see Fig 1-2 where the algorithm is repeated until stopping criterion is met), 
update the environment states using the retrained audio event classifier (e.g. update the environment states of section 2.2 using the retrained classification model, as suggested in Fig 1), 
update an exploration-exploitation parameter of the reinforcement learning agent (e.g. the query strategies are updated after each iteration until a stopping criterion is met in the algorithm of Fig 2 section 2.2, 3.1-3.2), 
retrain the reinforcement learning agent using the updated environment states and, and detecting an audio event and classifying the audio event in response to the retrained reinforcement learning agent (e.g. section 3.1-3.2 Table 3-5 disclose the results on Test set 1 and 2 which include retraining the reinforcement learning agent using the updated environment states and, and detecting an audio event and classifying the audio event in response to the retrained reinforcement learning agent).
Even though Mahnoosh teaches the annotated batch, Mahnoosh fails to teach calculate a reward for each audio sample in the annotated batch, 
Gomez teaches calculate a reward for each audio sample in raw data (e.g. a system that extracts feature information from raw data and estimates a reward for each audio sample in raw data, see Fig 7 [0128]-[0132]).
Mahnoosh and Gomez are analogous art because they all pertain to performing reinforcement learning. Therefore, it would have been obvious to people having ordinary skill in the art before the effective filing date of the claimed invention to modify the process and algorithm of Mahnoosh with the teachings of Gomez in Fig 7 to include: retrain the reinforcement learning agent using the updated environment states and rewards, and detecting an audio event and classifying the audio event in response to the retrained reinforcement learning agent, as modified by the estimated reward in [0128]-[0132] of Gomez. The benefit of the modification would be to perform autonomous learning through interaction between the device and the human and to reduce the number of evaluations of the human required to obtain an optimal operation and, particularly, the number of mistakes (unexpected behaviors), Gomez [0025]-[0026].
With respect to claim 10 (similarly claims 2 and 18), Mahnoosh teaches the system of claim 9 including the audio event classifier i.e. the classification model of Fig 1.
However, Mahnoosh fails to teach wherein the audio event classifier is a deep learning model.
Gomez teaches an audio event classifier is a deep learning model (e.g. a system that performs deep reinforcement learning using raw data in a comparative example, see Fig 6 [0032] and [0125]).
Mahnoosh and Gomez are analogous art because they all pertain to performing reinforcement learning. Therefore, it would have been obvious to people having ordinary skill in the art before the effective filing date of the claimed invention to modify the process and algorithm of Mahnoosh with the teachings of Gomez in Fig 6 to include: wherein the audio event classifier is a deep learning model, as taught by Gomez. The benefit of the modification would be to perform autonomous learning through interaction between the device and the human and to reduce the number of evaluations of the human required to obtain an optimal operation and, particularly, the number of mistakes (unexpected behaviors), Gomez [0025]-[0026].
With respect to claim 11 (similarly claims 3 and 18), Mahnoosh in view of Gomez teaches the system of claim 9 wherein the reinforcement learning agent uses a deep Q-network algorithm (Gomez e.g. the reinforcement learning agent uses a deep Q-network algorithm, as suggested in Fig 6-7 [0125], [0128]-[0132]).
With respect to claim 13 (similarly claims 5 and 19), Mahnoosh in view of Gomez teaches the system of claim 9 wherein a reinforcement learning agent action space comprises a binary choice of requesting or not requesting an annotation for each audio sample (Mahnoosh e.g. the reinforcement learning agent action space of Fig 1 comprises a binary choice of requesting or not requesting an annotation for each audio sample/set of informative instance, see Fig 1, see also the results of Table 4).
With respect to claim 14 (similarly claims 6 and 20), Mahnoosh in view of Gomez teaches the system of claim 13 wherein the reward is positive if the reinforcement learning agent selected an audio sample for annotation that was misclassified by the audio event classifier (Mahnoosh e.g. the reward is positive if the reinforcement learning agent selected an audio sample/unlabelled for annotation that was misclassified by the audio event classifier/classification model of Fig 1, as modified by Gomez in Fig 7 [0128]-[0132]).
With respect to claim 15 (similarly claims 7 and 20), Mahnoosh in view of Gomez teaches the system of claim 9 wherein the processor is further configured to initialize a reinforcement learning agent policy using transfer learning from a related audio event detection task (Mahnoosh e.g. section 2.4, feature extraction, suggest initializing a reinforcement learning agent policy using transfer learning from a related audio event detection task).
With respect to claim 16 (similarly claim 8), Mahnoosh in view of Gomez teaches the system of claim 9 wherein the audio samples are represented as mel-frequency cepstral coefficients or log-mel spectrograms (Mahnoosh e.g. section 2.4 mentions Fast-Fourier Transform and Fourier coefficents).

Claim(s) 4, 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahnoosh (NPL “Active learning for classifying long-duration audio recordings of the environment”) in view of Gomez (US 2023/0173683) and further in view of Sriram (US 2018/0336884).
With respect to claim 12 (similarly claims 4 and 19), Mahnoosh in view of Gomez teaches the system of claim 9 including the environment states.
However, Mahnoosh fails to teach wherein the environment states are determined from logit outputs of the audio event classifier concatenated with softmax or sigmoid outputs of the audio event classifier.
Sriram teaches states which are determined from logit outputs of the audio event classifier concatenated with softmax or sigmoid outputs of the audio event classifier (e.g. The logit output r.sub.t.sup.CF is eventually fed into a softmax layer 540 for the generation of a probability over outputs for model training, {circumflex over (P)}(y.sub.t|x, y.sub.<t) Fig 5 [0054], see also Fig 6 [0063]).
Mahnoosh and Sriram are analogous art because they all pertain to determining states. Therefore, it would have been obvious to people having ordinary skill in the art before the effective filing date of the claimed invention to modify Mahnoosh with the teachings of Sriram to include: wherein the environment states are determined from logit outputs of the audio event classifier concatenated with softmax or sigmoid outputs of the audio event classifier, as suggested by Sriram in Fig 5 [0054]. The benefit of the modification would be to determine the output probability with precision. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IBRAHIM SIDDO whose telephone number is (571)272-4508. The examiner can normally be reached 9:00-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Akwasi Sarpong can be reached at 5712703438. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/IBRAHIM SIDDO/Primary Examiner, Art Unit 2681

Read full office action

Prosecution Timeline

Jul 10, 2024

Application Filed

Mar 06, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/368,333

Patent 12640151

VOICE CONTROL WITH CONTEXTUAL KEYWORDS

2y 8m to grant Granted May 26, 2026

18/417,104

Patent 12634401

INSPECTION SYSTEM AND METHOD OF CONTROLLING THE SAME, AND STORAGE MEDIUM

2y 4m to grant Granted May 19, 2026

18/345,339

Patent 12622505

SYSTEMS, DEVICES, AND METHODS FOR SEGMENT-BASED GUIDANCE OF PRODUCT APPLICATION

2y 10m to grant Granted May 12, 2026

18/538,632

Patent 12614550

ELECTRONIC DEVICE, METHOD, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM CONTROLLING EXECUTABLE OBJECT BASED ON VOICE SIGNAL

2y 4m to grant Granted Apr 28, 2026

18/423,276

Patent 12608166

Automated Data Handling

2y 2m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

84%

Grant Probability

97%

With Interview (+13.1%)

2y 1m (~2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 477 resolved cases by this examiner. Grant probability derived from career allowance rate.