DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 5-11, 13-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahnoosh (NPL “Active learning for classifying long-duration audio recordings of the environment”) in view of Gomez (US 2023/0173683).
With respect to claim 9 (similarly claims 1 and 17), Mahnoosh teaches a system (e.g. the pool- based active learning (AL) process and algorithm of Fig 1-2, section 2.1) for active machine learning for audio event detection and classification (e.g. for active machine learning for audio event detection and classification, see the abstract), comprising:
a memory storing a labeled training pool of audio samples (e.g. the memory storing the labelled data, Fig 1);
an audio event classifier trained using the labeled training pool (e.g. the classification model of Fig 1, section 2.5-2.6);
a reinforcement learning agent configured to select a batch of audio samples from an unlabeled pool for annotation (e.g. the reinforcement learning agent that selects and annotates a set of information instances in Fig 1); and
a processor (e.g. inherently the process has a processor) configured to:
calculate one or more environment states for each audio sample using outputs of the audio event classifier (e.g. query strategies that measure sampling criterion for each instance, section 2.2),
add an annotated batch of audio samples to the labeled training pool (e.g. add new labelled instances, Fig 1),
retrain the audio event classifier using an updated labeled training pool (e.g. retrain the classification model using the added/updated labelled training pool, see Fig 1-2 where the algorithm is repeated until stopping criterion is met),
update the environment states using the retrained audio event classifier (e.g. update the environment states of section 2.2 using the retrained classification model, as suggested in Fig 1),
update an exploration-exploitation parameter of the reinforcement learning agent (e.g. the query strategies are updated after each iteration until a stopping criterion is met in the algorithm of Fig 2 section 2.2, 3.1-3.2),
retrain the reinforcement learning agent using the updated environment states and, and detecting an audio event and classifying the audio event in response to the retrained reinforcement learning agent (e.g. section 3.1-3.2 Table 3-5 disclose the results on Test set 1 and 2 which include retraining the reinforcement learning agent using the updated environment states and, and detecting an audio event and classifying the audio event in response to the retrained reinforcement learning agent).
Even though Mahnoosh teaches the annotated batch, Mahnoosh fails to teach calculate a reward for each audio sample in the annotated batch,
Gomez teaches calculate a reward for each audio sample in raw data (e.g. a system that extracts feature information from raw data and estimates a reward for each audio sample in raw data, see Fig 7 [0128]-[0132]).
Mahnoosh and Gomez are analogous art because they all pertain to performing reinforcement learning. Therefore, it would have been obvious to people having ordinary skill in the art before the effective filing date of the claimed invention to modify the process and algorithm of Mahnoosh with the teachings of Gomez in Fig 7 to include: retrain the reinforcement learning agent using the updated environment states and rewards, and detecting an audio event and classifying the audio event in response to the retrained reinforcement learning agent, as modified by the estimated reward in [0128]-[0132] of Gomez. The benefit of the modification would be to perform autonomous learning through interaction between the device and the human and to reduce the number of evaluations of the human required to obtain an optimal operation and, particularly, the number of mistakes (unexpected behaviors), Gomez [0025]-[0026].
With respect to claim 10 (similarly claims 2 and 18), Mahnoosh teaches the system of claim 9 including the audio event classifier i.e. the classification model of Fig 1.
However, Mahnoosh fails to teach wherein the audio event classifier is a deep learning model.
Gomez teaches an audio event classifier is a deep learning model (e.g. a system that performs deep reinforcement learning using raw data in a comparative example, see Fig 6 [0032] and [0125]).
Mahnoosh and Gomez are analogous art because they all pertain to performing reinforcement learning. Therefore, it would have been obvious to people having ordinary skill in the art before the effective filing date of the claimed invention to modify the process and algorithm of Mahnoosh with the teachings of Gomez in Fig 6 to include: wherein the audio event classifier is a deep learning model, as taught by Gomez. The benefit of the modification would be to perform autonomous learning through interaction between the device and the human and to reduce the number of evaluations of the human required to obtain an optimal operation and, particularly, the number of mistakes (unexpected behaviors), Gomez [0025]-[0026].
With respect to claim 11 (similarly claims 3 and 18), Mahnoosh in view of Gomez teaches the system of claim 9 wherein the reinforcement learning agent uses a deep Q-network algorithm (Gomez e.g. the reinforcement learning agent uses a deep Q-network algorithm, as suggested in Fig 6-7 [0125], [0128]-[0132]).
With respect to claim 13 (similarly claims 5 and 19), Mahnoosh in view of Gomez teaches the system of claim 9 wherein a reinforcement learning agent action space comprises a binary choice of requesting or not requesting an annotation for each audio sample (Mahnoosh e.g. the reinforcement learning agent action space of Fig 1 comprises a binary choice of requesting or not requesting an annotation for each audio sample/set of informative instance, see Fig 1, see also the results of Table 4).
With respect to claim 14 (similarly claims 6 and 20), Mahnoosh in view of Gomez teaches the system of claim 13 wherein the reward is positive if the reinforcement learning agent selected an audio sample for annotation that was misclassified by the audio event classifier (Mahnoosh e.g. the reward is positive if the reinforcement learning agent selected an audio sample/unlabelled for annotation that was misclassified by the audio event classifier/classification model of Fig 1, as modified by Gomez in Fig 7 [0128]-[0132]).
With respect to claim 15 (similarly claims 7 and 20), Mahnoosh in view of Gomez teaches the system of claim 9 wherein the processor is further configured to initialize a reinforcement learning agent policy using transfer learning from a related audio event detection task (Mahnoosh e.g. section 2.4, feature extraction, suggest initializing a reinforcement learning agent policy using transfer learning from a related audio event detection task).
With respect to claim 16 (similarly claim 8), Mahnoosh in view of Gomez teaches the system of claim 9 wherein the audio samples are represented as mel-frequency cepstral coefficients or log-mel spectrograms (Mahnoosh e.g. section 2.4 mentions Fast-Fourier Transform and Fourier coefficents).
Claim(s) 4, 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahnoosh (NPL “Active learning for classifying long-duration audio recordings of the environment”) in view of Gomez (US 2023/0173683) and further in view of Sriram (US 2018/0336884).
With respect to claim 12 (similarly claims 4 and 19), Mahnoosh in view of Gomez teaches the system of claim 9 including the environment states.
However, Mahnoosh fails to teach wherein the environment states are determined from logit outputs of the audio event classifier concatenated with softmax or sigmoid outputs of the audio event classifier.
Sriram teaches states which are determined from logit outputs of the audio event classifier concatenated with softmax or sigmoid outputs of the audio event classifier (e.g. The logit output r.sub.t.sup.CF is eventually fed into a softmax layer 540 for the generation of a probability over outputs for model training, {circumflex over (P)}(y.sub.t|x, y.sub.<t) Fig 5 [0054], see also Fig 6 [0063]).
Mahnoosh and Sriram are analogous art because they all pertain to determining states. Therefore, it would have been obvious to people having ordinary skill in the art before the effective filing date of the claimed invention to modify Mahnoosh with the teachings of Sriram to include: wherein the environment states are determined from logit outputs of the audio event classifier concatenated with softmax or sigmoid outputs of the audio event classifier, as suggested by Sriram in Fig 5 [0054]. The benefit of the modification would be to determine the output probability with precision.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IBRAHIM SIDDO whose telephone number is (571)272-4508. The examiner can normally be reached 9:00-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Akwasi Sarpong can be reached at 5712703438. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/IBRAHIM SIDDO/Primary Examiner, Art Unit 2681