DETAILED ACTION
1. This communication is in response to the Application filed on 7/11/2024. Claims 1-12 are pending and have been examined.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
2. Claims 1-12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1, 11, 12 recite an apparatus, a method and a system, thus relating to a statutory category.
Claims 1, 11, 12 further recite “recognizing audio .. audio satisfying condition being detected .. notify a user ..” The limitations as drafted cover mental process. More specifically, with the acquired audio, a human can mentally detect and recognize the audio if satisfying certain condition, and notify a user in any way.
This judicial exception is not integrated into a practical application. In particular, independent claims 1, 11, 12 recite additional elements of “an acquisition device .. audio processing apparatus ..” however, this is considered general purpose computing devices – see SPECIFICATION – Fig. 2. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Claims 1, 11, 12 do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Therefore, the claimed limitations are directed towards insignificant solution activity. The claim is not patent eligible.
With respect to claim 2, the claim further recites “when the recognizing audio satisfies the first condition, play notification sound to notify the user that the audio satisfying the condition has been detected; when the recognizing audio satisfies the second condition, present visual information to the user to notify the user that the audio satisfying the condition has been detected, and a priority order of notifying the user is lower when the notification condition satisfies the second condition than when the notification condition satisfies the first condition ..” where a human can mentally determine if the recognized audio satisfies the first or the second condition for the system to perform the following ready, well-known steps. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 3, the claim further recites “when the recognizing audio satisfies the first condition, play the notification sound and then play the audio satisfying the condition ..” where a human can mentally determine if the recognized audio satisfies the first condition for the system to perform the following ready, well-known steps. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 4, the claim further recites “when the recognizing audio satisfies the second condition, present a list to the user to present the visual information to the user, the list being pieces of information of the detected audio satisfying the condition ..” where a human can mentally determine if the recognized audio satisfies the second condition for the system to perform the following ready, well-known steps. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 5, the claim further recites “when the recognizing audio satisfies the second condition, play the audio satisfying the condition in response to an input from the user ..” where a human can mentally determine if the recognized audio satisfies the second condition, and with another input (such as user’s utterance) for the system to perform the following ready, well-known steps. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 6, the claim further recites “detect, as the audio satisfying the condition, audio whose feature matches the audio feature set in advance ..” where a human can mentally determine when the recognized audio satisfies the condition, if the audio feature (subject to BRI) matches the audio feature set in advance. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 7, the claim further recites “detect, as the audio satisfying the condition, an utterance including a search word ..” where a human can mentally determine when the recognized audio satisfies the condition, if the utterance including a search (key) word. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 8, the claim further recites “when the utterance satisfies the third condition, perform control with which, immediately after the search word is detected, notification sound is played and playback of the utterance is started ..” where a human can mentally determine if the recognized audio satisfies the third condition (such as detected the keyword) for the system to perform the following ready, well-known steps. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 9, the claim further recites “when the utterance satisfies the fourth condition, perform control with which, immediately after the utterance ends, notification sound is played and playback of the utterance is started ..” where a human can mentally determine if the recognized audio satisfies the fourth condition (subject to BRI) for the system to perform the following ready, well-known steps. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 10, the claim further recites “perform control with which audio data of an utterance interval including the utterance is played, and the utterance interval is an interval for which the audio data continues without interruption for a set time ..” where a human can mentally determine if the audio interval ends for the system to perform the following ready, well-known steps such as audio play control. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
3. Claims 1-4, 6, 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Dufaux, et al. (10th European Signal Processing Conference, 2000; hereinafter DUFAUX) in view of Garcıa-Ruiz, et al. (AES, 2007; hereinafter Garcıa-Ruiz).
As per claim 1, DUFAUX (Title: Automatic sound detection and recognition for noisy environment) discloses “An audio processing apparatus comprising: a controller configured to: acquire a result of a process for recognizing audio; and in response to audio satisfying condition being detected based on the result of the process, [ notify a user ] that the audio satisfying the condition has been detected, in accordance with the condition corresponding to the recognizing audio (DUFAUX, [Abstract], automatic detection and recognition of impulsive sounds, such as glass breaks, human screams, gunshots, explosions or door slams; Fig. 1 <‘This sound is a phone ringing !’ reads on ‘audio satisfying condition being detected’ where ‘condition’ is subject to BRI such as phone ringing, and ‘Human Intervention’ reads on the subsequent action following ‘notify a user’>; [sec. 1, para 1], alarm triggering or validation .. to inform disabled and elderly persons affected in their hearing capabilities about relevant environmental sounds (warning signals, etc.)).”
DUFAUX does not explicitly disclose “(condition) .. notify a user ..” However, the feature is taught by Garcıa-Ruiz (Title: Towards Multimodal Interfaces for Intrusion Detection).
In the same field of endeavor, Garcıa-Ruiz teaches: [Abstract] “We describe a sonification prototype which generates different sounds according to a number of well-known network attacks” and [sec. 1, para 4] “for intrusion detection assisted by multimodal interfaces (i.e., visual, auditive, gustatory, olfactory, and tactile)” and [sec. 2,4, para 1] “Bodnar, Corbett and Nekrasovski [6] conducted an experiment that compared the efficiency and disruptiveness of visual, auditory, and olfactory information that was delivered by a multimodal messaging notification system.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Garcıa-Ruiz in the system taught by DUFAUX for employing multimodal means for user notification dependent on different “conditions” of signal or event detection and recognition.
As per claim 2 (dependent on claim 1), DUFAUX in view of Garcıa-Ruiz further discloses “when the recognizing audio satisfies the first condition, play notification sound to notify the user that the audio satisfying the condition has been detected; when the recognizing audio satisfies the second condition, present visual information to the user to notify the user that the audio satisfying the condition has been detected, and a priority order of notifying the user is lower when the notification condition satisfies the second condition than when the notification condition satisfies the first condition (DUFAUX, [Abstract], automatic detection and recognition of impulsive sounds, such as glass breaks, human screams, gunshots, explosions or door slams; <each reads on different condition which is subject to BRI>; [sec. 1, para 1], alarm triggering .. to inform disabled and elderly persons .. about relevant environmental sounds <read on ready mechanisms to play/present various notifications with any pre-set priority order>; Garcıa-Ruiz, [Abstract], generates different sounds according to a number of well-known network attacks; [sec. 1, para 4], visual, auditory, and olfactory information that was delivered by a multimodal messaging notification system).”
As per claim 3 (dependent on claim 2), DUFAUX in view of Garcıa-Ruiz further discloses “when the recognizing audio satisfies the first condition, play the notification sound and then play the audio satisfying the condition (DUFAUX, [sec. 1, para 1], alarm triggering .. to inform disabled and elderly persons .. about relevant environmental sounds <read on ready mechanisms to play any combination of sound notification and audio in any pre-set order>; Garcıa-Ruiz, [Abstract], generates different sounds according to a number of well-known network attacks; [sec. 1, para 4], visual, auditory, and olfactory information that was delivered by a multimodal messaging notification system).”
As per claim 4 (dependent on claim 2), DUFAUX in view of Garcıa-Ruiz further discloses “when the recognizing audio satisfies the second condition, present a list to the user to present the visual information to the user, the list being pieces of information of the detected audio satisfying the condition (DUFAUX, [sec. 1, para 1], alarm triggering .. to inform disabled and elderly persons .. about relevant environmental sounds <read on ready mechanisms to send notification with any information/content, where the information/content can be pre-set or selectable by user>; Garcıa-Ruiz, [sec. 1, para 4], visual, auditory, and olfactory information that was delivered by a multimodal messaging notification system).”
As per claim 6 (dependent on claim 1), DUFAUX in view of Garcıa-Ruiz further discloses “detect, as the audio satisfying the condition, audio whose feature matches the audio feature set in advance (DUFAUX, [Abstract], automatic detection and recognition of impulsive sounds, such as glass breaks, human screams, gunshots, explosions or door slams; Fig. 1 <‘This sound is a phone ringing !’ reads on ‘audio whose feature matches the audio feature set in advance’ as the well-known feature/template matching in sound recognition>).”
Claims 11, 12 (similar in scope to claim 1) are rejected under the same rationale as applied above for claim 1. Claim 12 further recites “an acquisition device configured to acquire sound of surroundings ..” (see Fig. 1 – microphone and ringing phone).
4. Claims 5, 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over DUFAUX and Garcia-Ruiz, and further in view of Rose (Kluwer Academic Publishers 1996; hereinafter ROSE).
As per claim 5 (dependent on claim 2), DUFAUX in view of Garcıa-Ruiz further discloses “when the recognizing audio satisfies the second condition, [ play the audio satisfying the condition in response to an input from the user ] (Garcıa-Ruiz, [sec. 1, para 4], visual, auditory, and olfactory information that was delivered by a multimodal messaging notification system <read on a ready mechanism to play the audio under any satisfied condition>).”
DUFAUX in view of Garcıa-Ruiz does not explicitly disclose “play the audio satisfying the condition in response to an input from the user.” However, the feature is taught by ROSE (Title: WORD SPOTTING FROM CONTINUOUS SPEECH UTTERANCES).
In the same field of endeavor, ROSE teaches: [p. 309, para 2] “This allows for an error recovery strategy to be invoked by the application which may, for example, involve reprompting the user for additional utterances.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of ROSE in the system taught by DUFAUX and Garcıa-Ruiz for a ready mechanism to prompt the user for additional input for any purpose such as to play certain audio.
As per claim 7 (dependent on claim 1), DUFAUX in view of Garcıa-Ruiz further discloses “[ detect, as the audio satisfying the condition, an utterance including a search word ].
DUFAUX in view of Garcıa-Ruiz does not explicitly disclose “detect .. an utterance including a search word.” However, the feature is taught by ROSE (Title: WORD SPOTTING FROM CONTINUOUS SPEECH UTTERANCES).
In the same field of endeavor, ROSE teaches: Figure 1 .. Block diagram of word spotter based on a continuous speech recognition model. A Viterbi beam search decoder takes as input a set of hidden Markov models for keywords and fillers. The Viterbi decoder produces a continuous stream of keywords and fillers along with a confidence measure that is used by a second stage hypothesis verification procedure to disambiguate the correctly detected keywords from the false alarms.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of ROSE in the system taught by DUFAUX and Garcıa-Ruiz for detecting utterance with keywords as a condition to be satisfied to notify a user.
As per claim 8 (dependent on claim 7), DUFAUX in view of Garcıa-Ruiz and ROSE further discloses “wherein the condition includes a third condition; and the controller is configured to, when the utterance satisfies the third condition, perform control with which, immediately after the search word is detected, notification sound is played and playback of the utterance is started (ROSE, Figure 1 .. Block diagram of word spotter based on a continuous speech recognition model .. detected keywords <read on ‘a third condition’. Also read on a ready mechanism for any playback action to start for any audio notification and the utterance>; Garcıa-Ruiz, [sec. 1, para 4], visual, auditory, and olfactory information that was delivered by a multimodal messaging notification system).”
As per claim 9 (dependent on claim 8), DUFAUX in view of Garcıa-Ruiz and ROSE further discloses “wherein the condition includes a fourth condition; and the controller is configured to, when the utterance satisfies the fourth condition, perform control with which, immediately after the utterance ends, notification sound is played and playback of the utterance is started (ROSE, Figure 1 .. Block diagram of word spotter based on a continuous speech recognition model .. detected keywords <read on ‘a third/fourth condition’ which are both subject to BRI. Also read on a ready mechanism for any playback action to start at any time for any audio notification and the utterance>; Figure 6 .. The log path probabilities obtained for an utterance Y during Viterbi decoding. The time labels, ts and te, represent the start frames and end frames <read on a ready mechanism to determine ‘after the utterance ends’> for a hypothesized vocabulary word occurrence decoded by the recognizer).
As per claim 10 (dependent on claim 8), DUFAUX in view of Garcıa-Ruiz further discloses “perform control with which audio data of an utterance interval including the utterance is played, and the utterance interval is an interval for which the audio data continues without interruption for a set time (ROSE, Figure 6 .. The log path probabilities obtained for an utterance Y during Viterbi decoding. The time labels, ts and te, represent the start frames and end frames <read on ‘the utterance interval’> for a hypothesized vocabulary word occurrence decoded by the recognizer).”
Conclusion
5. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is 571-272-4609. The examiner can normally be reached on M-F (9:00-5:00). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah (SPE) can be reached on 571-270-1650.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG-TZER TZENG/ 1/6/2026
Primary Examiner, Art Unit 2653