DETAILED ACTION
This office action is in response to Applicant’s Amendments/Request for Reconsideration, received on 01/20/2026. Claims 1, 3, 7-11, 14-17 have been amended. Claims 1-17 are pending and have been considered. The examiner would like to note that the claims have been amended to remove all terms previously invoking interpretation under 35 U.S.C. 112(f). As such, interpretation of the claims under 112(f) has been withdrawn.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119
(a)-(d). The certified copy has been filed for the parent Application No. JP2021-124570, filed on 07/29/2021.
Response to Arguments
Applicant’s arguments, see pg. 10, filed 01/20/2026, with respect to “Specification Objection” have been fully considered and are persuasive. The objection of the title has been withdrawn.
Applicant’s arguments, see pg. 10, filed 01/20/2026, with respect to “Claim Objection” have been fully considered and are persuasive. The objection of claim 17 has been withdrawn.
Applicant’s arguments, see pgs. 11-15, filed 01/20/2026, with respect to the rejection(s) of independent claim(s) 1, 16, and 17 under 35 U.S.C. 102(a)(1) have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Kobayashi et al. (US-20220329936-A1), hereinafter Kobayashi. Kobayashi discloses “A sound collection and emission apparatus (10) emits, on the basis of an outside acoustic signal which emanates from a sound source outside an automobile (90) and arrives at the automobile (90), an inside acoustic signal which is an acoustic signal derived from the outside acoustic signal to inside the automobile (90). A sound collection unit (M1) collects the outside acoustic signal. A sound emission unit (S1) emits the inside acoustic signal. A danger sound detection unit (11) determines whether the outside acoustic signal has a feature representing a danger defined in advance. A control unit (12) performs control that emits the inside acoustic signal from the sound emission unit (S1) such that a driver of the automobile (90) is capable of perceiving the danger if the outside acoustic signal is determined to represent the danger.” (abstract). Specifically, Kobayashi discloses a “stationary component removal unit 117” for removing stationary noise ([0032]). See updated rejections below.
Applicant’s arguments, see pgs. 15-16, filed 01/20/2026, with respect to the rejection(s) of claim(s) 3 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Trent, further in view of Biswas. Trent has been previously cited for several dependent claims, namely, claim 6 which discloses a “first frequency band is an ultrasonic band that has a highest sound pressure level” in view of other frequency bands with associated pressure levels which Applicant made no arguments against. See update rejection of claim 3 below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 8, 10-12, 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhary et al. (US-20190227096-A1), hereinafter Chowdhary in view of Kobayashi et al. (US-20220329936-A1), hereinafter Kobayashi.
Regarding claim 1, Chowdhary discloses: an information processing system comprising a terminal and a computer connected with each other via a network ([Fig. 8, Server 220 connected to system 100, including smart device 120, through network 230], [A computer tracks to a processing server, see processing unit 222, the system 100 (containing user device 120) tracks to a terminal in view of Fig. 7 of Chowdhary]),
wherein the terminal includes:
a sound collector that collects a sound ([Fig. 7, Microphone]);
a first processor ([Fig. 9, Processing unit 122]) that inputs sound information indicative of the collected sound to a first trained model ([Fig. 9, Feature Data Extracting Unit 330], [0113] Feature extracting unit 330 may be configured to generate feature data from the context data detected by sensors, [Wherein the sensor tracks to a microphone, see [0038]. Further, see the disclosed training model for transient event classification of [0189], indicating that could be used to perform transient event identification]) to estimate whether the sound indicated by the sound information is steady sound or non-steady sound ([Fig. 9, Transient Event Classification Unit 350], [0117] transient event classification unit 350 may analyze, e.g., compare, the feature data with the corresponding classification parameters in the classification parameter library corresponding to the classes in the classification vectors, and determine which class in the vectors matches the detected transient event, [A determination of class indicates an estimation of the correct class, i.e. transient/continuous, see [0006]. Further, matching to a “detected transient event” indicates a required prior estimation of sound to be a transient event, see transient event detection unit 340]), and
outputs to the computer via the network the sound information estimated to indicate the non-steady sound as output sound information when the sound information is estimated to indicate the non-steady sound ([0117] classification may be determined based on feature data obtained from the raw context data…It is possible that the comparison results show that two mutually exclusive classes in a transient event classification vector, e.g., MTV, both have sufficiently high possibility values to match the detected transient event. Transient event classification unit 350 may output both classes for the meta-level processing to potentially remove the uncertainty thereby, [0125] Client input receiving unit 410 may be configured to receive from a local smart device 120 associated with sensors 110 the context data detected by sensors 110. The context data received from local smart devices 120 may be the raw context data or may be the context data pre-processed locally at the smart device 120, [Outputting context data, i.e. that used for classification, to a client server 220, i.e. computer, connected to user device 120 through network 230 as previously disclosed, indicates the context data to be representative of sound information estimated to indicate non-steady state sound based on the transient classification determination performed on the user device 120, i.e. pre-processed context data]).
Chowdhary is not relied upon to disclose:
refrains from outputting to the computer via the network the sound information estimated to indicate the steady state sound when the sound information is estimated to indicate the steady sound.
Kobayashi is relied upon to disclose:
refrains from outputting to the computer via the network the sound information estimated to indicate the steady state sound when the sound information is estimated to indicate the steady sound ([0032] The stationary component removal unit 117 removes a stationary noise component from an outside acoustic signal after frequency conversion which is output by the frequency analysis unit 111… The inclusion of the stationary component removal unit 117 removes a stationary noise, such as a running sound of the vehicle in question, [Wherein the running sound of a vehicle in question would clearly be steady state; therefore, stationary noises are steady]).
Chowdhary and Kobayashi are considered analogous art within noise removal for purposes of sound classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary to incorporate the teachings of Kobayashi, because of the novel way to improve audibility of outside sounds to a driver inside a vehicle through removal of stationary noise, allowing easier detection of danger points likely represented in transient sounds (Kobayashi, [0006]-[0008], [0032]).
Chowdhary further discloses:
the computer includes:
a communication circuit that acquires the output sound information ([Fig. 10, Client Input Receiving Unit 410], [A client tracks to a terminal, i.e. user device]); and,
a second processor ([Fig. 13, Meta-Level Analysis 730]) that estimates an action of a person from a resulting output obtained by inputting the output sound information acquired by the communication circuit to a second trained model ([Fig. 13, Receive base-level event classifications], [See classification of transient events using a training model, [0189]]) indicative of a relevance between the output sound information and action information on an action of a person ([Fig. 13, Remove Unlikely Classifications 830], [0072] comparing the relevant feature data with the classification parameter of the class to obtain a value representing how likely the detected event belongs to this class of event, [0156] A spatial environment transient event of “dropping utensil” will nullify a spatial environment continuous event of “on street”, [0159] In example operation 840, the remaining concurrent event classifications may be combined, which may include combining transient event classifications of different vectors, in sub-operation 850. For example, “phone ring” as a transient event in spatial environment context or sound context may be combined with “sit-to-stand” motion transient event. The combination results will strengthen a context awareness result that a user is standing up to reach a phone and to answer an incoming phone call, [Wherein the feature data tracks to sound information and event classes track to action information on an action of a person. Removing unlikely classifications indicates a determination that those unlikely classes aren’t relevant. In view of the classification probabilities of Kobayashi output from neural network 112 based on a noise signal with stationary components removed, indicating the exclusively transient signal of Kobayashi could be used as the multi-transient event signal of Chowdhary without a change in functionality to Chowdhary. The indication of the system of Chowdhary (Fig. 11) having individual paths for classifying transient/continuous events does not necessarily require the input signal to be consisting of those mixed sound types. The transient-exclusive signal of Kobayashi could be used as output from the “filter raw data to remove noise 622” step of Chowdhary as Kobayashi explicitly discloses removing stationary sounds to reduce noise]).
Regarding claim 8, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary further discloses:
wherein the second processor determines whether or not the resulting output from the second trained model is wrong ([0098] the state machine is reasonably robust to handle any spurious data received or incorrect decisions that may be taken on the base-level posteriorgrams, [A determination to handle incorrect decision requires an identification of the incorrect decision to be handled. Further, the examiner understands the output from the second trained model to be “indicative of a relevance between the output sound information and action information…” (claim 1). It is unclear to the examiner how one is able to determine that a relevance is incorrect or wrong.]), and
the second processor retrains, after determining that the resulting output from the second trained model is correct, the second trained model using output sound information corresponding to the resulting output ([0154] Further, the base-level context awareness analysis results may be provided to the continuous event classification operations, 720M-3, 720S-3, 720SE-3, and the transient event classification operations, 720M-2, 720S-2, 720SE-2, through feedback loop 740 to be used in the classification operations. For example, a previous base-level continuous motion event classification in time window T.sub.0 may be used in determining a later base-level continuous motion event classification in time window 2T.sub.0 that follows time window T.sub.0, [Applying a feedback loop to transient sound classification operation 720S-2 indicates the feedback is used for retraining, wherein using a classification from one step for a later time step indicates the previous classification, i.e. resulting output, is correct]).
Regarding claim 10, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 8.
Chowdhary further discloses:
wherein the second processor outputs, when receiving an input of the determination result information, the determination result information to the terminal via the network ([Fig. 8, Smart Device 120 Connected to Server 220 through Network 230], [0154] A previous base-level continuous motion event classification in time window T.sub.0 may also be used in determining a base-level transient motion event classification in time point T.sub.1 that follows time window T.sub.0 (See FIG. 4), [0108] a smart device 120 may communicate to server 220 to receive from the cloud based memory 230 at least partially the classification parameter library from server 220, [Taking previous classification information to be used for following classifications indicates the original classification to be input into the second estimator, i.e. that which determines classifications, e.g. determination results. Further, defining the smart device, i.e. terminal, and server, i.e. containing the second estimator, to be communicating and the smart device to receive a classification library, tracking to a determination result library, wherein these two devices are communicating through network 230, indicates the second estimator sending determination result information to the terminal via a network]).
Regarding claim 11, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary further discloses:
wherein the first processor retrains the first trained model by using the sound information estimated to indicate the steady sound by the first trained model ([0154] Further, the base-level context awareness analysis results may be provided to the continuous event classification operations, 720M-3, 720S-3, 720SE-3, and the transient event classification operations, 720M-2, 720S-2, 720SE-2, through feedback loop 740 to be used in the classification operations, [0155] meta-level context awareness analysis may also be provided back to the base-level event classification operations 720M-2, 720S-2, 720SE-2, 720M-3, 720S-3, 720SE-3, through the feedback loop 740 [Retraining the first trained model, i.e. that for determining sound state, tracks to using a feedback loop in the process of transient event classification operations. Further, as the first estimator is defined to only “input sound information”, it is unclear to the examiner how an input element is responsible for training other than inputting information to the model to be “trained”. As such, retraining reasonably tracks to additional inputting]).
Regarding claim 12, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary further discloses:
wherein the sound information concerns sound information on an ambient sound in a space where the sound collector is disposed ([Fig. 8, Context 210 around User 212], [0108] sensors 110 of local system 100 are used to detect context data of context 210, [0138] transient event detection unit 440 may determine whether a transient event happens in context 210, [Detecting sound within a certain context around a user indicates the sound information to be ambient sound in the space where the sound collector, i.e. user 212 with device 100, is disposed]).
Regarding claim 16, Chowdhary discloses: an information processing method for use in an information processing system including a terminal and a computer connected to each other via a network ([Fig. 8, Server 220 connected to system 100, including smart device 120, through network 230], [A computer tracks to a processing server, see processing unit 222, the system 100 (containing user device 120) tracks to a terminal in view of Fig. 7 of Chowdhary]), comprising:
by the terminal:
collecting a sound ([Fig. 7, Microphone]);
inputting sound information indicative of the collected sound to a first trained model ([Fig. 9, Feature Data Extracting Unit 330], [0113] Feature extracting unit 330 may be configured to generate feature data from the context data detected by sensors, [Wherein the sensor tracks to a microphone, see [0038]. Further, see the disclosed training model for transient event classification of [0189], indicating that could be used to perform transient event identification]) to estimate whether the sound indicated by the sound information is steady sound or non-steady sound ([Fig. 9, Transient Event Classification Unit 350], [0117] transient event classification unit 350 may analyze, e.g., compare, the feature data with the corresponding classification parameters in the classification parameter library corresponding to the classes in the classification vectors, and determine which class in the vectors matches the detected transient event, [A determination of class indicates an estimation of the correct class, i.e. transient/continuous, see [0006]. Further, matching to a “detected transient event” indicates a required prior estimation of sound to be a transient event, see transient event detection unit 340]); and,
outputting to the computer via the network the sound information estimated to indicate the non-steady sound as output sound information when the sound information is estimated to indicate the non-steady sound ([0117] classification may be determined based on feature data obtained from the raw context data…It is possible that the comparison results show that two mutually exclusive classes in a transient event classification vector, e.g., MTV, both have sufficiently high possibility values to match the detected transient event. Transient event classification unit 350 may output both classes for the meta-level processing to potentially remove the uncertainty thereby, [0125] Client input receiving unit 410 may be configured to receive from a local smart device 120 associated with sensors 110 the context data detected by sensors 110. The context data received from local smart devices 120 may be the raw context data or may be the context data pre-processed locally at the smart device 120, [Outputting context data, i.e. that used for classification, to a client server 220, i.e. computer, connected to user device 120 through network 230 as previously disclosed, indicates the context data to be representative of sound information estimated to indicate non-steady state sound based on the transient classification determination performed on the user device 120, i.e. pre-processed context data]).
Chowdhary is not relied upon to disclose:
refraining from outputting to the computer via the network the sound information estimated to indicate the steady state sound when the sound information is estimated to indicate the steady sound.
Kobayashi is relied upon to disclose:
refraining from outputting to the computer via the network the sound information estimated to indicate the steady state sound when the sound information is estimated to indicate the steady sound ([0032] The stationary component removal unit 117 removes a stationary noise component from an outside acoustic signal after frequency conversion which is output by the frequency analysis unit 111… The inclusion of the stationary component removal unit 117 removes a stationary noise, such as a running sound of the vehicle in question, [Wherein the running sound of a vehicle in question would clearly be steady state; therefore, stationary noises are steady]).
Chowdhary and Kobayashi are considered analogous art within noise removal for purposes of sound classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary to incorporate the teachings of Kobayashi, because of the novel way to improve audibility of outside sounds to a driver inside a vehicle through removal of stationary noise, allowing easier detection of danger points likely represented in transient sounds (Kobayashi, [0006]-[0008], [0032]).
Chowdhary further discloses:
by computer:
acquiring the output sound information ([Fig. 10, Client Input Receiving Unit 410], [A client tracks to a terminal, i.e. user device]); and,
estimating an action of a person from a resulting output obtained by inputting the output sound information acquired by the acquisition part to a second trained model ([Fig. 13, Receive base-level event classifications], [See classification of transient events using a training model, [0189]]) indicative of a relevance between the output sound information and action information on an action of a person ([Fig. 13, Remove Unlikely Classifications 830], [0072] comparing the relevant feature data with the classification parameter of the class to obtain a value representing how likely the detected event belongs to this class of event, [0156] A spatial environment transient event of “dropping utensil” will nullify a spatial environment continuous event of “on street”, [0159] In example operation 840, the remaining concurrent event classifications may be combined, which may include combining transient event classifications of different vectors, in sub-operation 850. For example, “phone ring” as a transient event in spatial environment context or sound context may be combined with “sit-to-stand” motion transient event. The combination results will strengthen a context awareness result that a user is standing up to reach a phone and to answer an incoming phone call, [Wherein the feature data tracks to sound information and event classes track to action information on an action of a person. Removing unlikely classifications indicates a determination that those unlikely classes aren’t relevant. In view of the classification probabilities of Kobayashi output from neural network 112 based on a noise signal with stationary components removed, indicating the exclusively transient signal of Kobayashi could be used as the multi-transient event signal of Chowdhary without a change in functionality to Chowdhary. The indication of the system of Chowdhary (Fig. 11) having individual paths for classifying transient/continuous events does not necessarily require the input signal to be consisting of those mixed sound types. The transient-exclusive signal of Kobayashi could be used as output from the “filter raw data to remove noise 622” step of Chowdhary as Kobayashi explicitly discloses removing stationary sounds to reduce noise]).
Regarding claim 17, Chowdhary discloses: a non-transitory computer readable recording medium storing an information program for use in an information processing system including a terminal and a computer connected to each other via a network ([Fig. 8, Server 220 connected to system 100, including smart device 120, through network 230], [A computer tracks to a processing server, see processing unit 222, the system 100 (containing user device 120) tracks to a terminal in view of Fig. 7 of Chowdhary]), the information processing program
causing the terminal to execute a process of :
collecting a sound ([Fig. 7, Microphone]);
inputting sound information indicative of the collected sound to a first trained model ([Fig. 9, Feature Data Extracting Unit 330], [0113] Feature extracting unit 330 may be configured to generate feature data from the context data detected by sensors, [Wherein the sensor tracks to a microphone, see [0038]. Further, see the disclosed training model for transient event classification of [0189], indicating that could be used to perform transient event identification]) to estimate whether the sound indicated by the sound information is steady sound or non-steady sound ([Fig. 9, Transient Event Classification Unit 350], [0117] transient event classification unit 350 may analyze, e.g., compare, the feature data with the corresponding classification parameters in the classification parameter library corresponding to the classes in the classification vectors, and determine which class in the vectors matches the detected transient event, [A determination of class indicates an estimation of the correct class, i.e. transient/continuous, see [0006]. Further, matching to a “detected transient event” indicates a required prior estimation of sound to be a transient event, see transient event detection unit 340]); and,
outputting to the computer via the network the sound information estimated to indicate the non-steady sound as output sound information when the sound information is estimated to indicate the non-steady sound ([0117] classification may be determined based on feature data obtained from the raw context data…It is possible that the comparison results show that two mutually exclusive classes in a transient event classification vector, e.g., MTV, both have sufficiently high possibility values to match the detected transient event. Transient event classification unit 350 may output both classes for the meta-level processing to potentially remove the uncertainty thereby, [0125] Client input receiving unit 410 may be configured to receive from a local smart device 120 associated with sensors 110 the context data detected by sensors 110. The context data received from local smart devices 120 may be the raw context data or may be the context data pre-processed locally at the smart device 120, [Outputting context data, i.e. that used for classification, to a client server 220, i.e. computer, connected to user device 120 through network 230 as previously disclosed, indicates the context data to be representative of sound information estimated to indicate non-steady state sound based on the transient classification determination performed on the user device 120, i.e. pre-processed context data]).
Chowdhary is not relied upon to disclose:
refraining from outputting to the computer via the network the sound information estimated to indicate the steady state sound when the sound information is estimated to indicate the steady sound.
Kobayashi is relied upon to disclose:
refraining from outputting to the computer via the network the sound information estimated to indicate the steady state sound when the sound information is estimated to indicate the steady sound ([0032] The stationary component removal unit 117 removes a stationary noise component from an outside acoustic signal after frequency conversion which is output by the frequency analysis unit 111… The inclusion of the stationary component removal unit 117 removes a stationary noise, such as a running sound of the vehicle in question, [Wherein the running sound of a vehicle in question would clearly be steady state; therefore, stationary noises are steady]).
Chowdhary and Kobayashi are considered analogous art within noise removal for purposes of sound classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary to incorporate the teachings of Kobayashi, because of the novel way to improve audibility of outside sounds to a driver inside a vehicle through removal of stationary noise, allowing easier detection of danger points likely represented in transient sounds (Kobayashi, [0006]-[0008], [0032]).
causing the computer to execute a process of:
acquiring the output sound information ([Fig. 10, Client Input Receiving Unit 410], [A client tracks to a terminal, i.e. user device]); and,
estimating an action of a person from a resulting output obtained by inputting the output sound information acquired by the acquisition part to a second trained model ([Fig. 13, Receive base-level event classifications], [See classification of transient events using a training model, [0189]]) indicative of a relevance between the output sound information and action information on an action of a person ([Fig. 13, Remove Unlikely Classifications 830], [0072] comparing the relevant feature data with the classification parameter of the class to obtain a value representing how likely the detected event belongs to this class of event, [0156] A spatial environment transient event of “dropping utensil” will nullify a spatial environment continuous event of “on street”, [0159] In example operation 840, the remaining concurrent event classifications may be combined, which may include combining transient event classifications of different vectors, in sub-operation 850. For example, “phone ring” as a transient event in spatial environment context or sound context may be combined with “sit-to-stand” motion transient event. The combination results will strengthen a context awareness result that a user is standing up to reach a phone and to answer an incoming phone call, [Wherein the feature data tracks to sound information and event classes track to action information on an action of a person. Removing unlikely classifications indicates a determination that those unlikely classes aren’t relevant. In view of the classification probabilities of Kobayashi output from neural network 112 based on a noise signal with stationary components removed, indicating the exclusively transient signal of Kobayashi could be used as the multi-transient event signal of Chowdhary without a change in functionality to Chowdhary. The indication of the system of Chowdhary (Fig. 11) having individual paths for classifying transient/continuous events does not necessarily require the input signal to be consisting of those mixed sound types. The transient-exclusive signal of Kobayashi could be used as output from the “filter raw data to remove noise 622” step of Chowdhary as Kobayashi explicitly discloses removing stationary sounds to reduce noise]).
Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhary in view of Kobayashi, further in view of Boudreau et al. (US-20210343309-A1), hereinafter Boudreau.
Regarding claim 2, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary in view of Kobayashi does not disclose:
wherein the output sound information is image information indicative of a spectrogram or frequency response of the sound collected by the sound collector.
Boudreau discloses:
wherein the output sound information is image information indicative of a spectrogram or frequency response of the sound collected by the sound collector ([0073] time-transient spectrogram image in a single image results in high value data to the image classification algorithm. The tonal and time-transient spectrogram image provides a fingerprint of the dominant features of a sound event, [A spectrogram of transient, i.e. non-steady, sound events for classification indicates the spectrogram to be image information indicative of sound information collected]).
Chowdhary, Kobayashi, and Boudreau are considered analogous art within transient sound event recognition/classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Boudreau, because of the novel way to remove background noise from sound events shown on spectrogram images, enhancing the contrast of sound events within these spectrograms (Boudreau, [0007]).
Claim(s) 3-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhary in view of Kobayashi, further in view of Trent, Jr. (US-20210272560-A1), hereinafter Trent, further in view of Biswas (US-20180358028-A1), hereinafter Biswas.
Regarding claim 3, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary in view of Kobayashi does not disclose:
wherein the first processor extracts, from the sound information estimated to indicate the non-steady sound, sound information in a first frequency band having a highest sound pressure level, and
converts the extracted sound information in the first frequency band to sound information in a second frequency band lower than the first frequency band, the converted sound information in the second frequency band being generated as the output sound information.
Trent discloses:
wherein the first processor extracts ([In view of the previously disclosed first estimator of Chowdhary]), from the sound information estimated to indicate the non-steady sound ([In view of the previously disclosed non-steady state determination of Chowdhary]), sound information in a first frequency band having a highest sound pressure level ([0035] Subaudible tones may refer to any tones that may be at a frequency or amplitude (e.g., sound level), [0054] In some implementations, the subaudible tones may be infrasonic tones in a low frequency range (e.g., between 25-250 Hz), ultrasonic tones in a high frequency range (e.g., between 14-20 KHz), such as between 14-20 KHz, , or at a sound level between 10-25 dB [Defining two ranges for subaudible tones indicates two frequency bands, wherein one, i.e. the first, is representing an ultrasonic frequency which will inherently have a higher sound pressure than the low frequency band as Trent defines sound pressure, i.e. level, to be directly related to frequency. In view of Applicant defining “an ultrasonic band that has a highest sound pressure level” (see [0026]) indicating the ultrasonic tones of Trent satisfy this element]).
Chowdhary, Kobayashi, and Trent are considered analogous art within speech classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Trent, because of the novel way to detect if hardware has been compromised and/or a digital recording has been used to circumvent voice verification through subaudible tone analysis, improving speaker verification techniques (Trent, [0003]).
Chowdhary in view of Kobayashi, further in view of Trent does not disclose:
converts the extracted sound information in the first frequency band to sound information in a second frequency band lower than the first frequency band, the converted sound information in the second frequency band being generated as the output sound information.
Biswas discloses:
converts the extracted sound information in the first frequency band to sound information in a second frequency band lower than the first frequency band ([0065] At the decoder 502 of FIG. 5, after QMF analysis 504 of the decoded signal, the expansion process 506 is applied first, and the A-SPX operation 508 subsequently reproduces the higher subband samples from the expanded signal in the lower frequencies), the converted sound information in the second frequency band being generated as the output sound information ([Fig. 5, Audio Out 512], [Referring to Fig. 5, the output representing the higher subband samples in lower frequencies is clearly used for audio synthesis, indicating it to be generated as output sound information]).
Chowdhary, Kobayashi, Trent, and Biswas are considered analogous art within transient sound event classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi, further in view of Trent to incorporate the teachings of Biswas, because of the novel way to develop a signal-dependent companding system which can adaptively apply companding based on the input signal content for improved discrimination between speech and tonal audio content (Biswas, [0009]).
Regarding claim 4, Chowdhary in view of Kobayashi, further in view of Trent, further in view of Biswas discloses: the information processing system according to claim 3.
Biswas further discloses:
wherein the output sound information has accompanying information indicative of a width of the first frequency band ([0045] Each QMF time-slot is equal to a stride and in each QMF time-slot there are 64 uniformly spaced subbands, [Defining the number of subbands, in view of the previously disclosed high-frequency band, i.e. those above 1kHz, indicates the combination of total subbands and frequency range is accompanying information indicative of a width of the first frequency band]).
Regarding claim 5, Chowdhary in view of Kobayashi, further in view of Trent, further in view of Biswas discloses: the information processing system according to claim 3.
Chowdhary further discloses:
wherein the second trained model has learned by machine learning a relevance between the sound information in the second frequency band and having the accompanying information and the action information ([Fig. 13, Remove Unlikely Classifications 830], [0072] comparing the relevant feature data with the classification parameter of the class to obtain a value representing how likely the detected event belongs to this class of event, [0156] A spatial environment transient event of “dropping utensil” will nullify a spatial environment continuous event of “on street”, [0159] In example operation 840, the remaining concurrent event classifications may be combined, which may include combining transient event classifications of different vectors, in sub-operation 850. For example, “phone ring” as a transient event in spatial environment context or sound context may be combined with “sit-to-stand” motion transient event. The combination results will strengthen a context awareness result that a user is standing up to reach a phone and to answer an incoming phone call, [Wherein the feature data tracks to sound information and event classes track to action information on an action of a person. Removing unlikely classifications indicates a determination that those unlikely classes aren’t relevant based on the comparison of feature data to previously classified feature sets/events. Further, the feature data of Chowdhary could be limited to the second frequency band as previously disclosed in Biswas without a change in functionality to Chowdhary as Chowdhary discloses high-frequency data processing for transient events, i.e. defining accompanying information, see [0042]-[0044]. Further still, consider the machine learning operations disclosed in [0093] of Chowdhary and trained model of [0189]]).
Regarding claim 6, Chowdhary in view of Kobayashi, further in view of Trent, further in view of Biswas discloses: the information processing system according to claim 3.
Trent further discloses:
wherein the first frequency band is an ultrasonic band that has a highest sound pressure level among a plurality of predetermined frequency bands ([0035] Subaudible tones may refer to any tones that may be at a frequency or amplitude (e.g., sound level), [0054] In some implementations, the subaudible tones may be infrasonic tones in a low frequency range (e.g., between 25-250 Hz), ultrasonic tones in a high frequency range (e.g., between 14-20 KHz), such as between 14-20 KHz, , or at a sound level between 10-25 dB [Defining two ranges for subaudible tones indicates two frequency bands, wherein one, i.e. the first, is representing an ultrasonic frequency which will inherently have a higher sound pressure than the low frequency band as Trent defines sound pressure, i.e. level, to be directly related to frequency]).
Claim(s) 9, 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhary in view of Kobayashi, further in view of Biswas.
Regarding claim 9, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 8.
Chowdhary in view of Kobayashi does not disclose:
wherein the second processor inputs to a device a control signal for controlling the device according to the action information indicative of the action estimated by the second processor, and
determines that the resulting output is wrong when receiving from the device a cancellation order of the control indicated by the control signal.
Biswas discloses:
wherein the second processor inputs to a device a control signal for controlling the device according to the action information indicative of the action estimated by the second processor ([0070] The system includes a detection mechanism 405 to detect the peakness of a signal in order to help generate an appropriate control signal for the compander function, [0126] a second interface receiving a bitstream encoding a companding control mode from a controller classifying the input audio signal based on signal characteristics, and switching among a plurality of companding modes based on the classified input audio signal, [An interface responsible for receiving a bitstream indicates that interface to be representing a device, i.e. capable of processing the bitstream. Further, controlling the device based on signal classification, i.e. action information, indicates the bitstream to be representing the estimation of the second estimator as previously defined in Chowdhary]), and
determines that the resulting output is wrong when receiving from the device a cancellation order of the control indicated by the control signal ([In view of the previous claim element, which discloses a device which receives a control signal as input for controlling the device, it is unclear to the examiner how the device is now responsible for sending control signals indicating controls. Further, it is unclear what is receiving the control signal if not the device previously cited for receiving control signals. Receiving a “cancellation” control signal from a device which only is defined to receive signals indicates the “receiving” of this element is performed by the same element and/or not possible, further indicating that the determining step defined here never occurs or inherently occurs when the device receives the control signal]).
Chowdhary, Kobayashi, and Biswas are considered analogous art within transient sound event classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Biswas, because of the novel way to develop a signal-dependent companding system which can adaptively apply companding based on the input signal content for improved discrimination between speech and tonal audio content (Biswas, [0009]).
Regarding claim 14, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary in view of Kobayashi does not disclose:
wherein the first processor extracts, from the sound information estimated to indicate the non-steady sound, sound information in a plurality of first frequency bands,
converts the extracted sound information in the first frequency bands to sound information in a second frequency band that is the lowest first frequency band among the first frequency bands, and
synthesizes the converted sound information pieces in the second frequency band, the synthesized sound information being generated as the output sound information.
Biswas discloses:
wherein the first processor extracts ([In view of the previously disclosed first estimator of Chowdhary]), from the sound information estimated to indicate the non-steady sound ([In view of the previously disclosed non-steady sound determination of Chowdhary]), sound information in a plurality of first frequency bands ([0053] the high frequency portions (e.g., audio components above 6 kHz) of the audio signal could be coded with an advanced spectral extension (A-SPX) tool. Additionally it may be desirable to use only the signal above 1 kHz (or a similar frequency) to guide the noise-shaping. In such a case only those subbands in the range 1 kHz to 6 kHz may be used, [Defining subbands (plurality emphasized) indicates a plurality of frequency bands]),
converts the extracted sound information in the first frequency bands to sound information in a second frequency band that is the lowest first frequency band among the first frequency bands ([0065] At the decoder 502 of FIG. 5, after QMF analysis 504 of the decoded signal, the expansion process 506 is applied first, and the A-SPX operation 508 subsequently reproduces the higher subband samples from the expanded signal in the lower frequencies), and
synthesizes the converted sound information pieces in the second frequency band ([Fig. 5, QMF Synthesis 510], [In view of the system defined in Fig. 5, it is clear that the reproduced samples in lower frequencies at 508 are passed into the synthesis operation]), the synthesized sound information being generated as the output sound information ([Fig. 5, Audio Out 512], [In view of Fig. 5, it is clear that the synthesized audio is sent/generated as output]).
Chowdhary, Kobayashi, and Biswas are considered analogous art within transient sound event classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Biswas, because of the novel way to develop a signal-dependent companding system which can adaptively apply companding based on the input signal content for improved discrimination between speech and tonal audio content (Biswas, [0009]).
Regarding claim 15, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary in view of Kobayashi does not disclose:
wherein the first processor extracts, from the sound information estimated to indicate the non-steady sound, sound information in a first frequency band concerning the non-steady sound among a plurality of first frequency bands,
converts the extracted sound information in the first frequency band to sound information in a second frequency band that is the lowest first frequency band among the first frequency bands, and
synthesizes the converted sound information in the second frequency band, the synthesized sound information being generated as the output sound information.
Biswas discloses:
wherein the first processor extracts, from the sound information estimated to indicate the non-steady sound, sound information in a first frequency band concerning the non-steady sound among a plurality of first frequency bands ([0053] it may be desirable to use only the signal above 1 kHz (or a similar frequency) to guide the noise-shaping. In such a case only those subbands in the range 1 kHz to 6 kHz may be used,, [0065] the envelope data for the higher frequencies may be extracted from the yet uncompressed subband samples, [The higher frequency portion itself inherently forms a band, i.e. a band of 1kHz and up, in view of the defined plurality of subbands indicating any of these can represent a first frequency band, wherein Biswas also discloses transient sound classification, see [0062]),
converts the extracted sound information in the first frequency band to sound information in a second frequency band that is the lowest first frequency band among the first frequency bands ([0065] At the decoder 502 of FIG. 5, after QMF analysis 504 of the decoded signal, the expansion process 506 is applied first, and the A-SPX operation 508 subsequently reproduces the higher subband samples from the expanded signal in the lower frequencies, [Reproducing high frequency samples in lower frequencies inherently indicates that the second frequency will be the lowest frequency band as compared to the original band being lowered]), and
synthesizes the converted sound information in the second frequency band ([Fig. 5, QMF Synthesis 510], [In view of the system defined in Fig. 5, it is clear that the reproduced samples in lower frequencies at 508 are passed into the synthesis operation]), the synthesized sound information being generated as the output sound information ([Fig. 5, Audio Out 512], [In view of Fig. 5, it is clear that the synthesized audio is sent/generated as output]).
Chowdhary, Kobayashi, and Biswas are considered analogous art within transient sound event classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Biswas, because of the novel way to develop a signal-dependent companding system which can adaptively apply companding based on the input signal content for improved discrimination between speech and tonal audio content (Biswas, [0009]).
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhary in view of Kobayashi, further in view of Fellers et al. (US-20160005413-A1), hereinafter Fellers.
Regarding claim 7, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary in view of Kobayashi does not disclose:
wherein the first processor estimates the sound indicated by the sound information to be the non-steady sound when an estimation error of the first trained model is not less than a threshold, and
changes the threshold such that a frequency of estimations of the non-steady sound is not greater than a reference frequency.
Fellers discloses:
wherein the first processor estimates the sound indicated by the sound information to be the non-steady sound when an estimation error of the first trained model is not less than a threshold ([0415] For example, if possible transient values range from 0 to 1, a range of transient values between 0.9 and 1 may correspond to a definite and/or a severe transient event, [Defining a transient value for determining transient events based on a frequency-band-weighted logarithmic power (WLP), see equation of Fig. 11D, wherein the WLP is dependent upon frequency coefficient estimation, see Eq. 15, indicates the transient value is representing an “estimation error”, i.e. estimation of frequency weights/coefficients, not being less than a threshold when combined, i.e. the transient value threshold of 0.9 for determining definite transient events (indicating low error in the transient estimation to be classified as “definite”), which are directly affected by the frequency coefficients indicating a required coefficient estimation error threshold for determining transient values. Further, an estimation error of the first trained model, wherein the first trained model outputs and estimation representing “…whether the sound indicated by the sound information is steady sound or non-steady sound…”, indicates the estimation error to be a binary determination, i.e. steady/non-steady. Therefore, a transient threshold value represents the estimation error value, i.e. a binary determination based upon a comparison of the generated transient score to the threshold 0.9 transient value score for classification]), and
changes the threshold such that a frequency of estimations of the non-steady sound is not greater than a reference frequency ([Fig. 11D, Upper Threshold TH], [0436] if a raw transient value is greater than or equal to the upper threshold T.sub.H, the transient control value is set to its maximum value, [The transient control value tracks to the upper threshold of Fig. 11D]).
Chowdhary, Kobayashi, and Fellers are considered analogous art within transient sound event detection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Fellers, because of the novel way to decorrelate audio from different channels without having to convert frequency coefficients, reducing the amount of required data and complexity of encoding/decoding algorithms (Fellers, [0004]-[0006]).
Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhary in view of Kobayashi, further in view of Trent.
Regarding claim 13, Chowdhary in view of Kobayashi discloses: the information processing system according to claim 1.
Chowdhary in view of Kobayashi does not disclose:
wherein the sound information acquired by the sound collector includes a sound in an ultrasonic band.
Trent discloses:
wherein the sound information acquired by the sound collector includes a sound in an ultrasonic band ([0054] the processor 310 causes one or more subaudible tones to be generated by one or more speakers (710). As described with respect to FIG. 3, a subaudible tone may be any tone having a frequency or an amplitude that may not be easily or normally detected by human hearing. In some implementations, the subaudible tones may be infrasonic tones in a low frequency range (e.g., between 25-250 Hz), ultrasonic tones in a high frequency range (e.g., between 14-20 KHz)).
Chowdhary, Kobayashi, and Trent are considered analogous art within speech classification. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhary in view of Kobayashi to incorporate the teachings of Trent, because of the novel way to detect if hardware has been compromised and/or a digital recording has been used to circumvent voice verification through subaudible tone analysis, improving speaker verification techniques (Trent, [0003]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Liu et al. (US-20210350822-A1) discloses “A system automatically controls an electronic device's audio by detecting an active sound source presence within an auditory detection space. The system transitions the electronic device to selectively output a desired sound when the active sound source presence is detected and detects sound in the auditory detection space. The system enhances sound and transforms it into electrical signals. The system converts the electrical signals into a digital signal and identifies active sound segments in the digital signals. The system attenuates noise components in the digital signals and locates the physical location of the active sound source. It adjusts an output automatically by muting a second sound source in a second detection space” (abstract). [0011] discloses removing continuous noise. See entire document.
Neumann et al. (US-20230260528-A1) discloses “The present document relates to a method of determining a perceptual impact of an amount of echo or reverberation in an degraded audio signal on a perceived quality thereof, wherein the degraded audio signal is received from an audio transmission system, wherein the degraded audio signal is obtained by conveying through said audio transmission system a reference audio signal such as to provide said degraded audio signal. The method includes performing a windowing operation on the degraded and reference audio signal by multiplying these with a window function to yield degraded and reference digital audio samples. Local estimates of an amount of echo or reverberation are determined on the basis of these samples” (abstract). [0077] discloses partially removing steady state noise from degraded audio signals. See entire document.
Shin et al. (US-20210327449-A1) discloses “An electronic device for speech recognition includes a multi-channel microphone array required for remote speech recognition. The electronic device improves efficiency and performance of speech recognition of the electronic device in a space where noise other than speech to be recognized exists. A control method includes receiving a plurality of audio signals output from a plurality of sources through a plurality of microphones and analyzing the audio signals and obtaining information on directions in which the audio signals are input and information on input times of the audio signals. A target source for speech recognition among the plurality of sources is determined on the basis of the obtained information on the directions in which the plurality of audio signals are input, and the obtained information on the input times of the plurality of audio signals, and an audio signal obtained from the determined target source is processed” (abstract). See entire document.
He et al. (CN-111968662-B) discloses “The disclosure relates to a processing method and device of an audio signal and a storage medium. The method comprises the following steps: acquiring a noise-carrying frequency power spectrum and a noise power spectrum to be processed; determining a first noise component according to the power characteristics of the noisy frequency power spectrum at each frequency point; the first noise component is a steady-state noise component contained in both the noisy frequency power spectrum and the noise power spectrum; subtracting the first noise component from the noisy frequency power spectrum to obtain a noisy frequency component; subtracting the first noise component from the noise power spectrum to obtain a second noise component; determining a frequency domain estimation signal according to the noisy frequency component and the second noise component; and performing time-frequency conversion based on the frequency domain estimation signal to obtain a noise reduction audio signal. Through the technical scheme of the embodiment of the disclosure, the common steady-state noise component is removed from the noise power spectrum of the noisy frequency power spectrum, and then the noise reduction treatment is carried out on the audio signal, so that the treatment deviation caused by the steady-state noise can be reduced, and the noise reduction effect is improved.” (abstract). He mentions removing steady state noise from power spectrums (see pg. 8 of English translation). See entire document.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE JOHN WITHEY whose telephone number is (703)756-1754. The examiner can normally be reached Monday - Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THEODORE WITHEY/Examiner, Art Unit 2655 /ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655