Response to Amendment
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 5-6, 9-12, 14-15, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20240411507 A1) in view of Kemmerer et al. (US 20230229383 A1).
In regard to claim 1, Wang teaches a method for providing an environmental audio alert on a personal audio device by an electronic device connectable to the personal audio device (Wang, Fig. 1, Para. 50, the user interacts with the noise reduction headphones through a mobile phone. When an alarm sound indicating danger appears around the user, the noise reduction headphones can notify the user), the method comprising: determining head direction of a user wearing the personal audio device in an environment in a time frame, based on sensor data from at least one sensor of the electronic device (Wang, Fig. 5-6; Para. 152, the mobile phone locates the acoustic source of the alarm sound by using the audio information collected by the microphone of the left headphone of the noise reduction headphones and the audio information collected by the microphone of the right headphone to obtain the location information of the alarm sound relative to the user. The location information usually includes a horizontal direction angle θ of the alarm sound relative to the user. FIG. 5 shows an example of the horizontal direction angle θ of the alarm sound relative to the user (specifically, the center point of the user's head); Para. 156, the microphone of the mobile phone can also collect sounds in the external environment to obtain audio information. Therefore, in some other embodiments of this application, the mobile phone can obtain the audio information collected by the microphone array embedded in the mobile phone, and locate the acoustic source of the alarm sound by using the audio information collected by the microphone array to obtain the location information of the alarm sound relative to the mobile phone); detecting a plurality of audio events occurring in the environment in the time frame using circuitry configured to capture audio events (Wang, Para. 141-142, a mobile phone and noise reduction headphones are all provided with a microphone. Therefore, the microphone in the mobile phone or the noise reduction headphones may collect the sounds in the external environment to obtain the audio information. It should be further noted that after a headphone smart alarm sound function is enabled, when the noise reduction headphones are on, the mobile phone or the noise reduction headphones may collect the sounds in the external environment periodically or in real time to obtain the audio information); generate a spatial binaural audio alert by localizing a virtual sound source for the determined audio event with respect to the determined head direction of the user based on a determined direction of a source of the determined audio event (Wang, Para. 169 and 170, the mobile phone processes a standard alarm sound based on the location information of the alarm sound relative to the user to obtain 3D notification sounds; the standard alarm sound may be processed by using a 3D sound technology to obtain the alarm sound carrying the direction. After the alarm sound carrying the direction is output to the user, the user can feel the direction of the alarm sound); providing the generated spatial binaural audio alert to the personal audio device (Wang, Para. 196, Then, the noise reduction headphones play the 3D notification sounds to notify the user that the alarm sound appears around and a safety problem exists).
Wang does not specifically teach determining a direction of a source of each of the detected plurality of audio events based on frequency spectrums related to the detected plurality of audio events; determining an audio event among the plurality of audio events based on the determined direction of the source of each of the detected plurality of audio events with respect to the determined head direction of the user.
However, Kemmerer teaches determining a direction of a source of each of the detected plurality of audio events based on frequency spectrums related to the detected plurality of audio events (Kemmerer, Fig. 4; Para. 122-123, At 630, filters are applied to the extracted features and to further classify events associated with the features of the input audio. For example, the feature extraction process may include filtering and transformation on the input audio signals (e.g. converting to a frequency domain “melspectrogram”; The output 654 provides direction of arrival estimation, such that the directions of the events detected, such as speech 641, sounds 642 associated with car horns, dog barking 643, and may be determined by determining an X-Y-Z coordinate of the origin of each event); determining an audio event among the plurality of audio events based on the determined direction of the source of each of the detected plurality of audio events with respect to the determined head direction of the user (Kemmerer, Fig. 5; Para. 107, the event determination algorithm may monitor and calculate an average magnitude (i.e., decibel level) of the background noise and consider incoming sounds that exceeds the average decibel level as event candidates; Para. 109, the event determination algorithm may determine the event based on the ambient sound by ruling out incidents of events based on the location of the event relative to the wearable device; Para. 115, The incidents 510 represent events that exceeds the threshold sound level and fall within the field of view 520. As a result, the incidents 510 are not reported to the user. The incidents 512 represent events that exceed the threshold sound level and fall outside the field of view 520. Furthermore, the incidents 512 are associated with a non-focused state of the user and therefore not reported to the user. The incidents 514 represent events that exceed the threshold sound level, that fall outside of the field of view 520, and that are associated with a focused state of the user. The incidents 514 are reported to the user, along with the associated location attribute, which may be presented as both a spatialized audio cue and by the visual representation 500).
Wang and Kemmerer are analogous art because they both pertain to providing information through personal audio devices.
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to use location of the event relative to the user’s head in determining the event to be reported (as taught by Kemmerer) resulting in predictable result of provide spatialized feedback to the user of events that may merit attention.
In regard to claim 2, Combination of Wang and Kemmerer teach the method of claim 1, wherein the head direction is determined while the user is on a call or listening to music using the personal audio device (Wang, Para. 66, The speaker 270A, also referred to as a “loudspeaker”, is configured to convert an audio electrical signals into a sound signal. The electronic device 200 may play music or answer a hands-free call through the speaker 270A), the personal audio device including around-the-ear, over-the-ear and in-ear headsets, headphones, earphones, earbuds, hearing aids, audio eyeglasses, head-worn audio devices, shoulder- or body-worn acoustic devices, during an activity of the user, the activity including, sitting, walking, jogging, running, or any movement (Wang, Fig. 6, a user wears noise reduction headphones on the head and wears a smart watch on the wrist, and a mobile phone establishes a Bluetooth connection separately to the smart watch and the noise reduction headphones. In this application scenario, the noise reduction headphones can also exchange information with the smart watch through a connection channel such as the Bluetooth, so that when an alarm sound indicating danger appears around the user, the noise reduction headphones can notify the user).
In regard to claim 3, Combination of Wang and Kemmerer teach the method of claim 1, wherein determining the head direction of the user in the time frame comprises: determining the time frame based on initial time frame (Wang, Para. 168, the preset duration may be set based on an actual requirement. Because two consecutive alarm sounds need to be filtered out through step S404, the preset duration should not be too long. In an example, the preset duration may be set to 30 seconds) or computation time including maximum interaural time delay (ITD), time taken by an audio classification module, time taken by an audio direction determination module, and time taken by a binaural alert generator.
In regard to claim 5, Combination of Wang and Kemmerer teach the method of claim 1, further comprising: determining maximum interaural time differences (ITD) and maximum interaural level differences (ILD) for the user for a detected audio event in the time frame to derive maximum angle deviation from the head direction and an activity of the user, wherein the maximum ITD is determined based on maximum of maximum ITD from previous time frames and ITD from the detected audio event and maximum ILD is determined based on maximum level of the detected audio event; generating a frequency spectrum of head related transfer function (HRTF) for the detected audio event based on the head direction of the user; extracting audio spectral features of the detected audio event using at least one of a discrete fourier transform, a Mel filter bank, and Mel frequency cepstral coefficients (MFCC) (Wang, Para. 19, the processing the standard sound based on the first location information of the alarm sound to obtain the first sound includes: obtaining a head-related transfer function HRTF value corresponding to the first location information of the alarm sound; and performing Fourier transform on the standard sound, and multiplying the standard value by the HRTF value to obtain the first sound); and classifying the detected audio event as noise or significant audio using a convolution neural network on the extracted audio spectral features, historical audio spectral features from a spectral features database, and maximum ITD and maximum ILD (Wang, Fig. 4A, S402; Para. 113, The alarm sound detection model may be a basic network model such as a convolutional neural network (Convolutional Neural Network, CNN) or a long-short term memory (Long-Short Term Memory, LSTM); Para. 121, The training samples include samples that include alarm sounds and samples that do not include alarm sounds. In addition, whether a training sample includes an alarm sound is marked. The alarm sound in the training sample may be, for example, a whistle of a vehicle. Certainly, to enable the alarm sound detection model to predict more types of alarm sounds through training, training samples including whistles of different types of motor vehicles such as cars and motorcycles and training samples including other alarm sounds such as alarm bells can be obtained; Para. 146, the alarm sound detection model has a function of predicting whether audio information that is input into the alarm sound detection model includes an alarm sound. Therefore, after the audio information in the external environment is obtained, whether the audio information includes the alarm sound may be detected by using the alarm sound detection model to obtain the detection result).
In regard to claim 6, Combination of Wang and Kemmerer teach the method of claim 5, wherein the detected audio event is classified based on the environment and significance level of audio and in presence of more than one significant audio (Kemmerer, Para. 107, the event determination algorithm may monitor and calculate an average magnitude (i.e., decibel level) of the background noise and consider incoming sounds that exceeds the average decibel level as event candidates), and a priority is given to the significant audio based on the direction of the audio with respect to the head direction of the user (Kemmerer, Fig. 5; Para. 109, the event determination algorithm may rule out incidents that are within a certain distance to the wearable device (e.g., classifying the incidents caused by the user, such as the user's own speech, etc.), incidents that are within a field of view or in a direction (such as in the front) of the wearable device (e.g., classifying the incidents visually noticeable by the user, such as a computer notification sound from a speaker in front of the user, someone speaking before the user, etc.), or incidents that may be too far away to require the user's attention (e.g., incidents of traffic noises outside residence perimeter, etc.); Para. 115, The incidents 514 represent events that exceed the threshold sound level, that fall outside of the field of view 520, and that are associated with a focused state of the user. The incidents 514 are reported to the user, along with the associated location attribute, which may be presented as both a spatialized audio cue and by the visual representation 500).
In regard to claim 9, Combination of Wang and Kemmerer teach the method of claim 1, wherein the alert includes a gamma binaural audio alert or the alert includes a multimodal alert based on the user being equipped with a wearable device which includes, at least one of, a wristband, wristwatch, augmented reality glasses, smart glasses, ring, necklace, an accessory device, implanted in the user's body, embedded in clothing, or tattooed on the skin and provided to the user via two dimensional or three dimensional simulations (Wang, Para. 266, the smart watch sends a 3D notification sound to the left headphone of the noise reduction headphones and sends a 3D notification sound to the right headphone of the noise reduction headphones).
In regard to claim 10, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 1 as stated above.
In regard to claim 11, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 2 as stated above.
In regard to claim 12, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 3 as stated above.
In regard to claim 14, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 5 as stated above.
In regard to claim 15, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 6 as stated above.
In regard to claim 18, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 9 as stated above.
In regard to claim 19, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 1 as stated above.
In regard to claim 20, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 2 as stated above.
Claim(s) 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20240411507 A1) in view of Kemmerer et al. (US 20230229383 A1) and further in view of Gordon et al. (US 20190281389 A1).
In regard to claim 4, Combination of Wang and Kemmerer do not specifically teach the method of claim 3, wherein determining the head direction comprises: receiving sensor data from a reference point as a reference for the sensor data, wherein the sensor data is received from a sensor block including the at least one sensor, the sensor block including at least one of a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer; calibrating the sensor data by monitoring difference between input and output sensor data of each sensor and adjusting the output sensor data to align with the input; and filtering and smoothing the calibrated sensor data to provide the head direction of the user.
However, the concept of using a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer to detect a direction of the user's head is well known in the art as also taught by Gordon. Gordon teaches the orientation tracking system can include a head-tracking or body-tracking system (e.g., an optical-based tracking system, accelerometer, magnetometer, gyroscope or radar) for detecting a direction in which the user 225 is facing, as well as movement of the user 225 and the personal audio device 10. Position tracking system 352 can also be configured to detect the orientation of the user 225, e.g., a direction of the user's head, or a change in the user's orientation such as a turning of the torso or an about-face movement (Para. 37); a combination of accelerometers, gyroscopes, and magnetometers (e.g., a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer), possibly in addition with a position tracking system 352, may be used to determine the direction that a user is facing or looking. The accelerometers, gyroscopes, magnetometers, and possibly the position tracking system 352, may be disposed in the device 10, which may be a head-worn device such as a headphone or an eyeglass. when a user is standing at an intersection in New York City, information from the position tracking system 352 can be used to track the location of the user, and information from a set of accelerometer, gyroscope, and magnetometer can be used to determine the direction of gaze of the user. The combination of the information can be used to deliver relevant audio content to the user. For example, if the user looks towards one particular direction from the intersection, one or more audio pins describing the points of interest along that direction may be delivered to the device 10 of the user (Para. 64).
Wang, Kemmerer and Gordon are analogous art because they all pertain to providing information through personal audio devices.
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to use a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer (as taught by Gordon) resulting in predictable result of detecting a direction of the user's head.
In regard to claim 13, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 4 as stated above.
Claim(s) 7 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20240411507 A1) in view of Kemmerer et al. (US 20230229383 A1) and further in view of Magariyacci et al. (US 20190069118 A1).
In regard to claim 7, Combination of Wang and Kemmerer teach the method of claim 1, wherein determining the direction of the source of each of the detected plurality of audio events comprises: identifying frequency spectrum of a head related transfer function (HRTF) for each of the plurality of audio events (Wang, Para. 181, obtaining an HRTF value of the left headphone and an HRTF value of the right headphone by using location information of the alarm sound relative to the user, where the two HRTF values correspond to the location information); generating horizontal plane directivity (HPD), head related impulse response (HRIR) from the HRTF frequency spectrum (Wang, Para. 182, The horizontal direction angle of the alarm sound relative to the user is used as a filtering factor to filter a plurality of HRTF values stored in the mobile phone to obtain the HRTF value of the left headphone and the HRTF value of the right headphone that match the horizontal direction angle of the alarm sound relative to the user; Para. 185, A head-related impulse response (Head Related Impulse Response, HRIR) is a time domain signal, and a head-related transfer function (Head-Response Transfer Function, HRTF) is a frequency domain signal corresponding to the HRIR); computing interaural time difference (ITD) and interaural level difference (ILD) for left and right ears of the user using the HRIR (Para. 184, the plurality of HRIR values are divided into a plurality of HRIR values for the left headphone and a plurality of HRIR values for the right headphone that are in a one-to-one correspondence with the HRIR values of the left headphone. HRIR values of one pair of left and right headphones correspond to one angle of one alarm sound relative to the user); and determining a direction of an environmental audio event producing source based on significant audio, the ITD and the ILD, the horizontal plane directivity (Wang, Para. 189-190, the location information of the alarm sound relative to the user includes a horizontal direction angle of the alarm sound relative to the user. The horizontal direction angle of the alarm sound relative to the user is used as a filtering factor to filter a plurality of HRIR values stored in the mobile phone to obtain the HRIR value of the left headphone and the HRIR value of the right headphone that match the horizontal direction angle of the alarm sound relative to the user; Convolutional processing is separately performed between the standard alarm sound and the HRIR value of the left headphone and the HRIR value of the right headphone that correspond to the location information to obtain output signals of two ears, to be specific, a 3D notification sound of the left headphone and a 3D notification sound of the right headphone).
Combination of Wang and Kemmerer do not specifically teach generating pinna related transfer function (PRTF) from the HRTF frequency spectrum and spectral cues from the PRTF.
However, Magariyachi teaches generating pinna related transfer function (PRTF) from the HRTF frequency spectrum and spectral cues from the PRTF (Magariyachi, Para. 329, he head-related transfer functions are filters formed according to the diffraction and the reflection of the head, the pinna, and the like of the listener, and the head-related transfer functions vary among individual listeners. Therefore, optimizing the head-related transfer functions for individuals is important in binaural reproduction).
Wang, Kemmerer, and Magariyachi are analogous art because they all pertain to sound processing based on head direction of the user.
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to generate Head-related transfer functions according to the diffraction and the reflection of the head, the pinna, and the like of the listener as taught by Magariyachi in order to optimize the head-related transfer functions for individuals in binaural reproduction.
In regard to claim 16, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 7 as stated above.
Claim(s) 8 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20240411507 A1) in view of Kemmerer et al. (US 20230229383 A1) and further in view of Armstrong (US 20220256300 A1).
In regard to claim 8, Combination of Wang and Kemmerer teach the method of claim 1, wherein generating the spatial binaural audio alert comprises: regenerating interaural time difference (ITD) and interaural level difference (ILD) for the regenerated direction and head related transfer function (HRTF) interpolation, wherein the virtual sound source is localized for regenerating the direction of the source of the audio event with respect to the head direction of the user; determining a frequency of audio playing in the personal audio device and generating the spatial binaural audio alert based on the frequency of the audio and the HRTF (Wang, Para. 232, convolutional processing is separately performed between the standard alarm sound and the HRIR value of the left headphone and the HRIR value of the right headphone that correspond to the location information to obtain output signals of two ears, to be specific, a 3D notification sound of the left headphone and a 3D notification sound of the right headphone. Then, the 3D notification sound of the left headphone and the 3D notification sound of the right headphone are separately multiplied by the distance coefficient gain to obtain 3D notification sounds, carrying energy gains, of the left and right headphones).
Combination of Wang and Kemmerer do not specifically teach adding a delay in the spatial binaural audio alert based on the regenerated ITD.
However, Armstrong teaches adding a delay in the spatial binaural audio alert based on the regenerated ITD (Armstrong, Para. 26-27, HRTFs contain information such as time delay, level difference, and spectral response for audio generated with a particular location relative to a listener in a virtual environment. The HRTF is combined with a raw sound source (such as audio from a game or video) in order to generate output audio. The time delay information within an HRTF is indicative of the time taken for a sound to propagate from the sound source to the listener (this may also be referred to as time-of-arrival). In the case of an HRTF being provided for each of the listener's ears, this can lead to the definition of an interaural time delay (that is, the time delay between each ear receiving the sound) which may be a useful indication of the sound source direction for the listener).
Wang, Kemmerer, and Armstrong are analogous art because they all pertain to providing localized audio output.
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to add delay (as taught by Armstrong) providing a useful indication of the sound source direction for the listener.
In regard to claim 17, the claim is interpreted and rejected for the same reasons as stated in the rejection of claim 8 as stated above.
Response to Arguments
Claim Rejection under 35 USC § 101 is withdrawn based on the claim amendments.
Response to amended claims is considered above in claim Rejections.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHARMIN AKHTER whose telephone number is (571)272-9365. The examiner can normally be reached on Monday - Thursday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Davetta W Goins can be reached on (571) 272.2957. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHARMIN AKHTER/
Examiner, Art Unit 2689
/DAVETTA W GOINS/Supervisory Patent Examiner, Art Unit 2689