Prosecution Insights
Last updated: April 19, 2026
Application No. 18/597,658

INTELLIGENT AREA-BASED SOUND SOURCE SEPARATION

Final Rejection §103
Filed
Mar 06, 2024
Examiner
WITHEY, THEODORE JOHN
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
44%
Grant Probability
Moderate
3-4
OA Rounds
2y 11m
To Grant
90%
With Interview

Examiner Intelligence

Grants 44% of resolved cases
44%
Career Allow Rate
10 granted / 23 resolved
-18.5% vs TC avg
Strong +47% interview lift
Without
With
+46.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
39 currently pending
Career history
62
Total Applications
across all art units

Statute-Specific Performance

§101
22.0%
-18.0% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 23 resolved cases

Office Action

§103
DETAILED ACTION This office action is in response to Applicant’s Amendment/Request for Reconsideration, received on 12/31/2025. Claims 1-2, 8-9, 14-16, and 19 have been amended. Claims 1-20 are pending and have been considered. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments, see pg. 8, filed 12/31/2025, with respect to the objections to claims have been fully considered and are persuasive. The objections of claims 8 and 14 have been withdrawn. Applicant’s arguments, see pgs. 8-10, filed 12/31/2025, with respect to “Rejections under 35 U.S.C. 101” have been fully considered and are persuasive. The rejections of claims 1-19 under 35 U.S.C. 101 have been withdrawn. The examiner would like to note that the amended independent claims have incorporated matter previously deemed to be eligible (see dependent claim 20), namely, adjusting/redefining a target area through performing phase shift/parameter update. Further, the concept of generating time-frequency representations of audio to be updated through application of a phase shift is not a process which can be reasonably performed in the mind with or without the aid of pen and paper (Step 2A, NO). Applicant’s arguments, see pgs. 10-14, filed 12/31/2025, with respect to the rejection(s) of independent claim(s) 1, 15, and 19 (pgs. 11-12) under 35 U.S.C. 102(a)(1) have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Gao, further in view of Kitazawa (US-9712937-B2). Gao has been previously cited to dependent claims. Kitazawa discloses “a sound pickup unit configured to pick up sound signals of a plurality of channels, a detector configured to detect a change in a relative positional relationship between a sound source and the sound pickup unit, a phase regulator configured to regulate a phase of the sound signal in accordance with the relative position change amount detected by the detector, a parameter estimator configured to estimate a variance and spatial correlation matrix of a sound source signal as sound source separation parameters with respect to the phase-regulated sound signal, and a sound source separator configured to generate a separation filter from the estimated parameters, and perform sound source separation” (abstract). See updated rejections below. Applicant’s arguments with respect to claim(s) 2-14, 16-18, 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-3, 6-7, 9-16, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jungmaier et al. (US-20190265345-A1), hereinafter Jungmaier, in view of Gao et al. (US-20250259639-A1), hereinafter Gao, further in view of Kitazawa (US-9712937-B2). Regarding claim 1, Jungmaier discloses: a method of extracting sound from a coverage area having a system with at least one audio input device ([Fig. 1, Coverage Area 80, Microphone system 40], [0105] extract a signal from a specific direction, [0020] a microphone system 40, such as a microphone array), the method comprising: receiving speech signals from one or more audio sources in a plurality of audio sources within the coverage area ([Fig. 1, Facility 20], [0019] detect commands issued by persons 50 located in the facility 20 [In view of Fig. 1 wherein the persons 50 would clearly be producing speech signals, i.e. commands, and the facility is within the coverage area 80]); and, generating a time-frequency representation of the speech signals from the at least one audio input device ([0106] In frequency domain beamforming the microphone signal is e.g. separated into narrowband frequency bins using a short time Fourier transform (STFT) and the data in each bin is processed separately, [A Fourier transform representing frequency components over intervals of time indicates the transform to be a time-frequency representation]). Jungmaier does not disclose: defining, by a machine learning model during an inference operation and based on the time-frequency representation, a target area that is a portion of the coverage area, the target area being definable by a trained criterion. Gao discloses: defining, by a machine learning model during an inference operation and based on the time-frequency representation, a target area that is a portion of the coverage area ([0030] Improved multi-modal audio source channelization systems are configured to improve processing speed of denoising, echo removal, source separating, source localizing, beamforming operations, and/or to reduce the computational resources associated with applying machine learning models to such tasks, [Wherein the beamforming operation of Gao is dependent upon a target area of interest ([0065]), indicating the target area, i.e. beam(s), determination to be performed using machine learning models as part of the beamforming and/or denoising operations of Gao]). Jungmaier and Gao are considered analogous art within speech source separation/cancellation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier to incorporate the teachings of Gao, because of the novel way to improve multi-modal audio source channelization system through the reduction of computational resources required for these tasks by using machine learning models to perform said tasks (Gao, [0030]). Jungmaier further discloses: redefining, during a processing operation, the target area into a redefined target area ([0059] as a function at least of the direction of the persons determined based on the radar system 30, the microphone system 40 steers one (or more) audio beam 100 in the corresponding direction so as to enhance the reception of the audio inputs generated by the person, [Steering a beam in a new direction (in the situation of a non-constant person direction function) indicates that new direction to be a new, redefined target area]). Jungmaier in view of Lee does not disclose: the redefining including updating, in a time-frequency domain, phase parameters applied to the speech signals for the at least one audio input device based on updated sound-source information for the coverage area, the redefining reusing the time-frequency representation so as to avoid an additional inference event by the machine learning model such that negligible additional computational cost is incurred. Kitazawa discloses: the redefining including updating, in a time-frequency domain, phase parameters applied to the speech signals for the at least one audio input device based on updated sound-source information for the coverage area ([Col. 4, Lines 34-40] The detected relative positional relationship between the sound pickup unit 1010 and sound source is output to the phase regulator 1060. The relative positional relationship herein mentioned is, for example, the direction (angle) of a sound source with respect to the sound pickup unit 1010. The phase regulator 1060 performs phase regulation on the input frequency spectrum), the redefining reusing the time-frequency representation so as to avoid an additional inference event by the machine learning model such that negligible additional computational cost is incurred ([As disclosed in the cited section above, regulating the phase of a frequency spectrum (indicating the spectrum has an innate phase prior to regulation) tracks to the regulation being a form of updating phase parameters applied to speech signals, wherein the relative positional relationship of a pickup and output tracks to sound-source information. Further, as there is only one spectrum being analyzed, this indicated “reusing” the time-frequency representation for regulation of the phase of the spectrum]). Jungmaier, Lee, and Kitazawa are considered analogous art within sound source separation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Lee to incorporate the teachings of Kitazawa, because of the novel way to perform sound source separation even when the relative positions of a sound source and sound pickup device change through use of a phase regulator which regulated phases of received sound signals in accordance with determined position changes of transmitters/receivers, improving upon traditional sound source separation methods which cannot function in variable environments (Kitazawa, [Col. 2, Lines 40-67], [Col. 3, Lines 1-10]). Jungmaier further discloses: extracting the speech signals from the target area using the time-frequency representation and the redefined target area ([0105] a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, [Wherein the beam in the specific direction is the redefined target area in view of the previously disclosed steering, i.e. redefining, beams of Jungmaier]); and transmitting the speech signals to a receiver ([0020] The systems 30, 40 and the processing module 70 are operatively coupled together, [0040] each subsystem 90 includes an integrated circuit 92, one or more transmit antennas 94, one or more receive antennas 96, and a substrate 98. In some embodiments, the one or more transmit antennas 94 are arranged in a transmit antenna array, and/or the one or more receive antennas 96 are arranged in a receive antenna array, [Defining the speech subsystems to be containing receivers and transmitters, wherein those speech subsystems are connected to the processing module 70 (see Fig. 2), indicating a transmission of the speech signals gathered by the subsystems to the processing module, i.e. a receiver]). Regarding claim 2, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Gao further discloses: wherein the defining the target area that is a portion of the coverage area is performed by the machine learning model ([0030] Improved multi-modal audio source channelization systems are configured to improve processing speed of denoising, echo removal, source separating, source localizing, beamforming operations, and/or to reduce the computational resources associated with applying machine learning models to such tasks, [Wherein the beamforming operation of Gao is dependent upon a target area of interest ([0065]), indicating the target area determination to be performed using machine learning models as part of the beamforming and/or denoising operations of Gao]); wherein the processing operation is during inference of the machine learning model ([0028] train and apply sophisticated machine learning models to both more efficiently and more effectively isolate targeted audio and ignore noise/defects from audio signal samples relative to manually configured techniques, [Defining trained models to be applied indicates the processing operation is applied during inference]). Regarding claim 3, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Jungmaier further discloses: wherein the trained criterion includes dimensions of the target area ([Fig. 1, Radius R of coverage area 80], [Defining a radius of a coverage area, wherein a target area can represent the entirety of the coverage area, see [0055] of the instant app, indicates the radius to be a dimension of the target area, wherein a circular target area also requires an angular dimension of 360 degrees]). Regarding claim 6, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Jungmaier further discloses: wherein the redefining the target area toward the speech signals during the processing operation includes aligning a target area centerline of the target area to an audio source center point of the one or more audio sources that are producing the speech signals ([Fig. 1, Coverage Area 80/Target Area 20], [Fig. 9A, Audio Beam 100, Person 50], [In view of the plurality of people 50 in the target area 20 of Fig. 1, determining to target a beam towards a specific person, as shown in Fig. 9a, indicates a redefining a target area, i.e. beam 100, through aligning a target area centerline to an audio source center point, i.e. person 50, as the beam and person are centered about the same angle]). Regarding claim 7, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Jungmaier further discloses: wherein the receiving the speech signals from the one or more audio sources in the plurality of audio sources within the coverage area is performed via the at least one audio input device ([0057] the microphone system 40 is configured to generate one or more audio beam 100 (FIG. 9A) via each of which the microphone system is configured to receive audio inputs. These audio inputs are adapted to include the vocal commands mentioned above). Regarding claim 9, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 7. Jungmaier further discloses: wherein the at least one audio input device includes a microphone array having a plurality of audio input devices ([0020] a microphone system 40, such as a microphone array of plural microphones). Gao further discloses: wherein the microphone array is provided by a laptop computer ([0044] Multi-modal audio source channelization systems as discussed herein may be implemented as part of a digital signal processing apparatus, and/or as software, or a software plugin, that is configured for execution on a laptop, [0050] The multi-source audio signal sample may be generated by a single microphone, by an array microphone [Multi-model audio source channelization requiring a multi-source audio for channelization indicating the microphone array is provided by the laptop implementing the channelization]). Regarding claim 10, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Jungmaier further discloses: mixing the extracted speech signals ([0105] a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, i.e. a source of interest may be selected while minimizing undesired interfering signals, [Reducing contamination from other signals indicates that they are also separated/extracted in order to be appropriately reduced, reduction reasonably tracks to a form of mixing]). Regarding claim 11, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Jungmaier further discloses: wherein the extracting the speech signals from the target area forms extracted speech signals ([0106] filtering the microphone signals and combining the outputs to extract (by constructive combining) the desired signal and reject (by destructive combining) interfering signals according to their spatial location, i.e. beamforming may separate sources with overlapping frequency content that originate at different spatial locations, [In view of the previously disclosed target area 20 and predefined threshold distance THD which are used for determining whether sound sources are within a predefined threshold distance (spatial location) to be steered towards, i.e. extracted, indicating this threshold distance radius, i.e. 20 and/or THD of Fig. 1, to be representative of a target area]), the method further comprising mixing the extracted speech signals ([0105] a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, i.e. a source of interest may be selected while minimizing undesired interfering signals, [Reducing contamination from other signals indicates that they are also separated/extracted in order to be appropriately reduced, reduction reasonably tracks to a form of mixing]). Regarding claim 12, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 11. Jungmaier further discloses: wherein the mixing includes at least one of masking and muting the extracted speech signals ([0105] a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, i.e. a source of interest may be selected while minimizing undesired interfering signals which may also be called beamforming, [Reducing contamination from other signals indicates that they are also separated/extracted in order to be appropriately reduced . Reducing the contribution/contamination of a signal is equivalent to muting that signal. Further, the examiner would like to note that due to the disjunctive nature of the claim, a masking operation does not requires a mapping]). Regarding claim 13, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 1. Jungmaier further discloses: wherein the extracting the speech signals from the target area includes suppressing all speech signals from outside the target area ([0069] the microphone system 40 is configured to generate the audio beam for a person 50 as a function of at least the direction of the person relative to the apparatus only if the detected distance of the person 50 relative to the apparatus 10 is inferior or equal to the predetermine threshold distance TH.sub.D., [Determining not to generate a beam for audio beyond the threshold distance indicates that speech signals outside the target area, i.e. the circle formed by the threshold radius, are suppressed. Determining not to collect noise from signals effectively suppresses those signals in view of the microphone array receiving audio]). Regarding claim 14, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 13. Jungmaier further discloses: wherein the extracting the speech signals from the target area further includes suppressing at least one of interfering speech signals and background noise from the speech signals within the target area ([0062] a noise cancellation process of the audio inputs received via the audio beam 100 may be performed based on the audio inputs received by the audio beam and audio inputs received by another beam not defined as a function of the direction of the person, and which will therefore pickup background noise relative to the considered person, [0105] The generation of the audio beams per se may be implemented through any known process. In generating the audio beam, a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, [Reducing the effect of sounds from different beams on the main beam, wherein those other beams are representative of background noise, indicates suppression of interfering speech signals and/or background noise as in the context of a multi-person speaking environment an interfering noise signal is reasonably understood to also be background noise. Further, the examiner would like to note that due to the disjunctive nature of the claim, both elements do not require a mapping]). Regarding claim 15, Jungmaier discloses: a system for extracting speech signals from a coverage area ([Fig. 1, Coverage Area 80, People 50], [0105] extract a signal from a specific direction [Extracting signals corresponding to the locations of people indicates those signals to be speech]), the system comprising: one or more computing systems ([0146] general components and functionality that may be used to implement portions of the embodiment radar and audio system and/or an external computer), each of the computing system having: a processor ([Fig. 14, CPU 1502]); a storage in communication with the processor ([Fig. 14, Memory 1504 connected to processor 1502 through bus 1508]); and, a plurality of audio input devices ([Fig. 1, Microphone system 40], [Wherein that microphone system is defined to be an array, i.e. plurality of devices, (see [0020])]), the one or more computing systems being configured to: receive speech signals from one or more audio sources in a plurality of audio sources within the coverage area ([Fig. 1, Facility 20], [0019] detect commands issued by persons 50 located in the facility 20 [In view of Fig. 1 wherein the persons 50 would clearly be producing speech signals, i.e. commands, and the facility is within the coverage area 80]); and, generate a time-frequency representation of the speech signals from the at least one audio input device ([0106] In frequency domain beamforming the microphone signal is e.g. separated into narrowband frequency bins using a short time Fourier transform (STFT) and the data in each bin is processed separately, [A fourier transform representing frequency components over intervals of time indicates the transform to be a time-frequency representation]). Jungmaier does not disclose: define, by a machine learning model during an inference operation and based on the time-frequency representation, a target area that is a portion of the coverage area, the target area being definable by a trained criterion. Gao discloses: define, by a machine learning model during an inference operation and based on the time-frequency representation, a target area that is a portion of the coverage area ([0030] Improved multi-modal audio source channelization systems are configured to improve processing speed of denoising, echo removal, source separating, source localizing, beamforming operations, and/or to reduce the computational resources associated with applying machine learning models to such tasks, [Wherein the beamforming operation of Gao is dependent upon a target area of interest ([0065]), indicating the target area, i.e. beam(s), determination to be performed using machine learning models as part of the beamforming and/or denoising operations of Gao]). Jungmaier and Gao are considered analogous art within speech source separation/cancellation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier to incorporate the teachings of Gao, because of the novel way to improve multi-modal audio source channelization system through the reduction of computational resources required for these tasks by using machine learning models to perform said tasks (Gao, [0030]). Jungmaier further discloses: redefine, during a processing operation, the target area into a redefined target area ([0059] as a function at least of the direction of the persons determined based on the radar system 30, the microphone system 40 steers one (or more) audio beam 100 in the corresponding direction so as to enhance the reception of the audio inputs generated by the person, [Steering a beam in a new direction (in the situation of a non-constant person direction function) indicates that new direction to be a new, redefined target area]). Jungmaier in view of Lee does not disclose: the redefining including updating, in a time-frequency domain, phase parameters applied to the speech signals for the plurality of audio input devices based on updated sound-source information for the coverage area, the redefining reusing the time-frequency representation so as to avoid an additional inference event by the machine learning model such that negligible additional computational cost is incurred. Kitazawa discloses: the redefining including updating, in a time-frequency domain, phase parameters applied to the speech signals for the plurality of audio input devices based on updated sound-source information for the coverage area ([Col. 4, Lines 34-40] The detected relative positional relationship between the sound pickup unit 1010 and sound source is output to the phase regulator 1060. The relative positional relationship herein mentioned is, for example, the direction (angle) of a sound source with respect to the sound pickup unit 1010. The phase regulator 1060 performs phase regulation on the input frequency spectrum), the redefining reusing the time-frequency representation so as to avoid an additional inference event by the machine learning model such that negligible additional computational cost is incurred ([As disclosed in the cited section above, regulating the phase of a frequency spectrum (indicating the spectrum has an innate phase prior to regulation) tracks to the regulation being a form of updating phase parameters applied to speech signals, wherein the relative positional relationship of a pickup and output tracks to sound-source information. Further, as there is only one spectrum being analyzed, this indicated “reusing” the time-frequency representation for regulation of the phase of the spectrum]). Jungmaier, Lee, and Kitazawa are considered analogous art within sound source separation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Lee to incorporate the teachings of Kitazawa, because of the novel way to perform sound source separation even when the relative positions of a sound source and sound pickup device change through use of a phase regulator which regulated phases of received sound signals in accordance with determined position changes of transmitters/receivers, improving upon traditional sound source separation methods which cannot function in variable environments (Kitazawa, [Col. 2, Lines 40-67], [Col. 3, Lines 1-10]). Jungmaier further discloses: extract the speech signals from the target area using the time-frequency representation and the redefined target area ([0105] a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, [Wherein the beam in the specific direction is the redefined target area in view of the previously disclosed steering, i.e. redefining, beams of Jungmaier]); and transmit the speech signals to a receiver ([0020] The systems 30, 40 and the processing module 70 are operatively coupled together, [0040] each subsystem 90 includes an integrated circuit 92, one or more transmit antennas 94, one or more receive antennas 96, and a substrate 98. In some embodiments, the one or more transmit antennas 94 are arranged in a transmit antenna array, and/or the one or more receive antennas 96 are arranged in a receive antenna array, [Defining the speech subsystems to be containing receivers and transmitters, wherein those speech subsystems are connected to the processing module 70 (see Fig. 2), indicating a transmission of the speech signals gathered by the subsystems to the processing module, i.e. a receiver]). Regarding claim 16, Jungmaier in view of Gao, further in view of Kitazawa discloses: the system of claim 15. Gao further discloses: wherein the target area is defined via the machine learning model ([0030] Improved multi-modal audio source channelization systems are configured to improve processing speed of denoising, echo removal, source separating, source localizing, beamforming operations, and/or to reduce the computational resources associated with applying machine learning models to such tasks, [Wherein the beamforming operation of Gao is dependent upon a target area of interest ([0065]), indicating the target area determination to be performed using machine learning models as part of the beamforming and/or denoising operations of Gao]); wherein the processing operation is during inference of the machine learning model ([0028] train and apply sophisticated machine learning models to both more efficiently and more effectively isolate targeted audio and ignore noise/defects from audio signal samples relative to manually configured techniques, [Defining trained models to be applied indicates the processing operation is applied during inference]). Regarding claim 19, Jungmaier discloses: a non-transitory computer-readable medium comprising instructions that, when executed by at least on processor ([0146] CPU 1502 executes instructions in an executable program stored, for example in a non-transitory computer readable storage medium), cause the at least on processor to: receive speech signals from one or more audio sources in a plurality of audio sources within the coverage area ([Fig. 1, Facility 20], [0019] detect commands issued by persons 50 located in the facility 20 [In view of Fig. 1 wherein the persons 50 would clearly be producing speech signals, i.e. commands, and the facility is within the coverage area 80]); and, generate a time-frequency representation of the speech signals from the at least one audio input device ([0106] In frequency domain beamforming the microphone signal is e.g. separated into narrowband frequency bins using a short time Fourier transform (STFT) and the data in each bin is processed separately, [A fourier transform representing frequency components over intervals of time indicates the transform to be a time-frequency representation]). Jungmaier does not disclose: define, by a machine learning model during an inference operation and based on the time-frequency representation, a target area that is a portion of the coverage area, the target area being definable by a trained criterion. Gao discloses: define, by a machine learning model during an inference operation and based on the time-frequency representation, a target area that is a portion of the coverage area ([0030] Improved multi-modal audio source channelization systems are configured to improve processing speed of denoising, echo removal, source separating, source localizing, beamforming operations, and/or to reduce the computational resources associated with applying machine learning models to such tasks, [Wherein the beamforming operation of Gao is dependent upon a target area of interest ([0065]), indicating the target area, i.e. beam(s), determination to be performed using machine learning models as part of the beamforming and/or denoising operations of Gao]). Jungmaier and Gao are considered analogous art within speech source separation/cancellation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier to incorporate the teachings of Gao, because of the novel way to improve multi-modal audio source channelization system through the reduction of computational resources required for these tasks by using machine learning models to perform said tasks (Gao, [0030]). Jungmaier further discloses: redefine, during a processing operation, the target area into a redefined target area ([0059] as a function at least of the direction of the persons determined based on the radar system 30, the microphone system 40 steers one (or more) audio beam 100 in the corresponding direction so as to enhance the reception of the audio inputs generated by the person, [Steering a beam in a new direction (in the situation of a non-constant person direction function) indicates that new direction to be a new, redefined target area]). Jungmaier in view of Lee does not disclose: the redefining including updating, in a time-frequency domain, phase parameters applied to the speech signals for the plurality of audio input devices based on updated sound-source information for the coverage area, the redefining reusing the time-frequency representation so as to avoid an additional inference event by the machine learning model such that negligible additional computational cost is incurred. Kitazawa discloses: the redefining including updating, in a time-frequency domain, phase parameters applied to the speech signals for the plurality of audio input devices based on updated sound-source information for the coverage area ([Col. 4, Lines 34-40] The detected relative positional relationship between the sound pickup unit 1010 and sound source is output to the phase regulator 1060. The relative positional relationship herein mentioned is, for example, the direction (angle) of a sound source with respect to the sound pickup unit 1010. The phase regulator 1060 performs phase regulation on the input frequency spectrum), the redefining reusing the time-frequency representation so as to avoid an additional inference event by the machine learning model such that negligible additional computational cost is incurred ([As disclosed in the cited section above, regulating the phase of a frequency spectrum (indicating the spectrum has an innate phase prior to regulation) tracks to the regulation being a form of updating phase parameters applied to speech signals, wherein the relative positional relationship of a pickup and output tracks to sound-source information. Further, as there is only one spectrum being analyzed, this indicated “reusing” the time-frequency representation for regulation of the phase of the spectrum]). Jungmaier, Lee, and Kitazawa are considered analogous art within sound source separation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Lee to incorporate the teachings of Kitazawa, because of the novel way to perform sound source separation even when the relative positions of a sound source and sound pickup device change through use of a phase regulator which regulated phases of received sound signals in accordance with determined position changes of transmitters/receivers, improving upon traditional sound source separation methods which cannot function in variable environments (Kitazawa, [Col. 2, Lines 40-67], [Col. 3, Lines 1-10]). Jungmaier further discloses: extract the speech signals from the target area using the time-frequency representation and the redefined target area ([0105] a microphone array is used to form a spatial filter which can extract a signal from a specific direction and reduce the contamination of the signals from other directions, [Wherein the beam in the specific direction is the redefined target area in view of the previously disclosed steering, i.e. redefining, beams of Jungmaier]); and transmit the speech signals to a receiver ([0020] The systems 30, 40 and the processing module 70 are operatively coupled together, [0040] each subsystem 90 includes an integrated circuit 92, one or more transmit antennas 94, one or more receive antennas 96, and a substrate 98. In some embodiments, the one or more transmit antennas 94 are arranged in a transmit antenna array, and/or the one or more receive antennas 96 are arranged in a receive antenna array, [Defining the speech subsystems to be containing receivers and transmitters, wherein those speech subsystems are connected to the processing module 70 (see Fig. 2), indicating a transmission of the speech signals gathered by the subsystems to the processing module, i.e. a receiver]). Claim(s) 4-5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jungmaier in view of Gao, further in view of Kitazawa, further in view of Ying et al. (US-20230333205-A1), hereinafter Ying. Regarding claim 4, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 3. Jungmaier in view of Gao, further in view of Kitazawa does not disclose: wherein the dimensions are defined in polar coordinates such that the target area is in the form of a circular sector. Ying discloses: wherein the dimensions are defined in polar coordinates such that the target area is in the form of a circular sector ([Fig. 3], [Fig. 5], [0116] In the H-plane, a location of the radar is an origin, an x-axis is a polar axis, and coordinates of the object in the plane may be indicated as (r.sub.1, α), where a indicates the azimuth. In the E-plane, a location of the radar may be an origin, a z-axis is a polar axis, and coordinates of the object may be indicated as (r.sub.2, β), where β indicates the pitch angle, [In view of the target area clearly being a circular sector as shown in Fig. 5]). Jungmaier, Gao, Kitazawa, and Ying are considered analogous art within sound source localization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Gao, further in view of Kitazawa to incorporate the teachings of Ying, because of the novel way to fuse a first angle with an incident angle to determine the location of a sound source relative to a microphone array regardless of whether or not the sound source is static or moving, improving the accuracy of voice data extraction (Ying, [0006]). Regarding claim 5, Jungmaier in view of Gao, further in view of Kitazawa, further in view of Ying discloses: the method of claim 4. Jungmaier further discloses: wherein the receiving the speech signals from the one or more audio sources in the plurality of audio sources within the coverage area is performed via the at least one audio input device ([0057] the microphone system 40 is configured to generate one or more audio beam 100 (FIG. 9A) via each of which the microphone system is configured to receive audio inputs. These audio inputs are adapted to include the vocal commands mentioned above). Claim(s) 8, 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jungmaier in view of Gao, further in view of Kitazawa, further in view of Poornachandran et al. (US-20160192073-A1), hereinafter Poornachandran. Regarding claim 8, Jungmaier in view of Gao, further in view of Kitazawa discloses: the method of claim 7. Jungmaier in view of Gao, further in view of Kitazawa does not disclose: wherein the redefining the target area toward the speech signals during the processing operation is performed via a phase shift to the at least one audio input device. Poornachandran discloses: wherein the redefining the target area toward the speech signals during the processing operation is performed via a phase shift to the at least one audio input device ([0015] the directionality of the microphone array 110 can be adjusted by shifting the phase of the received audio signals, [Shifting the directionality of a microphone array reasonably tracks to redefining a target area, i.e. that which the microphones are directed towards]). Jungmaier, Gao, Kitazawa, and Poornachandran are considered analogous art within speech source localization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Gao, further in view of Kitazawa to incorporate the teachings of Poornachandran, because of the novel way to continuously monitor background noise, filtering irrelevant noises to improve the quality of monitored background noise for notifications or other “noise” sounds that the wearer will want to hear (Poornachandran, [0002]). Regarding claim 17, Jungmaier in view of Gao, further in view of Kitazawa discloses: the system of claim 16. Jungmaier in view of Gao, further in view of Kitazawa does not disclose: wherein the target area is redefined by the machine learning model performing a phase shift to at least one audio input device in the plurality of audio input devices. Poornachandran discloses: wherein the target area is redefined by the machine learning model performing a phase shift to at least one audio input device in the plurality of audio input devices ([0015] the directionality of the microphone array 110 can be adjusted by shifting the phase of the received audio signals, [Shifting the directionality of a microphone array reasonably tracks to redefining a target area, i.e. that which the microphones are directed towards]). Jungmaier, Gao, Kitazawa, and Poornachandran are considered analogous art within speech source localization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Gao, further in view of Kitazawa to incorporate the teachings of Poornachandran, because of the novel way to continuously monitor background noise, filtering irrelevant noises to improve the quality of monitored background noise for notifications or other “noise” sounds that the wearer will want to hear (Poornachandran, [0002]). Regarding claim 18, Jungmaier in view of Gao, further in view of Kitazawa, further in view of Poornachandran discloses: the system of claim 17. Jungmaier further discloses: wherein the target area is defined as a circular sector centered about the plurality of audio input devices ([Fig. 1, Facility 20, Microphone System 40]). Poornachandran further discloses: wherein the phase shift is based on a phase difference between each audio input devices in the plurality of audio input devices ([0015] the directionality of the microphone array 110 can be adjusted by shifting the phase of the received audio signals and then adding the audio signals together, [Determining phase shifts in order for the signals to be added indicates each phase shift is based on a phase difference between audio devices in order to align the phases for the adding operation]), the phase difference being based on both an incident angle of each of the audio input devices in the plurality of audio input devices and a distance therebetween ([0015] processing the audio signals in such a way as to amplify certain components of the audio signal based on the relative position of the corresponding sound source… Processing the audio signals in this way creates a directional audio pattern such sounds received from some angles are more amplified compared to sounds received from other angles… Control over the directionality of the microphone array 110 will be determined, at least in part, by the number of microphones and their spatial arrangement on the electronic device 100, [Control of the microphone directionality, performed through phase shifting as previously disclosed, being dependent upon the spatial arrangement of microphones indicates an incident angle and distance are relevant parameters for phase shifting as they are both disclosed as factors for determining contribution/amplification to be applied to signals]). Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jungmaier in view of Gao, further in view of Kitazawa, further in view of Markovich-Golan et al. (US-10096328-B1), hereinafter Markovich-Golan. Regarding claim 20, Jungmaier in view of Gao, further in view of Kitazawa discloses: the non-transitory computer-readable medium of claim 19. Jungmaier further discloses: wherein the speech signals are acquired by a plurality of audio input devices ([0057] the microphone system 40 is configured to generate one or more audio beam 100 (FIG. 9A) via each of which the microphone system is configured to receive audio inputs. These audio inputs are adapted to include the vocal commands mentioned above). Jungmaier in view of Gao, further in view of Kitazawa does not disclose: wherein the target area is adjusted by performing a phase shift on the speech signals using a short-time Fourier transform representation of the plurality of audio input devices. Markovich-Golan discloses: wherein the target area is adjusted by performing a phase shift on the speech signals using a short-time Fourier transform representation of the plurality of audio input devices ([Col. 3, Lines 15-25] These beamforming filters scale and phase shift the signals from each of the microphones. Beamformer circuit 108 is configured to apply those weights to the signals received from each of the microphones, to generate a signal y(k) which is an estimate of the speech signal s(k) through the steered beam 120. The application of beamforming weights has the effect of focusing the array 106 on the current position of the speech source 102 and reducing the impacts of the noise sources 104. The signal estimate y(k) is transformed back to the time-domain using an inverse short time Fourier transform (ISTFT), [Col. 3, Lines 35-40] audio signals received from the microphones are transformed to the short time Fourier transform (STFT), [Transforming “back” into the time-domain using an ISTFT indicates a required STFT transformation before performing the operations above. Estimating a signal through a steered beam indicates the steering is a target area adjustment]). Jungmaier, Gao, Kitazawa, and Markovich-Golan are considered analogous art within sound source localization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jungmaier in view of Gao, further in view of Kitazawa to incorporate the teachings of Markovich-Golan, because of the novel way to apply beamforming weights to microphone arrays, enabling a beam to be steered to follow a moving speech source, resulting in improved quality of received speech in the presence of noise (Markovich-Golan, [Col. 1, Lines 55-67]). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Wichern et al. (US-20220101869-A1) discloses “The audio processing system includes a memory to store a neural network trained to process an audio mixture to output estimation of at least a subset of a set of audio sources present in the audio mixture. The audio sources are subject to hierarchical constraints enforcing a parent-children hierarchy on the set of audio sources, such that a parent audio source in includes a mixture of its one or multiple children audio sources. The subset includes a parent audio source and at least one of its children audio sources. The system further comprises a processor to process a received input audio mixture using the neural network to estimate the subset of audio sources and their mutual relationships according to the parent-children hierarchy. The system further includes an output interface configured to render the extracted audio sources and their mutual relationships” (abstract). See entire document. Sainath et al. (US-20160322055-A1) discloses “Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined” (abstract). See entire document. Hiroe et al. (US-20120263315-A1) discloses “There is provided a sound signal processing device, in which an observation signal analysis unit receives multi-channels of sound-signals acquired by a sound-signal input unit and estimates a sound direction and a sound segment of a target sound to be extracted and a sound source extraction unit receives the sound direction and the sound segment of the target sound and extracts a sound-signal of the target sound. By applying short-time Fourier transform to the incoming multi-channel sound-signals this device generates an observation signal in the time-frequency domain and detects the sound direction and the sound segment of the target sound. Further, based on the sound direction and the sound segment of the target sound, this device generates a reference signal corresponding to a time envelope indicating changes of the target's sound volume in the time direction, and extracts the signal of the target sound, utilizing the reference signal” (abstract). See entire document. Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE JOHN WITHEY whose telephone number is (703)756-1754. The examiner can normally be reached Monday - Friday, 8am-5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /THEODORE WITHEY/Examiner, Art Unit 2655 /ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action

Prosecution Timeline

Mar 06, 2024
Application Filed
Sep 24, 2025
Non-Final Rejection — §103
Dec 18, 2025
Interview Requested
Dec 31, 2025
Response Filed
Mar 04, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12591744
METHOD FOR TRAINING SEMANTIC REPRESENTATION MODEL, DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
Patent 12536994
APPARATUS FOR CLASSIFYING SOUNDS BASED ON NEURAL CODE IN SPIKING NEURAL NETWORK AND METHOD THEREOF
2y 5m to grant Granted Jan 27, 2026
Patent 12475330
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Nov 18, 2025
Patent 12417759
SPEECH RECOGNITION USING CADENCE PATTERNS
2y 5m to grant Granted Sep 16, 2025
Patent 12412580
Sound Extraction System and Sound Extraction Method
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
90%
With Interview (+46.9%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 23 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month