Last updated: April 19, 2026
Application No. 18/638,588
COORDINATION OF AUDIO DEVICES

Non-Final OA §103§DP
Filed
Apr 17, 2024
Examiner
TESHALE, AKELAW
Art Unit
2694
Tech Center
2600 — Communications
Assignee
Dolby International AB
OA Round
1 (Non-Final)
Interview Optional

— +15.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 834 resolved cases, 2023–2026
Examiner Intelligence

TESHALE, AKELAW View full profile →
Grants 82% — above average
Career Allow Rate
687 granted / 834 resolved
+20.4% vs TC avg
Strong +16% interview lift
Without
With
+15.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
33 currently pending
Career history
867
Total Applications
across all art units
Statute-Specific Performance

§101
7.5%
-32.5% vs TC avg
§103
41.0%
+1.0% vs TC avg
§102
35.4%
-4.6% vs TC avg
§112
6.2%
-33.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 834 resolved cases
Office Action

§103 §DP
DETAILED ACTION

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claims 1-25 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-24 of U.S. Patent No. 12,003,673 B2. Although the conflicting claims are not identical, they are not patentably distinct from each other because claims in the continuations are broader than the ones in patent, broad claims in the continuation application are rejected previously patented narrow claims. For example, claim 1 of the present invention is the same as claim 1 of U.S Patent except that “determining a closest loudspeaker-equipped audio device that is closest to the microphone location closest to the estimated current location of the person”. Therefore, claim 1 of the present invention is broader than the patented claim 1.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claims 1-25 are rejected under 35 U.S.C. 103 as being unpatentable over U.S Patent No.10,229,698 B1 to Chhetri in view of U.S Pub. No. 2019/0124462 A1 to Lindahl.



Regarding claim 1, Chhetri teaches an audio session management method, comprising:
receiving output signals from each microphone of a plurality of microphones in an
audio environment, each microphone of the plurality of microphones residing in a microphone location of the audio environment, the output signals including signals corresponding to a current utterance of a person (column 3, lines 41-65; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118);
determining one or more types of audio processing changes to apply to audio data being rendered to loudspeaker feed signals for the two or more audio devices, the audio processing changes having an effect of increasing a speech to echo ratio at one or more microphones of the plurality of microphones, wherein the one or more types of audio processing changes involve spectral modification (column 3, lines 41-65 and column 10, lines 24-44; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118. The echo signals y(n) 120 may be referred to as input audio data and may include a representation of the audible sound output by the loudspeakers 114 and/or a representation of speech input. In some examples, the echo signals y(n) 120 may be combined to generate combined echo signals y(n) 120 (e.g., combined input audio data); and
causing the one or more types of audio processing changes to be applied (column 3, lines 41-65 and column 5, lines 17-33; playback reference logic 103 may determine a frequency offset between the modified reference signals 112 and the echo signals 120 and may add/drop samples of the modified reference signals and/or the echo signals 120 to compensate for the frequency offset).
However, Chhetri does not teach determining, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations; selecting two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker. 
Lindahl discloses determining, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations (paragraphs [0034] and [0070]; decision logic detects a location of a speaker using at least one of a visual sensor or an audio sensor; and generates a microphone beam (from a plurality of microphone signals) in a direction of the location of the speaker to capture the speaker's speech); selecting two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker  (Abstract , paragraphs [0021], [0027], [0028] and [0031];  rendering mode selection is made by decision logic 8. The decision logic 8 may be implemented as a programmed processor, e.g., by sharing the rendering processor 7 or by the programming of a different processor, executing a program that based on certain inputs, makes a decision as to which sound rendering mode to use, for a given piece of sound program content that is being or is to be played back, in accordance with which the rendering processor 7 will drive the loudspeaker drivers 3 (during playback of the piece of sound program content to produce the desired beams). More generally, the selected sound rendering mode can be changed during the playback automatically based on, as explained further below, analysis performed by the decision logic 8).
At the time of the effective filing date of the invention, it would have been obvious to a person of ordinary skilled in the art to modify Chhetri’s teaching with a feature of determining, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations; selecting two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker as taught by Lindahl in order to create around the listener a sound field whose spatial distribution more closely approximates that of the original recording environment (paragraph [0003], Lindahl).

Regarding claim 2, Chhetri teaches the audio session management method of claim 1, wherein at least one of the audio processing changes for a first audio device is different from an audio processing change for a second audio device (Fig. 1A and column 3, lines 41-65; the first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118).

Regarding claim 3, Chhetri teaches the audio session management method of claim 1, wherein the spectral modification involves reducing a level of audio data in a frequency band between 500 Hz and 25 3 KHz (column 18, lines 28-45; device 102 may determine an existence and/or location associated with the wireless loudspeakers using a frequency range (e.g., 1 kHz to 3 kHz), although the disclosure is not limited thereto. In some examples, the device 102 may determine an existence and location of the wireless loudspeakers using the fixed beamform outputs, may select a portion of the fixed beamform outputs as the target signals and may select a portion of adaptive beamform outputs corresponding to the wireless loudspeakers as the reference signals).

Regarding claim 4, Chhetri teaches the audio session management method of claim 1, wherein the one or more types of audio processing changes cause a reduction in loudspeaker reproduction level for the at least one loudspeaker of the two or more audio devices (column 11, line 65 through column 12, line 10; system 100 to improve steady-state error, reduce a sensitivity to local speech disturbance and improve a convergence rate of the MC-AEC 108a. For example, the step-size value μ may be increased when the error signal 128 increases (e.g., the echo signal 120 and the estimated echo signal 126 diverge) to increase a convergence rate and reduce a convergence period. Similarly, the step-size value μ may be decreased when the error signal 128 decreases (e.g., the echo signal 120 and the estimated echo signal 126 converge) to reduce a rate of change in the transfer functions and therefore more accurately estimate the estimated echo signal 126).

Regarding claim 5, Chhetri teaches the audio session management method of claim 1, wherein selecting two or more audio devices of the audio environment comprises selecting N loudspeaker-equipped audio devices of the audio environment, N being an integer greater than 2 (column 27, lines 12-34; when there is no signal activity in the look direction, the step-size may be increased to achieve a larger value so that weight adaptation continues normally. The step-size may be greater than 0, and may be limited to a maximum value. Thus, the system may be configured to determine when there is an active source (e.g., a speaking user) in the look-direction. The system may perform this determination with a frequency that depends on the adaptation step size).

Regarding claim 6, Chhetri teaches the audio session management method of claim 1, wherein selecting the two or more audio devices of the audio environment is based, at least in part, on an estimated current location of the person relative to at least one of a microphone location or a loudspeaker-equipped audio device location (column 16, lines 53-60; while beamforming alone may increase a signal-to-noise (SNR) ratio of an audio signal, combining known acoustic characteristics of an environment (e.g., a room impulse response (RIR)) and heuristic knowledge of previous beampattern lobe selection may provide an even better indication of a speaking user's likely location within the environment).

Regarding claim 7, Chhetri teaches the audio session management method of claim 1, wherein the one or more types of audio processing changes involve changing a rendering process to warp a rendering of audio signals away from the estimated current location of the person (column 16, lines 53-60; while beamforming alone may increase a signal-to-noise (SNR) ratio of an audio signal, combining known acoustic characteristics of an environment (e.g., a room impulse response (RIR)) and heuristic knowledge of previous beampattern lobe selection may provide an even better indication of a speaking user's likely location within the environment).
Regarding claim 8, Chhetri teaches the audio session management method of claim 1, wherein the one or more types of audio processing changes involve inserting at least one gap into at least one selected frequency band of an audio playback signal (column 2, lines 38-47; loudspeaker may be modified based on compression/decompression during wireless communication, resulting in a different signal being received by the loudspeaker than was sent to the loudspeaker. A third case is non-linear post-processing performed on the received signal by the loudspeaker prior to playing the received signal).

Regarding claim 9, Chhetri teaches the audio session management method of claim 1, wherein the one or more types of audio processing changes involve dynamic range compression (column 2, lines 38-47; loudspeaker may be modified based on compression/decompression during wireless communication, resulting in a different signal being received by the loudspeaker than was sent to the loudspeaker. A third case is non-linear post-processing performed on the received signal by the loudspeaker prior to playing the received signal).

Regarding claim 10, Chhetri teaches the audio session management method of claim 1, wherein selecting the two or more audio devices is based, at least in part, on a signal-to-echo ratio estimation for one or more microphone locations (column 16, lines 5-16; spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. The increased selectivity of the beampattern improves signal-to-noise ratio for the audio signal. By improving the signal-to-noise ratio, the accuracy of speaker recognition performed on the audio signal is improved).

Regarding claim 11, Chhetri teaches the audio session management method of claim 10, wherein selecting the two or more audio devices is based, at least in part, on determining whether the signal-to-echo ratio estimation is less than or equal to a signal-to-echo ratio threshold (column 16, lines 5-16; spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. The increased selectivity of the beampattern improves signal-to-noise ratio for the audio signal. By improving the signal-to-noise ratio, the accuracy of speaker recognition performed on the audio signal is improved).

Regarding claim 12, Chhetri teaches the audio session management method of claim 10, wherein determining the one or more types of audio processing changes is based on an optimization of a cost function that is based, at least in part, on the signal-to-echo ratio estimation (column 16, lines 5-16; spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. The increased selectivity of the beampattern improves signal-to-noise ratio for the audio signal. By improving the signal-to-noise ratio, the accuracy of speaker recognition performed on the audio signal is improved).

Regarding claim 13, Chhetri teaches the audio session management method of claim 12, wherein the cost function is based, at least in part, on rendering performance.

Regarding claim 14, Chhetri teaches the audio session management method of claim 1, wherein selecting the two or more audio devices is based, at least in part, on a proximity estimation (paragraphs [0034] and [0070]; decision logic detects a location of a speaker using at least one of a visual sensor or an audio sensor; and generates a microphone beam (from a plurality of microphone signals) in a direction of the location of the speaker to capture the speaker's speech).

Regarding claim 15, Chhetri teaches the audio session management method of claim 1, further comprising: determining multiple current acoustic features from the output signals of each microphone; applying a classifier to the multiple current acoustic features, wherein applying the classifier involves applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the person in a plurality of user zones in the audio environment; and wherein determining one or more aspects of context information relating to the person involves determining, based at least in part on output from the classifier, an estimate of a user zone in which the person is currently located paragraphs [0034] and [0070]; decision logic detects a location of a speaker using at least one of a visual sensor or an audio sensor; and generates a microphone beam (from a plurality of microphone signals) in a direction of the location of the speaker to capture the speaker's speech).

Regarding claim 16, Chhetri teaches the audio session management method of claim 15, wherein the estimate of the user zone is determined without reference to geometric locations of the plurality of microphones (paragraphs [0034] and [0070]; decision logic detects a location of a speaker using at least one of a visual sensor or an audio sensor; and generates a microphone beam (from a plurality of microphone signals) in a direction of the location of the speaker to capture the speaker's speech).

Regarding claim 17, Chhetri teaches the audio session management method of claim 15, wherein the current utterance and the previous utterances comprise wakeword utterances (column 12, lines 33-47; the device 102 may use the audio outputs 129 to perform speech recognition processing on the speech to determine a command and may execute the command. For example, the device 102 may determine that the speech corresponds to a command to play music and the device 102 may play music in response to receiving the speech).

Regarding claim 18, Chhetri teaches the audio session management method of claim 1, further comprising selecting at least one microphone according to the one or more aspects of the context information (column 16, lines 53-60; while beamforming alone may increase a signal-to-noise (SNR) ratio of an audio signal, combining known acoustic characteristics of an environment (e.g., a room impulse response (RIR)) and heuristic knowledge of previous beampattern lobe selection may provide an even better indication of a speaking user's likely location within the environment).

Regarding claim 19, Chhetri teaches the audio session management method of claim 1, wherein the one or more microphones reside in multiple audio devices of the audio environment  (column 3, lines 41-65; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118).

Regarding claim 20, Chhetri teaches the audio session management method of claim 1, wherein the one or more microphones reside in a single audio device of the audio environment  (column 3, lines 41-65; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118).

Regarding claim 21, Chhetri teaches the audio session management method of claim 1, wherein at least one of the one or more microphone locations corresponds to multiple microphones of a single audio device (column 3, lines 41-65; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118).

Regarding claim 22, Chhetri teaches One or more non-transitory media having software stored thereon, the software including instructions for controlling one or more devices to perform an audio session management method, the audio session management method comprising: receiving output signals from each microphone of a plurality of microphones in an audio environment, each microphone of the plurality of microphones residing in a microphone location of the audio environment, the output signals including signals corresponding to a current utterance of a person  (column 3, lines 41-65; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118);
determining one or more types of audio processing changes to apply to audio data being rendered to loudspeaker feed signals for the two or more audio devices, the audio processing changes having an effect of increasing a speech to echo ratio at one or more microphones of the plurality of microphones, wherein the one or more types of audio processing changes involve spectral modification (column 3, lines 41-65 and column 10, lines 24-44; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118. The echo signals y(n) 120 may be referred to as input audio data and may include a representation of the audible sound output by the loudspeakers 114 and/or a representation of speech input. In some examples, the echo signals y(n) 120 may be combined to generate combined echo signals y(n) 120 (e.g., combined input audio data); and
causing the one or more types of audio processing changes to be applied (column 3, lines 41-65 and column 5, lines 17-33; playback reference logic 103 may determine a frequency offset between the modified reference signals 112 and the echo signals 120 and may add/drop samples of the modified reference signals and/or the echo signals 120 to compensate for the frequency offset).

However, Chhetri does not teach determining, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations; selecting two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker.
Lindahl discloses determining, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations (paragraphs [0034] and [0070]; decision logic detects a location of a speaker using at least one of a visual sensor or an audio sensor; and generates a microphone beam (from a plurality of microphone signals) in a direction of the location of the speaker to capture the speaker's speech); selecting two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker (Abstract , paragraphs [0021], [0027], [0028] and [0031]; rendering mode selection is made by decision logic 8. The decision logic 8 may be implemented as a programmed processor, e.g., by sharing the rendering processor 7 or by the programming of a different processor, executing a program that based on certain inputs, makes a decision as to which sound rendering mode to use, for a given piece of sound program content that is being or is to be played back, in accordance with which the rendering processor 7 will drive the loudspeaker drivers 3 (during playback of the piece of sound program content to produce the desired beams). More generally, the selected sound rendering mode can be changed during the playback automatically based on, as explained further below, analysis performed by the decision logic 8).
At the time of the effective filing date of the invention, it would have been obvious to a person of ordinary skilled in the art to modify Chhetri’s teaching with a feature of determining, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations; selecting two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker as taught by Lindahl in order to create around the listener a sound field whose spatial distribution more closely approximates that of the original recording environment (paragraph [0003], Lindahl).

Regarding claim 23, Chhetri teaches the one or more non-transitory media of claim 22, wherein at least one of the audio processing changes for a first audio device is different from an audio processing change for a second audio device (Fig. 1A and column 3, lines 41-65; the first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118).

Regarding claim 24, Chhetri teaches an apparatus, comprising:
an interface system; and a control system configured to: receive, via the interface system, output signals from each microphone of a plurality of microphones in an audio environment, each microphone of the plurality of microphones residing in a microphone location of the audio environment, the output signals including signals corresponding to a current utterance of a person (column 3, lines 41-65; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118);
determine one or more types of audio processing changes to apply to audio data being rendered to loudspeaker feed signals for the two or more audio devices, the audio processing changes having an effect of increasing a speech to echo ratio at one or more microphones of the plurality of microphones, wherein the one or more types of audio processing changes involve spectral modification (column 3, lines 41-65 and column 10, lines 24-44; The first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118. The echo signals y(n) 120 may be referred to as input audio data and may include a representation of the audible sound output by the loudspeakers 114 and/or a representation of speech input. In some examples, the echo signals y(n) 120 may be combined to generate combined echo signals y(n) 120 (e.g., combined input audio data); and
cause the one or more types of audio processing changes to be applied (column 3, lines 41-65 and column 5, lines 17-33; playback reference logic 103 may determine a frequency offset between the modified reference signals 112 and the echo signals 120 and may add/drop samples of the modified reference signals and/or the echo signals 120 to compensate for the frequency offset).
However, Chhetri does not teach determine, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations; select two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker.
Lindahl discloses determine, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations (paragraphs [0034] and [0070]; decision logic detects a location of a speaker using at least one of a visual sensor or an audio sensor; and generates a microphone beam (from a plurality of microphone signals) in a direction of the location of the speaker to capture the speaker's speech); select two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker (Abstract , paragraphs [0021], [0027], [0028] and [0031];  rendering mode selection is made by decision logic 8. The decision logic 8 may be implemented as a programmed processor, e.g., by sharing the rendering processor 7 or by the programming of a different processor, executing a program that based on certain inputs, makes a decision as to which sound rendering mode to use, for a given piece of sound program content that is being or is to be played back, in accordance with which the rendering processor 7 will drive the loudspeaker drivers 3 (during playback of the piece of sound program content to produce the desired beams). More generally, the selected sound rendering mode can be changed during the playback automatically based on, as explained further below, analysis performed by the decision logic 8).
At the time of the effective filing date of the invention, it would have been obvious to a person of ordinary skilled in the art to modify Chhetri’s teaching with a feature of determine, based on the output signals, one or more aspects of context information relating to the person, the context information including at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations; select two or more audio devices of the audio environment based, at least in part, on the one or more aspects of the context information, the two or more audio devices each including at least one loudspeaker as taught by Lindahl in order to create around the listener a sound field whose spatial distribution more closely approximates that of the original recording environment (paragraph [0003], Lindahl).

Regarding claim 25, Chhetri teaches the apparatus of claim 24, wherein at least one of the audio processing changes for a first audio device is different from an audio processing change for a second audio device (Fig. 1A and column 3, lines 41-65; the first wireless loudspeaker 114a outputs first audio z.sub.1(n) 116a and the second wireless loudspeaker 114b outputs second audio z.sub.2(n) 116b in a room 10 (e.g., an environment), and portions of the output sounds are captured by a pair of microphones 118a and 118b as “echo” signals y.sub.1(n) 120a and y.sub.2(n) 120b (e.g., input audio data), which contain some of the reproduced sounds from the reference signals x.sub.1(n) 112a and x.sub.2(n) 112b, in addition to any additional sounds (e.g., speech) picked up by the microphones 118).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AKELAW A TESHALE whose telephone number is (571)270-5302. The examiner can normally be reached 9 am -6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FAN TSANG can be reached at (571) 272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

AKELAW TESHALE
Primary Examiner
Art Unit 2694



/AKELAW TESHALE/Primary Examiner, Art Unit 2694
Read full office action
Prosecution Timeline

Apr 17, 2024
Application Filed
Feb 12, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/477,039
Patent 12598261
WIDEBAND DOUBLETALK DETECTION FOR OPTIMIZATION OF ACOUSTIC ECHO CANCELLATION
2y 5m to grant Granted Apr 07, 2026
18/501,134
Patent 12598253
SYSTEMS AND METHODS FOR MEDIA ANALYSIS FOR CALL STATE DETECTION
2y 5m to grant Granted Apr 07, 2026
18/340,620
Patent 12589700
HOLDING APPARATUS AND METHOD FOR HOLDING A MOBILE DEVICE
2y 5m to grant Granted Mar 31, 2026
18/557,434
Patent 12574665
DATA PROCESSING METHOD, OUTDOOR UNIT, INDOOR UNIT AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/196,306
Patent 12563346
FLEXIBLE ELECTRONIC DEVICE AND METHOD FOR ADJUSTING SOUND OUTPUT THEREOF
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
98%
With Interview (+15.6%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 834 resolved cases by this examiner. Grant probability derived from career allow rate.