Last updated: April 19, 2026

Application No. 18/719,004

METHOD OF OPERATING AN AUDIO DEVICE SYSTEM AND AN AUDIO DEVICE SYSTEM

Non-Final OA §103

Filed

Jun 12, 2024

Examiner

GODBOLD, DOUGLAS

Art Unit

2655

Tech Center

2600 — Communications

Assignee

Widex A/S

OA Round

1 (Non-Final)

Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1079 resolved cases, 2023–2026

Examiner Intelligence

GODBOLD, DOUGLAS View full profile →

Grants 83% — above average

Career Allow Rate

898 granted / 1079 resolved

+21.2% vs TC avg

Moderate +10% lift

Without

With

+10.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

25 currently pending

Career history

1104

Total Applications

across all art units

Statute-Specific Performance

§101

15.0%

-25.0% vs TC avg

§103

46.3%

+6.3% vs TC avg

§102

19.6%

-20.4% vs TC avg

§112

8.6%

-31.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1079 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 12 June 2024 in reference to application 18/718004.  Claims 1-12 are pending and have been examined.


Specification
The abstract of the disclosure is objected to because it is less than 50 words and fails to describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details .  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a sound source signal separator,” and “a speech content comparator” in claim 11.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aratsu et al. (US PAP 2014/0172426) in view of Pedersen et al. (US PAP 2022/0295191).

Consider claim 1, Aratsu teaches a method of operating an audio device system (abstract) comprising the steps of:
a) providing a plurality of sound source signals each from a sound source of a present sound environment (0099, collecting sound signals from sources within environment); 
b) comparing the speech content of each of said plurality of sound source signals (0099, performing comparisons such as speech verification to detect speech signals); 
c) detecting, based on said comparison, at least one conversation signal (0099-0100, 0102, grouping speech signals by speaker, i.e. creating a conversation signal); 
d) enabling a user of the audio device to select a detected conversation signal (0107-0110, allowing users to select icons associated with each speech conversation signal); and 
e) providing an audio output, wherein the contribution to the audio output from the sound source signals not comprised in the selected conversation signal is suppressed compared to the contribution from the selected conversation signal (0107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15).
Aratsu does not specifically teach
b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals; 
c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation; 
In the same field of selective speech enhancement, Pedersen teaches 
b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals (0052-59, 0242-45, comparing start and stop times of speakers.); 
c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation (0052-59, 0242-45, comparing start and stop times of speakers to determine if they are in a conversation., combining signals based on determination);
It would have been obvious to one of ordinary skill in the art at the time of effective filing compare speech signals to determine distinct conversations as taught by Pedersen in the system of Aratsu in order for the system to more efficiently allow selection of relevant conversation data (Pedersen 0004-07). 

Consider claim 2, Aratsu and Pedersen teach The method according to claim 1. Pedersen further teaches wherein the step of providing a plurality of sound source signals each from a sound source of a present sound environment comprises the further steps of:
- using an encoder-decoder neural network that has been obtained by feeding a mixed audio signal comprising a plurality of speech signals and a plurality of noise signals to the neural network and subsequently train the neural network to provide only said plurality of speech signals (OPTIONAL LIMITATION); or 
– using a plurality of beam formers each adapted to point in a desired direction different from the other beam formers (Figures 3A-3B, 0257-64, using microphone arrays to detect areas with speech and isolating sources using beamforming techniques, i.e. spatial filtering).
It would have been obvious to one of ordinary skill in the art at the time of effective filing to use the beamforming as taught by Pedersen in the system of Aratsu and Pedersen in order to allow for better separation of multiple sound sources.

Consider claim 3, Pedersen teaches The method according to claim 2, wherein the step of using a plurality of beam formers each adapted to point in a desired direction different from the other beam formers comprise the further step of:
- determining that a beam former is pointing in a desired direction if speech is detected in the beam former output signal (0260, using VAD to determine if speech is in a location).

Consider claim 4, Aratsu teaches The method according to claim 1, wherein the step of enabling a user of the audio device to select a detected conversation signal is carried out by: 
- providing an audio output based on a first out of said at least one conversation signals (0107-110,  signals may be muted or reduced, and select signals are enhanced, also see figure 2c and 0114-15); and 
- enabling the user to select a conversation signal by toggling between detected conversation signals in response to carrying out a predetermined interaction with the audio device system (0107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15, and selections may be altered by user).

Consider claim 5, Aratsu teaches the method according to claim 4, wherein the predetermined interaction is selected from at least one of: making a specific head movement, tapping an audio device of the audio device system, operating an audio device control means, speaking a control word and operating a graphical user interface of the audio device system (0107-110, users may select signals to mute or reduce, and select signals to enhance via GUI, see figure 2A-2C for example).

Consider claim 6, Pedersen teaches The method according to claim 1, wherein said step of comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals comprises at least one of:
i) assigning a numerical representation to at least some of the words comprised in each of said plurality of provided sound source signals and providing a word embedding similarity measure for estimating the similarity between each of said plurality of provided sound source signals (OPTIONAL LIMITATION); and 
ii) determining the timing of speech endings and speech onsets for each of said plurality of provided sound source signals (0052-59, 0242-45, comparing start and stop times of speakers to determine if they are in a conversation) and 
subsequently match sound source signals for which speech onset for one speech signal is within a predetermined duration after speech ending for another sound source signal (0052-59, 0242-45, comparing start and stop times of speakers that do not overlap to determine if they are in a conversation); and 
iii) assigning a numerical representation to at least one of syntactic and semantic information comprised in each of said plurality of provided sound source signals and providing at least one of a syntactic and a semantic similarity measure in order to estimate the similarity between each of said plurality of provided sound source signals (OPTIONAL LIMITATION).

Consider claim 7, Pedersen teaches The method according to claim 1, wherein the step of detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation comprises at least one of the steps of:
i) detecting at least one conversation signal comprising at least two sound source signals having a word embedding similarity measure score that is above a first predetermined threshold (OPTIONAL LIMITATION); 
ii) detecting at least one conversation signal comprising at least two sound source signals for which one of said sound source signals has a speech onset within a predetermined duration after a speech ending of another of said sound source signals (0052-59, 0242-45, comparing start and stop times of speakers that do not overlap to determine if they are in a conversation);
iii) detecting at least one conversation signal comprising at least two sound source signals having a semantic similarity measure score or a syntactic similarity measure score that is above a second or a third predetermined threshold (OPTIONAL LIMITATION); and 
iv) detecting at least one conversation signal comprising at least two sound source signals having a combined score that is above a fourth predetermined threshold, wherein the combined score is obtained by combining at least two of: the word embedding similarity measure score, the semantic similarity measure score, the syntactic similarity measure score, a sound pressure level score reflecting the strength of said at least two sound source signals and a previous participant score reflecting how often the speakers representing said at least two sound source signals have previously participated in a conversation with each other (OPTIONAL LIMITATION).

Consider claim 8, Aratsu teaches The method according to claim 1, wherein the step of providing an audio output based on a selected conversation signal, wherein the contribution to the audio output from the sound source signals not comprised in the selected conversation signal is suppressed compared to the contribution from the conversation signal, comprises at least one of the steps of:
- suppressing the contribution to the audio output from the sound source signals not comprised in the selected conversation signal such that the combined level is in the range between 3 and 24 dB or between 6 and 18 dB below the selected conversation signal level (OPTIONAL LIMITATION);
- enabling the user to control the ratio between the conversation signal level and the combined level of the sound source signals not comprised in the selected conversation signal (107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15, and selections may be altered by user, and thus may effectively control the ratio between enhanced and reduced signals).

Consider claim 9, Aratsu and Pedersen teach The method according to claim 1. Pedersen further teaches wherein the steps d) and c) are only carried out if an estimate of the sound quality of the provided plurality of sound source signals is above a predetermined fifth threshold ( 0021-27, voice detection used, which compares probabilities to voice-no voice decision thresholds. Also see 0260).
It would have been obvious to one of ordinary skill in the art at the time of effective filing to use VAD as taught by Pedersen in the system of Aratsu and Pedersen in order to allow for better isolation of speech from noise.

Consider claim 10, Aratsu and Pedersen teach The method according to claim 1. Pedersen further teaches the further step of processing the plurality of sound source signals in order to compensate a hearing loss (0224-27, hearing aid system which compensates hearing loss).
It would have been obvious to one of ordinary skill in the art at the time of effective filing compensate for hearing loss as taught by Pedersen in the system of Aratsu and Pedersen in order to allow the system to be adapted to a user’s particular hearing needs (Pedersen 0226).

Consider claim 11, Aratsu teaches An audio device system (abstract) comprising at least one audio device, wherein said at least one audio device comprises an acoustical-electrical input transducer block (0099, microphone) and an electrical-acoustical output transducer (0114, headphones), and wherein said audio device system further comprises:
- a sound source signal separator adapted to receive an input signal from said acoustical-electrical input transducer block and to provide a plurality of sound source signals each representing a sound source of a present sound environment; (0099, collecting sound signals from sources within environment); 
- a speech content comparator adapted to compare the speech content of each of said plurality of sound source signals (0099, performing comparisons such as speech verification to detect speech signals), and adapted to detect, based on said comparison, at least one conversation signal (0099-0100, 0102, grouping speech signals by speaker, i.e. creating a conversation signal); 
- a user interface (405) adapted to enable a user to select a detected conversation signal; (0107-0110, allowing users to select icons associated with each speech conversation signal); and 
a digital signal processor adapted to process and combine the provided plurality of sound source signals signal in order to provide an output signal, wherein the contribution to the output signal from the sound source signals not comprised in the selected conversation signal is suppressed compared to the contribution from the conversation signal (0107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15); and
- an electrical-acoustical output transducer (406) configured to receive the output signal and provide an audio output (0114, headphones).
Aratsu does not specifically teach
b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals; 
c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation; 
In the same field of selective speech enhancement, Pedersen teaches 
b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals (0052-59, 0242-45, comparing start and stop times of speakers.); 
c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation (0052-59, 0242-45, comparing start and stop times of speakers to determine if they are in a conversation., combining signals based on determination);
It would have been obvious to one of ordinary skill in the art at the time of effective filing compare speech signals to determine distinct conversations as taught by Pedersen in the system of Aratsu in order for the system to more efficiently allow selection of relevant conversation data (Pedersen 0004-07).

Claims 12 contains similar limitations as claim 10 and is therefore rejected for the same reasons. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Linton et al. (US Patent 11,257,510) and Sabin et al. (US PAP 2020/0128322) both teach conversation isolation systems.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655

Read full office action

Prosecution Timeline

Jun 12, 2024

Application Filed

Jan 14, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/338,075

Patent 12585879

ARTIFICIAL INTELLIGENCE ASSISTED NETWORK OPERATIONS REPORTING AND MANAGEMENT

2y 5m to grant Granted Mar 24, 2026

18/327,780

Patent 12579371

USING MACHINE LEARNING TO GENERATE SEGMENTS FROM UNSTRUCTURED TEXT AND IDENTIFY SENTIMENTS FOR EACH SEGMENT

2y 5m to grant Granted Mar 17, 2026

18/489,671

Patent 12579372

KEY PHRASE TOPIC ASSIGNMENT

2y 5m to grant Granted Mar 17, 2026

18/492,524

Patent 12579383

VERIFYING TRANSLATIONS OF SOURCE TEXT IN A SOURCE LANGUAGE TO TARGET TEXT IN A TARGET LANGUAGE

2y 5m to grant Granted Mar 17, 2026

18/232,485

Patent 12572749

Compressing Information Provided to a Machine-Trained Model Using Abstract Tokens

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

83%

Grant Probability

94%

With Interview (+10.5%)

2y 10m

Median Time to Grant

Low

PTA Risk

Based on 1079 resolved cases by this examiner. Grant probability derived from career allow rate.

METHOD OF OPERATING AN AUDIO DEVICE SYSTEM AND AN AUDIO DEVICE SYSTEM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email