Prosecution Insights
Last updated: April 19, 2026
Application No. 18/719,004

METHOD OF OPERATING AN AUDIO DEVICE SYSTEM AND AN AUDIO DEVICE SYSTEM

Non-Final OA §103
Filed
Jun 12, 2024
Examiner
GODBOLD, DOUGLAS
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Widex A/S
OA Round
1 (Non-Final)
83%
Grant Probability
Favorable
1-2
OA Rounds
2y 10m
To Grant
94%
With Interview

Examiner Intelligence

Grants 83% — above average
83%
Career Allow Rate
898 granted / 1079 resolved
+21.2% vs TC avg
Moderate +10% lift
Without
With
+10.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
1104
Total Applications
across all art units

Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
19.6%
-20.4% vs TC avg
§112
8.6%
-31.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1079 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This Office Action is in response to correspondence filed 12 June 2024 in reference to application 18/718004. Claims 1-12 are pending and have been examined. Specification The abstract of the disclosure is objected to because it is less than 50 words and fails to describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details . A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b). Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “a sound source signal separator,” and “a speech content comparator” in claim 11. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aratsu et al. (US PAP 2014/0172426) in view of Pedersen et al. (US PAP 2022/0295191). Consider claim 1, Aratsu teaches a method of operating an audio device system (abstract) comprising the steps of: a) providing a plurality of sound source signals each from a sound source of a present sound environment (0099, collecting sound signals from sources within environment); b) comparing the speech content of each of said plurality of sound source signals (0099, performing comparisons such as speech verification to detect speech signals); c) detecting, based on said comparison, at least one conversation signal (0099-0100, 0102, grouping speech signals by speaker, i.e. creating a conversation signal); d) enabling a user of the audio device to select a detected conversation signal (0107-0110, allowing users to select icons associated with each speech conversation signal); and e) providing an audio output, wherein the contribution to the audio output from the sound source signals not comprised in the selected conversation signal is suppressed compared to the contribution from the selected conversation signal (0107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15). Aratsu does not specifically teach b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals; c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation; In the same field of selective speech enhancement, Pedersen teaches b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals (0052-59, 0242-45, comparing start and stop times of speakers.); c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation (0052-59, 0242-45, comparing start and stop times of speakers to determine if they are in a conversation., combining signals based on determination); It would have been obvious to one of ordinary skill in the art at the time of effective filing compare speech signals to determine distinct conversations as taught by Pedersen in the system of Aratsu in order for the system to more efficiently allow selection of relevant conversation data (Pedersen 0004-07). Consider claim 2, Aratsu and Pedersen teach The method according to claim 1. Pedersen further teaches wherein the step of providing a plurality of sound source signals each from a sound source of a present sound environment comprises the further steps of: - using an encoder-decoder neural network that has been obtained by feeding a mixed audio signal comprising a plurality of speech signals and a plurality of noise signals to the neural network and subsequently train the neural network to provide only said plurality of speech signals (OPTIONAL LIMITATION); or – using a plurality of beam formers each adapted to point in a desired direction different from the other beam formers (Figures 3A-3B, 0257-64, using microphone arrays to detect areas with speech and isolating sources using beamforming techniques, i.e. spatial filtering). It would have been obvious to one of ordinary skill in the art at the time of effective filing to use the beamforming as taught by Pedersen in the system of Aratsu and Pedersen in order to allow for better separation of multiple sound sources. Consider claim 3, Pedersen teaches The method according to claim 2, wherein the step of using a plurality of beam formers each adapted to point in a desired direction different from the other beam formers comprise the further step of: - determining that a beam former is pointing in a desired direction if speech is detected in the beam former output signal (0260, using VAD to determine if speech is in a location). Consider claim 4, Aratsu teaches The method according to claim 1, wherein the step of enabling a user of the audio device to select a detected conversation signal is carried out by: - providing an audio output based on a first out of said at least one conversation signals (0107-110, signals may be muted or reduced, and select signals are enhanced, also see figure 2c and 0114-15); and - enabling the user to select a conversation signal by toggling between detected conversation signals in response to carrying out a predetermined interaction with the audio device system (0107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15, and selections may be altered by user). Consider claim 5, Aratsu teaches the method according to claim 4, wherein the predetermined interaction is selected from at least one of: making a specific head movement, tapping an audio device of the audio device system, operating an audio device control means, speaking a control word and operating a graphical user interface of the audio device system (0107-110, users may select signals to mute or reduce, and select signals to enhance via GUI, see figure 2A-2C for example). Consider claim 6, Pedersen teaches The method according to claim 1, wherein said step of comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals comprises at least one of: i) assigning a numerical representation to at least some of the words comprised in each of said plurality of provided sound source signals and providing a word embedding similarity measure for estimating the similarity between each of said plurality of provided sound source signals (OPTIONAL LIMITATION); and ii) determining the timing of speech endings and speech onsets for each of said plurality of provided sound source signals (0052-59, 0242-45, comparing start and stop times of speakers to determine if they are in a conversation) and subsequently match sound source signals for which speech onset for one speech signal is within a predetermined duration after speech ending for another sound source signal (0052-59, 0242-45, comparing start and stop times of speakers that do not overlap to determine if they are in a conversation); and iii) assigning a numerical representation to at least one of syntactic and semantic information comprised in each of said plurality of provided sound source signals and providing at least one of a syntactic and a semantic similarity measure in order to estimate the similarity between each of said plurality of provided sound source signals (OPTIONAL LIMITATION). Consider claim 7, Pedersen teaches The method according to claim 1, wherein the step of detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation comprises at least one of the steps of: i) detecting at least one conversation signal comprising at least two sound source signals having a word embedding similarity measure score that is above a first predetermined threshold (OPTIONAL LIMITATION); ii) detecting at least one conversation signal comprising at least two sound source signals for which one of said sound source signals has a speech onset within a predetermined duration after a speech ending of another of said sound source signals (0052-59, 0242-45, comparing start and stop times of speakers that do not overlap to determine if they are in a conversation); iii) detecting at least one conversation signal comprising at least two sound source signals having a semantic similarity measure score or a syntactic similarity measure score that is above a second or a third predetermined threshold (OPTIONAL LIMITATION); and iv) detecting at least one conversation signal comprising at least two sound source signals having a combined score that is above a fourth predetermined threshold, wherein the combined score is obtained by combining at least two of: the word embedding similarity measure score, the semantic similarity measure score, the syntactic similarity measure score, a sound pressure level score reflecting the strength of said at least two sound source signals and a previous participant score reflecting how often the speakers representing said at least two sound source signals have previously participated in a conversation with each other (OPTIONAL LIMITATION). Consider claim 8, Aratsu teaches The method according to claim 1, wherein the step of providing an audio output based on a selected conversation signal, wherein the contribution to the audio output from the sound source signals not comprised in the selected conversation signal is suppressed compared to the contribution from the conversation signal, comprises at least one of the steps of: - suppressing the contribution to the audio output from the sound source signals not comprised in the selected conversation signal such that the combined level is in the range between 3 and 24 dB or between 6 and 18 dB below the selected conversation signal level (OPTIONAL LIMITATION); - enabling the user to control the ratio between the conversation signal level and the combined level of the sound source signals not comprised in the selected conversation signal (107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15, and selections may be altered by user, and thus may effectively control the ratio between enhanced and reduced signals). Consider claim 9, Aratsu and Pedersen teach The method according to claim 1. Pedersen further teaches wherein the steps d) and c) are only carried out if an estimate of the sound quality of the provided plurality of sound source signals is above a predetermined fifth threshold ( 0021-27, voice detection used, which compares probabilities to voice-no voice decision thresholds. Also see 0260). It would have been obvious to one of ordinary skill in the art at the time of effective filing to use VAD as taught by Pedersen in the system of Aratsu and Pedersen in order to allow for better isolation of speech from noise. Consider claim 10, Aratsu and Pedersen teach The method according to claim 1. Pedersen further teaches the further step of processing the plurality of sound source signals in order to compensate a hearing loss (0224-27, hearing aid system which compensates hearing loss). It would have been obvious to one of ordinary skill in the art at the time of effective filing compensate for hearing loss as taught by Pedersen in the system of Aratsu and Pedersen in order to allow the system to be adapted to a user’s particular hearing needs (Pedersen 0226). Consider claim 11, Aratsu teaches An audio device system (abstract) comprising at least one audio device, wherein said at least one audio device comprises an acoustical-electrical input transducer block (0099, microphone) and an electrical-acoustical output transducer (0114, headphones), and wherein said audio device system further comprises: - a sound source signal separator adapted to receive an input signal from said acoustical-electrical input transducer block and to provide a plurality of sound source signals each representing a sound source of a present sound environment; (0099, collecting sound signals from sources within environment); - a speech content comparator adapted to compare the speech content of each of said plurality of sound source signals (0099, performing comparisons such as speech verification to detect speech signals), and adapted to detect, based on said comparison, at least one conversation signal (0099-0100, 0102, grouping speech signals by speaker, i.e. creating a conversation signal); - a user interface (405) adapted to enable a user to select a detected conversation signal; (0107-0110, allowing users to select icons associated with each speech conversation signal); and a digital signal processor adapted to process and combine the provided plurality of sound source signals signal in order to provide an output signal, wherein the contribution to the output signal from the sound source signals not comprised in the selected conversation signal is suppressed compared to the contribution from the conversation signal (0107-110, users may select signals to mute or reduce, and select signals to enhance, also see figure 2c and 0114-15); and - an electrical-acoustical output transducer (406) configured to receive the output signal and provide an audio output (0114, headphones). Aratsu does not specifically teach b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals; c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation; In the same field of selective speech enhancement, Pedersen teaches b) comparing the speech content of each of said plurality of sound source signals with at least one of the other of said plurality of sound source signals (0052-59, 0242-45, comparing start and stop times of speakers.); c) detecting, based on said comparison, at least one conversation signal comprising at least two sound source signals representing speakers participating in the same conversation (0052-59, 0242-45, comparing start and stop times of speakers to determine if they are in a conversation., combining signals based on determination); It would have been obvious to one of ordinary skill in the art at the time of effective filing compare speech signals to determine distinct conversations as taught by Pedersen in the system of Aratsu in order for the system to more efficiently allow selection of relevant conversation data (Pedersen 0004-07). Claims 12 contains similar limitations as claim 10 and is therefore rejected for the same reasons. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Linton et al. (US Patent 11,257,510) and Sabin et al. (US PAP 2020/0128322) both teach conversation isolation systems. Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. DOUGLAS GODBOLD Examiner Art Unit 2655 /DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655
Read full office action

Prosecution Timeline

Jun 12, 2024
Application Filed
Jan 14, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12585879
ARTIFICIAL INTELLIGENCE ASSISTED NETWORK OPERATIONS REPORTING AND MANAGEMENT
2y 5m to grant Granted Mar 24, 2026
Patent 12579371
USING MACHINE LEARNING TO GENERATE SEGMENTS FROM UNSTRUCTURED TEXT AND IDENTIFY SENTIMENTS FOR EACH SEGMENT
2y 5m to grant Granted Mar 17, 2026
Patent 12579372
KEY PHRASE TOPIC ASSIGNMENT
2y 5m to grant Granted Mar 17, 2026
Patent 12579383
VERIFYING TRANSLATIONS OF SOURCE TEXT IN A SOURCE LANGUAGE TO TARGET TEXT IN A TARGET LANGUAGE
2y 5m to grant Granted Mar 17, 2026
Patent 12572749
Compressing Information Provided to a Machine-Trained Model Using Abstract Tokens
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
94%
With Interview (+10.5%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 1079 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month