Last updated: April 19, 2026

Application No. 18/678,925

PERSONALIZED NEARBY VOICE DETECTION SYSTEM

Final Rejection §103

Filed

May 30, 2024

Examiner

ZHU, RICHARD Z

Art Unit

2654

Tech Center

2600 — Communications

Assignee

BOSE CORPORATION

OA Round

2 (Final)

Interview Optional

— +15.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 718 resolved cases, 2023–2026

Examiner Intelligence

ZHU, RICHARD Z View full profile →

Grants 69% — above average

Career Allow Rate

498 granted / 718 resolved

+7.4% vs TC avg

Strong +15% interview lift

Without

With

+15.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

32 currently pending

Career history

750

Total Applications

across all art units

Statute-Specific Performance

§101

16.0%

-24.0% vs TC avg

§103

54.5%

+14.5% vs TC avg

§102

19.7%

-20.3% vs TC avg

§112

4.2%

-35.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 718 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement  
Acknowledgement is made of applicant’s amendment made on 2/25/2026. Applicant’s submission filed has been entered and made of record.
Status of the Claims
Claims 1-7, 9-19, and 21 are pending. 
Response to Applicant’s Arguments
Claims 1-7 and 9-10 are patent eligible for the following reason:
Under Step 2A, Prong 2, the Supreme Court held that when a claim containing a mathematical formula (i.e., an abstract idea) implements or applies that math formula / abstract idea in a structure or process which, when considered as a whole, is performing a function which the patent laws were designed to protect (e. g., transforming or reducing an article to a different state or thing), then the claim satisfies the requirements of §101. Diamond v. Diehr, 450 U.S. 175, 192 (1981); See MPEP 2106.04(d)I (“Implementing a judicial exception with, or using a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim, as discussed in MPEP 2106.05(b)”). See also Gottschalk v. Benson, 409 U.S. 63, 70 (1972) (“Transformation and reduction of an article "to a different state or thing" is the clue to the patentability of a process claim that does not include particular machines”).
In Diehr, the claims involved a method for curing rubber by using Arrhenius equation to constantly measure actual temperature inside a mold and feeding the temperature measurements into a computer to repeatedly recalculate the cure time to open the press. Diehr, 450 U.S. at 178-79. Since the Supreme Court viewed the claims not as an attempt to patent a mathematical formula, but to an industrial process for molding of rubber products, the claims were statutory. Id. at 192-93.
The key here, as noted by the CAFC, is that the Supreme Court in Diehr looked to how the claims "used that equation in a process designed to solve a technological problem in `conventional industry practice.'" McRO, Inc. v. Bandai Namco Games America, Inc., 837 F.3d 1299, 1312 (Fed. Cir. 2016). When looked at as a whole, "the claims in Diehr were patent eligible because they improved an existing technological process, not because they were implemented on a computer." Id. at 1312-13.
In another example, in McRO, the CAFC noted that prior art method of generating morph weight set with values between “0” and “1” for computer animation of facial expressions are manually determined. McRO, 837 F.3d at 1304-5. The claimed improvement in McRO allows computers to produce “accurate and realistic lip synchronization and facial expressions in animated characters” that previously could only be produced by human animators through the automated use of rules, rather than artists, to set the morph weights and transitions between phonemes. Id. at 1313.
Specifically, the claims are directed to the incorporation of claimed rules, not the use of the computer that improved existing technological process by allowing automation of further tasks that goes beyond merely organizing existing information into a new form. Id. at 1314-15. 
In other words, the claimed process used a combined order of specific rules that rendered information into a specific format that was then used and applied to create a sequence of synchronized, animated characters that prevent pre-emption of all processes for achieving automated lip-synchronization of 3-D characters. Id. at 1315. Therefore, the CAFC held that the ordered combination of claimed steps, using unconventional rules that relate sub-sequences of phonemes, timing, and morph weight sets is patent eligible. Id. at 1302-3.
Amended Claim 1 recites a method comprising:
(1) prompting a user to input one or more words or phrases related to how others refer to the user;
(2) generating, using the input, data to detect the one or more words or phrases from a variety of sounds of speech input; 
(3) detecting a sound in an environment while a wearable audio device is emitting audio at a volume;
(4) determining, using the data, that the sound detected in the environment passes a threshold of including the one or more words or phrases;
(5) lowering the volume of the audio emitted by the wearable audio device to a second volume;
(6) determining the user is engaging with the sound detected in the environment; and
(7) lowering the volume of the audio emitted by the wearable audio device to a third volume.
Much like the application of Arrhenius equation in an industrial process for molding of rubber products in Diehr, the combination of steps (1)-(7) applied the determinations of steps (4) and (6) on how to adjust the volume of audio emitted by a wearable audio device such that the steps are directed to an application of the mathematical determinations of (4) and (6) to a specifically asserted technology; i.e., a particular means and method of adjusting the volume of the wearable audio device.
Just as the application of a combined order of specific rules to mathematically computed morph weights to render information into a specific format to create automated lip synchronization of 3-D characters in McRO, the application of determinations in steps (4) and (6) are applied to adjust the volume of audio emitted by the wearable audio device that goes beyond merely organizing existing information into a new form; i.e., steps (1)-(7) set forth particular combination of steps to operate the volume setting of wearable audio device and therefore are directed to controlling the wearable audio device to render / emit audio at respective volumes. 
Therefore, claims 1-7 and 9-10 are directed to a particular means or method of controlling a wearable audio device to render / emit audio at particular volumes. 
In response to “However, Haggai does not teach, suggest or disclose "detecting a sound in an environment while a wearable audio device is emitting audio at a volume[,] determining, using the data, that the sound detected in the environment passes a threshold of including the one or more words or phrases[,] lowering the volume of the audio emitted by the wearable audio device to a second volume[,] determining the user is engaging with the sound detected in the environment[,] and lowering the volume of the audio emitted by the wearable audio device to a third volume" as recited in Claim 1”. 
In view of such amendment to claim 1, anticipation rejections under Haggai have been withdrawn. Upon further search and consideration, please see details of a new combination of references set forth below.  
In response to “However, Gelter and Haggai, alone or in combination, do not teach, suggest, or disclose "detect a sound in an environment with the at least one audio sensor while the wearable audio emitting device is emitting audio at a volume[,] determine, using the data, that the sound detected in the environment passes a threshold of including the one or more words or phrases[,] lowering the volume of the audio emitted by the wearable audio device to a second volume[,] determining the user is engaging with the sound detected in the environment[,] and lowering the volume of the audio emitted by the wearable audio device to a third volume" as recited in claim 12”.
In view of such amendment to claim 12, anticipation rejections under Gelter and Haggai have been withdrawn. Upon further search and consideration, please see details of a new combination of references set forth below.
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 9-10, 12, 14-16, and 21 are rejected under 35 USC 103(a) as being unpatentable over Gelter (US 2016/0277858 A1) in view of Fadell et al. (US 10013999 B1) and Haggai et al. (US 2024/0236541 A9).
Regarding Claims 1 and 12, Gelter discloses a system (Figs 1A and 2, apparatus 10), comprising: 
a device (¶45, another cellphone; Fig. 2, device 12 as a cell phone per ¶18) comprising: 
an interface (¶21, user interface 30); and 
at least one first processor (¶21, controller 20 for another cell phone) configured to prompt a user to input desired audio trigger into the interface (¶45, audio listening device 12 receiving pre-recorded desired audio triggers from another device 12 / cell phone when the user enters the audio listening device into configuration mode per ¶40 and ¶42 to prompt the user to provide the desired audio trigger through user interface 30 of the other cellphone); and 
a wearable audio device in communication with the device (Figs. 1A and 2, audio listening device with headphones 14; per ¶33, audio listening device 12 and headphones 14 integrated into a single device), the wearable audio device comprising: 
at least one audio sensor (Fig. 2, microphone 16); and 
at least one second processor (Fig. 2, controller 20 for the audio listening device 12) configured to: 
generate, using the input, data to detect the desired audio trigger from a variety of sounds of speech input (¶28, classifier 22 / controller 20 initiates quick search / comparison of incoming ambient audio signal to stored desired audio trigger); 
detecting a sound in an environment while a wearable audio device is emitting audio at a volume (¶18, audio listening device 12 configured to transmit an audio signal to headphone 14 for audio playback for a user / listener in a listening mode; ¶21 and ¶28, when microphone 16 receives an incoming ambient audio signal when the audio listening device 12 is in the listening mode (e.g., headphones 14 are transmitting audio data as received from the audio listening device 12 to the user)); 
determine, using the data, that the sound detected in the environment passes a threshold of including the desired audio trigger (¶21, audio listening device 12 notifies the headphones 14 when an incoming ambident audio signal matches a desired audio trigger during audio playback at the headphones 14; ¶30, classifier 22 performs deep scan to further compare information on the incoming ambient audio signal to the stored desired audio trigger to determine if the incoming ambient audio signal indeed matches one of the stored desired audio triggers within a tolerance level for the desired audio trigger); and
lowering the volume of the audio emitted by the wearable audio device to a second volume (¶33, headphones 14 comprises receiver 42 configured to receive notification signal from audio listening device 12 that the incoming ambient audio signal matches the stored desired audio trigger, an attenuator 44 attenuates the audio data that is being played back to enable the user to hear the incoming ambient audio signal).
Gelter does not disclose determining the user is engaging with the sound detected in the environment and lowering the volume of the audio emitted by the wearable audio device to a third volume. 
Fadell discloses a system comprising a device and a wearable audio device (Fig. 4, computing device 410 and wearable computing device 430) for 
detecting a sound in an environment while the wearable audio device is emitting audio at a volume (Col 12, Rows 50-54, while playing back audio content by driving an audio output module of the wearable device with first audio signal (per Abstract), wearable device automatically recognize when a user is engaging in a conversation and then duck the audio content playback accordingly in real time; Col 13, Rows 30-40, at block 504, device 300 receives second audio signal with first ambient noise including speech of the user and others around the user via microphone of the wearable device, perform spectral analysis of the first ambient noise to detect typical human speech patterns; e.g., Col 14, Rows 32-46), 
determining that the sound detected in the environment passes a threshold of including one or more words / phrases (Col 13, Rows 40-44, device determines signal to noise ratio of the first ambient noise is above a threshold ratio indicating noise is likely speech and perform speech recognition), 
lowering the volume of the audio emitted by the wearable audio device to a second volume (Col 13, Rows 59-62 and Col 14, Rows 54-56, initiate ducking / volume attenuation of the first audio signal in response to a determination that the first ambient noise is indicative of speech by the user), and
determining that the user is engaging with the sound detected in the environment (Col 15, Rows 23-29, while the first audio signal / audio content playback is ducked, detect second ambient noise comprising not only user’s speech but also ambient speech of others as well in a subsequent portion of the second audio signal; i.e., user is engaging in a conversation with others), lowering the volume of the audio emitted by the wearable audio device to a third volume (Col 16, Rows 11-23, adjust the ducking of the first audio signal while the first audio signal is ducked by pausing the first audio signal or adjusting the degree of volume attenuation).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine that the user is engaging with the sound detected in the environment and lowering the volume of the audio emitted by the wearable audio device to a third volume in order to adjust the nature of ducking of the volume of the audio emitted by the wearable audio device based on conditions such as duration of the conversation (Fadell, Col 16, Rows 11-23).
Gelter does not disclose that the desired audio trigger is one or more words / phrases related to how others refer to the user.
Haggai discloses a method, comprising: 
prompting a user to input one or more words or phrases related to how others refer to the user (¶23, computing device 302 can store user preferences and learn user preferences over time regarding which sounds a user wishes to responds to by generating a dedicated configuration for each user preference upon learning based on user inputs; in view of ¶13, users may customize on which key phrases they would like to be notified (e.g., their name – David, Mary, Mom, Dad) so that they may be aware of people who need their attention while busy listening to content via Bluetooth hearables); 
generating, using the input, data to detect the one or more words or phrases from a variety of sounds of speech input (¶20, computing device 302 performs processing for event detection at block 320 to determine presence of a relevant sound relevant to the user based upon text triggers stored in memory); and 
determining, using the data, that sound detected in an environment passes a threshold of including the one or more words or phrases (¶25, computing device 302 filters incoming audio 402 and examine input for keyword patterns (keyword spotting circuitry 404); ¶28, a Deep Neural Network performs functions of spotting a keyword to make a trigger decision to notify the end user).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to prompt a user to input desired audio trigger corresponding to one or more words or phrases related to how others refer to the user in order to make the user aware of people who need their attention while busy listening to content via wearable device (Haggai, ¶13; compare Gelter, ¶15 and ¶30, execute speech recognition algorithm to detect human voice to alert the user of activity outside of the headphones).
Regarding Claims 2 and 14, Gelter discloses wherein the at least one second processor is further configured to compare the data to reference data (¶36, perform quick scan and deep scan to determine if received incoming ambient audio signals match a stored desired audio trigger).
Regarding Claims 3 and 15, Gelter discloses wherein the reference data comprises a plurality of reference audio samples that include the one or more words or phrases (Gelter, ¶15 and ¶60, previously stored desired audio triggers include certain human voices and using speech recognition algorithm to detect the human voice comprising phoneme; compare Haggai, ¶28, using DNN to spot keywords).
Regarding Claims 4 and 16, Gelter discloses wherein the reference data is pre-obtained by a plurality of non-users (¶15, previously stored desired audio triggers such as doorbell, certain human voices, baby crying etc.).
Regarding Claims 9-10 and 21, Gelter discloses wherein the input is text or audio (¶21, record and store desired audio trigger; compare Haggai, ¶13, allowing user to customize key phrases they would like to be notified and ¶23, computing device 302 can store user preferences regarding which sounds a user wishes to respond to such as hearing a particular name or word).
Claims 5-7 and 17-19 are rejected under 35 USC 103(a) as being unpatentable over Gelter (US 2016/0277858 A1) in view of Fadell et al. (US 10013999 B1) and Haggai et al. (US 2024/0236541 A9) as applied to claims 1 and 12, in further view of Kracun et al. (US 2022/0165277 A1).
Regarding Claims 5-7, Haggai does not disclose wherein the reference data comprises negative data that fails to include the one or more words or phrases.
Kracun discloses a speech enabled device initiating voice interaction based on one or more words or phrases (¶28 and ¶36, user device hotword detector configured to detect “Hey Google” in streaming audio) by comparing data generated to detect the one or more words or phrases from a variety of sounds of speech input to reference data comprising negative data that fails to include the one or more words or phrases (¶64, comparing subsequent audio data corresponding to another utterance spoken to classification results stored in memory including negative hotwords “Poodle”, “Noodle”, and “Doodle”), wherein the data is plotted against the reference data in a vector space to determine how closely the data matches the reference data versus the negative data (¶64, compute an evaluation embedding representation for the subsequent audio data characterizing the hotword event and access the memory to obtain embedding representation of corresponding negative hotwords), and determining that the sound detected in an environment passes a threshold (¶65, compare computed evaluation embedding representation with all of reference embeddings for each of the negative hotword), wherein the threshold is a distance measured within the vector space that the sound detected in the environment includes the one or more words or phrases based on the plotted data (¶65, determine a similarity score between the reference embedding representation and the computed evaluation embedding representation, each similarity score is associated with a cosine distance between the evaluation embedding representation and the reference embedding representation; ¶66, compare the similarity score to a similarity threshold and classify the subsequent audio data as including the negative hotword when the similarity score satisfies the similarity score threshold).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to compare the data to reference data comprising negative data that fails to include the one or more words or phrases in order to suppress detecting the word or phrase event in subsequent audio data (Kracun, ¶66) and prevent false detection of the one or more words or phrases (Kracun, ¶33; compare Gelter, ¶60, perform speech recognition algorithms to detect human voice).
Claim 13 is rejected under 35 USC 103(a) as being unpatentable over Gelter (US 2016/0277858 A1) in view of Fadell et al. (US 10013999 B1) and Haggai et al. (US 2024/0236541 A9) as applied to claim 12, in further view of Chin et al. (US 2009/0094547 A1).
Regarding Claim 13, Gelter discloses wherein the at least one first processor is further configured to pre-recorded and store different audio samples that include the desired audio triggers prior to the data being generated by the at least one second processor (¶45, audio listening device 12 (i.e., controller 20) receives pre-recorded desired audio triggers for purposes of comparison to incoming ambient audio signal and store the same in memory 26) and as modified by Haggai, the desired audio triggers include the one or more words or phrases (Haggai, ¶13).
Gelter and Haggai do not disclose the first processor is configured to synthesize multiple different audio samples that include the one or more words or phrases.
Chin discloses a processor configured to synthesize multiple different audio samples that include one or more words or phrases (¶¶28-29, a user’s name is stored as a text file, fed to a speech synthesizer to generate an audio signal incorporating the user’s name; per ¶27, different users 7, 8, 9 with voice files 71, 81, 91). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to generate the pre-recorded desired audio trigger by synthesizing multiple different audio samples that include the one or more words or phrases related to how others refer to the user in order to generate and link speech file to the user’s identity (Chin, ¶8).


Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor Hai Phan whose telephone number is 571-272-6338. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2654                                                                                                                                                                                                        03/07/2026

Read full office action

Prosecution Timeline

May 30, 2024

Application Filed

Dec 18, 2025

Non-Final Rejection — §103

Feb 02, 2026

Interview Requested

Feb 09, 2026

Examiner Interview (Telephonic)

Feb 09, 2026

Examiner Interview Summary

Feb 25, 2026

Response Filed

Mar 07, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/247,441

Patent 12592228

SPEECH INTERACTION METHOD ,AND APPARATUS, COMPUTER READABLE STORAGE MEDIUM, AND ELECTRONIC DEVICE

2y 5m to grant Granted Mar 31, 2026

18/365,694

Patent 12592222

APPARATUSES, COMPUTER PROGRAM PRODUCTS, AND COMPUTER-IMPLEMENTED METHODS FOR ADAPTING SPEECH RECOGNITION CONFIDENCE SCORES BASED ON EXPECTED RESPONSE

2y 5m to grant Granted Mar 31, 2026

18/510,086

Patent 12586574

ELECTRONIC DEVICE FOR PROCESSING UTTERANCE, OPERATING METHOD THEREOF, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

18/520,336

Patent 12579978

NETWORKED DEVICES, SYSTEMS, & METHODS FOR INTELLIGENTLY DEACTIVATING WAKE-WORD ENGINES

2y 5m to grant Granted Mar 17, 2026

17/957,934

Patent 12572739

GENERATING MACHINE INTERPRETABLE DECOMPOSABLE MODELS FROM REQUIREMENTS TEXT

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

69%

Grant Probability

85%

With Interview (+15.4%)

3y 2m

Median Time to Grant

Moderate

PTA Risk

Based on 718 resolved cases by this examiner. Grant probability derived from career allow rate.