Last updated: May 29, 2026
Application No. 18/622,449
Headphone Conversation Detect

Final Rejection §103
Filed
Mar 29, 2024
Priority
Apr 28, 2023 — provisional 63/499,180 +1 more
Examiner
WOZNIAK, JAMES S
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
2 (Final)
Interview Optional

— +39.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 60% grant rate with +39.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 391 resolved cases, 2023–2026
Examiner Intelligence

WOZNIAK, JAMES S View full profile →
Grants 60% of resolved cases
Career Allowance Rate
233 granted / 391 resolved
-2.4% vs TC avg
Strong +39% interview lift
Without
With
+39.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
21 currently pending
Career history
429
Total Applications
across all art units
Statute-Specific Performance

§101
7.2%
-32.8% vs TC avg
§103
82.5%
+42.5% vs TC avg
§102
5.8%
-34.2% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 391 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Non-Final Office Action from 11/18/2025, Applicant has filed an amendment on 2/10/2026.  In this reply, Applicant has amended independent claim 1 to further specify that the OVAD is "configured to output voice activity of the wearer of the first headphone" and that the TVAD is "configured to output voice activity of another person."  Independent claim 17 was similarly amended, and was amended to include a limitation regarding monitoring an idle duration with respect to an idle threshold similar to claim 1.  Applicant has also argued that the prior art of record fails to teach OVADs and TVADs that distinguish between the voice activities of a headphone wearer and another talker (Amendment, Pages 7-9).  These arguments have been fully considered, however, are not found to be persuasive for the reasons noted in the Response to Arguments section.

Applicant argues that the grammar issues of claims 1 and 3 have been corrected via the instant amendment (Remarks, Page 7).  
In response to the amendments correcting the grammar issues in claims 1 and 3, the claim objections directed towards minor informalities have been withdrawn.

Applicant argues that the amendments to claims 13-15 and 20 resolve the indefiniteness rejections under 35 U.S.C. 112(b) (Remarks, Page 7).  
In response to the correction of the antecedent basis issues in these claims, the 35 U.S.C. 112(b) rejection is now moot and has been withdrawn.

In regards to the non-statutory double patenting rejection, Applicant indicates that such a rejection will be considered when the rejection is revisited in response to the instant amendment (Remarks, Page 7).
In response, it has been recognized that an amendment was filed on 3/30/2026 in the co-pending/reference application 18/622543 where conflicting claim 1 has been cancelled.  Accordingly, with no remaining conflicting claims, the non-statutory obviousness-type double patenting rejection is now moot and has been withdrawn.

Examiner Notes on Compact Prosecution

Starting on 2/23/2026, the Examiner conducted numerous phone conversations with the Applicant’s representative before leaving a final voicemail that was not returned on 4/6/2026.  These efforts were made in order to incorporate allowable subject matter from dependent claim 16 into the independent claims as well as discuss the filing of an amendment or terminal disclaimer to obviate the non-statutory double patenting rejection in view of conflicting claim 1 in the co-pending/reference application 18/622543.  After waiting for Applicant reply in the co-pending agreement was ultimately not reached due to the addition of 4 new unexamined claims without a corresponding paid fee included in the Applicant proposed amendment to incorporate the subject matter of dependent claim 16.  Examiner requested an Examiner’s Amendment in which these new and unexamined claims were not included, however, Applicant’s representative did not reply to such requests as of the mailing of this Final Office Action.  Applicant’s proposed amendments have been entered into the file wrapper.

Response to Arguments

With respect to independent Claim 1, Applicant argues that in the combination of Patel and Hook, Hook relates to how many people are involved in a conversation and the timings of their respective speech, but does not distinguish between the voice activities of a headphone wearer and another talker. Applicant further finds that the detectors and microphones are not OVADs and TVADs because they are assigned to physical zones in a vehicle cabin.  Next, Applicant argues that the conversation metric in Hook is the absolute value of the difference between speech and non-speech metrics and does not relate to detecting a simultaneous lack of activity and declaring a conversation end based upon an idle duration being longer than a threshold.  Accordingly, Applicant concludes that the prior art combination under 35 U.S.C. 103 is improper and should be withdrawn (Remarks, Pages 7-9).
In response to applicant's arguments against the references individually, one cannot show non-obviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).  Applicant is also reminded that A person of ordinary skill in the art is also a person of ordinary creativity, not an automaton." KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 421, 82 USPQ2d 1385, 1397 (2007). "[I]n many cases a person of ordinary skill will be able to fit the teachings of multiple patents together like pieces of a puzzle." Id. at 420, 82 USPQ2d 1397. Office personnel may also take into account "the inferences and creative steps that a person of ordinary skill in the art would employ." Id. at 418, 82 USPQ2d at 1396.
Turning, now to the merits of the prior art rejection under 35 U.S.C. 103, Applicant is directed towards page 14 of the Non-final Office Action.  Here, the modification rendering the claimed invention obvious was described as Patel's relevant audio detection for pass-through transparency including the detection of a conversation between participants (headset user 150 and external person 170) as effectively setting up zones for speech detection.  These zones of a headset wearer's own (150) and another external person (170) were then applied to the zones assigned via the teachings of Hook for conversation detection (Paragraphs 0075-77).   Paragraph 0103 of Hook also discusses that an absence of speech of more than one person (e.g., the user and the external person) leads to determining an end of a conversation while Paragraph 0077 of Hook describes a large absence of speech between people thus implying some form of idle duration threshold.  In this manner, Hook does describe a simultaneous lack of speech activity to declare the end of conversation with an implied duration (i.e., larger absence vs. shorter absences in normal conversation turn taking as described in Paragraph 0077).  In this manner, it is the combination of Patel and Hook that address the argued limitation and Applicant’s arguments directed towards claim 1 are not found to be persuasive.
Applicant traverses the prior art rejection of independent claim 17 and the remaining dependent claims for reasons similar to Claim 1 (Remarks, Pages 9-10).  In regards to such arguments, see the response directed towards claim 1.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. (U.S. PG Publication:  2021/0266655 A1) in view of Hook et al (2021/0193162 A1).
With respect to Claim 1, Patel discloses:
A digital audio processor (Paragraph 0020) for use with a first headphone (Fig. 1, Element 150), the digital audio processor comprising:
a filter block that is to process one or more of a plurality of microphone signals in the first headphone, for producing a transparency audio signal (headset filter that includes a pass-through/transparency mode allowing external sounds/audio to be heard via the headset, Paragraphs 0018-0019, 0026, 0028, and 0037); and
a conversation detector that processes one or more of the plurality of microphone signals (audio classifier that is "configured to analyze portions of an input signal that corresponds to audio received by the one or more microphones to generate a classification result" pertaining to conversational "speech content," Paragraphs 0026, 0034 (discussing speech of another person external to the headphone wearer, Fig. 1, Elements 150, 160, and 170), 0037, 0041, 0042 (detection of headphone user speech in "speaking to the person 170"), 0046 (detected speech of the person 170 or the wearer of the headset 160), and 0048)), to: 
declare a conversation and in response activate a transparency mode of operation, when a wearer of the first headphone is conversing with or is about converse with another talker who is in an ambient environment of the wearer (updating the mode of headset operation into pass through/transparency mode when it is determined that a headset user starts a conversation (i.e., about to converse) or is talking with another person in the environment (see Fig. 1, Elements 150, 160, and 170), Paragraphs 0019, 0026, 0034, 0039, 0041-0042, 0046, 0064),
	wherein in the transparency mode, the processor configures the filter block to activate the transparency audio signal, and routes the transparency audio signal to a speaker of the first headphone (transparency mode called "pass through" in the prior art that "enable external sounds to pass through to a wearer (e.g., the user 160) of the headset 150 when relevant audio is detected," Paragraphs 0019, 0026 (“enabling sound corresponding to the input signals to be output by the speakers of the headset 150”), 0039, 0048, and 0064), and
declare an end to the conversation based on processing one or more of the plurality of microphone signals, and in response deactivate the transparency mode (reset configuration to end the conversational pass through that involves noise cancellation to one again be enabled based upon a headset user audio input, Paragraph 0038 and 0065).
Patel teaches headset wearer and external person zones, but does not teach wherein the conversation detector comprises an own voice activity detector, OVAD configured to output voice activity of the wearer of the first headphone and a target voice activity detector, TVAD, configured to output voice activity of another person monitors an idle duration in which the OVAD and the TVAD are both or simultaneously indicating no activity, and declares the end to the conversation in response to the idle duration being longer than an idle threshold.
Hook, however, recites:
wherein the conversation detector comprises an own voice activity detector, OVAD and a target voice activity detector, TVAD, monitors an idle duration in which the OVAD and the TVAD are both or simultaneously indicating no activity, and declares the end to the conversation in response to the idle duration being longer than an idle threshold (audio-based monitoring of a conversation between participants such as "between two people" that involves multiple microphones/sensors and speech detectors (i.e., an own/first detector and another participant/target speech detector) and identifies the end of a conversation based upon a large (i.e., implied threshold) "absence of speech" from both speakers, Paragraphs 0024-0025, 0030, 0053, 0074 (discussing different zones for speech detection), 0075-0077, and 0103 (end of conversation detection based upon silence)).
Patel and Hook are analogous art because they are from a similar field of endeavor in audio playback adjustment based upon voice activity detection.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the participant (i.e., headset user 150 and external person 170 in the case of Patel) speech detectors to identify an end to a conversation between two people as taught by Hook in the audio playback based upon relevant audio classification taught by Patel to provide a predictable result of reducing conversational effort by controlling audio playback for a duration of a conversation as a type of relevant audio (Hook, Abstract).
With respect to Claim 2, Patel further discloses:
The processor of claim 1 wherein the TVAD comprises a machine learning (ML) model that is driven by one or more of the microphone signals and is being used for detecting voice activity of the another talker (audio classifier using "an artificial intelligence network," that detects speech of the other person 170 based upon a microphone input, Paragraphs 0026, 0031, 0043, 0046, and 0053 wherein the multiple speech detectors such as for the other person are taught by Hook as applied to claim 1).
With respect to Claim 3, Patel further discloses:
The processor of claim 1 wherein the OVAD comprises another machine learning ML model that is driven by one or more of the microphone signals and is used for detecting own voice activity of the wearer (audio classifier using "an artificial intelligence network," that detects speech of the headset wearer 160 based upon a microphone input, Paragraphs 0026, 0031, 0043, 0046, and 0053 wherein the multiple speech detectors such as for the other person are taught by Hook as applied to claim 1).
With respect to Claim 4, Patel further discloses:
The processor of claim 1 wherein the filter block is configured to produce the transparency audio signal as a conversation-focused transparency audio signal (pass through for relevant conversation audio including the speech from the headset wearer 160 and/or other person 170, Paragraphs 0026, 0046, 0064).
With respect to Claim 5, Patel further discloses:
The processor of claim 4 wherein when the transparency mode is deactivated, the conversation detector deactivates the conversation-focused transparency audio signal and activates a normal transparency audio signal that is routed to drive the speaker (conversation pass through deactivates and in response a normal audio playback to a headset speaker is resumed that can include a degree of transparency with variable noise suppression/cancellation, Paragraphs 0019, 0030, 0038, and 0048).
With respect to Claim 6, Patel further discloses:
The processor of claim 5, wherein the conversation detector configures the filter block to produce the normal transparency audio signal by processing one or more of the plurality of microphone signals (filter enablement/activation to process the microphone signals via reduction or cancellation, Paragraphs 0018, 0028, 0030, and 0037-0038).
With respect to Claim 7, Patel further discloses:
The processor of claim 1 wherein the conversation detector declares the conversation in response to the OVAD indicating speech activity (relevant audio criteria for when the headset "user starts speaking to the person 170," Paragraphs 0042 and 0046; Fig. 2 wherein the multiple speech detectors such as for the other person are taught by Hook as applied to claim 1).
With respect to Claim 8, Patel further discloses:
The processor of claim 1 wherein the conversation detector declares the conversation based on an automatic speech recognition machine learning model configured to receive as input the one or more microphone signals and provide an output that differentiates spoken voice syllables from other sounds (audio classifier that uses an "an artificial intelligence network" includes a "speech recognizer" that identifies/differentiates spoken voice sounds as pertaining to particular syllables as a classification output (e.g., contained in a keyword, name of a person, topic, etc.), Paragraphs 0031, 0041, 0043, and 0053).
With respect to Claim 9, Patel further discloses:
The processor of claim 1 wherein the filter block further comprises an acoustic noise cancellation (ANC) subsystem that produces an anti-noise signal based on processing one or more of the microphone signals (automatic noise cancellation utilizing a filter that produces an anti-noise signal (e.g., a noise reduced signal) based upon the external sounds received by the microphone, Paragraphs 0018-0019, 0028, and 0037), and the processor is to configure the filter block to deactivate the anti-noise signal, reduce selected frequency-dependent gains of the anti-noise signal, or reduce a scalar gain of the anti-noise signal, in response to the transparency mode being activated (deactivation of the filter/noise cancellation that produces the anti-noise signal responsive to activation of pass through, Paragraphs 0018-0020, 0026, 0028, and 0037).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook et al and further in view of Bean, et al. (U.S. PG Publication:  2021/0409860 A1).
With respect to Claim 10, Patel in view of Hook discloses the headset utilizing conversation voice activity detection for passthrough/transparency activation, as applied to Claim 1.  Patel in view of Hook does not specifically describe the adaptive/adjustable transparency digital filter as set forth in claim 10.
Bean, however, discloses:
a transparency digital filter that filters one or more of the plurality of microphone signals to produce the transparency audio signal as a conversation-focused transparency audio signal (adaptive "hear-through" filter that filters multiple microphone signals, Paragraphs 0052-0053 and 0106), wherein the transparency digital filter is a time-varying filter that is updated or adapted in real-time or on a per audio frame basis by the processor based on the processor detecting a far-field speech in the plurality of microphone signals (the hear-through/transparency filter is adaptive over time based upon far-field/external microphone signals, Paragraphs 0042 (describing hear-through of external sounds), 0045, 0051-0057, and 0091 (describing alternate ANC mode); Fig. 6 showing frame-wise processing of the input "n" in x(n)).
Patel, Hook, and Bean are analogous art because they are from a similar field of audio adjustment based upon audio analysis.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the adaptive hear-through filter taught by Bean in the passthrough processing taught by Patel in view of Hook to provide a predictable result of better adjusting to varying listening, environment, and/or device conditions (Bean, Paragraph 0060).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook et al and further in view of Visser, et al. (U.S. PG Publication:  2023/0036986 A1).
With respect to Claim 11, Patel in view of Hook discloses the headset utilizing conversation voice activity detection for passthrough/transparency activation, as applied to Claim 1.  Patel in view of Hook does not specifically describe the transparency digital filter operating as part of a beamforming process as set forth in claim 11. Visser, however, discloses:
a transparency digital filter that filters the plurality of microphone signals to produce the transparency audio signal, wherein the transparency digital filter operates as part of a beamforming process that performs spatially selective sound pick up in an angular spread of less than 180 degrees in front of the wearer (transparency mode filter that plays an environmental audio event to a user using a beamformer that enables "a user to hear external sounds under specific circumstances" where the beamforming filter for pass through/transparency has an associated "audio zoom angle" that is pointed to the sound source (Fig. 4, Elements 180 and 182) and that is in front of a headset wearer (Fig. 4) such that the zoom angle would be less than 180 degrees which would correspond to the entire area in front of a user otherwise the beam would not "point" towards the source and would include other ambient noise, Paragraphs 0050, 0053, 0088, 0092-0093, 0099, 0114, 0121, 0123-0124, 0227, and 0281).
Patel, Hook, and Visser are analogous art because they are from a similar field of endeavor in relevant audio event detection.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the beamforming filter taught by Visser in the noise cancellation techniques used by the headset taught by Patel in view of Hook to provide a predictable result of better reducing noise from undesired sources (e.g., Fig. 4, Element 182) (Visser, Paragraph 0121).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook et al and further in view of Lu, et al. (U.S. PG Publication:  2021/0390972 A1).
With respect to Claim 12, Patel in view of Hook discloses the headset utilizing conversation voice activity detection for passthrough/transparency activation, as applied to Claim 1.  Patel in view of Hook does not specifically describe the own voice or sidetone digital filter as described in claim 12.  Lu, however, discloses:
the filter block comprises an own voice or sidetone digital filter that filters one or more of the plurality of microphone signal to produce an own voice or sidetone audio signal (sidetone filter (Fig. 1, Element 120) that produces an audio signal that replicates the voice of a user speaking to the microphone, Paragraph 0016 and 0021),
where it is noted that Patel further discloses:
 the conversation detector routes the own voice audio or sidetone signal to the speaker in the first headphone in response to and whenever detecting the wearer is talking but the conversation detector has not declared the conversation (relevant audio for triggering pass through that is related only to speech of the user for pass through to a headset speaker, Paragraphs 0026 and 0046; Fig. 2, Element 206; where Hook teaches the conversation detection between two participants as applied to claim 1).
Patel, Hook, and Lu are analogous art because they are from a similar field of endeavor in audio playback adjustment based upon voice activity detection.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the sidetone filter taught by Lu in the noise cancellation techniques (Patel, Paragraph 0028) taught by Patel in view of Hook in order to provide a predictable result of achieving a more noise-free self-voice signal (Lu, Paragraph 0021).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook et al and further in view of Clemm (U.S. Patent:  6,865,162).
With respect to Claim 13, Patel in view of Hook discloses the headset utilizing conversation voice activity-based conversation detection on microphone inputs for relevant audio for passthrough/transparency activation, as applied to Claim 1.  Patel in view of Hook does not teach buffering the incoming audio signals that are received via the microphone(s) for the voice activity/conversation detection and then routing the past ambient audio for playback (i.e., the headset speaker in the case of Patel) while the activity detection is processing the buffered microphone signal to declare the conversation.  Clemm, however, discloses a buffer that captures an audio signal during the delay between the start of voice (i.e., conversation activity in the case of Patel in view of Hook) that is then played out when voice conversation is detected (Abstract; Col. 2, Line 29- Col. 3, Line 6 including where silence detection/conversation end is continually performed on incoming buffered signals).
Patel, Hook, and Clemm are analogous art because they are from a similar field of audio adjustment based upon speech detection.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the buffering taught by Clemm to account for the delay in the conversation/relevant audio detection in Patel in view of Hook in order to eliminate the clipping of important onset/starting audio that can be clipped in VAD-directed silence suppression (i.e., noise cancellation in Patel) (Clemm, Col. 2, Lines 20-21).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook et al and further in view of Usher (U.S. PG Publication:  2019/0124436 A1).
With respect to Claim 14, Patel in view of Hook discloses the headset utilizing conversation voice activity-based conversation detection on microphone inputs for relevant audio for passthrough/transparency activation, as applied to Claim 1.  Patel further discloses:  process one or more of the plurality of microphone signals for detecting far-field speech, and so long as no far-field speech is detected the transparency mode remains deactivated, and then the transparency mode is activated in response to far- field speech being detected (pass through mode remains deactivated until relevant audio is detected that includes a relevant audio for far-field speech/speech from the person 170, Paragraphs 0039, 0041, and 0064; Fig. 2, Element 204).  Patel very likely requires or at least implies the use of a buffer because captured microphone signals must be retained for relevant audio signal classification and pass through (Paragraph 0024 and 0026), but since Patel in view of Hook does not provide an express teaching of an audio buffer, Usher has been provided.  Usher discloses a buffer in an earphone ambient sound pass-through process for holding a microphone audio for audio analysis (Abstract; Paragraphs 0041-0043).
Patel, Hook, and Usher are analogous art because they are from a similar field of audio adjustment based upon speech detection.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the buffering taught by Usher in the relevant audio signal classification and pass-through taught by Patel in view of Hook to provide a predictable result of a hardware element that can be relied upon to hold the captured audio data in Patel for analysis and pass through if necessary.

Claims 15 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook, et al. and further in view of Jing, et al. (U.S. PG Publication:  2011/0026722 A1).
With respect to Claim 15, Patel in view of Hook discloses the headset utilizing conversation voice activity-based conversation detection on microphone inputs for relevant audio for passthrough/transparency activation, as applied to Claim 1.
While Patel classifies relevant audio to enable a pass-through mode of headset operation, Patel does not teach the false trigger detection set forth in claim 15.
Jing, however, discloses:
a false trigger detector that prevents the transparency mode from being activated, in response to detecting a first false trigger sound while processing i) one or more of the plurality of microphone signals and ii) a bone conduction sensor signal of the first headphone (voice activity detection in a headset using a combination of an acoustic VAD (AVAD) and bone-conduction (i.e., vibrations in the face that includes skeletal structure when acoustics are produced in speech) vibration VAD (VVAD) wherein the sensors are correlated to detect and reject "false positives", Paragraphs 0084, 0090, 0098, 0103, 0118, and 0246; note that the teachings of Jing relate to non-relevant audio in Patel that would not cause an update of the headset of the speaker to a pass-through mode).
Patel, Hook, and Jing are analogous art because they are from a similar field of endeavor in audio playback devices utilizing audio classification.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the false positive detection additionally relying on a bone/vibration sensor taught by Jing in the relevant audio classification for pass-through enablement taught by Patel in view of Hook to provide a predictable result in the form of inadvertently triggering a pass-through mode when there is actually no relevant audio present in Patel’s headset.
With respect to Claim 17, Patel discloses:
A digital audio processor (Paragraph 0020) for use with a first headphone (Fig. 1, Element 150), the digital audio processor comprising:
a filter block that is to process one or more of a plurality of microphone signals in the first headphone, to produce a transparency audio signal (headset filter that includes a pass-through/transparency mode allowing external sounds/audio to be heard via the headset, Paragraphs 0018-0019, 0026, 0028, and 0037); and
a transparency mode of operation in which the filter block becomes configured to activate the transparency audio signal, and the processor routes the transparency audio signal to a speaker of the first headphone (transparency mode called "pass through" in the prior art that "enable external sounds to pass through to a wearer (e.g., the user 160) of the headset 150 when relevant audio is detected," Paragraphs 0019, 0026 (“enabling sound corresponding to the input signals to be output by the speakers of the headset 150”), 0039, 0048, and 0064);
a conversation detector that processes one or more of the plurality of microphone signals (audio classifier that is "configured to analyze portions of an input signal that corresponds to audio received by the one or more microphones to generate a classification result" pertaining to conversational "speech content," Paragraphs 0026, 0034 (discussing speech of another person external to the headphone wearer, Fig. 1, Elements 150, 160, and 170), 0037, 0041, 0042 (detection of headphone user speech in "speaking to the person 170"), 0046 (detected speech of the person 170 or the wearer of the headset 160), and 0048)), to
declare a conversation and in response activate the transparency mode of operation, when a wearer of the first headphone is conversing with another talker who is in an ambient environment of the wearer (updating the mode of headset operation into pass through/transparency mode when it is determined that a headset user starts a conversation (i.e., about to converse) or is talking with another person in the environment (see Fig. 1, Elements 150, 160, and 170), Paragraphs 0019, 0026, 0034, 0039, 0041-0042, 0046, 0064).
Patel teaches headset wearer and external person zones, but does not teach wherein the conversation detector comprises an own voice activity detector, OVAD configured to output voice activity of the wearer of the first headphone and a target voice activity detector, TVAD, configured to output voice activity of another person monitors an idle duration in which the OVAD and the TVAD are both or simultaneously indicating no activity, and declares the end to the conversation in response to the idle duration being longer than an idle threshold.
Hook, however, recites:
wherein the conversation detector comprises an own voice activity detector, OVAD and a target voice activity detector, TVAD, monitors an idle duration in which the OVAD and the TVAD are both or simultaneously indicating no activity, and declares the end to the conversation in response to the idle duration being longer than an idle threshold (audio-based monitoring of a conversation between participants such as "between two people" that involves multiple microphones/sensors and speech detectors (i.e., an own/first detector and another participant/target speech detector) and identifies the end of a conversation based upon a large (i.e., implied threshold) "absence of speech" from both speakers, Paragraphs 0024-0025, 0030, 0053, 0074 (discussing different zones for speech detection), 0075-0077, and 0103 (end of conversation detection based upon silence)).
Patel and Hook are analogous art because they are from a similar field of endeavor in audio playback adjustment based upon voice activity detection.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the participant (i.e., headset user 150 and external person 170 in the case of Patel) speech detectors to identify an end to a conversation between two people as taught by Hook in the audio playback based upon relevant audio classification taught by Patel to provide a predictable result of reducing conversational effort by controlling audio playback for a duration of a conversation as a type of relevant audio (Hook, Abstract).
While Patel classifies relevant audio to enable a pass-through mode of headset operation, Patel in view of Hook does not teach the false trigger detection set forth in claim 17.
Jing, however, discloses:
a false trigger detector that prevents the transparency mode of operation from being activated, in response to detecting a first false trigger sound while processing i) one or more of the plurality of microphone signals and ii) a bone conduction sensor signal of the first headphone (voice activity detection in a headset using a combination of an acoustic VAD (AVAD) and bone-conduction (i.e., vibrations in the face that includes skeletal structure when acoustics are produced in speech) vibration VAD (VVAD) wherein the sensors are correlated to detect and reject "false positives", Paragraphs 0084, 0090, 0098, 0103, 0118, and 0246; note that the teachings of Jing relate to non-relevant audio in Patel that would not cause an update of the headset of the speaker to a pass-through mode).
Patel, Hook, and Jing are analogous art because they are from a similar field of endeavor in audio playback devices utilizing audio classification.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the false positive detection additionally relying on a bone/vibration sensor taught by Jing in the relevant audio classification for pass-through enablement taught by Patel in view of Hook to provide a predictable result in the form of inadvertently triggering a pass-through mode when there is actually no relevant audio present in Patel’s headset.
With respect to Claim 18, Jing further discloses:
The processor of claim 17 wherein the first false trigger sound represents chewing, sneeze, cough, yawn, or burp by the wearer (chewing such as chewing gum, Paragraphs 0098 and 0111; headset, Paragraph 0118).
With respect to Claim 19, Jing further discloses:
The processor of claim 17 wherein the first false trigger sound represents loud breath, loud sigh, face scratch, walking, or running by the wearer (walking, Paragraphs 0018 and 0098).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Patel, et al. in view of Hook in view of Jing, et al. and further in view of Ma, et al. (U.S. PG Publication:  2024/0038215 A1).
With respect to Claim 20, Patel in view of Hook and further in view of Jing discloses the headset utilizing conversation voice activity-based detection on microphone inputs for relevant audio for passthrough/transparency activation utilizing false trigger detection additionally using a bone-conduction/vibration sensor, as applied to Claim 17.  Patel in view of Hook and further in view of Jing does not teach that false triggers relate to a headphone wearer singing or humming a song to which they are listening and the manner of detecting such a trigger set forth in claim 20.  Ma, however, discloses:
the first false trigger sound represents the wearer  singing or humming to a song to which they are listening and is being played back through a speaker of the first headphone, and the false trigger  detector comprises a machine learning model (an ML model) configured to detect the first false trigger sound, as the wearer is singing or humming to the song, based on the following inputs to the ML model being simultaneously active in the first headphone: i) one or more of the plurality of microphone signals, ii) the bone conduction sensor signal of the first headphone, and iii) a user content audio signal that is driving a speaker of the first headphone to play back the song (detection of a non-voice/false trigger of a wearer of a headset as humming to "playing" music based upon a combination of microphone and bone-conduction sensors, and the playing music on the Bluetooth headset/headphones, Paragraphs 0045, 0053 (discussing a combination of bone-conduction sensors and microphones), 0233 (describing neural network/machine learning-based detection), and 270; Fig. 2, Table Entry 23).  
Patel, Hook, Jing, and Ma are analogous art because they are from a similar field of endeavor in audio playback devices utilizing audio classification.  Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the humming detection taught by Ma in the relevant audio classification for pass-through enablement taught by Patel in view of Hook and further in view of Jing to provide a predictable result in the form of allowing the headset user to better enjoy the music by allowing humming so that the wearer may continue to listen to the music without triggering pass-through related operations (Ma, Paragraph 0270).

Allowable Subject Matter

Claim 16 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:
With respect to Claim 16, the prior art of record taken individually or as a combination fails to explicitly teach or fairly suggest the digital audio processor for use with a headphone passthrough/transparency control as set forth in claim 1 based on a conversation detector comprising an OVAD and TVAD and in which an idle duration in which the OVAD and the TVAD are both or simultaneously indicating no activity being longer than an idle threshold is the basis for declaring the end to a conversation so as to deactivate the transparency mode and wherein the conversation detector i) tracks a plurality of instances of an observed idle duration over several days, each instance being a length of time the transparency mode remains continuously inactive until activated, and ii) varies the idle threshold based on the observed idle duration.
Most pertinent prior art:
Patel, et al. (U.S. PG Publication:  2021/0266655 A1) is perhaps closest to the claimed invention in that Patel discloses a headset filter that enables a pass-through/transparency mode that allows external sounds/audio to be heard via a headset (Paragraphs 0018-0019, 0026, 0028, and 0037).  Importantly, pass-through in Patel is based upon relevant audio that is identified by analyzing portions of audio received by multiple microphones including a headphone wearer and external person having a conversation (Paragraphs 0026, 0034, 0037, 0041, 0042, and 0046).  While Patel declares an end to a conversation to deactivate transparency model (Paragraphs 0038 and 0065), the end to conversation is based upon a user reset and not the idle duration of both or simultaneous OVAD or TVAD detectors.  Patel is also silent on the tracking of idle durations to vary the idle threshold based upon those observations.
Hook, et al. (U.S. PG Publication:  2021/0193162 A1) resolves some of the deficiencies of Patel.  Patel discloses monitoring multiple microphones to declare a conversation between two zones in the form of a headset wearer and external person.  Hook goes a step further by declaring zones for those people involved in a conversation (Paragraph 0075) and then declaring an end to a conversation based upon large absence of speech from at least one of the speakers (see Paragraphs 0077 and 0103) thus implying an idle duration.  Hook, however, does not teach the varying of the idle duration for controlling reactivation of noise cancellation/deactivation of headphone transparency mode as set forth in claim 16.  While other prior art such as Dolenc et al (U.S. 2017/0318374 A1; cited in the PTO-892 from 11/18/2025) teaches voice pass-through being deactivated in response to a predetermined time limit without voice being exceeded (paragraph 0022), Dolenc does not disclose how such a time limit is predetermined.  
Thus, the prior art of records fails to explicitly teach or fairly suggest the invention set forth in claim 16.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Show 1 earlier event
Nov 15, 2024
Response after Non-Final Action
Nov 18, 2025
Non-Final Rejection mailed — §103
Feb 10, 2026
Response Filed
Feb 10, 2026
Examiner Interview Summary
Feb 10, 2026
Applicant Interview (Telephonic)
Apr 10, 2026
Final Rejection mailed — §103
May 20, 2026
Request for Continued Examination
May 22, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/585,204
Patent 12640139
METHOD AND APPARATUS FOR IMPROVING PERFORMANCE OF ARTIFICIAL INTELLIGENCE MODEL USING SPEECH RECOGNITION RESULTS AS TEXT INPUT
2y 3m to grant Granted May 26, 2026
18/535,521
Patent 12609113
NATURAL LANGUAGE PROCESSING SYSTEMS AND METHODS FOR INTENT CLASSIFICATION OF SPEECH TRANSCRIPTION
2y 4m to grant Granted Apr 21, 2026
18/544,354
Patent 12609106
EMOTIVE TEXT-TO-SPEECH WITH AUTO DETECTION OF EMOTIONS
2y 4m to grant Granted Apr 21, 2026
18/399,876
Patent 12597422
SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION
2y 3m to grant Granted Apr 07, 2026
18/488,578
Patent 12586569
Knowledge Distillation with Domain Mismatch For Speech Recognition
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+39.4%)
3y 8m (~1y 6m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 391 resolved cases by this examiner. Grant probability derived from career allowance rate.
Headphone Conversation Detect

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email