Prosecution Insights
Last updated: April 19, 2026
Application No. 17/782,113

ADJUSTING AUDIO AND NON-AUDIO FEATURES BASED ON NOISE METRICS AND SPEECH INTELLIGIBILITY METRICS

Non-Final OA §103§112
Filed
Jun 02, 2022
Examiner
ADESANYA, OLUJIMI A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Dolby Laboratories Licensing Corporation
OA Round
9 (Non-Final)
66%
Grant Probability
Favorable
9-10
OA Rounds
3y 6m
To Grant
91%
With Interview

Examiner Intelligence

Grants 66% — above average
66%
Career Allow Rate
430 granted / 655 resolved
+3.6% vs TC avg
Strong +26% interview lift
Without
With
+25.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
35 currently pending
Career history
690
Total Applications
across all art units

Statute-Specific Performance

§101
19.3%
-20.7% vs TC avg
§103
40.6%
+0.6% vs TC avg
§102
17.7%
-22.3% vs TC avg
§112
12.9%
-27.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 655 resolved cases

Office Action

§103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 1/15/26 has been entered. Response to Arguments Applicant's arguments filed 12/19/25 have been fully considered but they are not persuasive. Regarding the 35 U.S.C. 103 rejection of the independent claims 1, 26 and 28 with reference Lemelson in view of references D’Amelio and Schreiner, and as a result, claims dependent therefrom, Applicant argues that nowhere does paragraph [0010] nor any other paragraph of D'Amelio disclose or suggest limitation "determining whether to simplify some speech-based text, based, at least in part, on a determination that the speech intelligibility metric or the noise metric is below a threshold level wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning or subtitles or including more or less text," as recited in the claims (Arguments, pg. 9-10). Examiner respectfully disagrees. Instant Claims 1, 26 and 28 recite "determining whether to simplify some speech-based text, based, at least in part, on a determination that the speech intelligibility metric or the noise metric is below a threshold level wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning and subtitles or including more or less text,". D’Amelio discloses enabling the display of audio-based display indicia activation when an audio or dialogue track of media playback drops below a predefined threshold (e.g., a decibel level or range), when the frequency of audio meets a certain value or based or noise level (para. [0025]-[0026]), where the audio-based display indicia include subtitles and/or closed captioning (para. [0011]), corresponding to the argued limitation “determining whether to simplify some speech-based text, based, at least in part, on a determination that the speech intelligibility metric or the noise metric is below a threshold level, wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning and subtitles or including more or less text”, and as such Examiner maintains that Lemelson in view of D’Amelio and Schreiner discloses the limitations of the claims 1, 26 and 28. Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1-2,8-9,12-13,15-23,26-28,31 and 33-39 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventors, at the time the application was filed, had possession of the claimed invention. In particular, claims 1, 26 and 28 recite “wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning and subtitles or including more or less text”. Applicant’s original disclosure (pg. 39, ln 27-28) describes simplifying speech-based text as involving including more or less text or presenting subtitles instead of closed captioning i.e., selecting between more or less text or subtitles or closed captioning, but not selecting between more or less text or subtitles and closed captioning The dependent claims are rejected based on their dependency. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 1. Claims 1, 2, 12, 13, 15-23, 26-28, 31 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Lemelson et al US 2005/0086058 A1 (“Lemelson”) in view of D’Amelio et al US 2017/0134821 A1 (“D’Amelio”) and Schreiner et al US 2010/0014692 A1 (“Schreiner”) Per Claim 1, Lemelson discloses a content stream processing method, comprising: receiving, by a control system and via an interface system, a content stream that includes video data and audio data corresponding to the video data (para. [0020]; The audio source 2 may be a television, radio, or any other source of an audio signal containing speech that may contain background noise …, para. [0023]; para. [0053]; method of selectively enhancing, while optionally eliminating, a particular component of an audio signal, para. [0039]; para. [0055]; para. [0057]; para. [0059]; The noise estimator 46 provides for determination of the noise content of the initial signal entering the noise cancellation apparatus 40 Most approaches find periodic components in the total (speech and noise) signal.…, para. [0068], audio signal as including multiple components); determining, by the control system, at least one of a noise metric or a speech intelligibility metric (The audio source 2 may be a television, radio, or any other source of an audio signal containing speech that may contain background noise interfering with a hearing impaired person's ability to resolve the speech…, para. [0023]; The adaptive filter section 22 of the speech enhancement system of FIG. 2 provides a circuit and a methodology of reducing background noise to improve intelligibility of the speech…, para. [0058]); performing, by the control system, a compensation process in response to at least one of the noise metric or the speech intelligibility metric, wherein performing the compensation process involves altering a processing of the audio data (para. [0067]-[0068]; equalization is used to provide a flat response but also provide a response that amplifies and attenuates the necessary frequencies providing a hearing impaired individual with the proper frequency characteristics to compensate for the hearing impairment. For example, a person with high frequency hearing loss has the upper frequencies boosted …, para. [0076]) and wherein altering the processing of the audio data involves determining which audio objects will be rendered based, at least in part, on at least one of the noise metric or the speech intelligibility metric (fig. 3, element 46; method of selectively enhancing, while optionally eliminating, a particular component of an audio signal, para. [0039]; This can be accomplished by performing certain acts of enhancing the speech component of an audio presentation for the benefit of a hearing impaired person by compensation of the speech component of the audio presentation.…, para. [0040]; A selector switch 6 within the speech enhancement system 4 allows the speech enhancement system or circuitry to be bypassed when the speech enhancement unit 4 is not being used. The speech enhancement system 4 output is supplied to an audio amplifier 8…When the speech enhancement system 4 is turned off, the selector switch 6 directs the output of the audio source 2 directly to the amplifier…, para. [0055]; any audible noise in the system is unwanted and should be eliminated.…, para. [0059]; The noise estimator 46 provides for determination of the noise content of the initial signal entering the noise cancellation apparatus 40 Most approaches find periodic components in the total (speech and noise) signal.…, para. [0068]; para. [0076], audio signal as including audio objects (see original claim 16), components of audio signal as audio objects, enhancing speech component of audio presentation/signal while eliminating noise component of the audio signal based on estimated noise); wherein the compensation process further comprises controlling a closed captioning system, a surtitling system or a subtitling system (para. [0021]) processing, by the control system, the video data (resolving speech transmissions emanating from a television…, para. [0012]; Television programs, live performances, the playback of prerecorded audio or video performances…, para. [0053]; para. [0057]); providing, by the control system, processed video data to at least one display device of an environment (para. [0012]; para. [0065]); rendering, by the control system, the audio data for reproduction via a set of audio reproduction transducers of the environment, to produce rendered audio signals (fig. 2, elements, 8, 10; para. [0017]; para. [0055]); providing, via the interface system, the rendered audio signals to at least some audio reproduction transducers of the set of audio reproduction transducers of the environment (fig. 2, elements 8, 10; para. [0017]; para. [0055]) Lemelson does not explicitly disclose wherein controlling the closed captioning system, the surtitling system or the subtitling system involves determining whether to simplify some speech-based text, based, at least in part, on a determination that the speech intelligibility metric or the noise metric is below a threshold level, wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning and subtitles or including more or less text However, this feature is taught by D’Amelio (The content of the media playback may include existing subtitles and/or closed captioning capability to enable display of audio-based display indicia 108. As used herein, the term “audio-based display indicia” includes subtitles, closed captioning, and/or other text …, para. [0011]; para. [0024]; embodiments provided herein enable automatic audio-based display indicia activation when an audio or dialogue track of media playback drops below a predefined threshold (e.g., a decibel level or range) or when the audio of the media playback meets certain other criteria as defined by the user …, para. [0025]-[0026], playback audio characteristics and audio frequency of audio as example speech intelligibility, providing audio-based display indicia (i.e., closed captioning and/or subtitles) as simplifying text) Lemelson in view of D’Amelio does not explicitly disclose wherein performing the compensation process involves altering a processing of the audio data based, at least in part, on the metadata However, this feature is taught by Schreiner (An object manipulator individually manipulates objects using audio object based metadata referring to the individual audio objects to obtain manipulated audio objects …, Abstract; fig. 3A; The manipulated audio objects are mixed using an object mixer for finally obtaining an audio output signal having one or several channel signals depending on a specific rendering setup, Abstract; fig. 10; the object based metadata for the upper object manipulated by device 13a is just the information that this object is a "speech" object. The object based metadata for the other object processed by item 13b have information that this second object is a surround object., para. [0136]-[0138]; For clean audio applications, a target level for the speech object can be provided as well. Then, the surround object might be set to zero or almost to zero in order to heavily emphasize the speech object within the sound generated by a certain loudspeaker setup…., para. [0139]; the objects have to have a ranking in metadata saying that an object is important or less important …, para. [0157]; an importance level is transmitted as metadata to enable a reduction of less important signal components…., para. [0162]; para. [0165], amplified/emphasized objects as objects to be rendered, metadata as identifying important/less important objects/components) It would have been obvious to one of ordinary skill in the art to combine the teachings of D’Amelio with the method of Lemelson in arriving at the missing feature of Lemelson, as well as to combine the teachings of Schreiner with the method of Lemelson in view of D’Amelio in arriving at the missing features of Lemelson in view of D’Amelio, because such combination would have resulted in enabling a user does not have to worry about turning on audio-based display indicia manually and missing or having to repeat portions of content being watched (D’Amelio, para. [0046]-[0048]) and in helping speech intelligibility for hearing-disabled people (Schreiner, para. [0151]; para. [0157]; para. [0162]). Per Claim 2, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Lemelson discloses wherein the speech intelligibility metric is based, at least in part, on one or more of a speech transmission index (STI), a common intelligibility scale (CIS), C50, reverberance of the environment, a frequency response of the environment, playback characteristics of one or more audio reproduction transducers of the environment, or a level of environmental noise (para. [0012]; para. [0068]). Per Claim 12, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, D’Amelio discloses wherein controlling the closed captioning system, the surtitling system or the subtitling system involves determining whether to display text based, at least in part on the noise metric or speech intelligibility metric (para. [0024]-[0025]). Per Claim 13, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 12, D’Amelio discloses wherein determining whether to display the text involves applying a first noise threshold to determine that the text will be displayed (para. [0025]-[0026]) and applying a second noise threshold to determine that the text will cease to be displayed (para. [0025]-[0027]; para. [0043]). Per Claim 15, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Schreiner discloses wherein the audio data includes audio objects and audio object priority metadata and wherein altering the processing of the audio data involves selecting high-priority audio objects based on the audio object priority metadata and rendering the high-priority audio objects, but not rendering other audio objects (fig. 10; An object manipulator individually manipulates objects using audio object based metadata referring to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an audio output signal having one or several channel signals depending on a specific rendering setup, Abstract; For clean audio applications, a target level for the speech object can be provided as well. Then, the surround object might be set to zero or almost to zero in order to heavily emphasize the speech object within the sound generated by a certain loudspeaker setup …, para. [0139]; the input audio stream is preferably divided into separate objects, where the objects have to have a ranking in metadata saying that an object is important or less important…., para. [0157]; For clean audio applications as illustrated in FIG. 11c, an importance level is transmitted as metadata to enable a reduction of less important signal components. Then, the other branch would correspond to the importance components, which are amplified while the lower branch might correspond to the less important components which can be attenuated…., para. [0162]; It may be sufficient to "mask out" signal components which are to be manipulated. This is similar to editing masks in image processing. Then, a generalized "object" is a superposition of several original objects, where this superposition includes a number of objects which is smaller than the total number of original objects. All objects are again added up at a final stage. There might be no interest in separated single objects, and for some objects, the level value may be set to 0, which is a high negative dB figure, when a certain object has to be removed completely …, para. [0165], metadata as identifying important/less important objects/components, amplified/emphasized speech objects as high priority rendered objects, zeroed/completely removed objects as audio objects not rendered). Per Claim 16, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Schreiner discloses wherein audio data includes audio objects and altering the processing of the audio data involves changing a rendering location of one or more audio objects (the input audio stream is preferably divided into separate objects, where the objects have to have a ranking in metadata saying that an object is important or less important. Then, the level difference between them can be adjusted in accordance with the meta data or the object position can be relocated to increase intelligibility …, para. [0157]; para. [0162]). Per Claim 17, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Lemelson discloses wherein altering the processing of the audio data involves applying one or more speech enhancement methods based, at least in part, on at least one of the noise metric or the speech intelligibility metric (para. [0039]-[0040]; para. [0068]). Per Claim 18, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 17, Lemelson discloses wherein the one or more speech enhancement methods include at least one of reducing a gain of non-speech audio or increasing a gain of audio (para. [0040]; para. [0068]; para. [0076]). Per Claim 19, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Lemelson discloses wherein altering the processing of the audio data involves altering one or more of an upmixing process, a downmixing process, a virtual bass process, a bass distribution process, an equalization process, a crossover filter, a delay filter, a multiband limiter or a virtualization process based, at least in part, on at least one of the noise metric or the speech intelligibility metric (para. [0062]). Per Claim 20, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Lemelson discloses transmitting the audio data from a first device to a second device (para. [0070]). Per Claim 21, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 20, Lemelson discloses transmitting at least one of the noise metric, the speech intelligibility metric or echo reference data from the first device to the second device or from the second device to the first device, wherein the second device is a hearing aid, a personal sound amplification product, a cochlear implant or a headset (para. [0070]). Per Claim 22, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 21, Lemelson discloses: receiving, by a second device control system, second device microphone signals (Special headphones 80 are worn by the listener and connected the speech enhancement system. The headphones have microphones 82 in each ear piece and another microphone 84 located midway between the ear pieces…, para. [0070]-[0071]); receiving, by the second device control system, the audio data and at least one of the noise metric, the speech intelligibility metric or echo reference data (The external microphone picks up the ambient noise that is then processed by the adaptive filter to create "anti-noise" that is reproduced by the speakers in the headphone cups..., para. [0061]); determining, by the second device control system, one or more audio data gain settings and one or more second device microphone signal gain settings (para. [0071]; para. [0077]); applying, by the second device control system, the audio data gain settings to the audio data to produce gain-adjusted audio data (para. [0071]; para. [0077]); applying, by the second device control system, the second device microphone signal gain settings to the second device microphone signals to produce gain-adjusted second device microphone signals (para. [0071]; para. [0077]); mixing, by the second device control system, the gain-adjusted audio data and the gain- adjusted second device microphone signals to produce mixed second device audio data (para. [0071]; para. [0077]); providing, by the second device control system, the mixed second device audio data to one or more second device transducers (para. [0070]; This combined signal is fed to the audio amplifier 92 and supplied to the headphone speakers 86…, para. [0071]; para. [0077]) and reproducing the mixed second device audio data by the one or more second device transducers (para. [0069]-[0070]; This combined signal is fed to the audio amplifier 92 and supplied to the headphone speakers 86…, para. [0071]; para. [0077]). Per Claim 23, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 22, Lemelson discloses controlling, by the second device control system, the relative levels of the gain-adjusted audio data and the gain-adjusted second device microphone signals in the mixed second device audio data based, at least in part, on the noise metric (para. [0069]- [0071]; para. [0077]). Per Claim 26, Lemelson in view of D’Amelio and Schreiner discloses an apparatus comprising: an interface system (para. [0020]; para. [0023]); and a control system configured (para. [0056]) to: receive, via the interface system, a content stream that includes video data and audio data corresponding to the video data (The audio source 2 may be a television, radio, or any other source of an audio signal containing speech that may contain background noise …, para. [0023]; para. [0053]; method of selectively enhancing, while optionally eliminating, a particular component of an audio signal, para. [0039]; para. [0055]; para. [0057]; para. [0059]; The noise estimator 46 provides for determination of the noise content of the initial signal entering the noise cancellation apparatus 40 Most approaches find periodic components in the total (speech and noise) signal.…, para. [0068], audio signal as including multiple components); determine at least one of a noise metric or a speech intelligibility metric (The audio source 2 may be a television, radio, or any other source of an audio signal containing speech that may contain background noise interfering with a hearing impaired person's ability to resolve the speech…, para. [0023]; The adaptive filter section 22 of the speech enhancement system of FIG. 2 provides a circuit and a methodology of reducing background noise to improve intelligibility of the speech…, para. [0058]); perform a compensation process in response to at least one of the noise metric or the speech intelligibility metric, wherein performing the compensation process involves altering a processing of the audio data (para. [0067]-[0068]; equalization is used to provide a flat response but also provide a response that amplifies and attenuates the necessary frequencies providing a hearing impaired individual with the proper frequency characteristics to compensate for the hearing impairment. For example, a person with high frequency hearing loss has the upper frequencies boosted …, para. [0076]) and wherein altering the processing of the audio data involves determining which audio objects will be rendered based, at least in part, on at least one of the noise metric or the speech intelligibility metric (fig. 3, element 46; method of selectively enhancing, while optionally eliminating, a particular component of an audio signal, para. [0039]; This can be accomplished by performing certain acts of enhancing the speech component of an audio presentation for the benefit of a hearing impaired person by compensation of the speech component of the audio presentation.…, para. [0040]; A selector switch 6 within the speech enhancement system 4 allows the speech enhancement system or circuitry to be bypassed when the speech enhancement unit 4 is not being used. The speech enhancement system 4 output is supplied to an audio amplifier 8…When the speech enhancement system 4 is turned off, the selector switch 6 directs the output of the audio source 2 directly to the amplifier…, para. [0055]; any audible noise in the system is unwanted and should be eliminated.…, para. [0059]; The noise estimator 46 provides for determination of the noise content of the initial signal entering the noise cancellation apparatus 40 Most approaches find periodic components in the total (speech and noise) signal.…, para. [0068]; para. [0076], audio signal as including audio objects (see original claim 16), components of audio signal as audio objects, enhancing speech component of audio presentation/signal while eliminating noise component of the audio signal based on estimated noise); wherein the compensation process further comprises controlling a closed captioning system, a surtitling system or a subtitling system (para. [0021]) process the video data (resolving speech transmissions emanating from a television…, para. [0012]; Television programs, live performances, the playback of prerecorded audio or video performances…, para. [0053]; para. [0057]); provide the processed video data to at least one display device of an environment (para. [0012]; para. [0065]); render the audio data for reproduction via a set of audio reproduction transducers of the environment, to produce rendered audio signals (fig. 2, elements, 8, 10; para. [0017]; para. [0055]); provide, via the interface system, the rendered audio signals to at least some audio reproduction transducers of the set of audio reproduction transducers of the environment (fig. 2, elements 8, 10; para. [0017]; para. [0055]) Lemelson does not explicitly disclose wherein controlling the closed captioning system, the surtitling system or the subtitling system involves determining whether to simplify some speech-based text, based, at least in part, on a determination that the speech intelligibility metric or the noise metric is below a threshold level, wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning and subtitles or including more or less text However, this feature is taught by D’Amelio (The content of the media playback may include existing subtitles and/or closed captioning capability to enable display of audio-based display indicia 108. As used herein, the term “audio-based display indicia” includes subtitles, closed captioning, and/or other text …, para. [0011]; para. [0024]; embodiments provided herein enable automatic audio-based display indicia activation when an audio or dialogue track of media playback drops below a predefined threshold (e.g., a decibel level or range) or when the audio of the media playback meets certain other criteria as defined by the user …, para. [0025]-[0026], playback audio characteristics and audio frequency of audio as example speech intelligibility, providing audio-based display indicia (i.e., closed captioning and/or subtitles) as simplifying text) Lemelson in view of D’Amelio does not explicitly disclose wherein performing the compensation process involves altering a processing of the audio data based, at least in part, on the metadata However, this feature is taught by Schreiner (An object manipulator individually manipulates objects using audio object based metadata referring to the individual audio objects to obtain manipulated audio objects …, Abstract; fig. 3A; The manipulated audio objects are mixed using an object mixer for finally obtaining an audio output signal having one or several channel signals depending on a specific rendering setup, Abstract; fig. 10; the object based metadata for the upper object manipulated by device 13a is just the information that this object is a "speech" object. The object based metadata for the other object processed by item 13b have information that this second object is a surround object., para. [0136]-[0138]; For clean audio applications, a target level for the speech object can be provided as well. Then, the surround object might be set to zero or almost to zero in order to heavily emphasize the speech object within the sound generated by a certain loudspeaker setup…., para. [0139]; the objects have to have a ranking in metadata saying that an object is important or less important …, para. [0157]; an importance level is transmitted as metadata to enable a reduction of less important signal components…., para. [0162]; para. [0165], amplified/emphasized objects as objects to be rendered, metadata as identifying important/less important objects/components) It would have been obvious to one of ordinary skill in the art to combine the teachings of D’Amelio with the system of Lemelson in arriving at the missing feature of Lemelson, as well as to combine the teachings of Schreiner with the method of Lemelson in view of D’Amelio in arriving at the missing features of Lemelson in view of D’Amelio, because such combination would have resulted in enabling a user does not have to worry about turning on audio-based display indicia manually and missing or having to repeat portions of content being watched (D’Amelio, para. [0046]-[0048]) and in helping speech intelligibility for hearing-disabled people (Schreiner, para. [0151]; para. [0157]; para. [0162]). Per Claim 27, Lemelson in view of D’Amelio and Schreiner discloses a system that includes the apparatus of claim 26 (see claim 26; Lemelson, Abstract), Lemelson discloses the system further comprising the set of audio reproduction transducers (fig. 2, elements, 8, 10; Abstract; para. [0017]; para. [0055]); Per Claim 28, Lemelson discloses to perform a method, the method comprising: receiving, by a control system and via an interface system, a content stream that includes video data and audio data corresponding to the video data (The audio source 2 may be a television, radio, or any other source of an audio signal containing speech that may contain background noise …, para. [0023]; para. [0053]; method of selectively enhancing, while optionally eliminating, a particular component of an audio signal, para. [0039]; para. [0055]; para. [0057]; para. [0059]; The noise estimator 46 provides for determination of the noise content of the initial signal entering the noise cancellation apparatus 40 Most approaches find periodic components in the total (speech and noise) signal.…, para. [0068], audio signal as including multiple components); determining, by the control system, at least one of a noise metric or a speech intelligibility metric (The audio source 2 may be a television, radio, or any other source of an audio signal containing speech that may contain background noise interfering with a hearing impaired person's ability to resolve the speech…, para. [0023]; The adaptive filter section 22 of the speech enhancement system of FIG. 2 provides a circuit and a methodology of reducing background noise to improve intelligibility of the speech…, para. [0058]); performing, by the control system, a compensation process in response to at least one of the noise metric or the speech intelligibility metric, wherein performing the compensation process involves altering a processing of the audio data (para. [0067]-[0068]; equalization is used to provide a flat response but also provide a response that amplifies and attenuates the necessary frequencies providing a hearing impaired individual with the proper frequency characteristics to compensate for the hearing impairment. For example, a person with high frequency hearing loss has the upper frequencies boosted …, para. [0076]) and wherein altering the processing of the audio data involves determining which audio objects will be rendered based, at least in part, on at least one of the noise metric or the speech intelligibility metric (fig. 3, element 46; method of selectively enhancing, while optionally eliminating, a particular component of an audio signal, para. [0039]; This can be accomplished by performing certain acts of enhancing the speech component of an audio presentation for the benefit of a hearing impaired person by compensation of the speech component of the audio presentation.…, para. [0040]; A selector switch 6 within the speech enhancement system 4 allows the speech enhancement system or circuitry to be bypassed when the speech enhancement unit 4 is not being used. The speech enhancement system 4 output is supplied to an audio amplifier 8…When the speech enhancement system 4 is turned off, the selector switch 6 directs the output of the audio source 2 directly to the amplifier…, para. [0055]; any audible noise in the system is unwanted and should be eliminated.…, para. [0059]; The noise estimator 46 provides for determination of the noise content of the initial signal entering the noise cancellation apparatus 40 Most approaches find periodic components in the total (speech and noise) signal.…, para. [0068]; para. [0076], audio signal as including audio objects (see original claim 16), components of audio signal as audio objects, enhancing speech component of audio presentation/signal while eliminating noise component of the audio signal based on estimated noise); wherein the compensation process further comprises controlling a closed captioning system, a surtitling system or a subtitling system (para. [0021]) processing, by the control system, the video data (resolving speech transmissions emanating from a television…, para. [0012]; Television programs, live performances, the playback of prerecorded audio or video performances…, para. [0053]; para. [0057]); providing, by the control system, processed video data to at least one display device of an environment (para. [0012]; para. [0065]); rendering, by the control system, the audio data for reproduction via a set of audio reproduction transducers of the environment, to produce rendered audio signals (fig. 2, elements, 8, 10; para. [0017]; para. [0055]); providing, via the interface system, the rendered audio signals to at least some audio reproduction transducers of the set of audio reproduction transducers of the environment (fig. 2, elements 8, 10; para. [0017]; para. [0055]) Lemelson does not explicitly disclose one or more non-transitory media having software stored thereon, the software including instructions for controlling one or more devices to perform a method However, at the time of the effective filing of the invention, it would have been obvious to one of ordinary skill in the art to implement “one or more non-transitory media having software stored thereon, the software including instructions for controlling one or more devices to perform the method“ with the suggestion/motivation of preventing a complete overhaul/update of the existing system when new data is available for measurement, by easily removing previous data and introducing new data and preventing a recompilation of the entire computing system, thereby allowing for algorithms that can be used by multiple applications. Lemelson does not explicitly disclose wherein controlling the closed captioning system, the surtitling system or the subtitling system involves determining whether to simplify some speech-based text, based, at least in part, on a determination that the speech intelligibility metric or the noise metric is below a threshold level, wherein determining whether to simplify some speech-based text includes at least one of selecting between closed captioning and subtitles or including more or less text However, this feature is taught by D’Amelio (The content of the media playback may include existing subtitles and/or closed captioning capability to enable display of audio-based display indicia 108. As used herein, the term “audio-based display indicia” includes subtitles, closed captioning, and/or other text …, para. [0011]; para. [0024]; embodiments provided herein enable automatic audio-based display indicia activation when an audio or dialogue track of media playback drops below a predefined threshold (e.g., a decibel level or range) or when the audio of the media playback meets certain other criteria as defined by the user …, para. [0025]-[0026], playback audio characteristics and audio frequency of audio as example speech intelligibility, providing audio-based display indicia (i.e., closed captioning and/or subtitles) as simplifying text) Lemelson in view of D’Amelio does not explicitly disclose wherein performing the compensation process involves altering a processing of the audio data based, at least in part, on the metadata However, this feature is taught by Schreiner (An object manipulator individually manipulates objects using audio object based metadata referring to the individual audio objects to obtain manipulated audio objects …, Abstract; fig. 3A; The manipulated audio objects are mixed using an object mixer for finally obtaining an audio output signal having one or several channel signals depending on a specific rendering setup, Abstract; fig. 10; the object based metadata for the upper object manipulated by device 13a is just the information that this object is a "speech" object. The object based metadata for the other object processed by item 13b have information that this second object is a surround object., para. [0136]-[0138]; For clean audio applications, a target level for the speech object can be provided as well. Then, the surround object might be set to zero or almost to zero in order to heavily emphasize the speech object within the sound generated by a certain loudspeaker setup…., para. [0139]; the objects have to have a ranking in metadata saying that an object is important or less important …, para. [0157]; an importance level is transmitted as metadata to enable a reduction of less important signal components…., para. [0162]; para. [0165], amplified/emphasized objects as objects to be rendered, metadata as identifying important/less important objects/components) It would have been obvious to one of ordinary skill in the art to combine the teachings of D’Amelio with the method of Lemelson in arriving at the missing feature of Lemelson, as well as to combine the teachings of Schreiner with the method of Lemelson in view of D’Amelio in arriving at the missing features of Lemelson in view of D’Amelio, because such combination would have resulted in enabling a user does not have to worry about turning on audio-based display indicia manually and missing or having to repeat portions of content being watched (D’Amelio, para. [0046]-[0048]) and in helping speech intelligibility for hearing-disabled people (Schreiner, para. [0151]; para. [0157]; para. [0162]). Per Claim 31, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Schreiner discloses wherein the audio data includes audio objects and audio object priority metadata (Abstract; the objects have to have a ranking in metadata saying that an object is important or less important…., para. [0157]; para. [0162]). Per Claim 33, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 31, Schreiner discloses wherein altering the processing of the audio data involves selecting high-priority audio objects based on the audio object priority metadata and rendering the high-priority audio objects, but not rendering other audio objects (An object manipulator individually manipulates objects using audio object based metadata referring to the individual audio objects to obtain manipulated audio objects. The manipulated audio objects are mixed using an object mixer for finally obtaining an audio output signal having one or several channel signals depending on a specific rendering setup, Abstract; For clean audio applications, a target level for the speech object can be provided as well. Then, the surround object might be set to zero or almost to zero in order to heavily emphasize the speech object within the sound generated by a certain loudspeaker setup …, para. [0139]; the input audio stream is preferably divided into separate objects, where the objects have to have a ranking in metadata saying that an object is important or less important…., para. [0157]; For clean audio applications as illustrated in FIG. 11c, an importance level is transmitted as metadata to enable a reduction of less important signal components. Then, the other branch would correspond to the importance components, which are amplified while the lower branch might correspond to the less important components which can be attenuated…., para. [0162]; It may be sufficient to "mask out" signal components which are to be manipulated. This is similar to editing masks in image processing. Then, a generalized "object" is a superposition of several original objects, where this superposition includes a number of objects which is smaller than the total number of original objects. All objects are again added up at a final stage. There might be no interest in separated single objects, and for some objects, the level value may be set to 0, which is a high negative dB figure, when a certain object has to be removed completely …, para. [0165], metadata as identifying important/less important objects/components, amplified/emphasized speech objects as high priority rendered objects, zeroed/completely removed objects as audio objects not rendered). 2. Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Lemelson in view of D’Amelio and Schreiner as applied to claim 1 above, and further in view of Momosaki et al US 2005/0038661 A1 (“Momosaki”) Per Claim 8, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Lemelson in view of D’Amelio and Schreiner does not explicitly disclose wherein controlling the closed captioning system, the surtitling system or the subtitling system is based, at least in part, on at least one of a user’s hearing ability, the user’s language proficiency, the user’s eyesight or the user’s reading comprehension However, this feature is taught by Momosaki (para. [0044]-[0045]; para. [0051]) It would have been obvious to one of ordinary skill in the art to combine the teachings of Momosaki with the method of Lemelson in view of D’Amelio and Schreiner in arriving at the missing feature of Lemelson in view of D’Amelio h and Schreiner, because such combination would have resulted in appropriately and exactly delivering contents of speech to a viewer even in a situation in which it is hard to hear a speech in audio (Momosaki, fig. 3A-3D; fig. 4; para. [0009]). Per Claim 9, Lemelson in view of D’Amelio, Schreiner and Momosaki discloses the method of claim 1, Lemelson in view of D’Amelio and Schreiner does not explicitly disclose wherein controlling the closed captioning system, the surtitling system or the subtitling system involves controlling at least one of a font or a font size based, at least in part, on the speech intelligibility metric However, this feature is taught by Momosaki (para. [0036]) It would have been obvious to one of ordinary skill in the art to combine the teachings of Momosaki with the method of Lemelson in view of D’Amelio and Schreiner in arriving at the missing feature of Lemelson in view of D’Amelio and Schreiner, because such combination would have resulted in appropriately and exactly delivering contents of speech to a viewer even in a situation in which it is hard to hear a speech in audio (Momosaki, fig. 3A-3D; fig. 4; para. [0009]). 3. Claims 34-39 are rejected under 35 U.S.C. 103 as being unpatentable over Lemelson in view of D’Amelio and Schreiner as applied to claims 1, 26 and 28, and further in view of Andersen et al US 2019/0222943 A1 (“Andersen”) Per Claim 34, Lemelson in view of D’Amelio and Schreiner discloses the method of claim 1, Lemelson in view of D’Amelio and Schreiner does not explicitly disclose wherein the intelligibility metric is determined based on machine learning However, this feature is taught by Andersen (para. [0004]; para. [0075]-[0077]) It would have been obvious to one of ordinary skill in the art to combine the teachings of Andersen with the method of Lemelson in view of D’Amelio and Schreiner in arriving at the missing feature of Lemelson in view of D’Amelio and Schreiner, because such combination would have resulted in providing an accurate estimate of intelligibility (Andersen (para. [0028]). Per Claim 35, Lemelson in view of D’Amelio, Schreiner and Andersen discloses the method of claim 34, Andersen discloses wherein the machine learning includes a neural network that was trained on a set of speech content for which speech intelligibility is known (para. [0147]; para. [0152]; para, [0156]; para. [0181]) Per Claim 36, Lemelson in view of D’Amelio and Schreiner discloses the apparatus of claim 26, Lemelson in view of D’Amelio and Schreiner does not explicitly disclose wherein the intelligibility metric is determined based on machine learning However, this feature is taught by Andersen (para. [0004]; para. [0075]-[0077]) It would have been obvious to one of ordinary skill in the art to combine the teachings of Andersen with the apparatus of Lemelson in view of D’Amelio and Schreiner in arriving at the missing feature of Lemelson in view of D’Amelio and Schreiner, because such combination would have resulted in providing an accurate estimate of intelligibility (Andersen (para. [0028]). Per Claim 37, Lemelson in view of D’Amelio, Schreiner and Andersen discloses the apparatus of claim 36, Andersen discloses wherein the machine learning includes a neural network that was trained on a set of speech content for which speech intelligibility is known (para. [0147]; para. [0152]; para, [0156]; para. [0181]) Per Claim 38, Lemelson in view of D’Amelio and Schreiner discloses the one or more non-transitory media of claim 28, Lemelson in view of D’Amelio and Schreiner does not explicitly disclose wherein the intelligibility metric is determined based on machine learning However, this feature is taught by Andersen (para. [0004]; para. [0075]-[0077]) It would have been obvious to one of ordinary skill in the art to combine the teachings of Andersen with the media of Lemelson in view of D’Amelio and Schreiner in arriving at the missing feature of Lemelson in view of D’Amelio and Schreiner, because such combination would have resulted in providing an accurate estimate of intelligibility (Andersen (para. [0028]). Per Claim 39, Lemelson in view of D’Amelio, Schreiner and Andersen discloses the one or more non-transitory media of claim 38, Andersen discloses wherein the machine learning includes a neural network that was trained on a set of speech content for which speech intelligibility is known (para. [0147]; para. [0152]; para, [0156]; para. [0181]). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO 892 form. Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUJIMI A ADESANYA whose telephone number is (571)270-3307. The examiner can normally be reached Monday-Friday 8:30-5:00pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /OLUJIMI A ADESANYA/Primary Examiner, Art Unit 2658
Read full office action

Prosecution Timeline

Jun 02, 2022
Application Filed
Jun 02, 2023
Non-Final Rejection — §103, §112
Sep 01, 2023
Response Filed
Sep 06, 2023
Final Rejection — §103, §112
Dec 06, 2023
Request for Continued Examination
Dec 12, 2023
Response after Non-Final Action
Jan 20, 2024
Non-Final Rejection — §103, §112
May 15, 2024
Interview Requested
May 20, 2024
Applicant Interview (Telephonic)
May 20, 2024
Examiner Interview Summary
May 22, 2024
Response Filed
May 28, 2024
Final Rejection — §103, §112
Jul 15, 2024
Response after Non-Final Action
Jul 29, 2024
Response after Non-Final Action
Aug 29, 2024
Request for Continued Examination
Sep 03, 2024
Response after Non-Final Action
Sep 10, 2024
Non-Final Rejection — §103, §112
Nov 18, 2024
Interview Requested
Nov 22, 2024
Applicant Interview (Telephonic)
Nov 22, 2024
Examiner Interview Summary
Dec 02, 2024
Response Filed
Jan 21, 2025
Final Rejection — §103, §112
Mar 13, 2025
Response after Non-Final Action
Apr 22, 2025
Request for Continued Examination
Apr 23, 2025
Response after Non-Final Action
May 09, 2025
Non-Final Rejection — §103, §112
Sep 11, 2025
Response Filed
Oct 18, 2025
Final Rejection — §103, §112
Dec 19, 2025
Response after Non-Final Action
Jan 15, 2026
Request for Continued Examination
Jan 29, 2026
Response after Non-Final Action
Feb 07, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12591739
METHOD AND SYSTEM FOR DIACRITIZING ARABIC TEXT
2y 5m to grant Granted Mar 31, 2026
Patent 12585686
EVENT DETECTION AND CLASSIFICATION METHOD, APPARATUS, AND DEVICE
2y 5m to grant Granted Mar 24, 2026
Patent 12585481
METHOD AND ELECTRONIC DEVICE FOR PERFORMING TRANSLATION
2y 5m to grant Granted Mar 24, 2026
Patent 12578779
Multiple Stage Network Microphone Device with Reduced Power Consumption and Processing Load
2y 5m to grant Granted Mar 17, 2026
Patent 12579181
Synchronization of Sensor Network with Organization Ontology Hierarchy
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

9-10
Expected OA Rounds
66%
Grant Probability
91%
With Interview (+25.5%)
3y 6m
Median Time to Grant
High
PTA Risk
Based on 655 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month