Last updated: April 19, 2026
Application No. 17/995,902
AUTOMATED MIXING OF AUDIO DESCRIPTION

Final Rejection §103
Filed
Oct 10, 2022
Examiner
WITHEY, THEODORE JOHN
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Dolby Laboratories Licensing Corporation
OA Round
4 (Final)
This examiner grants 44% of cases after interview

— +46.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 23 resolved cases, 2023–2026
Examiner Intelligence

WITHEY, THEODORE JOHN View full profile →
Grants 44% of resolved cases
Career Allow Rate
10 granted / 23 resolved
-18.5% vs TC avg
Strong +47% interview lift
Without
With
+46.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
22.0%
-18.0% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 23 resolved cases
Office Action

§103
DETAILED ACTION
	This office action is in response to Applicant’s amendment/request for reconsideration, received on 11/14/2025. Claims 1, 12, 23 have been amended. Claims 16-17 have been cancelled. Claims 24-25 have been added. Claims 1-15, 20, 22-25 are pending and have been considered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see pg. 9, filed 11/14/2025, with respect to “Claim Objections” have been fully considered and are persuasive.  The objection of claim 23 has been withdrawn.
Applicant's arguments filed 11/14/2025, see pgs. 9-10, with regard to “Rejections under 35 U.S.C. 103” have been fully considered but they are not persuasive.
Applicant’s representative asserts, “ The rejections are respectfully traversed. Even assuming arguendo that a person unskilled in the audio mixing arts would consider ‘loudness’ to be synonymous with ‘volume’, it is respectfully submitted that one of ordinary skill in the audio mixing arts would not consider "loudness" to be synonymous with ‘volume’ in the context of mixing audio description with audio content. 
Notwithstanding such traverse, claims 1 and 12 have been amended. For brevity, the following discussion refers to claim 1 (as amended), with similar discussion being applicable to claim 12 (as amended). 
Claim 1 (as amended) recites wherein calculating the long-term loudness of the audio object data includes performing at least one of a Leq (loudness equivalent continuous sound pressure level) loudness measurement process, a LKFS (loudness, K-weighted, relative to full scale) loudness measurement process, and a LUFS (loudness units relative to full scale) loudness measurement process. Support for this feature can be found in the specification at [0047] (disclosing Leq, LKFS and LUFS loudness measurement processes). This amendment clarifies the specifics of the loudness calculation to show that ‘loudness’ in the context of claim 1 (as amended) is not synonymous with ‘volume’. It is respectfully submitted that one of ordinary skill in the art would not look to dictionary definitions for ‘volume’ regarding these terms.”

	In response, the examiner would like to refer to new sections of Jot. Specifically, [0043]-[0045] disclose loudness measures including LKFS and LUFS. In view of these definitions, the examiner respectfully asserts that it would have been obvious to apply the loudness measures as disclosed in Jot to the gain adjustment visualization as disclosed in Jot in view of Naik, further in view of Wang as Jot explicitly discloses a loudness measure as required for the claims. Further, Naik explicitly discloses determining loudness values of media items, [0036], see Fig. 5A. Further still, though Wang exclusively refers to “volume”, the examiner respectfully maintains their assertion that, functionally, volume and loudness are synonymous terms in consideration of Wang. Wang discloses a plot of volume by amplitude against time, Fig. 4. Naik discloses a plot of volume against time, Fig. 11. Associated with this figure of Naik, [0098] discloses “Initially, a primary media item 112 is played back, such as via a media player application executed on the device 10. As shown, the primary media item 112 is initially played back at a normal loudness, which may correspond to a full volume setting V…For instance, during the duck-in interval t.sub.AB (meaning from time t.sub.A to time t.sub.B), the loudness of the primary media item is gradually faded out until its loudness level is reduced to the ducked loudness level DL at time t.sub.B, at which point playback of the secondary media item 114 begins.” In view of this disclosure of Naik creating a direct correlation between volume and loudness, i.e. disclosing the figure to be representing loudness when the y-axis is defined to be volume, it would be obvious to apply the volume of Wang to the loudness/volume of Naik as they can be graphically represented the same way (the examiner respectfully asserts that the volume of a signal will have a direct influence on the loudness/amplitude) with an explicit relationship, indicating volume and loudness are synonymous and/or maintain a direct, positive relationship, i.e. as volume/loudness is changed, the other will be similarly changed. This provides a clear motivation to combine the volume of Wang with the volume/loudness of Naik and/or Jot as they appear to the examiner to be interchangeable terms as defined in Naik. See updated rejections below.

Information Disclosure Statement
The information disclosure statement(s) submitted on 01/24/2023, 07/10/2025 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim(s) 1-4, 9-15, 20, 22-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jot et al. (US-20170127212-A1), hereinafter Jot in view of Naik et al. (US-20100211199-A1), hereinafter Naik, further in view of Wang (US-20160314802-A1).

	Regarding claim 1, Jot discloses a computer-implemented method of audio processing (Abstract, adjusting a relationship between dialog and non-dialog signals [Adjusting a relationship, i.e. volume levels, between two signals tracks to audio processing]), the method comprising:
	receiving audio object data ([0032] The term “non-dialog” refers to any remaining or other portion of an audio program [i.e. background music]); wherein the audio object data includes a first plurality of audio objects ([0034] In an example, the audio signals 110 include at least a first object-based audio signal that includes a dialog signal, and a second object-based audio signal that includes a non-dialog signal. The encoder device 120 can be configured to read, add, or modify metadata 113 associated with one or more of the first and second object-based audio signals [One or more second object-based audio signals, i.e. non-dialog signal, tracks to a plurality of audio objects]).
Jot does not disclose: 
receiving audio description data.
Naik discloses:
Receiving audio description data ([0067] secondary media data, such as voice feedback data, where, [0067] voice feedback data which may include spoken audio data or commentary).
 Jot and Naik are considered analogous art within dynamic audio mixing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jot to incorporate the teachings of Naik, because of the novel way to apply techniques for reading/modifying metadata associated with song information to be overlayed on top of  (Naik, [0006]) to provide an improved user experience (Naik, [0007]).
Furthermore, the action of performing long-term dialog balance using metadata is disclosed in [0055] of Jot. Thus, it would be predictable to apply these balances to the metadata defined in Naik, i.e. song information metadata, as this falls within a metadata calculation already being performed by Jot.
The combination of Jot and Naik, particularly the addition of Naik’s audio description data and metadata now processed as detailed by Jot further discloses:
calculating a long-term loudness of the audio object data and a long-term loudness of the audio description data ([Fig. 11, 1110], where the global dialog/non-dialog loudnesses are defined: [0017] The long-term dialog balance can generally be associated with an entire duration of an audio program, and in such instances can be considered to be a “global” dialog [Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]),
wherein calculating the long-term loudness of the audio object data includes performing at least one of a Leg (loudness equivalent continuous sound pressure level) ([The examiner would like to note that, due to the disjunctive nature of the claim, this element does not require a mapping]) loudness measurement process, a LKFS (loudness, K-weighted, relative to full scale) loudness measurement process, and a LUFS (loudness units relative to full scale) loudness measurement process ([Fig. 2, Measure Loudness performed using LKFS], [0043] Following the regulations and recommendations, the long-term (or integrated) loudness measure of a digital audio program, expressed in LKFS (Loudness, K-weighted, relative to Full Scale) or LUFS (Loudness Units relative to Full Scale) can be calculated as:, [See equation (1) and Fig. 2]);
	calculating a plurality of short-term loudnesses of the audio object data and a plurality of short-term loudnesses of the audio description data ([Fig. 11, 1120], [0074] At operation 1120, the method can include monitoring short-term loudness for object-based dialog and non-dialog signals [Applying the techniques discloses in Jot to the audio description signals disclosed in Naik in the resultant combination]);
	reading a first plurality of mixing parameters that correspond to the audio object data ([0034] In an example, the encoder device 120 receives the audio signals 110 and adds respective metadata 113 to the audio signals 110. The metadata 113 can include, among other things, an indication of or information about the audio signal's source, type, genre, loudness, quietness, duration, noise characteristic, frequency content, spatial position, or other information [Adding metadata corresponding to non-audio signals, i.e. audio object data, based on genre, type, spatial position, i.e. mixing parameters, indicates a first plurality of mixing parameters are read from input audio object data]);
	generating a second plurality of mixing parameters based on the first plurality of mixing parameters, the long-term loudness of the audio object data, the long-term loudness of the audio description data, the plurality of short-term loudnesses of the audio object data, and the plurality of short-term loudnesses of the audio description data ([Fig. 1, 130], [0037] The decoder device 130 can be configured to update or adjust a signal balance between two or more object-based audio signals. In an example, the processor circuit 135 receives a dialog balance setting 136, and then compares the dialog balance setting 136 with a detected or determined dialog balance of the object-based audio signals to be processed by the decoder device 130. If the relationship between the dialog balance setting 136 and the detected or determined dialog balance of the signals meets or exceeds a specified threshold, then the processor circuit 135 can update or adjust a loudness characteristic of one or more of the object-based audio signals 136 [Updating or adjusting the signal balance between two or more audio signals tracks to generation of a second set of mixing parameters, wherein the dialog balance, i.e. loudness, between the two signals is being taken into account to update the signal balance, and loudness can be both short and long term as previously defined. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]);
	generating a gain adjustment visualization corresponding to the second plurality of mixing parameters, the audio object data and the audio description data ([Fig. 8], [Fig. 11, 1116], [0077] Resulting time-varying gain offsets g.sub.D(m) and g.sub.N(m) can then be determined at operation 1124, and then applied to corresponding object waveforms at operation 1116, where, [0071] g.sub.D(m) and g.sub.N(m), such as can be applied respectively to dialog objects and non-dialog objects [A time-varying gain indicates dependence on the audio data which is also time-varying, calculating a gain indicates knowledge of volume which indicates the second plurality of mixing parameters were used to gather the volume data. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination. Further, in view of Figure 8 representing a visualization of a gain adjustment (y-axis) dependent on a difference between dialog and non-dialog audio (x-axis) in view of the audio object of Jot and the audio description of Naik]);
	generating mixed audio object data by mixing the audio object data and the audio description data according to the second plurality of mixing parameters ([Fig. 1, 135, 112], [Metadata, input audio, and a dialog balance setting being sent to a processor that outputs audio signals indicates that the output is mixed between the dialog, i.e. audio description, and non-dialog, i.e. audio object, data. It would have been obvious to apply the techniques for generating mixed audio data disclosed in Jot to the audio description data disclosed in Naik]);
	wherein the mixed audio object data includes a second plurality of audio objects ([Fig. 1, 112], [0033] The system 100 further includes playback device(s) 150 that receive one or more output signals 112 from the decoder device 130);
	wherein the second plurality of audio objects correspond to the first plurality of audio objects mixed with the audio description data according to the second plurality of mixing parameters ([Fig. 1, 135], [Metadata, input audio, and a dialog balance setting being sent to a processor that outputs audio signals indicates that the output corresponds to a first plurality of audio objects, i.e. input signals, mixed with the audio description according to the second plurality of mixing parameters, i.e. the dialog balance setting 136. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]).
	Jot in view of Naik does not disclose:
wherein the gain adjustment visualization shows a loudness of the audio object data, a loudness of the audio description data, and a gain to be applied for mixing.
Wang discloses:
wherein the gain adjustment visualization shows a loudness of the audio object data, a loudness of the audio description data, and a gain to be applied for mixing ([Fig. 4, Vt’, V, gt, gl], [0073] The first curve represents a variation curve diagram of a time-varying smooth volume V.sub.t′ of an original voice signal as shown in FIG. 2; the second curve represents a variation curve diagram of a time-varying combined smooth volume of the original voice signal as shown in FIG. 2 [Wang discloses a visualization featuring two audio signals, in view of the audio abject data of Jot and the audio description data of Naik, wherein the visualization also features gain curves which are determined based on the audio signals (see [Fig. 1, S105]) to be applied to the audio signals (see [Fig. 1, S106]), in view of the determined gain adjustment of Jot Fig. 8]).
 Jot, Naik, and Wang are considered analogous art within dynamic volume control of audio signals. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jot in view of Naik to incorporate the teachings of Wang, because of the novel way to control the rate of change of volume of signals using a smaller scope for improved output voice quality  (Wang, [0075]).

	Regarding claim 2, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Jot further discloses: 
wherein the long-term loudness of the audio object data is calculated over multiple samples of the audio object data ([0040] In an example, a dialog balance can be deemed valid for a lesser duration than that of an entire audio program, in this example, the dialog balance characteristic can be considered a “long-term” characteristic [Jot discloses a “global balance” which represents loudness for an entire audio program, this indicates long-term, as defined in Jot as a lesser duration than the entire audio, includes multiple samples of a larger audio signal]);
	wherein the long-term loudness of the audio description data is calculated over multiple samples of the audio description data ([As Jot discloses dialog and non-dialog objects both having long-term loudness calculations performed (Fig. 11, 1112), it would have been obvious to apply the long-term loudness calculation to the secondary media data, i.e. dialog, disclosed in Naik]);
	wherein each of the plurality of short-term loudnesses of the audio object data is calculated over a single sample of the audio object data ([0040] Even lesser duration dialog balance characteristics, such as corresponding to about 20 milliseconds or less, can be considered a “short-term” characteristic [20ms tracks to a sample or frame of audio data]);
	wherein each of the plurality of short-term loudnesses of the audio description data is calculated over a single sample of the audio description data ([As Jot discloses dialog and non-dialog objects both having short-term loudness calculations performed (Fig. 11, 1120), now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]).

	Regarding claim 3, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Jot further discloses:
	wherein the first plurality of mixing parameters is associated with one of a plurality of genres ([0041] In some embodiments, a user's preferred dialog salience setting can depend on, among other things, a content or genre of the corresponding audio program [A dialog salience, i.e. a mix, associated or dependent on the genre of the audio]);
	wherein each of the plurality of genres is associated with a corresponding set of mixing parameters ([0038] In an example, the dialog balance setting 136 can be determined or influenced by a user preference that is input to the decoder device via a second input 133, by device information corresponding to the playback device(s) 150, by the genre information 114, or by other factors [Determining a dialog balance setting based on genre information indicates the genre information affects the mixing parameters]).

	Regarding claim 4, Jot in view of Naik, further in view of Wang discloses: the method of claim 3. 
	Jot further discloses:
	wherein the plurality of genres includes an action genre, a horror genre, a suspense genre, a news genre, a conversational genre, a sports genre, and a talk-show genre ([0041] Audio program genres can include various classes or types of audio, such as audio corresponding to a live sporting event, talk show [which are conversational in nature], advertisement, concert, movie [i.e. action, horror, suspense], TV episode, TV commercial, or other media [i.e. news genre, Naik discloses news genre [0036]]).

	Regarding claim 9, Jot in view of Naik, further in view of Wang discloses the method of claim 1.
	Naik further discloses:
	receiving a user input to adjust the second plurality of mixing parameters, prior to generating the mixed audio object data ([0082] In the illustrated example, the secondary media items 114 may be voice feedback announcements, including an artist name 114a, a track name 114b, and an album name 114c. One or more of these announcements 114a, 114b, and 114c, may be played back as voice feedback in response an event, and may be configured via a set of user preferences [Playback of artist, track, or album name, i.e. audio description data, with a user preferences indicates that those preference must be set before the output was generated in order to generate a meaningful output]);
	generating a revised gain adjustment visualization corresponding to the second plurality of mixing parameters having been adjusted according to the user input ([0082] The enhanced media item 110 further includes loudness data 116. The loudness data 116 may include loudness values for each of the primary media item 112 and the secondary media items 114a, 114b, and 114c [The enhanced media item as defined in Fig. 6 containing primary and secondary media data, which has previously been discloses as user configured, indicates the enhanced audio object has a gain adjustment visualization corresponding to the loudnesses, i.e. mixing parameters, according to user input. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]);
	wherein the mixed audio object data is generated based on the second plurality of mixing parameters having been adjusted ([If a user input is a necessary step to this claim, as displayed in the first element, it is inherent that the mixed audio object data is generated based on the second plurality of mixing parameters having been adjusted as the user adjusting these parameters to receive mixed audio object data is a required step of this method]).
	
	Regarding claim 10, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Naik further discloses:
	receiving audio data, wherein the audio data does not include an audio object ([Fig. 8, 112, 114], [Primary/secondary media files which exclusively contain audio/speech data])
	converting the audio data into the audio object data ([0082] For example, in one presently contemplated embodiment, respective loudness values may be stored in metadata tags of each primary and secondary media file [Taking audio files and storing loudness values associated with the files in metadata tags indicates the audio is in an object format as shown in Fig. 6 “Enhanced Media Item 110”]);
	after generating the mixed audio object data: converting the mixed audio object data to mixed audio data ([Fig. 8, 138], [0088] The mixer 134 may also be implemented via hardware and/or software, and may perform the function of combining two or more electronic signals (e.g., primary and secondary audio signals) into a composite output signal 138 [Combining two or more signals in a sound mixer 134 to output a mixed audio signal 138 indicates that the volume metadata was used to combine the sources into one output signal, dependent of volume metadata, i.e. no longer audio objects]); wherein the mixed audio data corresponds to the audio data mixed with the audio description data ([Primary, i.e. non-dialog, and secondary media files, i.e. audio description, as input resulting in one output indicates a mixed audio output signal corresponding to audio data mixed with audio description data. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]).

	Regarding claim 11, Jot discloses a non-transitory computer readable medium storing a computer program that, when executed by a processor, controls an apparatus to execute processing including the method of claim 1 ([0103] The software module can be contained in a massed, tangible, non-transitory computer-readable media that can be accessed by a computing device).

	Regarding claim 12, Jot discloses an apparatus for audio processing (Abstract, adjusting a relationship between dialog and non-dialog signals), the apparatus comprising:
	A processor ([0099] Moreover, in some embodiments, acts or events can be performed concurrently, such as through multi-threaded processing, interrupt processing, or multiple processors or processor cores);
	Wherein the processor is configured to control the apparatus to receive audio object data ([0035] The decoder device 130 can include a processor circuit 135 that is configured to read the metadata 113 from the recovered object-based audio signals),
	Jot does not disclose:
	Wherein the processor is configured to control the apparatus to receive audio description data.
Naik discloses:
wherein the processor is configured to control the apparatus to receive audio description data ([Fig. 2, 50], [0067] secondary media data, such as voice feedback data, where, [0067] voice feedback data which may include spoken audio data or commentary).
 Jot and Naik are considered analogous art within dynamic audio mixing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jot to incorporate the teachings of Naik, because of the novel way to apply techniques for reading/modifying metadata associated with song information to be overlayed on top of  (Naik, [0006]) to provide an improved user experience (Naik, [0007]).
Furthermore, the action of performing long-term dialog balance using metadata is disclosed in [0055] of Jot.  Thus, it would be predictable to apply these balances to the metadata defined in Naik, i.e. song information metadata, as this falls within a metadata calculation already being performed by Jot.
The combination of Jot and Naik, particularly the addition of Naik’s audio description data and metadata now processed as detailed by Jot further discloses:
wherein the processor is configured to control the apparatus to calculate a long-term loudness of the audio object data and a long-term loudness of the audio description data ([Fig. 11, 1110], where the global dialog/non-dialog loudnesses are defined: [0017] The long-term dialog balance can generally be associated with an entire duration of an audio program, and in such instances can be considered to be a “global” dialog [Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]),
wherein calculating the long-term loudness of the audio object data includes performing at least one of a Leg (loudness equivalent continuous sound pressure level) ([The examiner would like to note that, due to the disjunctive nature of the claim, this element does not require a mapping]) loudness measurement process, a LKFS (loudness, K-weighted, relative to full scale) loudness measurement process, and a LUFS (loudness units relative to full scale) loudness measurement process ([Fig. 2, Measure Loudness performed using LKFS], [0043] Following the regulations and recommendations, the long-term (or integrated) loudness measure of a digital audio program, expressed in LKFS (Loudness, K-weighted, relative to Full Scale) or LUFS (Loudness Units relative to Full Scale) can be calculated as:, [See equation (1)]);
	wherein the processor is configured to control the apparatus to calculate a plurality of short-term loudnesses of the audio object data and a plurality of short-term loudnesses of the audio description data ([Fig. 11, 1120], [0074] At operation 1120, the method can include monitoring short-term loudness for object-based dialog and non-dialog signals [Applying the techniques discloses in Jot to the audio description signals disclosed in Naik in the resultant combination]);
	wherein the processor is configured to control the apparatus to read a first plurality of mixing parameters that correspond to the audio object data ([0034] In an example, the encoder device 120 receives the audio signals 110 and adds respective metadata 113 to the audio signals 110. The metadata 113 can include, among other things, an indication of or information about the audio signal's source, type, genre, loudness, quietness, duration, noise characteristic, frequency content, spatial position, or other information [Adding metadata corresponding to non-audio signals, i.e. audio object data, based on genre, type, spatial position, i.e. mixing parameters, indicates a first plurality of mixing parameters are read from input audio object data]);
	wherein the processor is configured to control the apparatus to generate a second plurality of mixing parameters based on the first plurality of mixing parameters, the long-term loudness of the audio object data, the long-term loudness of the audio description data, the plurality of short-term loudnesses of the audio object data, and the plurality of short- term loudnesses of the audio description data ([Fig. 1, 130], [0037] The decoder device 130 can be configured to update or adjust a signal balance between two or more object-based audio signals. In an example, the processor circuit 135 receives a dialog balance setting 136, and then compares the dialog balance setting 136 with a detected or determined dialog balance of the object-based audio signals to be processed by the decoder device 130. If the relationship between the dialog balance setting 136 and the detected or determined dialog balance of the signals meets or exceeds a specified threshold, then the processor circuit 135 can update or adjust a loudness characteristic of one or more of the object-based audio signals 136 [Updating or adjusting the signal balance between two or more audio signals tracks to generation of a second set of mixing parameters, wherein the dialog balance, i.e. loudness, between the two signals is being taken into account to update the signal balance, and loudness can be both short and long term as previously defined. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]);
	wherein the processor is configured to control the apparatus to generate a gain adjustment visualization corresponding to the second plurality of mixing parameters, the audio object data and the audio description data ([Fig. 8], [Fig. 11, 1116], [0077] Resulting time-varying gain offsets g.sub.D(m) and g.sub.N(m) can then be determined at operation 1124, and then applied to corresponding object waveforms at operation 1116, where, [0071] g.sub.D(m) and g.sub.N(m), such as can be applied respectively to dialog objects and non-dialog objects [A time-varying gain indicates dependence on the audio data which is also time-varying, calculating a gain indicates knowledge of volume which indicates the second plurality of mixing parameters were used to gather the volume data. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination. Further, in view of Figure 8 representing a visualization of a gain adjustment (y-axis) dependent on a difference between dialog and non-dialog audio (x-axis) in view of the audio object of Jot and the audio description of Naik]);
	wherein the processor is configured to control the apparatus to generate mixed audio object data by mixing the audio object data and the audio description data according to the second plurality of mixing parameters ([Fig. 1, 135, 112], [Metadata, input audio, and a dialog balance setting being sent to a processor that outputs audio signals indicates that the output is mixed between the dialog, i.e. audio description, and non-dialog, i.e. audio object, data. It would have been obvious to apply the techniques for generating mixed audio data disclosed in Jot to the audio description data disclosed in Naik]);
	wherein the mixed audio object data includes a second plurality of audio objects ([Fig. 1, 112], [0033] The system 100 further includes playback device(s) 150 that receive one or more output signals 112 from the decoder device 130);
	wherein the second plurality of audio objects correspond to the first plurality of audio objects mixed with the audio description data according to the second plurality of mixing parameters. ([Fig. 1, 135], [Metadata, input audio, and a dialog balance setting being sent to a processor that outputs audio signals indicates that the output corresponds to a first plurality of audio objects, i.e. input signals, mixed with the audio description according to the second plurality of mixing parameters, i.e. the dialog balance setting 136. It would have been obvious to apply the mixing method discloses in Jot to the secondary media files discloses in Naik]).
	Jot in view of Naik does not disclose:
wherein the gain adjustment visualization shows a loudness of the audio object data, a loudness of the audio description data, and a gain to be applied for mixing.
Wang discloses:
wherein the gain adjustment visualization shows a loudness of the audio object data, a loudness of the audio description data, and a gain to be applied for mixing ([Fig. 4, Vt’, V, gt, gl], [0073] The first curve represents a variation curve diagram of a time-varying smooth volume V.sub.t′ of an original voice signal as shown in FIG. 2; the second curve represents a variation curve diagram of a time-varying combined smooth volume of the original voice signal as shown in FIG. 2 [Wang discloses a visualization featuring two audio signals, in view of the audio abject data of Jot and the audio description data of Naik, wherein the visualization also features gain curves which are determined based on the audio signals (see [Fig. 1, S105]) to be applied to the audio signals (see [Fig. 1, S106]), in view of the determined gain adjustment of Jot Fig. 8]).
 Jot, Naik, and Wang are considered analogous art within dynamic volume control of audio signals. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jot in view of Naik to incorporate the teachings of Wang, because of the novel way to control the rate of change of volume of signals using a smaller scope for improved output voice quality  (Wang, [0075]).


	Regarding claim 13, Jot in view of Naik, further in view of Wang discloses: the apparatus of claim 12.
	Naik further discloses:
	a display that is configured to display the gain adjustment visualization ([Fig. 2, 24], [0044] The display 24 may also display various system indicators 26 that provide feedback to a user [i.e. a gain adjustment]).

	Regarding claim 14, Jot in view of Naik, further in view of Wang discloses: the apparatus of claim 12.
Jot further discloses: 
wherein the long-term loudness of the audio object data is calculated over multiple samples of the audio object data ([0040] In an example, a dialog balance can be deemed valid for a lesser duration than that of an entire audio program, in this example, the dialog balance characteristic can be considered a “long-term” characteristic [Jot discloses a “global balance” which represents loudness for an entire audio program, this indicates long-term, as defined in Jot as a lesser duration than the entire audio, includes multiple samples of a larger audio signal]);
	wherein the long-term loudness of the audio description data is calculated over multiple samples of the audio description data ([As Jot discloses dialog and non-dialog objects both having long-term loudness calculations performed (Fig. 11, 1112), it would have been obvious to apply the long-term loudness calculation to the secondary media data, i.e. dialog, disclosed in Naik]);
	wherein each of the plurality of short-term loudnesses of the audio object data is calculated over a single sample of the audio object data ([0040] Even lesser duration dialog balance characteristics, such as corresponding to about 20 milliseconds or less, can be considered a “short-term” characteristic [20ms tracks to a sample or frame of audio data]);
	wherein each of the plurality of short-term loudnesses of the audio description data is calculated over a single sample of the audio description data ([As Jot discloses dialog and non-dialog objects both having short-term loudness calculations performed (Fig. 11, 1120), Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]).

	Regarding claim 15, Jot in view of Naik, further in view of Wang discloses: the apparatus of claim 12.
	Jot further discloses:
	wherein the first plurality of mixing parameters is associated with one of a plurality of genres ([0041] In some embodiments, a user's preferred dialog salience setting can depend on, among other things, a content or genre of the corresponding audio program [A dialog salience, i.e. a mix, associated or dependent on the genre of the audio]);
	wherein each of the plurality of genres is associated with a corresponding set of mixing parameters ([0038] In an example, the dialog balance setting 136 can be determined or influenced by a user preference that is input to the decoder device via a second input 133, by device information corresponding to the playback device(s) 150, by the genre information 114, or by other factors [Determining a dialog balance setting based on genre information indicates the genre information affects the mixing parameters]).

	Regarding claim 20, Jot in view of Naik, further in view of Wang discloses: the apparatus of claim 12.
	Naik further discloses:
	wherein the processor is configured to control the apparatus to receive a user input to adjust the second plurality of mixing parameters, prior to generating the mixed audio object data ([0082] In the illustrated example, the secondary media items 114 may be voice feedback announcements, including an artist name 114a, a track name 114b, and an album name 114c. One or more of these announcements 114a, 114b, and 114c, may be played back as voice feedback in response an event, and may be configured via a set of user preferences [Playback of artist, track, or album name, i.e. audio description data, with a user preferences indicates that those preference must be set before the output was generated in order to generate a meaningful output]);
	wherein the processor is configured to control the apparatus to generate a revised gain adjustment visualization corresponding to the second plurality of mixing parameters having been adjusted according to the user input ([0082] The enhanced media item 110 further includes loudness data 116. The loudness data 116 may include loudness values for each of the primary media item 112 and the secondary media items 114a, 114b, and 114c [The enhanced media item as defined in Fig. 6 containing primary and secondary media data, which has previously been discloses as user configured, indicates the enhanced audio object has a gain adjustment visualization corresponding to the loudnesses, i.e. mixing parameters, according to user input. Now with consideration of the techniques discloses in Jot as applied to the audio description signals disclosed in Naik in the resultant combination]);
	wherein the mixed audio object data is generated based on the second plurality of mixing parameters having been adjusted ([If a user input is a necessary step to this claim, as displayed in the first element, it is inherent that the mixed audio object data is generated based on the second plurality of mixing parameters having been adjusted as the user adjusting these parameters to receive mixed audio object data is a required step of this method]).

	Regarding claim 22, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Jot further discloses:
	wherein an audio engineer uses the gain adjustment visualization to evaluate gains of a proposed mix ([0042] A fixed dialog gain correction can be applied, if necessary, such as to match a user-specified dialog balance at the playback device(s) 150 [In view of the gain adjustment visualization of Fig. 8, a user-specific dialog balance indicates a certain level of evaluation necessary to adjust the balance and determine where the balance needs to be adjusted, user tracks to an audio engineer]).

	Regarding claim 23, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Wang further discloses:
	wherein the gain to be applied corresponds to the gain to be applied to reduce the loudness of the audio object data in view of the loudness of the audio description data shown in the gain adjustment visualization ([Fig. 4, gT, gL], [Fig. 6, S602 “Control the Amplitude… according to gt”, S613 “Control a volume”], [Controlling a volume based on gain, wherein the gain controlling a volume is dependent upon the loudness of audio object data and audio description data as previously disclosed in claim 1 to be shown in Fig. 4, (indicating it to be “in view” of the loudness of audio description data), in view of the decreasing gain displayed in Fig. 4, indicating a decreasing gain will result in a reduced loudness of audio object data when the reducing gain is applied to the voice signal ([Fig. 1, S106]), i.e. voice signals track to audio object data in view of the audio object of Jot in view of Naik]).

	Regarding claim 24, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Naik further discloses:
evaluating, by a user, the gain adjustment visualization, prior to generating the mixed audio object data ([Fig. 10, Feedback Event Detected 162], [Fig. 11, Volume of primary media 112],  [0098] the volume setting V may be adjusted at will by the user. At time t.sub.A, a feedback event may be detected which may trigger the ducking of the primary media item 112, [In view of the flow of Fig. 10, receiving feedback based exclusively on the primary media file indicates the feedback is a user evaluation prior to generating mixed audio object data, wherein the adjustment changes the volume, i.e. gain adjustment, as visually represented in Fig. 11. There is no disclosure that the graphical depiction requires two audio tracks and could not be similarly generated for one audio track with an associated volume level, i.e. that which the user listens to before providing original feedback]); 
receiving a user input regarding the second plurality of mixing parameters responsive to the user evaluating the gain adjustment visualization ([Fig. 10, Identify Secondary Media Files to Play Based on Feedback Event], [Determining secondary media to be played through user feedback indicates the user feedback to be regarding a second plurality of mixing parameters, i.e. the default volume settings of the selected secondary media, in response to the user listening to the primary media file, i.e. represented through the gain adjustment visualization]); 
adjusting the second plurality of mixing parameters in response to the user input ([0099] once the secondary media file 114 is fully faded in and reaches the maximum loudness V, the desired relative loudness difference RLD between the primary 112 and secondary 114 media items is achieved, [Fading in a secondary media file indicates the fading to be a plurality of mixing parameters based on a user input selecting the second media file containing an original mix without fade in]); and
generating a revised gain adjustment visualization corresponding to the second plurality of mixing parameters having been adjusted according to the user input ([Fig. 11, Volume of Secondary Media File 114], [Generating a gain/volume adjustment based on the second plurality of mixing parameters, i.e. ramping up/down the secondary/primary media tracks respectively, wherein the second media file is added through user input, indicating the revised gain adjustment visualization of Naik to be corresponding to the second plurality of mixing parameters as represented through the fading in/out of the primary/secondary tracks in view of the gain adjustment visualization of Wang which could take the signals of Naik and produce a visual without a change in functionality to Wang as the signals represented on both plots are the same]),
wherein the mixed audio object data is generated based on the second plurality of mixing parameters having been adjusted ([0097] Once the loudness of the primary media item is reduced to the ducked level (DL), playback of the secondary media item occurs at step 170. For instance, the primary audio stream and the secondary media stream may be mixed by the mixer 134 to create a composite audio stream 138 in which the primary media item is played at the ducked loudness level (DL) and in which the secondary media item is played at its normal loudness. As indicated by the decision block 172, the playback of the secondary media item may continue (step 170) to completion. Once the playback of the secondary media item is completed, ducking of the primary media item ends and the primary media item may be ducked out, wherein the loudness of the primary media item is gradually increased back to its normal level, [Generating an audio file combining the primary and secondary media items based on the volumes of the two media items indicates the output audio represent in Fig. 11 is generated based on the second plurality of mixing parameters having been adjusted, i.e. to account for the fading]).

Claim(s) 5-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jot in view of Naik, further in view of Wang, further in view of Wang et al. (US-20210151082-A1), hereinafter Wang2.

	Regarding claim 5, Jot in view of Naik, further in view of Wang discloses: the method of claim 1.
	Naik further discloses: wherein the first plurality of mixing parameters includes a maximum delta parameter ([Fig. 11, RLD], [0099] As shown in the graph 176, the secondary media item 114, which may be either a voice feedback or system feedback announcement, is faded in while the primary media item 112 continues to play at the ducked loudness level DL over the interval t.sub.BC, which defines the period of concurrent playback. Further, once the secondary media file 114 is fully faded in and reaches the maximum loudness V, the desired relative loudness difference RLD between the primary 112 and secondary 114 media items is achieved).
	Jot in view of Naik, further in view of Wang does not disclose:
	Wherein the first plurality of mixing parameters includes a lookahead parameter and a ramp parameter.
	Wang2 discloses:
	Wherein the first plurality of mixing parameters includes a lookahead parameter ([0099] It has been noticed that some neighboring sections of audio description may be very close to each other and may have a minimal time separation between them. In such cases, a default value used for a ducking process may cause releasing and attacking segments of audio to be very close to each other. This may create a noticeable and potentially annoying audio effect to a listener. To reduce this potentially annoying audio effect, in some embodiments, an audio description dialog merging module may be implemented. In operation, the gap between neighboring audio description segments is calculated. In the situation of two descriptions with a separation gap of less than 50 ms, the two segments are merged into one event [Merging two audio events to avoid noise caused by adaptive ducking between close events indicates that there is an ability to “lookahead” and stop adaptive mixing when necessary]); and,
	A ramp parameter ([Fig. 3C], [0090] In contrast, in the time period or segment between T.sub.3 and T.sub.N−2 the soundtrack volume level has been gradually decreased or increased at the ends of the time period or segment).
Jot, Naik, Wang, and Wang2 are considered analogous art within adaptive audio mixing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jot in view of Naik, further in view of Wang to incorporate the teachings of Wang2, because of the novel way to introduce both ramping and lookahead features to reduce the effect of annoying, negative sound impact on nearby signals being adaptively mixed (Wang2, [0099]).
	
	Regarding claim 6, Jot in view of Naik, further in view of Wang, further in view of Wang2 discloses: the method of claim 5.
	Wang2 further discloses:
	wherein the lookahead parameter corresponds to maintaining a uniform gain adjustment during an audio pause in the audio description data ([0099] To reduce this potentially annoying audio effect, in some embodiments, an audio description dialog merging module may be implemented. In operation, the gap between neighboring audio description segments is calculated. In the situation of two descriptions with a separation gap of less than 50 ms, the two segments are merged into one event [Merging two audio events to avoid noise caused by adaptive ducking between close events indicates that there is an ability to “lookahead” and stop adaptive mixing when necessary to keep a gain that is sufficient for both events]).

	Regarding claim 7, Jot in view of Naik, further in view of Wang, further in view of Wang2 discloses: the method of claim 5.
	Wang2 further discloses:
	wherein the ramp parameter corresponds to a time period over which a gain adjustment is gradually applied ([Fig. 3C], [0090] In contrast, in the time period or segment between T.sub.3 and T.sub.N−2 the soundtrack volume level has been gradually decreased or increased at the ends of the time period or segment).

	Regarding claim 8, Jot in view of Naik, further in view of Wang, further in view of Wang2 discloses: the method of claim 5.
	Naik further discloses:
	wherein the maximum delta parameter corresponds to a maximum loudness difference between a frame of the audio object data and a corresponding frame of the audio description data ([Fig. 11, RLD], [0099] As shown in the graph 176, the secondary media item 114, which may be either a voice feedback or system feedback announcement, is faded in while the primary media item 112 continues to play at the ducked loudness level DL over the interval t.sub.BC, which defines the period of concurrent playback. Further, once the secondary media file 114 is fully faded in and reaches the maximum loudness V, the desired relative loudness difference RLD between the primary 112 and secondary 114 media items is achieved [Determining the loudness of primary and secondary audio is a method that would have been obvious to apply to the audio objects discloses in Jot]).

Claim(s) 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jot in view of Naik, further in view of Wang, further in view of Marten (US-20210084381-A1).

Regarding claim 25, Jot in view of Naik, further in view of Wang discloses: the method of claim 24.
Jot in view of Naik, further in view of Wang does not disclose:
iteratively generating the gain adjustment visualization, 
Marten discloses:
iteratively generating the gain adjustment visualization ([0035] improve the user preference determination based on a feedback loop including information regarding volume control and/or closed captioning setting habits received from the receiving device 118, mobile device 128 and/or receiving devices and mobile devices of other users, [Wherein the loudness of ambient noise and program audio, see Fig. 6, are compared to determine whether closed captioning should be applied. Considering the user feedback loop of Marten in view of Jot, Naik, and/or Wang (which disclose updating loudnesses of dialog and non-dialog signals) indicating that the user feedback based on volume between two signals of Marten could be applied to the two signals of Jot, Naik, and/or Wang without a change in functionality to these prior arts. Further, consider the gain adjustment visualization of Jot in view of Naik, further in view of Wang, the audio signals of Marten could be applied to the visualization previously disclosed without changing the functionality of the visualization. This indicates an iterative operation consisting of user feedback for generating a gain adjustment visualization]).
 Jot, Naik, Wang, and Marten are considered analogous art within sound level monitoring. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jot in view of Naik, further in view of Wang to incorporate the teachings of Marten, because of the novel way to track increases in program audio based on a comparison to a loudness level for activating closed captioning, improving the efficiency and effectiveness of closed captioning system without a user manually needing to turn on captions (Marten, [0003]-[0004]).
Wang further discloses:
evaluating the gain adjustment visualization ([0074] according to various inflection points of the fourth curve g.sub.t and the fifth curve g.sub.L and the variation trend of the inflection points, the volume of a voiced voice signal (signal in pitch period) corresponding to the inflection point is increased suddenly, the volume gain in the inflection point is declined compared to the volume gain at the prior moment. Moreover, it can be known according to the first inflection point of the fourth curve g.sub.t and the fifth curve g.sub.L, the time corresponding to each inflection point of the fourth curve g.sub.t is earlier than the time corresponding to each inflection point of the fifth curve g.sub.L; that is to say, the variation curve diagram of the volume gain g.sub.L determined according to the smooth volume V.sub.t′ at the moment t and the predetermined reference volume falls behind the variation curve diagram of the volume gain g.sub.t determined according to the combined smooth volume, [Identification of inflection points in a generated gain adjustment visualization indicates those inflection points to be based on an evaluation of the overall graph]). 
Naik further discloses:
receiving the user input ([Fig. 10, Identify Secondary Media Files to Play Based on Feedback Event], [Determining secondary media to be played through user feedback indicates the user feedback to be regarding a second plurality of mixing parameters, i.e. the default volume settings of the selected secondary media, in response to the user listening to the primary media file, i.e. represented through the gain adjustment visualization]), 
adjusting the second plurality of mixing parameters ([0099] once the secondary media file 114 is fully faded in and reaches the maximum loudness V, the desired relative loudness difference RLD between the primary 112 and secondary 114 media items is achieved, [Fading in a secondary media file indicates the fading to be a plurality of mixing parameters based on a user input selecting the second media file containing an original mix without fade in]), and 
generating the revised gain adjustment visualization ([Fig. 11, Volume of Secondary Media File 114], [Generating a gain/volume adjustment based on the second plurality of mixing parameters, i.e. ramping up/down the secondary/primary media tracks respectively, wherein the second media file is added through user input, indicating the revised gain adjustment visualization of Naik to be corresponding to the second plurality of mixing parameters as represented through the fading in/out of the primary/secondary tracks in view of the gain adjustment visualization of Wang which could take the signals of Naik and produce a visual without a change in functionality to Wang as the signals represented on both plots are the same]), 
wherein the mixed audio object data is generated based on the second plurality of mixing parameters having been iteratively adjusted ([0097] Once the loudness of the primary media item is reduced to the ducked level (DL), playback of the secondary media item occurs at step 170. For instance, the primary audio stream and the secondary media stream may be mixed by the mixer 134 to create a composite audio stream 138 in which the primary media item is played at the ducked loudness level (DL) and in which the secondary media item is played at its normal loudness. As indicated by the decision block 172, the playback of the secondary media item may continue (step 170) to completion. Once the playback of the secondary media item is completed, ducking of the primary media item ends and the primary media item may be ducked out, wherein the loudness of the primary media item is gradually increased back to its normal level, [Generating an audio file combining the primary and secondary media items based on the volumes of the two media items indicates the output audio represent in Fig. 11 is generated based on the second plurality of mixing parameters having been adjusted, i.e. to account for the fading. Further, generating an overall signal with fading applied to a secondary media file indicates the fading to be performed through iterative adjustment, i.e. transforming the secondary media file, e.g. an original iteration, into that which has had fading applied, i.e. a second iteration of the secondary media file]).	

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Malak et al. (US-20150245153-A1) discloses “A method and apparatus for processing object-based audio signals is provided. The apparatus receives a plurality of object-based audio signals. Each object-based audio signal of the object-based audio signals includes audio waveform data and object metadata associated with the audio waveform data. The object metadata includes at least one of a loudness parameter or a power parameter associated with the audio waveform data. The apparatus determines a loudness metric based on the received object-based audio signals and based on the at least one of the loudness parameter or the power parameter for each object-based audio signal of the received object-based audio signals. In one configuration, the apparatus renders the received object-based audio signals to a set of output signals based on the determined loudness metric. In another configuration, the apparatus transmits (e.g., broadcast, file delivery, or streaming) the received object-based audio signals based on the determined loudness metric” (abstract). Specifically, [0066] discloses a listener updating or changing audio objects in a mix. See entire document.
Paulus et al. (US-20150348564-A1) discloses “A decoder for generating an audio output signal having one or more audio output channels is provided, having a receiving interface for receiving an audio input signal having a plurality of audio object signals, for receiving loudness information on the audio object signals, and for receiving rendering information indicating whether one or more of the audio object signals shall be amplified or attenuated, further having a signal processor for generating the one or more audio output channels of the audio output signal, configured to determine a loudness compensation value depending on the loudness information and depending on the rendering information, and configured to generate the one or more audio output channels of the audio output signal from the audio input signal depending on the rendering information and depending on the loudness compensation value. One or more by-pass audio object signals are employed for generating the audio output signal. Moreover, an encoder is provided” (abstract). Specifically, [0114] of Paulus discloses a user having the ability to increase the loudness of an audio object. Further, [0203] discloses user-end changes resulting in an immediate loudness estimation output. See entire document.
Mehta et al. (US-20250124933-A1) discloses “Methods for generating an object based audio program, renderable in a personalizable manner, and including a bed of speaker channels renderable in the absence of selection of other program content (e.g., to provide a default full range audio experience). Other embodiments include steps of delivering, decoding, and/or rendering such a program. Rendering of content of the bed, or of a selected mix of other content of the program, may provide an immersive experience. The program may include multiple object channels (e.g., object channels indicative of user-selectable and user-configurable objects), the bed of speaker channels, and other speaker channels. Another aspect is an audio processing unit (e.g., encoder or decoder) configured to perform, or which includes a buffer memory which stores at least one frame (or other segment) of an object based audio program (or bitstream thereof) generated in accordance with, any embodiment of the method” (abstract). Specifically, Mehta discloses users changing mixes of audio, [0189].
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE JOHN WITHEY whose telephone number is (703)756-1754. The examiner can normally be reached Monday - Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/THEODORE WITHEY/Examiner, Art Unit 2655   

/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Oct 10, 2022
Application Filed
Oct 28, 2024
Non-Final Rejection — §103
Feb 04, 2025
Response Filed
Feb 24, 2025
Final Rejection — §103
May 07, 2025
Response after Non-Final Action
Jun 06, 2025
Request for Continued Examination
Jun 09, 2025
Response after Non-Final Action
Aug 18, 2025
Non-Final Rejection — §103
Nov 14, 2025
Response Filed
Dec 29, 2025
Final Rejection — §103
Apr 02, 2026
Request for Continued Examination
Apr 13, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/655,770
Patent 12591744
METHOD FOR TRAINING SEMANTIC REPRESENTATION MODEL, DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/113,192
Patent 12536994
APPARATUS FOR CLASSIFYING SOUNDS BASED ON NEURAL CODE IN SPIKING NEURAL NETWORK AND METHOD THEREOF
2y 5m to grant Granted Jan 27, 2026
17/956,558
Patent 12475330
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Nov 18, 2025
17/813,944
Patent 12417759
SPEECH RECOGNITION USING CADENCE PATTERNS
2y 5m to grant Granted Sep 16, 2025
17/986,417
Patent 12412580
Sound Extraction System and Sound Extraction Method
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
44%
Grant Probability
90%
With Interview (+46.9%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 23 resolved cases by this examiner. Grant probability derived from career allow rate.
AUTOMATED MIXING OF AUDIO DESCRIPTION

This examiner grants 44% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email