Last updated: April 18, 2026
Application No. 18/340,808
Content System with Speech-Related Audio Content Replacement Feature

Non-Final OA §102§103
Filed
Jun 23, 2023
Examiner
CHUNG, DANIEL WONSUK
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Roku Inc.
OA Round
3 (Non-Final)
Interview Optional

— +37.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 44 resolved cases, 2023–2026
Examiner Intelligence

CHUNG, DANIEL WONSUK View full profile →
Grants 54% of resolved cases
Career Allow Rate
24 granted / 44 resolved
-7.5% vs TC avg
Strong +38% interview lift
Without
With
+37.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
25.2%
-14.8% vs TC avg
§103
52.3%
+12.3% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 44 resolved cases
Office Action

§102 §103
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 2/25/2026.
Claims 1-10,12-14,19 and 21-26 are pending and have been examined.
All previous objections / rejections not mentioned in this Office Action have been withdrawn by the examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendments
Applicant has amended independent claims 1, 14, and 19.  Furthermore, applicant has added claims 21-26.  Applicant has not made any remarks or explanation of the added limitations or claims.  The added limitations and claims raise new grounds for rejection. 
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2, 10, 12-14, and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Narayanan et al. (U.S. PG Pub No. 20220417659).

Regarding claim 1, 14 and 19 Narayanan teaches:
(Claim 1) A method comprising: (P0003, Systems, methods, and devices relating to audio correction.)
(Claim 14) A computing system configured for performing a set of acts comprising: (P0003, Systems, methods, and devices relating to audio correction.)
(Claim 19) A non-transitory computer-readable medium having stored thereon program instructions that upon execution by a computing system, cause performance of a set of acts comprising: (P0124, computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including computer-readable instructions for implementing the function specified in the flowchart block or blocks.)
obtaining media content; (P0022, The video distribution system may receive content from the content source.; P0020, The content source may comprise stored video content. … Content may comprise audio-only content.)
extracting from the obtained media content, audio content representing speech; providing to a speech-to-text (STT) model, the extracted audio content representing speech, and responsively receiving from the STT model, (i) generated corresponding speech text and (ii) generated corresponding speech metadata; (P0020, Content may comprise audio-only content, such as a radio broadcast, or content with both audio and video components.; P0042, The input audio content may be the audio component of audio/video content, such as a digital video stream or linear digital television programming.; P0053, Based on the speech recognized in the input audio content (e.g., the words indicated in spoken audio content in the input audio content) by the ASR and/or the voice profile associated with the spoken audio content in the input audio content (e.g., the speaker of the spoken audio content), a correction module may determine that a portion of the input audio content comprises spoken audio content that indicates an incorrect word (or multiple incorrect words) that should be corrected. For example, the correction module may determine that one of the words recognized by the ASR is an incorrect word. As indicated above, the start and stop times of the portion of the input audio content may generally coincide with the start and stop times in the input audio content of when the incorrect word was spoken, plus an optional short buffer on either side of the incorrect word.)
replacing the one or more words of the generated speech text with the received one or more corresponding replacement words, thereby generating modified speech text; (P0023, The audio correction module may generally determine that content comprises one or more incorrect words (e.g., spoken by a speaker) and automatically initiate steps to replace the incorrect(s) word in the content with corresponding correct word(s).)
providing to a text-to-speech (TTS) model, (i) the modified speech text and (ii) the generated corresponding speech metadata received from the SST model, and responsively receiving from the TTS model generated corresponding replacement audio content representing the modified speech; (P0026, The audio correction module may generate second spoken audio content indicating the correct word.; P0060, The start and end times, with respect to the input audio content, of the corrected spoken audio content may be determined. The start and end times may indicate where in the input audio content the corrected spoken audio content should be inserted.)
in the obtained media content, replacing the audio content representing speech with the generated replacement audio content representing speech, thereby generating modified media content; and (P0027, The audio correction module may replace the incorrect word in the portion of the content with the determined correct word by removing the spoken audio content indicating the incorrect word from the portion of the content. Based on the voice profile associated with the speaker, the audio correction module may generate second spoken audio content indicating the correct word. The audio correction module may mix (e.g., overlap and/or add) the second spoken audio content with the background audio content in the portion of the content. In this case, the background audio content may have not been isolated or separated out from the portion of the content. Mixing the second spoken audio content with the background audio content may restore the portion of the content, except that the incorrect word is replaced with the correct word.)
outputting for presentation the generated modified media content. (P0022, The video distribution system may generally effectuate video content delivery to the client devices.; P0031, A client device may be configured to receive video content and output the video content to a separate display device for consumer viewing.)

	Regarding claim 2 Narayanan teach claim 1.
	Narayanan further teaches:
	wherein the media content includes (i) a video content component and (ii) an audio content component, and wherein the audio content component includes (i) the audio content representing speech and (ii) non-speech related audio content. (P0033, FIG. 2 shows example content comprising video content and audio content (represented as an audio spectrogram). Although not distinguishable in the figure, the audio content comprises, to varying degrees, spoken audio content and background audio content. The spoken audio content may comprise speech from the person visible in the video content.)

	Regarding claim 10 Narayanan teach claim 1.
	Narayanan further teaches:
wherein outputting for presentation, the generated modified media content comprises transmitting to a presentation device, media data representing the generated modified media content for display by the presentation device. (P0022, The video distribution system may generally effectuate video content delivery to the client devices.; P0031, A client device may be configured to receive video content and output the video content to a separate display device for consumer viewing.)

Regarding claim 12 Narayanan teach claim 1.
	Narayanan further teaches:
wherein outputting for presentation, the generated modified media content comprises displaying the generated modified media content. (P0031, A client device may comprise any one of numerous types of devices configured to effectuate video playback and/or viewing.)

Regarding claim 13 Narayanan teach claim 12.
	Narayanan further teaches:
	wherein displaying the generated modified media content comprises a television displaying the generated modified media content. (P0020, The content source may comprise video content intended for immediate or near-immediate broadcast, such as a live television video feed. 
P0031, A client device may be configured to receive video content and output the video content to a separate display device for consumer viewing. For example, a client device may comprise a set-top box, such as a cable set-top box, a digital media player, or a gaming device.)

Claims 3-6, 22, 23, 25, 26 are rejected under 35 U.S.C. 103 as being unpatentable over Narayanan in view of Cormack et al. (U.S. PG Pub No. 20120116773), hereinafter Cormack.

Regarding claim 3 Narayanan teach claim 1.
Narayanan further teaches:
determining user profile data associated with a viewer of the media content; and (P0023, The voice profile module may determine the voice profile.)
	Narayanan does not specifically teach: 
	determining user profile data associated with a viewer of the media content; and
	using at least the one or more words of the generated speech text and the determined user profile data as a basis to select the one or more replacement words.  
	Cormack, however, teaches:
determining user profile data associated with a viewer of the media content; and (P0029, Different lists of prohibited words are maintained for different viewers.)
	using at least the one or more words of the generated speech text and the determined user profile data as a basis to select the one or more replacement words. (P0029, The list of prohibited words might depend on a rating. For example, a first list of words might be used for a show having a “TV-Y7” rating and a second list might be used for a show having a “TV-MA” rating as established by the National Association of Broadcasters, the National Cable Television Association, and the Motion Picture Association of America.)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use user profile data as a basis to select one or more replacement words.  It would have been obvious to combine the references because a parent might not wish their child to hear objectional words depending the age and appropriateness of the child. (Cormack P0029).

Regarding claim 4 Narayanan in view of Cormack teach claim 3.
Cormack further teaches:
	wherein the user profile data specifies age-related information about the viewer. (P0029, Different lists of prohibited words are maintained for different viewers. … The list of prohibited words might depend on a rating. For example, a first list of words might be used for a show having a “TV-Y7” rating and a second list might be used for a show having a “TV-MA” rating as established by the National Association of Broadcasters, the National Cable Television Association, and the Motion Picture Association of America.)

Regarding claim 5 Narayanan in view of Cormack teach claim 3.
Narayanan further teaches:
	wherein using at least the one or more words of the generated speech text and the determined user profile data as a basis to select the one or more replacement words comprises using mapping data to map at least the one or more words of the generated speech text and the determined user profile data to the one or more replacement words. (P0025, A machine-readable dictionary may comprise listings of incorrect words and correct words that may be cross-referenced to determine what correct word(s), if any, are associated with a given incorrect word. … Determining the correct word may be further based on the voice profile.)

Regarding claim 6 Narayanan in view of Cormack teach claim 3.
Narayanan further teaches:
	wherein using at least the one or more words of the generated speech text and the determined user profile data as a basis to select the one or more replacement words comprises using a trained model to map at least the one or more words of the generated speech text and the determined user profile data to the one or more replacement words. (P0048, The voice profile module and/or the feedback loop may comprise a machine learning model configured to determine a voice profile.; P0025, The correct word may be determined using a machine learning model configured to receive an incorrect word as an input and output an associated correct word. … Determining the correct word may be further based on the voice profile.)

Regarding claim 22 Narayanan in view of Cormack teach claim 5.
Narayanan further teaches:
	extracting from the obtained media content, audio content representing speech; (P0020, Content may comprise audio-only content, such as a radio broadcast, or content with both audio and video components.; P0042, The input audio content may be the audio component of audio/video content, such as a digital video stream or linear digital television programming.)
	providing to a speech-to-text (STT) model, the extracted audio content representing speech, and responsively receiving from the STT model, (i) generated corresponding speech text and (ii) generated corresponding speech metadata; (P0053, Based on the speech recognized in the input audio content (e.g., the words indicated in spoken audio content in the input audio content) by the ASR and/or the voice profile associated with the spoken audio content in the input audio content (e.g., the speaker of the spoken audio content), a correction module may determine that a portion of the input audio content comprises spoken audio content that indicates an incorrect word (or multiple incorrect words) that should be corrected. For example, the correction module may determine that one of the words recognized by the ASR is an incorrect word. As indicated above, the start and stop times of the portion of the input audio content may generally coincide with the start and stop times in the input audio content of when the incorrect word was spoken, plus an optional short buffer on either side of the incorrect word.)
	replacing the one or more words of the generated speech text with one or more corresponding replacement words, thereby generating modified speech text; (P0023, The audio correction module may generally determine that content comprises one or more incorrect words (e.g., spoken by a speaker) and automatically initiate steps to replace the incorrect(s) word in the content with corresponding correct word(s).)
	providing to a text-to-speech (TTS) model, (i) the modified speech text and (ii) the generated corresponding speech metadata received from the SST model, and responsively receiving from the TTS model to generated corresponding replacement audio content representing the modified speech; (P0026, The audio correction module may generate second spoken audio content indicating the correct word.; P0060, The start and end times, with respect to the input audio content, of the corrected spoken audio content may be determined. The start and end times may indicate where in the input audio content the corrected spoken audio content should be inserted.)
	in the obtained media content, replacing the audio content representing speech with the generated replacement audio content representing speech, thereby generating modified media content; and (P0027, The audio correction module may replace the incorrect word in the portion of the content with the determined correct word by removing the spoken audio content indicating the incorrect word from the portion of the content. Based on the voice profile associated with the speaker, the audio correction module may generate second spoken audio content indicating the correct word. The audio correction module may mix (e.g., overlap and/or add) the second spoken audio content with the background audio content in the portion of the content. In this case, the background audio content may have not been isolated or separated out from the portion of the content. Mixing the second spoken audio content with the background audio content may restore the portion of the content, except that the incorrect word is replaced with the correct word.)
	outputting for presentation the generated modified media content. (P0022, The video distribution system may generally effectuate video content delivery to the client devices.; P0031, A client device may be configured to receive video content and output the video content to a separate display device for consumer viewing.)
	Narayanan does not specifically teach:
	determining that the obtained media content has a rating that is not permitted based on a selected rating mode; and responsive to determining that the obtained media content has the rating that is not permitted based on the selected rating mode, responsively:
	Cormack, however, teaches:
	determining that the obtained media content has a rating that is not permitted based on a selected rating mode; and responsive to determining that the obtained media content has the rating that is not permitted based on the selected rating mode, responsively: (P0028, A user might enter or remove a particular word, select a content category (e.g., indicating that violent words should be prohibited), and/or select a content level (e.g., indicating that even mildly objectionable words should be prohibited) via a Graphical User Interface (GUI) and/or a remote control device.)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to select a rating mode.  It would have been obvious to combine the references because a parent might not wish their child to hear objectional words depending the age and appropriateness of the child. (Cormack P0029).

Regarding claim 23 Narayanan in view of Cormack teach claim 5.
Narayanan further teaches:
	extracting from the obtained media content, audio content representing speech; (P0020, Content may comprise audio-only content, such as a radio broadcast, or content with both audio and video components.; P0042, The input audio content may be the audio component of audio/video content, such as a digital video stream or linear digital television programming.)
	providing to a speech-to-text (STT) model, the extracted audio content representing speech, and responsively receiving from the STT model, (i) generated corresponding speech text and (ii) generated corresponding speech metadata; (P0053, Based on the speech recognized in the input audio content (e.g., the words indicated in spoken audio content in the input audio content) by the ASR and/or the voice profile associated with the spoken audio content in the input audio content (e.g., the speaker of the spoken audio content), a correction module may determine that a portion of the input audio content comprises spoken audio content that indicates an incorrect word (or multiple incorrect words) that should be corrected. For example, the correction module may determine that one of the words recognized by the ASR is an incorrect word. As indicated above, the start and stop times of the portion of the input audio content may generally coincide with the start and stop times in the input audio content of when the incorrect word was spoken, plus an optional short buffer on either side of the incorrect word.)
	replacing the one or more words of the generated speech text with one or more corresponding replacement words, thereby generating modified speech text; (P0023, The audio correction module may generally determine that content comprises one or more incorrect words (e.g., spoken by a speaker) and automatically initiate steps to replace the incorrect(s) word in the content with corresponding correct word(s).)
	providing to a text-to-speech (TTS) model, (i) the modified speech text and (ii) the generated corresponding speech metadata received from the SST model, and responsively receiving from the TTS model to generated corresponding replacement audio content representing the modified speech; (P0026, The audio correction module may generate second spoken audio content indicating the correct word.; P0060, The start and end times, with respect to the input audio content, of the corrected spoken audio content may be determined. The start and end times may indicate where in the input audio content the corrected spoken audio content should be inserted.)
	in the obtained media content, replacing the audio content representing speech with the generated replacement audio content representing speech, thereby generating modified media content; and (P0027, The audio correction module may replace the incorrect word in the portion of the content with the determined correct word by removing the spoken audio content indicating the incorrect word from the portion of the content. Based on the voice profile associated with the speaker, the audio correction module may generate second spoken audio content indicating the correct word. The audio correction module may mix (e.g., overlap and/or add) the second spoken audio content with the background audio content in the portion of the content. In this case, the background audio content may have not been isolated or separated out from the portion of the content. Mixing the second spoken audio content with the background audio content may restore the portion of the content, except that the incorrect word is replaced with the correct word.)
	outputting for presentation the generated modified media content. (P0022, The video distribution system may generally effectuate video content delivery to the client devices.; P0031, A client device may be configured to receive video content and output the video content to a separate display device for consumer viewing.)
	Narayanan does not specifically teach:
	determining that the obtained media content has a rating that is not permitted based on a selected rating mode associated with a user age-range; determining that a user who is not in the user age-range is watching the obtained media content; and responsive to (i) determining that the obtained media content has a rating that is not permitted based on a selected rating mode associated with a user age-range and (ii) determining that the user who is not in the user age-range is watching the obtained media content, responsively:
	Cormack, however, teaches:
determining that the obtained media content has a rating that is not permitted based on a selected rating mode associated with a user age-range; determining that a user who is not in the user age-range is watching the obtained media content; and responsive to (i) determining that the obtained media content has a rating that is not permitted based on a selected rating mode associated with a user age-range and (ii) determining that the user who is not in the user age-range is watching the obtained media content, responsively: (P0029, Different lists of prohibited words are maintained for different viewers. … The list of prohibited words might depend on a rating. For example, a first list of words might be used for a show having a “TV-Y7” rating and a second list might be used for a show having a “TV-MA” rating as established by the National Association of Broadcasters, the National Cable Television Association, and the Motion Picture Association of America.; P0029, List of objectionable words that should be used when a child is viewing content (e.g., and the appropriate list might be selected based on a viewer access code).)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to determine rating based on age-range.  It would have been obvious to combine the references because a parent might not wish their child to hear objectional words depending the age and appropriateness of the child. (Cormack P0029).

Regarding claim 25 Narayanan teach claim 1.
Narayanan further teaches:
wherein the generated corresponding speech metadata specifies a narration voice style of the extracted audio content representing speech. (P0096, The voice profile module may determine the voice profile based on the spoken audio content in the content portion that indicates the incorrect word (as opposed to background audio content in the content portion). The voice profile may be determined based on audio, vocal, and/or linguistic characteristics of the spoken audio content, such as audio spectral patterns, vocal frequency, vocal pitch, speaking speed, intonation, loudness, amplitude, speech patterns, speech cadence, the number and/or duration of utterances, the number and/or duration of breaks between utterances, etc.; P0097, The voice profile may have been identified or marked in the content, such as via a manifest file or metadata.)

Regarding claim 26 Narayanan teach claim 1.
Narayanan further teaches:
wherein the generated corresponding speech metadata specifies a pitch of the extracted audio content representing speech. (P0096, The voice profile module may determine the voice profile based on the spoken audio content in the content portion that indicates the incorrect word (as opposed to background audio content in the content portion). The voice profile may be determined based on audio, vocal, and/or linguistic characteristics of the spoken audio content, such as audio spectral patterns, vocal frequency, vocal pitch, speaking speed, intonation, loudness, amplitude, speech patterns, speech cadence, the number and/or duration of utterances, the number and/or duration of breaks between utterances, etc.; P0097, The voice profile may have been identified or marked in the content, such as via a manifest file or metadata.)

Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Narayanan in view of Yanamandra et al. (U.S. PG Pub No. 20240404503), hereinafter Yanamandra.

Regarding claim 7 Narayanan teach claim 1.
Narayanan does not specifically teach:
determining a speaking duration of the one or more words of the generated speech text; and
	using at least the one or more words of the generated speech text and the determined speaking duration of the one or more words of the generated speech text as a basis to select the one or more replacement words.  
	Yanamandra, however, teaches:
	determining a speaking duration of the one or more words of the generated speech text; and (P0055, At block 514, the device may add the first speech signals into the dialogue using at least a portion of the time duration of the removed second speech signals.)
	using at least the one or more words of the generated speech text and the determined speaking duration of the one or more words of the generated speech text as a basis to select the one or more replacement words. (P0052, At block 508, the device may select a second utterance (e.g., the replacement utterance(s), the replacement utterance(s)) to replace the first utterance in the portion of the audio data. The second utterance may be from a list of non-course replacement words, based on machine learning trained to replace utterances with similar meaning utterances that fit within a same time duration, or may represent a brand as an advertisement.)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to determine a speaking duration of one or more words and use the speaking duration as a basis to select one or more replacement words.  It would have been obvious to combine the references because finding replacement words that are similar in length as the replaced word avoids replacement audio being significantly shorter or longer in time than the audio that it replaces and thus avoids jarring or confusion. (Yanamandra P0001, P0020).

Regarding claim 8 Narayanan in view of Yanamandra teach claim 7.
Yanamandra further teaches:
	wherein using at least the one or more words of the generated speech text and the determined duration of the one or more words of the generated speech text as a basis to select the one or more replacement words comprises using mapping data to map at least the one or more words of the generated speech text and the determined duration of the one or more words of the generated speech text. (P0052, At block 508, the device may select a second utterance (e.g., the replacement utterance(s), the replacement utterance(s)) to replace the first utterance in the portion of the audio data. The second utterance may be from a list of non-course replacement words, based on machine learning trained to replace utterances with similar meaning utterances that fit within a same time duration, or may represent a brand as an advertisement.)

Regarding claim 9 Narayanan in view of Yanamandra teach claim 7.
Yanamandra further teaches:
wherein using at least the one or more words of the generated speech text and the determined duration of the one or more words of the generated speech text as a basis to select the one or more replacement words comprises using a trained model to map at least the one or more words of the generated speech text and the determined duration of the one or more words of the generated speech text. (P0052, At block 508, the device may select a second utterance (e.g., the replacement utterance(s), the replacement utterance(s)) to replace the first utterance in the portion of the audio data. The second utterance may be from a list of non-course replacement words, based on machine learning trained to replace utterances with similar meaning utterances that fit within a same time duration, or may represent a brand as an advertisement.)

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Narayanan in view of Cormack and further view of Kummer (U.S. PG Pub No. 20160042766).
Regarding claim 21 Narayanan in view of Cormack teach claim 5.
	Narayanan in view of Cormack does not specifically teach:
	wherein the mapping data maps at least the one or more words of the generated speech text in a first language to the one or more replacement words in a second language that is different from the first language, wherein the extracted audio content representing speech is in the first language, and wherein the generated corresponding replacement audio content representing the modified speech is in the second language.
Kummer, however, teaches:
wherein the mapping data maps at least the one or more words of the generated speech text in a first language to the one or more replacement words in a second language that is different from the first language, wherein the extracted audio content representing speech is in the first language, and wherein the generated corresponding replacement audio content representing the modified speech is in the second language. (P0006, The media data may be referred to as “original” media data because it is provided with an audio portion in a first or “original” language. … Generate a set of replacement media data that includes replacement audio data in a second or “replacement” language.; P0024, The server generates replacement audio data to be included in replacement media data. For example, using the sample data attributes, along with metadata from the original data, the translation data and translation metadata, the server may identify certain words or sets of words in audio data according to indices or the like in translation metadata. The server may then modify the identified words or sets of words according to sample data attributes for an actor or other participant in the media data.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to map words with replacement words in a second language.  It would have been obvious to combine the references because mapping words with replacement words in a second language yields a predictable result of performing audio dubbing to replace a soundtrack in a first language with a soundtrack in a second language. (Kummer P0001).

Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Narayanan in view of Candelore et al. (U.S. PG Pub No. 20060130121), hereinafter Candelore.
Regarding claim 24 Narayanan teach claim 1.
Narayanan further teaches:
outputting for presentation the edited media content. (P0022, The video distribution system may generally effectuate video content delivery to the client devices.; P0031, A client device may be configured to receive video content and output the video content to a separate display device for consumer viewing.)
Narayanan does not specifically teach:
after generating the modified media content, editing the generated modified media content, based on user input of the temporal positioning of the audio content representing the modified speech within the obtained media content; and
Candelore, however, teaches:
after generating the modified media content, editing the generated modified media content, based on user input of the temporal positioning of the audio content representing the modified speech within the obtained media content; and (P0047,  Dynamic nonlinear editing system receives information from content storage units, which are configured to store video and audio content.  Dynamic nonlinear editing system is adapted to sequence video and audio retrieved from content storage units, and then to establish their temporal relationships. The temporal relationships involve the arrangement of adjacent sequences of the same content type.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to edit the media content based on temporal positioning.  It would have been obvious to combine the references because editing allows content to be correctly located temporally within the transport stream so that primary and secondary audio can be contextually located adjacent to each other for synchronization of content. (Candelore P0054).

Conclusion                                                                                                                                                                             
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL WONSUK CHUNG whose telephone number is (571)272-1345. The examiner can normally be reached Monday - Friday (7am-4pm)[PT].
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL W CHUNG/Examiner, Art Unit 2659      

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Jun 23, 2023
Application Filed
May 31, 2025
Non-Final Rejection — §102, §103
Oct 07, 2025
Applicant Interview (Telephonic)
Oct 07, 2025
Examiner Interview Summary
Oct 08, 2025
Response Filed
Oct 19, 2025
Final Rejection — §102, §103
Feb 25, 2026
Request for Continued Examination
Feb 26, 2026
Response after Non-Final Action
Apr 04, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/984,768
Patent 12579471
DATA AUGMENTATION AND BATCH BALANCING METHODS TO ENHANCE NEGATION AND FAIRNESS
2y 5m to grant Granted Mar 17, 2026
17/812,782
Patent 12493892
METHOD AND SYSTEM FOR EXTRACTING CONTEXTUAL PRODUCT FEATURE MODEL FROM REQUIREMENTS SPECIFICATION DOCUMENTS
2y 5m to grant Granted Dec 09, 2025
17/706,303
Patent 12400078
INTERPRETABLE EMBEDDINGS
2y 5m to grant Granted Aug 26, 2025
18/441,766
Patent 12387000
PRIVACY-PRESERVING AVATAR VOICE TRANSMISSION
2y 5m to grant Granted Aug 12, 2025
17/842,986
Patent 12380875
SPEECH SYNTHESIS WITH FOREIGN FRAGMENTS
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
54%
Grant Probability
92%
With Interview (+37.5%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.