Last updated: April 19, 2026

Application No. 18/732,194

Video-Content System with Narrative-Based Video Content Generation Feature

Final Rejection §103

Filed

Jun 03, 2024

Examiner

WU, YANNA

Art Unit

2615

Tech Center

2600 — Communications

Assignee

Roku Inc.

OA Round

2 (Final)

Interview Optional

— +35.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 438 resolved cases, 2023–2026

Examiner Intelligence

WU, YANNA View full profile →

Grants 81% — above average

Career Allow Rate

354 granted / 438 resolved

+18.8% vs TC avg

Strong +35% interview lift

Without

With

+35.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 4m

Avg Prosecution

20 currently pending

Career history

458

Total Applications

across all art units

Statute-Specific Performance

§101

8.2%

-31.8% vs TC avg

§103

65.1%

+25.1% vs TC avg

§102

6.3%

-33.7% vs TC avg

§112

11.3%

-28.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 438 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This is in response to applicant’s amendment/response filed on 02/19/2026, which has
been entered and made of record.  Claim 1, 16 and 20 are amended. Claims 1-20 are pending in the application.
		
Response to Arguments
The ODP rejection is withdrawn in view of the filed TD on 02/19/2026. Applicant arguments regarding claim rejections under 103 are considered, but are not persuasive. 
Applicant argues:

    PNG
    media_image1.png
    110
    686
    media_image1.png
    Greyscale

Examiner disagrees:  The amendment overcome the references listed in the non-final rejection. But further search and consideration are required for its allowability. With new ground of rejection, the arguments are deemed moot. 




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 5, 7-10, 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2021/0192824 A1) in view of Watanabe et al. (US 2024/0212228 A1) and further in view of Reale (US 2024/0370664 A1).
Regarding claim 1, Chen teaches:
A method comprising:
obtaining input data, wherein the input data includes story description text; ( [0046], [0047], [0060], inputting a message, plain text, travel description);
providing the obtained input data to a narration model and responsively receiving generated narrative text; (paragraphs [0047]-[0048], sequential motion parsing);
identifying, from among the generated narrative text, a subset of text, wherein identifying, from among the generated narrative text, the subset of text comprises identifying, from among the generated narrative text, the subset of text based on the identified subset of text satisfying one or more conditions of a condition set, ([0052], “In an implementation, the features 620 may comprise keywords in the message 610. In this disclosure, “word” is used for collectively referring to character, word, phrase, etc. in various language families. Herein, a “keyword” may refer to one or more words for which one or more corresponding animations have been collected or created in the animation database 630. For example, a keyword “glad” may correspond to at least a facial animation indicating grinning in the face part. For example, a keyword “very surprised” may correspond to at least a facial animation indicating opening mouth and eyes largely in the face part, and may further correspond to a body animation indicating opening arms and hands in the body part.”)  wherein a condition in the condition set is a condition that the subset of text itself has a predefined characteristic, wherein the condition set includes a condition that the subset of text includes a person’s name, a proper noun, a location, an activity, an object, an animal, or an instance of punctuation; ([0055], “In an implementation, the features 620 may comprise a pronoun in the message 610. The pronoun may be “I”, “you”, etc.”)
providing the identified subset of text to an image generation model and responsively receiving generated images from the image generation model, ([0052], “Herein, a “keyword” may refer to one or more words for which one or more corresponding animations have been collected or created in the animation database 630. For example, a keyword “glad” may correspond to at least a facial animation indicating grinning in the face part. For example, a keyword “very surprised” may correspond to at least a facial animation indicating opening mouth and eyes largely in the face part, and may further correspond to a body animation indicating opening arms and hands in the body part.” [0075], “In an aspect, an emotion category of this message may be detected as “angry”, a continuous facial expression corresponding to the emotion “angry” may be determined to be applied during this message.”);
providing the generated images to an animation model and responsively receiving generated video segments;([0075], “In an aspect, an emotion category of this message may be detected as “angry”, a continuous facial expression corresponding to the emotion “angry” may be determined to be applied during this message. Accordingly, an exemplary facial animation indicating frowning may be selected from the continuous facial expression subset in the facial animation set 632.”)
combining at least the generated video segments to generate video content; ([0075], “…Any or all of the above facial animations and body animations may be combined together to be applied for the message 610.”) and
outputting for presentation, the generated video content. ([0091], “In an implementation, if the message 510 is in a text format, the process 500 may further comprise convert the message 510 into voice, and incorporate the voice into the video 562. Thus, the motions of the avatar and the audio may be presented together.”).
However, Chen does not explicitly, but Watanabe teaches:
wherein the image generation model uses at least the identified subset of text to generate the generated images, such that the image generation model generates the generated images after receiving the identified subset of text; ([0134], “The image generation assisting unit 113 of Example 3 can extract a keyword used for image generation by analyzing a text”)
Chen teaches acquiring images from database based on extracted keywords. However, Chen does not teach how the images in database are generated in the first place. Watanabe teaches generating images based on extracted keywords and then put them in database.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Chen with the specific teachings of Watanabe to generate images stored in database, which can be used later for images retrieval in the method of Chen. The combination makes the method comprehensive without missing steps.  
However, Chen in view of Watanabe does not, but Reale teaches:
Wherein the story description text indicates at least a story’s plot, setting, and/or characters, and wherein the generated narrative text is a corresponding more detailed version the story description text.([0046], “In another use case or implementation of the disclosure, a “read and write with me”-type application program and associated system can permit the user to interactively create one or more characters by answering a series of prompt questions causing the system to build a profile of one or more characters including data points such as the character's name, physical description, where they live, what is important to the character, what goals the character has, etc. Cooperatively created characters can then be used in conjunction with the other use cases of the disclosure to generate a book, play, or video as a new fiction or non-fiction story or narrative.” [0068]-[0069], “Based at least in part on the content or output from the AI language model 226, the internal game command processor 304 loads data associated with a generated location, such as “the north corridor”. Based at least in part on the generated location from the internal game command processor 304, the story database 302 can retrieve or otherwise provide previously stored human-authored content, such as “You walk down the long corridor of the north wing of the 17th level of the Green Bunker. You reach a door that reads Chief of Department. As you walk forward, the scanner reads your irises and the door to the Chief of Department suite slides open to a sparely appointed waiting area. A voice instructs, ‘Take a seat if you wish.’ But before choosing to sit or stand, the door swings open and the Chief of Department beckons, ‘DL5, come in please.’””)
Chen in view of Watanabe teaches a method of generating image/video based on text input. In this method, the text input comes from user prompt. Reale also teaches a method of generating image/video based on text information. In Reale’s method, a user can provide text input, the text input is first being used to generate a more complex/detailed story. Based on the story, images/video can be generated.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Chen in view of Watanabe with the specific teachings of Reale to expand user’s ability to create a more detailed story and corresponding video. 

Regarding claim 2, Chen in view of Watanabe and Reale teaches:
The method of claim 1, wherein obtaining the input data comprises:
prompting, via a user interface, a user for the input data and responsively receiving, via the user interface, the input data.(Chen, Fig. 1, and paragraph [0026], user input).

Regarding claim 5, Chen in view of Watanabe and Reale teaches:
The method of claim 1, wherein providing the identified subset of text to an image generation model and responsively receiving generated images comprises: (1) providing at least a first portion of the identified subset of text to the image generation model and responsively receiving a first generated image; and  (ii) repeating (i) for additional portions of the identified subset of text, to generate additional images. (Chen, [0060], “A continuous facial expression refers to a facial expression that may continue for a relatively long time, e.g., continuing during a sentence, continuing among more than one sentence, etc. The continuous facial expression may be associated with a message or a context of the message, and intends to reflect, e.g., a holistic emotion of the message or the context. For example, if a pleasant travel is described in one or more messages, a continuous facial expression corresponding to emotion “happy” may be continuously presented in the face part during the one or more messages. The continuous facial expression subset may comprise a number of animations, e.g., animation a1-1, animation a1-2, etc., which correspond to various continuous facial expressions respectively. In an implementation, the animations in the continuous facial expression subset may correspond to facial expressions reflecting various emotions. For example, assuming that the animation a1-1 corresponds to a facial expression reflecting emotion “happy”, the animation a1-1 may indicate a facial motion of squinting and grinning in the face part.”)

Regarding claim 7, Chen in view of Watanabe and Reale teaches:
The method of claim 1, wherein combining at least the generated video segments to generate the video content comprises: stitching the generated video segments together in sequence to generate the video content. (Chen FIG. 7 and corresponding paragraphs)

Regarding claim 8, Chen in view of Watanabe and Reale teaches:
The method of claim 1, wherein combining at least the generated video segments to generate the video content comprises: stitching the generated video segments with corresponding narrative speech together in sequence to generate the video content. (Chen FIG. 7 and corresponding paragraphs)

Regarding claim 9, Chen in view of Watanabe and Reale teaches:
The method of claim 1, wherein the condition is that the subset of text includes a person’s name. (Chen, ([0055], “In an implementation, the features 620 may comprise a pronoun in the message 610. The pronoun may be “I”, “you”, etc.” Chen gives an example of input text as pronoun to indicate a person. It would have been obvious to use a person’s name instead of a pronoun to indicate a person. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have chosen using a person’s name with a reasonable expectation of success.)

Regarding claim 10, Chen in view of Watanabe and Reale teaches:
The method of claim 1, wherein the condition is that the subset of text includes a proper noun. (Chen, ([0055], “In an implementation, the features 620 may comprise a pronoun in the message 610. The pronoun may be “I”, “you”, etc.”)

Regarding claim 16, Chen in view of Watanabe and Reale teaches:
A non-transitory computer-readable medium having stored thereon program instructions that upon execution by a computing system, cause performance of a set of acts comprising (Chen [0143], “The embodiments of the present disclosure may be embodied in a non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations of the methods for automatically generating motions of an avatar according to the embodiments of the present disclosure as mentioned above.”)
	The rest of claim 16 recites similar limitations of claim 1, thus is rejected accordingly.

Claim 17-20 recite similar limitations of claim 7-8, 10 and 1 respectively, thus are rejected accordingly.

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Watanabe and Reale and further in view of Davis (US 2013/0257877 A1).
Regarding claim 3, Chen in view of Watanabe and Reale teaches:
The method of claim 1,
However, Chen in view of Watanabe and Reale does not, but Davis teaches:
 wherein the input data further includes user profile data. ([0057]-[0058], “In preferred embodiments, one or more users 232 may type a query using a keyboard or other input device 244. The user 232 may also provide interactive inputs via an input sensor 234 that is part of a computing/processing apparatus 228 associated with the avatar system…. the avatar presentation system 226 manipulates 118 an avatar 246 whose basic characteristics have been pre-transmitted and received so as to be available along with the personality and behavioral profile characteristics of the interrogated person 208, with the manipulation 118 being responsive to interactions with the avatar 246. The interactions, as previously mentioned, may be simple keyboard inputs, such as for example key strokes on a keyboard, button, mouse, or other input device 244, or the interactions may be other types of inputs. In preferred embodiments the avatar 246 is presented, animated, and manipulated so as to provide an interactive experience for one or more users 232 where the avatar simulates personality and behavioral profile characteristics based on at least a portion of the captured personality and behavioral profile characteristics corresponding to the target person 208.”)
Chen in view of Watanabe teaches generating video based on user input. Davis teaches one of the inputs can be user profile data.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Chen in view of Watanabe and Reale with the specific teachings of Davis to allow generating video based on user’s all kinds of preferences.

Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Watanabe and Reale and further in view of Viellescaze et al. (US 2004/0179043 A1).
Regarding claim 4, Chen in view of Watanabe and Reale teaches:
The method of claim 1, 
However, Chen in view of Watanabe and Reale does not, but Davis teaches:
wherein the input data further includes a target length of narrative text. ([0265], “The advantage of a fuzzy parameter is that it has a precise significance for the brain, irrespective of the content. For example, it is desirable to know the length of a paragraph of a text, because if a paragraph is "short" the agent will be made to move while it is reciting the paragraph.”)
Chen in view of Watanabe and Reale teaches generating video based on user input. Davis teaches one of the inputs should have a target length of the text.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Chen in view of Watanabe and Reale with the specific teachings of Davis so the generating of video segment can be made while the system is reciting the input text, which improve the system response time.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Watanabe and Reale and further in view of Grover (US 2022/0141515 A1).
Regarding claim 6, Chen in view of Watanabe and Reale teaches:
The method of claim 5, 
However, Chen in view of Watanabe and Reale does not, but Grover teaches:
further comprising: generating a query fingerprint of the first generated image; ([0055], “In an example implementation, as the content-distribution system 102 distributes various channels of content, a fingerprint-generation engine (not shown) at the content-distribution system 102 can generate digital reference fingerprint data representing the content respectively of each such channel and can provide that reference fingerprint data along with associated metadata, such as channel identification and frame time stamps, to the fingerprint-matching server 106”) comparing the generated query fingerprint with multiple reference fingerprints of corresponding reference images; ([0056], “And the fingerprint-matching server 106 can compare that query fingerprint data with the reference fingerprint data representing various channels, in an effort to find a match.”) based on the comparing, detecting a match between the generated query fingerprint and at least one of the reference fingerprints; ([0056], “And the fingerprint-matching server 106 can compare that query fingerprint data with the reference fingerprint data representing various channels, in an effort to find a match. Upon determining with sufficient certainty that the query fingerprint data matches the reference fingerprint data representing a particular channel (e.g., determining that at least a threshold degree of similarity exists between the query fingerprint data and the reference fingerprint data.”) and responsive to detecting the match, repeating (i) to generate a different image to replace the first generated image. ([0063], “Upon determining that a particular modifiable-content segment, such as a particular ad, is present on a given channel, the fingerprint-matching server 106 can then work with each content-presentation device that is receiving that channel, to facilitate dynamic content modification as noted above. For instance, having determined that content-presentation device 104 is receiving that channel, the fingerprint-matching server 106 can work with the content-presentation device 104 to facilitate the dynamic content modification.”)
Chen in view of Watanabe and Reale teaches generating images and video based on user input, where images are retrieved from database corresponding to the input. Grover teaches using image finger print to determine which images to retrieve.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Chen in view of Watanabe and Reale with the specific teachings of Grover to accurately and convenient get images.

Claim(s) 11-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Watanabe and Reale and further in view of Luo et al. (US 2025/0284926 A1).
Regarding claim 11-15, Chen in view of Watanabe and Reale teaches:
The method of claim 1, 
However, Chen in view of Watanabe and Reale does not explicitly teach, but Luo teaches:
wherein the condition is that the subset of text includes a location.
wherein the condition is that the subset of text includes an activity.
wherein the condition is that the subset of text includes an object.
wherein the condition is that the subset of text includes an animal.
wherein the condition is that the subset of text includes an instance of punctuation.([0224], “For example, as shown in FIG. 13, the user may perform input on the client, and input text may include “on ancient road in the west wind a lean horse goes, ethereal, melodious, 3D, painting, and ancient””)
Chen in view of Watanabe and Reale teaches generating images based on user input. Luo teaches a specific example of user input includes a location, an activity, an object, an animal and an instance of punctuation.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Chen in view of Watanabe and Reale with the specific teachings of Luo to enable the method of Chen in view of Watanabe and Reale to handle different input scenario. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725. The examiner can normally be reached Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alicia Harrington can be reached at 5712722330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YANNA WU/Primary Examiner, Art Unit 2615

Read full office action

Prosecution Timeline

Jun 03, 2024

Application Filed

Nov 18, 2025

Non-Final Rejection — §103

Feb 19, 2026

Applicant Interview (Telephonic)

Feb 19, 2026

Examiner Interview Summary

Feb 19, 2026

Response Filed

Mar 06, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/369,935

Patent 12602850

GENERATIVE AI VIRTUAL CLOTHING TRY-ON

2y 5m to grant Granted Apr 14, 2026

18/084,084

Patent 12579664

EYE TRACKING METHOD, APPARATUS AND SENSOR FOR DETERMINING SENSING COVERAGE BASED ON EYE MODEL

2y 5m to grant Granted Mar 17, 2026

18/224,195

Patent 12573106

INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD FOR PROCESSING OVERLAY IMAGES

2y 5m to grant Granted Mar 10, 2026

18/295,741

Patent 12573108

HEAD-POSE AND GAZE REDIRECTION

2y 5m to grant Granted Mar 10, 2026

18/146,090

Patent 12555187

CLIENT-SERVER MEDICAL IMAGE STACK RETRIEVAL AND DISPLAY

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

81%

Grant Probability

99%

With Interview (+35.3%)

2y 4m

Median Time to Grant

Moderate

PTA Risk

Based on 438 resolved cases by this examiner. Grant probability derived from career allow rate.

Video-Content System with Narrative-Based Video Content Generation Feature

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email