Prosecution Insights
Last updated: April 17, 2026
Application No. 18/132,974

Conversion of Text to Dynamic Video

Final Rejection §103
Filed
Apr 11, 2023
Examiner
MA, MICHELLE HAU
Art Unit
2617
Tech Center
2600 — Communications
Assignee
unknown
OA Round
2 (Final)
81%
Grant Probability
Favorable
3-4
OA Rounds
2y 7m
To Grant
99%
With Interview

Examiner Intelligence

Grants 81% — above average
81%
Career Allow Rate
17 granted / 21 resolved
+19.0% vs TC avg
Strong +36% interview lift
Without
With
+36.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
35 currently pending
Career history
56
Total Applications
across all art units

Statute-Specific Performance

§101
3.0%
-37.0% vs TC avg
§103
84.2%
+44.2% vs TC avg
§102
6.4%
-33.6% vs TC avg
§112
5.5%
-34.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 21 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment The amendment filed September 16, 2025 has been entered. Claims 1-16 remain pending in the application. Applicant’s amendments to the Specification and Claims have overcome each and every objection previously set forth in the Non-Final Office Action mailed June 5, 2025. Response to Arguments Applicant’s arguments, see Page 10-12 of Remarks, filed September 16, 2025, with respect to the claims 1-16, where the previously cited prior art does not teach the features from the newly proposed amendments to claims 1 and 6, have been fully considered and are persuasive. However, upon further consideration, a new ground(s) of rejection is made in view of Microsoft (Microsoft 1: Microsoft Word 2016 Part 2 Insert Tab; Microsoft 2: How to Insert Emoji in Microsoft Word Documents; Microsoft 3: Insert ASCII or Unicode Latin-based symbols and characters). See 35 USC 103 rejections below. Furthermore, applicant’s arguments, see Page 10 of Remarks, filed September 16, 2025, with respect to the rejections of claims 4-5 and 9-10 under 35 USC 102 and 103, where Schriber does not teach machine learning-based natural language processing, have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Young et al. (KR 20190118108 A). See 35 USC 103 rejections below. On the other hand, applicant's arguments filed September 16, 2025, regarding the obviousness of machine-learning natural language processing, have been fully considered but they are not persuasive. In response to applicant’s argument, on Page 10-11 of Remarks, that there is no teaching, suggestion, or motivation to replace natural language processing with machine-learning natural language processing, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case, the primary reference, Schriber teaches natural language processing (Paragraph 0039, 0051 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”), and the secondary reference, Young teaches machine-learning natural language processing (Abstract, Paragraph 0036-0037, 0066 – “an artificial intelligence (AI) system using an AI model learned in accordance with at least one of machine learning, neural networks, and a deep learning algorithm, and to application thereof…Once the text is acquired, the electronic device determines multiple key words from the acquired text (S220). And, when multiple key words are determined, the electronic device obtains multiple first illustrations corresponding to the multiple key words (S230). Specifically, the electronic device can input information and text about the design of the presentation video into a first artificial intelligence model learned by an artificial intelligence algorithm, thereby obtaining a plurality of first illustrations related to the text and corresponding to the design of the presentation video…The first artificial intelligence model (1210) can perform natural language processing, extract key words from text, and understand the meaning and relationships of each key word”). Young suggests a motivation for using machine-learning natural language processing instead of natural language processing: “Recently, artificial intelligence systems that achieve human-level intelligence are being used in various fields. Unlike existing rule-based smart systems, artificial intelligence systems are systems in which machines learn, make judgments, and become smarter on their own. As AI systems become more widely used, their recognition rates improve and their ability to understand user preferences more accurately increases. As a result, existing rule-based smart systems are gradually being replaced by deep learning-based AI systems” (Young: Paragraph 0003). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber by incorporating the machine-learning natural language processing of Young. Claim Objections Claims 1-16 are objected to because of the following informalities: Claims 1 and 6 recite the limitation "the group" in line 5. There is insufficient antecedent basis for this limitation in the claim. “the group” should read “a group”. In claims 2 and 7 lines 1-3, “the method further comprising: the annotations are from a library of options” should read “wherein the annotations are from a library of options”. In claims 12 and 15 line 2, perhaps “metadate” should read “metadata”. Claims 2-5 and 14-16 are objected due to their dependency on claim 1. Claims 7-13 are objected due to their dependency on claim 6. Appropriate correction is required. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-2 and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Schriber et al. (US 20190155829 A1) in view of Microsoft (Microsoft 1: Microsoft Word 2016 Part 2 Insert Tab; Microsoft 2: How to Insert Emoji in Microsoft Word Documents; Microsoft 3: Insert ASCII or Unicode Latin-based symbols and characters), hereinafter Schriber and Microsoft respectively. Regarding claim 1, Schriber teaches a method for automatically converting text to dynamic video (Paragraph 0005, 0008, 0087 – “a computer-implemented method comprises presenting a visual representation of a script…the script and the updated script are written in a natural language format…Actions and/or interactions are represented using animation… Camera view may be controlled by a user actuating the representation of camera 310A”; Note: the method converts a script, which is equivalent to text, into an animation, which is equivalent to a video. The user can interact with the video by controlling the camera view, which makes the video dynamic), the method comprising: accessing a screenplay; applying one or more non-textual annotations adjacent to text of the screenplay to create an annotated screenplay (Paragraph 0036, 0056 – “one or more elements of a script are received…The script may be in a text format, and can be displayed on editing panel 204…persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”; Note: the script is equivalent to the screenplay), the annotations being selected from the group consisting of an image (Paragraph 0056 – “persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”), an animated image (Paragraph 0087 – “Characters and objects can be represented by corresponding 3D models/images. Actions and/or interactions are represented using animation”; Note: the images are animated), a video (Paragraph 0056 – persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”), a 3d object (Paragraph 0087 – “Characters and objects can be represented by corresponding 3D models/image”), audio (Fig. 3E, Paragraph 0075 – “Element 302C presents background audio to a user”; Note: Fig. 3E shows the audio visualization; see screenshot of Fig. 3E below), and a symbol (Fig. 3K, Paragraph 0086 – “character 3106 is represented as, e.g., “looking” to the right, while character 310D is represented as looking up towards character 3106. Element 310C, which as described above, may be a tent can be represented as having its opening/door positioned in the direction where character 3106 is positioned in the scene”; Note: 310s are symbols; see screenshot of Fig. 3K below); transforming the annotated screenplay to a sequencer (Paragraph 0039, 0049 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… visualization engine 208 may be receiving, from frontend/backend analytics engines 206/212 and/or NLP core 224, information gleaned from the parsed metadata. Visualization engine 208 may use this information to generate the character timeline and/or 2D representation of the relevant aspects of the script”; Note: the script is transformed into metadata that provides context/meaning of the script, which inherently includes the sequence of events considering it can be used to create a timeline. Thus, the extracted metadata is equivalent to the sequencer); building a virtual world from the sequencer (Paragraph 0049, 0051 – “visualization engine 208 may be receiving, from frontend/backend analytics engines 206/212 and/or NLP core 224, information gleaned from the parsed metadata. Visualization engine 208 may use this information to generate the character timeline and/or 2D representation of the relevant aspects of the script…Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”; Note: the visualization, which is equivalent to the virtual world, is built from the extracted metadata, which is equivalent to the sequencer. Additionally, it is inherent that the extracted metadata contains information related to the sequence of the events, as it can be used to create a timeline of the story. See screenshots of Fig. 3K and Fig. 3L below to see that the visualization is a virtual world); and rendering the virtual world into a video (Paragraph 0048, 0087 – “some form of abstract or rendered visualization is created to represent a scene or aspect of the script…FIG. 3L illustrates an example of a 3D preview or pre-visualization of a scene. In this example, the 3D preview is represented as a birds-eye view. Characters and objects can be represented by corresponding 3D models/images. Actions and/or interactions are represented using animation”; Note: the visualization of the scene is equivalent to the virtual world, and the animation is equivalent to a video. See screenshot of Fig. 3L below, which shows the virtual world). PNG media_image1.png 460 475 media_image1.png Greyscale Screenshot of Fig. 3E (taken from Schriber) PNG media_image2.png 603 669 media_image2.png Greyscale Screenshot of Fig. 3K (taken from Schriber) PNG media_image3.png 753 717 media_image3.png Greyscale Screenshot of Fig. 3L (taken from Schriber) Schriber does not teach the “adjacent” portion of the limitation: “applying one or more non-textual annotations adjacent to text of the screenplay to create an annotated screenplay” nor does Schriber teach the annotations being selected from the group consisting of a pictogram, a logogram, an ideogram, an emoticon, an emoji, a glyph, a mark, a grapheme, a code point, and a typographical approximation. However, Microsoft teaches applying one or more non-textual annotations adjacent to text (Microsoft 1: Image 4 on Page 2 – An image is applied adjacent to the text; see screenshot of Image 4 below), the annotations being selected from the group consisting of a pictogram (Microsoft 1: Image 2 on Page 1 – There are pictogram options, such as the clover pictogram. The annotations are being selected from the “Insert” tab of the document), a logogram (Microsoft 1: Image 2 on Page 1 – There are logogram options, such as © for “copyright”), an ideogram (Microsoft 1: Image 2 on Page 1 – There are ideogram options, such as “=” for “equal” or “equality”), an emoticon (Microsoft 2: Image on Page 1 – There is an option to select an emoticon), an emoji (Microsoft 2: Image on Page 1 – There is an option to select an emoji), a glyph (Microsoft 3: Page 9 – There are options to insert glyphs), a mark (Microsoft 1: Image 2 on Page 1 – There are various mark options, such as the arrows), a grapheme (Microsoft 3: Page 3 – The character map show grapheme options such as “T”), a code point (Microsoft 3: Paragraph 5 on Page 2 – “To insert a Unicode character, type the character code, press ALT, and then press X”), and a typographical approximation (Microsoft 1: Image 3 on Page 2 – There are typographical approximation options, such as the different versions of “X”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Microsoft to have the text and non-textual annotations be adjacent so that they can supplement each other and provide a clear message to the user of what the text and annotation refers to. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Microsoft to include a pictogram, a logogram, an ideogram, an emoticon, an emoji, a glyph, a mark, a grapheme, a code point, and a typographical approximation as options for annotations because any type of visualization related to the text may assist the user in better understanding the message or story. PNG media_image4.png 1187 1916 media_image4.png Greyscale Screenshot of Text Adjacent to Non-Textual Annotation (taken from Microsoft 1) PNG media_image5.png 1175 1917 media_image5.png Greyscale Modified screenshot of Annotations (taken from Microsoft 1) PNG media_image6.png 1190 1918 media_image6.png Greyscale Modified screenshot of Annotations (taken from Microsoft 2) PNG media_image7.png 1189 1916 media_image7.png Greyscale Modified screenshot of Annotations (taken from Microsoft 1) PNG media_image8.png 820 1072 media_image8.png Greyscale Modified screenshot of Annotations (taken from Microsoft 3) Regarding claim 2, Schriber in view of Microsoft teaches the method disclosed in claim 1. Schriber further teaches wherein the annotations are from a library of options (Paragraph 0008, 0054, 0056 – “the script and the updated script are written in a natural language format… As the user develops or edits the script and/or adjusts one or more aspects of the visualization, visualization engine 208 reflects the appropriate changes relative to the first visualization…persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”; Note: the script, which is originally written in text, can be annotated by the user with images from the persistent data library). Regarding claim 6, Schriber teaches a method for automatically converting text to video (Paragraph 0005, 0008, 0087 – “a computer-implemented method comprises presenting a visual representation of a script…the script and the updated script are written in a natural language format…Actions and/or interactions are represented using animation… Camera view may be controlled by a user actuating the representation of camera 310A”; Note: the method converts a script, which is equivalent to text, into an animation, which is equivalent to a video), the method comprising: accessing a screenplay; applying one or more non-textual annotations adjacent to text of the screenplay to create an annotated screenplay (Paragraph 0036, 0056 – “one or more elements of a script are received…The script may be in a text format, and can be displayed on editing panel 204…persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”; Note: the script is equivalent to the screenplay), the annotations being selected from the group consisting of an image (Paragraph 0056 – “persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”), an animated image (Paragraph 0087 – “Characters and objects can be represented by corresponding 3D models/images. Actions and/or interactions are represented using animation”; Note: the images are animated), a video (Paragraph 0056 – persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”), a 3d object (Paragraph 0087 – “Characters and objects can be represented by corresponding 3D models/image”), audio (Fig. 3E, Paragraph 0075 – “Element 302C presents background audio to a user”; Note: Fig. 3E shows the audio visualization; see screenshot of Fig. 3E below), and a symbol (Fig. 3K, Paragraph 0086 – “character 3106 is represented as, e.g., “looking” to the right, while character 310D is represented as looking up towards character 3106. Element 310C, which as described above, may be a tent can be represented as having its opening/door positioned in the direction where character 3106 is positioned in the scene”; Note: 310s are symbols; see screenshot of Fig. 3K above); transforming the annotated screenplay to a sequencer (Paragraph 0039, 0049 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… visualization engine 208 may be receiving, from frontend/backend analytics engines 206/212 and/or NLP core 224, information gleaned from the parsed metadata. Visualization engine 208 may use this information to generate the character timeline and/or 2D representation of the relevant aspects of the script”; Note: the script is transformed into metadata that provides context/meaning of the script, which inherently includes the sequence of events considering it can be used to create a timeline. Thus, the extracted metadata is equivalent to the sequencer); building a virtual world from the sequencer (Paragraph 0049, 0051 – “visualization engine 208 may be receiving, from frontend/backend analytics engines 206/212 and/or NLP core 224, information gleaned from the parsed metadata. Visualization engine 208 may use this information to generate the character timeline and/or 2D representation of the relevant aspects of the script…Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”; Note: the visualization, which is equivalent to the virtual world, is built from the extracted metadata, which is equivalent to the sequencer. Additionally, it is inherent that the extracted metadata contains information related to the sequence of the events, as it can be used to create a timeline of the story. See screenshots of Fig. 3K and Fig. 3L above to see that the visualization is a virtual world); and rendering the virtual world into a video (Paragraph 0048, 0087 – “some form of abstract or rendered visualization is created to represent a scene or aspect of the script…FIG. 3L illustrates an example of a 3D preview or pre-visualization of a scene. In this example, the 3D preview is represented as a birds-eye view. Characters and objects can be represented by corresponding 3D models/images. Actions and/or interactions are represented using animation”; Note: the visualization of the scene is equivalent to the virtual world, and the animation is equivalent to a video. See screenshot of Fig. 3L above, which shows the virtual world). Schriber does not directly teach a method for automatically converting text to static video. Instead, Schriber teaches a method for automatically converting text to dynamic video (Paragraph 0005, 0008, 0087 – “a computer-implemented method comprises presenting a visual representation of a script…the script and the updated script are written in a natural language format…Actions and/or interactions are represented using animation…Camera view may be controlled by a user actuating the representation of camera 310A”; Note: the method converts a script, which is equivalent to text, into an animation, which is equivalent to a video. The user can interact with the video by controlling the camera view, which makes the video dynamic). However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to convert text to static video because if no interactions with the audience are necessary, then the text should be converted to a static video, rather than a dynamic video, to save processing and rendering time and resources. Furthermore, regarding videos, there is a finite number of video types; either they can be static or dynamic. According to the definition provided by paragraph 0016 in the specification, regarding static videos, “the content of the video does not change”, while for dynamic videos, “the content of the video changes, based on, for example, who is viewing or interacting with the video.” Therefore, if a video is not dynamic, then it must be static and vice versa. One of ordinary skill in the art could have converted text to a static video with a reasonable expectation of success and would have done so for the benefit of simplicity, especially in the case where audience interaction is not desired. Therefore, it would have been obvious to try the solution of converting text to static video. Furthermore, Schriber does not teach the “adjacent” portion of the limitation: “applying one or more non-textual annotations adjacent to text of the screenplay to create an annotated screenplay” nor does Schriber teach the annotations being selected from the group consisting of a pictogram, a logogram, an ideogram, an emoticon, an emoji, a glyph, a mark, a grapheme, a code point, and a typographical approximation. However, Microsoft teaches applying one or more non-textual annotations adjacent to text (Microsoft 1: Image 4 on Page 2 – An image is applied adjacent to the text; see screenshot of Image 4 above), the annotations being selected from the group consisting of a pictogram (Microsoft 1: Image 2 on Page 1 – There are pictogram options, such as the clover pictogram. The annotations are being selected from the “Insert” tab of the document), a logogram (Microsoft 1: Image 2 on Page 1 – There are logogram options, such as © for “copyright”), an ideogram (Microsoft 1: Image 2 on Page 1 – There are ideogram options, such as “=” for “equal” or “equality”), an emoticon (Microsoft 2: Image on Page 1 – There is an option to select an emoticon), an emoji (Microsoft 2: Image on Page 1 – There is an option to select an emoji), a glyph (Microsoft 3: Page 9 – There are options to insert glyphs), a mark (Microsoft 1: Image 2 on Page 1 – There are various mark options, such as the arrows), a grapheme (Microsoft 3: Page 3 – The character map show grapheme options such as “T”), a code point (Microsoft 3: Paragraph 5 on Page 2 – “To insert a Unicode character, type the character code, press ALT, and then press X”), and a typographical approximation (Microsoft 1: Image 3 on Page 2 – There are typographical approximation options, such as the different versions of “X”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Microsoft to have the text and non-textual annotations be adjacent so that they can supplement each other and provide a clear message to the user of what the text and annotation refers to. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Microsoft to include a pictogram, a logogram, an ideogram, an emoticon, an emoji, a glyph, a mark, a grapheme, a code point, and a typographical approximation as options for annotations because any type of visualization related to the text may assist the user in better understanding the message or story. Regarding claim 7, Schriber in view of Microsoft teaches the method disclosed in claim 6. Schriber further teaches wherein the annotations are from a library of options (Paragraph 0008, 0054, 0056 – “the script and the updated script are written in a natural language format… As the user develops or edits the script and/or adjusts one or more aspects of the visualization, visualization engine 208 reflects the appropriate changes relative to the first visualization…persistant data library 220 may contain images of outfits associated with one or more characters, sketches (image and video), or any other relevant information or data that can be used to create and/or edit aspects of the script or visualization”; Note: the script, which is originally written in text, can be annotated by the user with images from the persistent data library). Claims 3 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Schriber in view of Microsoft and Frey et al. (US 12236514 B2), hereinafter Frey. Regarding claim 3, Schriber in view of Microsoft teaches the method disclosed in claim 1. Schriber does not teach distributing to display dynamic videos generated during the rendering process. However, Frey teaches distributing to display dynamic videos generated during the rendering process (Col. 12 lines 33-36, Col. 12 lines 43-52 – “exporter 136 can perform alpha blending of the images in static layer protocol buffer messages and the populated dynamic layers to produce the final rendering of video project file 202…VRP 130 provides the rendered video to DCDS 110 for presentation to user device 106 along with, or as reply 114. As described above, reply data 114 can indicate the video content as rendered by VRP 130 in addition the requested electronic document. Reply data 114 is transmitted by DCDS 110 to user device 106 in response to DCDS 110 receiving request 108 and determining, based on the received distribution parameters and user data indicated in request 108, that the distribution parameters are satisfied”; Note: a video with dynamic components is rendered, distributed, and displayed). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Frey to distribute and display dynamic videos generated during rendering because users would not be able to view the videos if they were not distributed or displayed. Additionally, videos are commonly generated during the rendering process, especially after editing or adding effects. Regarding claim 8, Schriber in view of Microsoft teaches the method disclosed in claim 6. Schriber does not teach distributing to display dynamic videos generated during the rendering process. However, Frey teaches distributing to display dynamic videos generated during the rendering process (Col. 12 lines 33-36, Col. 12 lines 43-52 – “exporter 136 can perform alpha blending of the images in static layer protocol buffer messages and the populated dynamic layers to produce the final rendering of video project file 202…VRP 130 provides the rendered video to DCDS 110 for presentation to user device 106 along with, or as reply 114. As described above, reply data 114 can indicate the video content as rendered by VRP 130 in addition the requested electronic document. Reply data 114 is transmitted by DCDS 110 to user device 106 in response to DCDS 110 receiving request 108 and determining, based on the received distribution parameters and user data indicated in request 108, that the distribution parameters are satisfied”; Note: a video with dynamic components is rendered, distributed, and displayed). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Frey to distribute and display dynamic videos generated during rendering because users would not be able to view the videos if they were not distributed or displayed. Additionally, videos are commonly generated during the rendering process, especially after editing or adding effects. Claims 4-5, 9-11, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Schriber in view of Microsoft and Young et al. (KR 20190118108 A), hereinafter Young. Regarding claim 4, Schriber in view of Microsoft teaches the method disclosed in claim 1. Schriber further teaches transforming the text of the screenplay into visual information and components (Paragraph 0039, 0051 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”). Schriber does not teach “utilizing machine learning” in the limitation: “utilizing machine learning to transform text into meaningful visual information and components”. However, Young teaches utilizing machine learning to transform the text of the screenplay into visual information and components (Abstract, Paragraph 0036-0037, 0066 – “an artificial intelligence (AI) system using an AI model learned in accordance with at least one of machine learning, neural networks, and a deep learning algorithm, and to application thereof…Once the text is acquired, the electronic device determines multiple key words from the acquired text (S220). And, when multiple key words are determined, the electronic device obtains multiple first illustrations corresponding to the multiple key words (S230). Specifically, the electronic device can input information and text about the design of the presentation video into a first artificial intelligence model learned by an artificial intelligence algorithm, thereby obtaining a plurality of first illustrations related to the text and corresponding to the design of the presentation video…The first artificial intelligence model (1210) can perform natural language processing, extract key words from text, and understand the meaning and relationships of each key word”; Note: the AI model is a natural language processing model that learns with machine learning. It identifies words and outputs illustrations related to the words). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Young to use machine learning natural language processing for the benefit of efficiency and accuracy that comes from the automation and knowledge of an ML model. Additionally, a person of ordinary skill in the art before the effective filing date of the claimed invention would have recognized that the natural language processor of Schriber could have been substituted for the machine learning natural language processor of Young because both the natural language processor and machine learning natural language processor serve the purpose of determining words in text and generating related visuals for the words. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Finally, the substitution achieves the predictable result of determining words in text and generating related visuals for the words. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the natural language processor of Schriber for the machine learning natural language processor of Young according to known methods to yield the predictable result of determining words in text and generating related visuals for the words. Regarding claim 5, Schriber in view of Microsoft teaches the method disclosed in claim 1. Schriber further teaches utilizing natural language processors to determine words in the text of the screenplay that are entities to render in the video (Paragraph 0039, 0051 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”; Note: natural language processing is used to determine entities, like characters, from text of a script). Schriber does not teach “utilizing machine learning natural language processors” in the limitation: “utilizing machine learning natural language processors to determine words in text that are entities to render in video”. However, Young teaches utilizing machine learning natural language processors to determine words in text that are entities to render in video (Abstract, Paragraph 0036-0037, 0066 – “an artificial intelligence (AI) system using an AI model learned in accordance with at least one of machine learning, neural networks, and a deep learning algorithm, and to application thereof…Once the text is acquired, the electronic device determines multiple key words from the acquired text (S220). And, when multiple key words are determined, the electronic device obtains multiple first illustrations corresponding to the multiple key words (S230). Specifically, the electronic device can input information and text about the design of the presentation video into a first artificial intelligence model learned by an artificial intelligence algorithm, thereby obtaining a plurality of first illustrations related to the text and corresponding to the design of the presentation video…The first artificial intelligence model (1210) can perform natural language processing, extract key words from text, and understand the meaning and relationships of each key word”; Note: the AI model is a natural language processing model that learns with machine learning. It identifies words and outputs illustrations to be put in a presentation video). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Young to use machine learning natural language processing for the benefit of efficiency and accuracy that comes from the automation and knowledge of an ML model. Additionally, a person of ordinary skill in the art before the effective filing date of the claimed invention would have recognized that the natural language processor of Schriber could have been substituted for the machine learning natural language processor of Young because both the natural language processor and machine learning natural language processor serve the purpose of determining words in text and generating related visuals for the words. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Finally, the substitution achieves the predictable result of determining words in text and generating related visuals for the words. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the natural language processor of Schriber for the machine learning natural language processor of Young according to known methods to yield the predictable result of determining words in text and generating related visuals for the words. Regarding claim 9, Schriber in view of Microsoft teaches the method disclosed in claim 6. Schriber further teaches transforming the text of the screenplay into visual information and components (Paragraph 0039, 0051 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”). Schriber does not teach “utilizing machine learning” in the limitation: “utilizing machine learning to transform text into meaningful visual information and components”. However, Young teaches utilizing machine learning to transform the text of the screenplay into visual information and components (Abstract, Paragraph 0036-0037, 0066 – “an artificial intelligence (AI) system using an AI model learned in accordance with at least one of machine learning, neural networks, and a deep learning algorithm, and to application thereof…Once the text is acquired, the electronic device determines multiple key words from the acquired text (S220). And, when multiple key words are determined, the electronic device obtains multiple first illustrations corresponding to the multiple key words (S230). Specifically, the electronic device can input information and text about the design of the presentation video into a first artificial intelligence model learned by an artificial intelligence algorithm, thereby obtaining a plurality of first illustrations related to the text and corresponding to the design of the presentation video…The first artificial intelligence model (1210) can perform natural language processing, extract key words from text, and understand the meaning and relationships of each key word”; Note: the AI model is a natural language processing model that learns with machine learning. It identifies words and outputs illustrations related to the words). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Young to use machine learning natural language processing for the benefit of efficiency and accuracy that comes from the automation and knowledge of an ML model. Additionally, a person of ordinary skill in the art before the effective filing date of the claimed invention would have recognized that the natural language processor of Schriber could have been substituted for the machine learning natural language processor of Young because both the natural language processor and machine learning natural language processor serve the purpose of determining words in text and generating related visuals for the words. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Finally, the substitution achieves the predictable result of determining words in text and generating related visuals for the words. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the natural language processor of Schriber for the machine learning natural language processor of Young according to known methods to yield the predictable result of determining words in text and generating related visuals for the words. Regarding claim 10, Schriber in view of Microsoft teaches the method disclosed in claim 6. Schriber further teaches utilizing natural language processors to determine words in the text of the screenplay that are entities to render in the video (Paragraph 0039, 0051 – “frontend analytics engine 306 may be utilized to extract metadata from the script. That is, frontend analytics engine 206 may be programmed with natural language processing functionality such that it can analyze the text of a script and determine the existence of meaningful language, such as language indicative of characters, actions, interactions between characters, dialog, etc… Visualization engine 208 utilizes a visual metaphor language for generating the visualizations. Scene metadata (e.g., characters, props, and actions) may be extracted from the script as described above, and used to generate a first visualization”; Note: natural language processing is used to determine entities, like characters, from text of a script). Schriber does not teach “utilizing machine learning natural language processors” in the limitation: “utilizing machine learning natural language processors to determine words in text that are entities to render in video”. However, Young teaches utilizing machine learning natural language processors to determine words in text that are entities to render in video (Abstract, Paragraph 0036-0037, 0066 – “an artificial intelligence (AI) system using an AI model learned in accordance with at least one of machine learning, neural networks, and a deep learning algorithm, and to application thereof…Once the text is acquired, the electronic device determines multiple key words from the acquired text (S220). And, when multiple key words are determined, the electronic device obtains multiple first illustrations corresponding to the multiple key words (S230). Specifically, the electronic device can input information and text about the design of the presentation video into a first artificial intelligence model learned by an artificial intelligence algorithm, thereby obtaining a plurality of first illustrations related to the text and corresponding to the design of the presentation video…The first artificial intelligence model (1210) can perform natural language processing, extract key words from text, and understand the meaning and relationships of each key word”; Note: the AI model is a natural language processing model that learns with machine learning. It identifies words and outputs illustrations to be put in a presentation video). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Young to use machine learning natural language processing for the benefit of efficiency and accuracy that comes from the automation and knowledge of an ML model. Additionally, a person of ordinary skill in the art before the effective filing date of the claimed invention would have recognized that the natural language processor of Schriber could have been substituted for the machine learning natural language processor of Young because both the natural language processor and machine learning natural language processor serve the purpose of determining words in text and generating related visuals for the words. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Finally, the substitution achieves the predictable result of determining words in text and generating related visuals for the words. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the natural language processor of Schriber for the machine learning natural language processor of Young according to known methods to yield the predictable result of determining words in text and generating related visuals for the words. Regarding claim 11, Schriber in view of Microsoft teaches the method disclosed in claim 6. Schriber does not teach grouping together more than one of the annotations to represent one or more new annotations. However, Young teaches grouping together more than one of the annotations to represent one or more new annotations (Paragraph 0044, 0056-0057 – “the electronic device can input text into an artificial intelligence model to obtain a plurality of first illustrations and obtain a second illustration that is a composite of the plurality of first illustrations as an illustration associated with the text. That is, a composite illustration that synthesizes multiple illustrations can be provided…Multiple illustrations can be categorized based on the relationship and context of the key words… Users can use the composited illustration as is, as in Figure 8, or can individually modify multiple illustrations within the composited illustration as desired (modifying size, graphic effects, placement, etc.) to create a new composited illustration”; Note: the annotation was previously taught by Schriber in the rejection of claim 1. Here, the illustration corresponds to the annotation). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Schriber to incorporate the teachings of Young to group annotations for the benefit of creating a visualization that would help increase understanding for the user. Annotations are grouped by relationship and context, which would assist in telling a story and/or making the visualization more aesthetic. Regarding claim 14, Schriber in view of Microsoft teaches the method disclosed in claim 1. Schriber does not teach grouping together more than one of the annotations to represent one or more new annotations. However, Young teaches grouping together more than one of the annotations to represent one or more new annotations (Paragraph 0044, 0056-0057 – “the electronic device can input text into an artificial intelligence model to obtain a plurality of first illustrations and obtain a second illustration that is a composite of the plurality of first illustrations as an illustration associated with the text. That is, a composite illustration that synthesizes multiple illustrations can be provided…Multiple illustrations can be categorized based on the relationship and context of the key words… Users can use the composited illustration as is, as in Figure 8, or can individually modify multiple illustrations within the composited illustration as desired (modifying size, graphic effects, placement, etc.) to create a new composited illustration”; Note: the annotation was previously taught by Schriber in the rejection of claim 1. Here, the illustration corresponds to the annotation). It would h
Read full office action

Prosecution Timeline

Apr 11, 2023
Application Filed
Jan 11, 2024
Response after Non-Final Action
Jun 02, 2025
Non-Final Rejection — §103
Sep 16, 2025
Response Filed
Nov 03, 2025
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602750
DIFFERENTIABLE EMULATION OF NON-DIFFERENTIABLE IMAGE PROCESSING FOR ADJUSTABLE AND EXPLAINABLE NON-DESTRUCTIVE IMAGE AND VIDEO EDITING
2y 5m to grant Granted Apr 14, 2026
Patent 12597208
BUILDING INFORMATION MODELING SYSTEMS AND METHODS
2y 5m to grant Granted Apr 07, 2026
Patent 12573217
SERVER, METHOD AND COMPUTER PROGRAM FOR GENERATING SPATIAL MODEL FROM PANORAMIC IMAGE
2y 5m to grant Granted Mar 10, 2026
Patent 12561851
HIGH-RESOLUTION IMAGE GENERATION USING DIFFUSION MODELS
2y 5m to grant Granted Feb 24, 2026
Patent 12536734
Dynamic Foveated Point Cloud Rendering System
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+36.4%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 21 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in for Full Analysis

Enter your email to receive a magic link. No password needed.

Free tier: 3 strategy analyses per month