Last updated: April 18, 2026
Application No. 18/498,380
SYSTEM AND METHOD FOR VISUAL CONTENT GENERATION AND ITERATION

Final Rejection §102§103
Filed
Oct 31, 2023
Examiner
YOUNG, CAMERON KENNETH
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Toyota Research Institute, Inc.
OA Round
2 (Final)
Interview Optional

— +12.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 20 resolved cases, 2023–2026
Examiner Intelligence

YOUNG, CAMERON KENNETH View full profile →
Grants 70% — above average
Career Allow Rate
14 granted / 20 resolved
+8.0% vs TC avg
Moderate +12% lift
Without
With
+12.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
23 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
20.1%
-19.9% vs TC avg
§103
58.9%
+18.9% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
7.7%
-32.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 20 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
	Applicant’s amendments, filed 12/31/2025, has been entered. Claims 1 – 20 remain pending. Applicant’s amendments have overcome each and every 35 U.S.C. § 101 rejection previously laid out in the office action dated 10/03/2025. 

Response to Arguments
Applicant’s arguments, see page 6 of Applicant’s Response, filed 12/31/2025, with respect to the 35 U.S.C. § 101 have been fully considered and are persuasive.  The 35 U.S.C. § 101 rejections of claims 1 – 20 have been withdrawn. Particularly, Applicant’s amendments directed to training generative machine learning language and visual models used to generate texts and images exceed the interpretation of the claims as a mental process. Primarily because the human mind is not considered humanly capable of training machine learning models. As such, the 35 U.S.C. § 101 rejections of the claims have been withdrawn for at least this reason.
Applicant's arguments filed 12/31/2025 with respect to the 35 U.S.C. § 102 and 103 rejections of claims 1 – 20 have been fully considered but they are not persuasive. 
Particularly, Applicant alleges, on page 7 of Applicant’s Response, that Finegan does not teach the newly amended limitations, let alone training machine learning language models. Examiner disagrees. 
Particularly, Examiner notes that Finegan uses trained machine learning models to both generate text and generate images which are made into video clips. See Finegan at 2:45 – 3:32 and 3:57 – 4:6. As referenced, Finegan indeed teaches training machine learning models for the generation of text and images. Therefore, the 35 U.S.C. § 102 rejections of claims 1, 3, 6, 8, 10, 13, 15, 17, and 20 are maintained in view of the rejections laid out below, and the arguments presented above. 
Further, Applicant argues that Choi and Chakiat do not cure the deficiencies of Finegan. However, as laid out in the 35 U.S.C. § 102 rejections of claims 1 and 15 below, Finegan indeed teaches the newly amended limitations. As such, Finegan has no deficiencies to cure, and the 35 U.S.C. § 103 rejections of claims 2, 4 – 5, 9, 11 – 12, 16, and 18 – 19 are maintained in light of the arguments above and the rejections laid out below. 

Claim Rejections - 35 USC § 102
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 3, 6, 8, 10, 13, 15, 17, and 20 are rejected under 35 U.S.C. § 102(a)(2) as being anticipated by U.S. Patent No. 11,797,780 B1 to Corinne Finegan et al. (hereinafter Finegan).
Regarding claim 1, Finegan teaches a system comprising: a processor; and a memory storing machine-readable instructions that, when executed by the processor, cause the processor to: (Finegan teaches a computer implemented system including a processor, memory, and a variety of other connected devices. Finegan at 10:57 - 11:41.)
train a generative language model based on a machine learning model using a set of training data; (Finegan teaches generative language models are trained on more than a first threshold amount of data. (i.e., Finegan's generative model is trained, and therefore Finegan explicitly discloses training a generative machine learning model) Finegan at 2:45 - 3:32.)
train a generative visual model based on a machine learning model using a second set of training data; (Finegan teaches the text-to-image model (i.e., generative visual model) is trained to generate digital images using a corpus comprising a large collection of potentially noisy text-image pairs. (i.e., the generative text-to-image model is trained using a second set of training data, namely the potentially noisy text-image pairs.) Finegan at 3:57 - 4:6.)
generate a plurality of texts that are related and semantically diverse based on one or more prompts using the generative language model; (Finegan teaches generating a summary of a plurality of text documents (i.e., related texts (summaries) generated from prompts (input text documents) using a large language machine model (i.e., a generative language model).) Finegan at 5:41 - 8:47.)
generate a plurality of images based on at least a portion of the plurality of texts using the generative visual model. (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47.)
and output at least a portion of the plurality of images to a data storage location. (Finegan teaches generating a video clip (i.e., a plurality of images) and both outputting the video clip and storing the video clip.  (i.e., at least a portion of the plurality of images is output by storing the video clip.) Finegan at 8:36 - 8:47.).

Regarding claim 3, Finegan teaches the system of claim 1, wherein the one or more prompts are based on at least one of: machine-generated data; or human-generated data. (Finegan teaches the summary sentences are generated from input documents (i.e., human-generated data). Finegan at 5:41 - 8:47. Further, Finegan teaches using the generated summaries to generate keywords which are used to generate images (i.e., machine-generated data). Finegan at 5:41 - 8:47. As such, a person of ordinary skill in the art would have understood that the prompts are based on both machine generated and human generated data because the keywords are extracted from machine-generated summaries based on user input documents which, in essence, are based on both machine-generated and human-generated content.)

Regarding claim 6, Finegan teaches the system of claim 1, wherein the portion of the plurality of texts is based on a diverse selection of the plurality of texts. (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47. As such, a collection of keywords amounts to a diverse selection of the plurality of texts as the collection of keywords can be extracted from all the texts resulting in a spread as diverse as the collection of input documents.)


Regarding claim 8, Finegan teaches a method comprising: 
training a generative language model based on a machine learning model using a set of training data; (Finegan teaches generative language models are trained on more than a first threshold amount of data. (i.e., Finegan's generative model is trained, and therefore Finegan explicitly discloses training a generative machine learning model) Finegan at 2:45 - 3:32.)
training a generative visual model based on a machine learning model using a second set of training data; (Finegan teaches the text-to-image model (i.e., generative visual model) is trained to generate digital images using a corpus comprising a large collection of potentially noisy text-image pairs. (i.e., the generative text-to-image model is trained using a second set of training data, namely the potentially noisy text-image pairs.) Finegan at 3:57 - 4:6.)
generating a plurality of texts that are related and semantically diverse based on one or more prompts using the generative language model; (Finegan teaches generating a summary of a plurality of text documents (i.e., related texts (summaries) generated from prompts (input text documents) using a large language machine model (i.e., a generative language model).) Finegan at 5:41 - 8:47.)
generating a plurality of images based on at least a portion of the plurality of texts using the generative visual model. (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47.)
and outputting at least a portion of the plurality of images to a data storage location. (Finegan teaches generating a video clip (i.e., a plurality of images) and both outputting the video clip and storing the video clip.  (i.e., at least a portion of the plurality of images is output by storing the video clip.) Finegan at 8:36 - 8:47.).

Regarding claim 10, Finegan teaches the method of claim 8, wherein the one or more prompts are based on at least one of: machine-generated data; or human-generated data. (Finegan teaches the summary sentences are generated from input documents (i.e., human-generated data). Finegan at 5:41 - 8:47. Further, Finegan teaches using the generated summaries to generate keywords which are used to generate images (i.e., machine-generated data). Finegan at 5:41 - 8:47. As such, a person of ordinary skill in the art would have understood that the prompts are based on both machine generated and human generated data because the keywords are extracted from machine-generated summaries based on user input documents which, in essence, are based on both machine-generated and human-generated content.)

Regarding claim 13, Finegan teaches the method of claim 8, wherein the portion of the plurality of texts is based on a diverse selection of the plurality of texts. (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47. As such, a collection of keywords amounts to a diverse selection of the plurality of texts as the collection of keywords can be extracted from all the texts resulting in a spread as diverse as the collection of input documents.)

Regarding claim 15, Finegan teaches a non-transitory computer-readable medium including instructions that when executed by a processor cause the processor to: (Finegan teaches a computer implemented system including a processor, memory, and a variety of other connected devices. Finegan at 10:57 - 11:41.)
train a generative language model based on a machine learning model using a set of training data; (Finegan teaches generative language models are trained on more than a first threshold amount of data. (i.e., Finegan's generative model is trained, and therefore Finegan explicitly discloses training a generative machine learning model) Finegan at 2:45 - 3:32.)
train a generative visual model based on a machine learning model using a second set of training data; (Finegan teaches the text-to-image model (i.e., generative visual model) is trained to generate digital images using a corpus comprising a large collection of potentially noisy text-image pairs. (i.e., the generative text-to-image model is trained using a second set of training data, namely the potentially noisy text-image pairs.) Finegan at 3:57 - 4:6.)
generate a plurality of texts that are related and semantically diverse to one or more prompts using the generative language model; (Finegan teaches generating a summary of a plurality of text documents (i.e., related texts (summaries) generated from prompts (input text documents) using a large language machine model (i.e., a generative language model).) Finegan at 5:41 - 8:47.)
generate a plurality of images based on at least a portion of the plurality of texts using the generative visual model. (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47.)
and output at least a portion of the plurality of images to a data storage location. (Finegan teaches generating a video clip (i.e., a plurality of images) and both outputting the video clip and storing the video clip.  (i.e., at least a portion of the plurality of images is output by storing the video clip.) Finegan at 8:36 - 8:47.).

Regarding claim 17, Finegan teaches the non-transitory computer-readable medium of claim 15, wherein the one or more prompts are based on at least one of: machine-generated data; or human-generated data. (Finegan teaches the summary sentences are generated from input documents (i.e., human-generated data). Finegan at 5:41 - 8:47. Further, Finegan teaches using the generated summaries to generate keywords which are used to generate images (i.e., machine-generated data). Finegan at 5:41 - 8:47. As such, a person of ordinary skill in the art would have understood that the prompts are based on both machine generated and human generated data because the keywords are extracted from machine-generated summaries based on user input documents which, in essence, are based on both machine-generated and human-generated content.)

Regarding claim 20, Finegan teaches the non-transitory computer-readable medium of claim 15, wherein the portion of the plurality of texts is based on a diverse selection of the plurality of texts. (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47. As such, a collection of keywords amounts to a diverse selection of the plurality of texts as the collection of keywords can be extracted from all the texts resulting in a spread as diverse as the collection of input documents.)

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 2, 4 – 5, 9, 11 – 12, 16, and 18 – 19 are rejected under 35 U.S.C. 103 as being unpatentable over Finegan in view of U.S. Patent Application Publication No. 2019/0251355 A1 to Hyungtak Choi et al. (hereinafter Choi).
Regarding claim 2, Finegan teaches all the limitations of claim 1 as laid out above. Further, Finegan teaches the system of claim 1, wherein the one or more prompts include at least one of: a text; (Finegan teaches using text as input to generate refined prompts for generation of images from a generative language model. Finegan at 5:41 - 8:47.)
Finegan, however, does not teach the prompts including an image; a sound; or a video. In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches one or more prompts include at least one of: an image; a sound; or a video. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide prompts including an image, sound, or video. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 4, Finegan teaches all the limitations of claim 1 as laid out above. Finegan, however, does not teach the one or more prompts are based on at least one of the plurality of images. 
In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches the system of claim 1, wherein the one or more prompts are based on at least one of the plurality of images. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088]. As such, a person of ordinary skill in the art would have understood that images generated by Finegan could also be used as input to the Finegan's system based on Choi's teaching of using images, sound, and video as prompts for text generation.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide prompts based on at least one of the plurality of images. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 5, Finegan teaches all the limitations of claim 1 as laid out above. Finegan, however, does not teach [generating] the plurality of images based on at least an image.
In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches the system of claim 1, wherein the machine-readable instructions further include instructions that when executed by the processor cause the processor to: generate the plurality of images based on at least an image. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088]. As such, Choi's use of image, audio, and video-based inputs for text generation in combination with Finegan's generation of text based off of user input amounts to generating a plurality of images based off of at least an image.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide generating the plurality of images based on at least an image. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 9, Finegan teaches all the limitations of claim 8 as laid out above. Further, Finegan teaches the method of claim 8, wherein the one or more prompts include at least one of: a text; (Finegan teaches using text as input to generate refined prompts for generation of images from a generative language model. Finegan at 5:41 - 8:47.)
Finegan, however, does not teach the prompts including an image; a sound; or a video. In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches one or more prompts include at least one of: an image; a sound; or a video. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide prompts including an image, sound, or video. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 11, Finegan teaches all the limitations of claim 8 as laid out above. Finegan, however, does not teach the one or more prompts are based on at least one of the plurality of images. 
In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches the method of claim 8, wherein the one or more prompts are based on at least one of the plurality of images. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088]. As such, a person of ordinary skill in the art would have understood that images generated by Finegan could also be used as input to the Finegan's system based on Choi's teaching of using images, sound, and video as prompts for text generation.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide prompts based on at least one of the plurality of images. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 12, Finegan teaches all the limitations of claim 8 as laid out above. Finegan, however, does not teach generating the plurality of images based on at least an image.
In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches the method of claim 8, further comprising: generating the plurality of images based on at least an image. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088]. As such, Choi's use of image, audio, and video-based inputs for text generation in combination with Finegan's generation of text based off of user input amounts to generating a plurality of images based off of at least an image.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide generating the plurality of images based on at least an image. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 16, Finegan teaches all the limitations of claim 15 as laid out above. Regarding claim 9, Finegan teaches all the limitations of claim 8 as laid out above. Further, Finegan teaches the non-transitory computer-readable medium of claim 15, wherein the one or more prompts include at least one of: a text; (Finegan teaches using text as input to generate refined prompts for generation of images from a generative language model. Finegan at 5:41 - 8:47.)
Finegan, however, does not teach the prompts including an image; a sound; or a video. In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches one or more prompts include at least one of: an image; a sound; or a video. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide prompts including an image, sound, or video. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 18, Finegan teaches all the limitations of claim 15 as laid out above. Finegan, however, does not teach the one or more prompts are based on at least one of the plurality of images. 
In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches the non-transitory computer-readable medium of claim 15, wherein the one or more prompts are based on at least one of the plurality of images. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088]. As such, a person of ordinary skill in the art would have understood that images generated by Finegan could also be used as input to the Finegan's system based on Choi's teaching of using images, sound, and video as prompts for text generation.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide prompts based on at least one of the plurality of images. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Regarding claim 19, Finegan teaches all the limitations of claim 15 as laid out above. Finegan, however, does not teach generating the plurality of images based on at least an image.
In a similar field of endeavor (e.g., generating text using machine learning and artificial intelligence based on input text or prompts), Choi teaches the non-transitory computer-readable medium of claim 15, wherein the instructions further include instructions that when executed by the processor cause the processor to: generate the plurality of images based on at least an image. (Choi teaches generating a text based on user input wherein the user input is retrieved by an A/V interface which uses an audio signal, video signal, or photography mode to collect/input data for generation of a text comment. (i.e., the prompts include an image, sound, or video.) Choi at ¶ [0088]. As such, Choi's use of image, audio, and video-based inputs for text generation in combination with Finegan's generation of text based off of user input amounts to generating a plurality of images based off of at least an image.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Choi to provide generating the plurality of images based on at least an image. A person of ordinary skill in the art would have recognized the significant overlapping material between Finegan and Choi. As such, because of their significant overlapping fields of endeavor, a person of ordinary skill in the art would have been motivated to combine the concepts of Choi with the concepts of Finegan as Choi's alternate input interfaces would have expanded Finegan's capabilities to receive input. Further, various forms of input are commonly understood in the art, as such, a person of ordinary skill in the art would have found it obvious to provide additional modes of input for the generative model.

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Finegan in view of U.S. Patent Application Publication No. 2025/0077568 A1 to Chakiat et al. (hereinafter Chakiat). 
Regarding claim 7, Finegan teaches all the limitations of claim 1 as laid out above. Further, Finegan teaches the system of claim 1, wherein the plurality of images includes at least one of: new images created by the generative visual model; (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47.)
Finegan, however, does not teach the plurality of images includes already existing images selected based on the generative visual model.
In a similar field of endeavor (e.g., generating text and images based on the generated text), Chakiat teaches already existing images selected based on the generative visual model. (Chakiat teaches generating personalized content by generating a summary using artificial intelligence and select an image from a plurality of images stored in a database. (i.e., select an already existing image based on a generative model). Chakiat at ¶¶ [0018] - [0021].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Chakiat to provide a plurality of images selected based on the generative visual model. A person of ordinary skill in the art would have found it obvious to combine the teachings of Finegan with the teachings of Chakiat due to their significant overlapping fields of endeavor (e.g., generation of text and images based on the text). Particularly, a simple substitution of Chakiat's selection of images based on generated text for Finegan's generation of images would accomplish the limitations of the claims. As such, a person of ordinary skill in the art would be motivated to combine Finegan and Chakiat because of their similar fields of endeavor and processes.

Regarding claim 14, Finegan teaches all the limitations of claim 8 as laid out above. Further, Finegan teaches method of claim 8, wherein the plurality of images includes at least one of: new images created by the generative visual model; (Finegan teaches generating a plurality of images from a collection of keywords distilled from the summaries generated by the language model. (i.e., generating a plurality of images based on a portion of a plurality of texts). Finegan 5:41 - 8:47.)
Finegan, however, does not teach the plurality of images includes already existing images selected based on the generative visual model.
In a similar field of endeavor (e.g., generating text and images based on the generated text), Chakiat teaches already existing images selected based on the generative visual model. (Chakiat teaches generating personalized content by generating a summary using artificial intelligence and select an image from a plurality of images stored in a database. (i.e., select an already existing image based on a generative model). Chakiat at ¶¶ [0018] - [0021].)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Finegan with the teachings of Chakiat to provide a plurality of images selected based on the generative visual model. A person of ordinary skill in the art would have found it obvious to combine the teachings of Finegan with the teachings of Chakiat due to their significant overlapping fields of endeavor (e.g., generation of text and images based on the text). Particularly, a simple substitution of Chakiat's selection of images based on generated text for Finegan's generation of images would accomplish the limitations of the claims. As such, a person of ordinary skill in the art would be motivated to combine Finegan and Chakiat because of their similar fields of endeavor and processes.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMERON KENNETH YOUNG whose telephone number is (703)756-1527. The examiner can normally be reached Mon - Fri, 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CAMERON KENNETH YOUNG/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/ Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Oct 31, 2023
Application Filed
Sep 29, 2025
Non-Final Rejection — §102, §103
Dec 31, 2025
Response Filed
Apr 01, 2026
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/999,850
Patent 12602409
INFORMATION SEARCH SYSTEM
2y 5m to grant Granted Apr 14, 2026
18/290,574
Patent 12592230
RECOGNITION OR SYNTHESIS OF HUMAN-UTTERED HARMONIC SOUNDS
2y 5m to grant Granted Mar 31, 2026
17/974,455
Patent 12567429
VOICE CALL CONTROL METHOD AND APPARATUS, COMPUTER-READABLE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 03, 2026
18/619,608
Patent 12525250
Cascade Architecture for Noise-Robust Keyword Spotting
2y 5m to grant Granted Jan 13, 2026
18/096,309
Patent 12493748
LARGE LANGUAGE MODEL UTTERANCE AUGMENTATION
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
82%
With Interview (+12.5%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 20 resolved cases by this examiner. Grant probability derived from career allow rate.