DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This action is in response to Pre-Brief appeal conference decision on 15 January 2026. Claims 1-21 are currently pending and have been examined.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li et al (US Pub., 2024/0161369 A1) in view of Divakaran et al (US Pub. No., 2021/0297498 A1)
With respect to claim 1,Li teaches a method for content generation(Figs. , 2, 4A-B, 5 paragraphs [0005], [0013] and [0019], disclose systems and methods for subject-drive image generation), comprising:
obtaining, by a user experience platform, a content provider context for the user experience platform, wherein the content provider context includes profile information for a content provider and an interaction history of the content provider (paragraph [0045], discloses computing device may receive the input (such as s training dataset) from a networked database via a commucation interface .., such as input subject matter from a user via the user interface, paragraph [0066], discloses database may store user profile relating to the user, predication previously viewed or saved by user historical data receiving from the server, and paragraph [0068], discloses provide training dataset including training image and prompt to the server , ) ;
Additionally, paragraph [0029], of provisional application no 63/424, 413 discloses database 218 may store user profile relating to the user 240, predications previously viewed or saved by the user 240, historical data received from the server 230 and the like, and paragraph [0031], discloses data vender server may correspond to a server to correspond to a server that hosts database 219 to
provide training datasets including image, text, or image-text pairs to the server 230. The
database 219 may be implemented by one or more relational database, distributed
databases, cloud databases, and/ or the like. Thus, obtaining content provider context includes profile information for a content provider and an interaction history of the content provider is addressed, by provisional application no 63/424, 413.
receiving by the query elements of the user interface _a user query relating to the context provider context (Fig. 3, and paragraph [0040], discloses input subject image 102 illustrates a backpack [query elements], which resemble a dog face …., using input subject image 102 which subject text 112 “backpack” and test prompt 118 “cube shaped” , Fig. 7, receive via a data interface, a subject image containing a subject, a text description of the subject in the image ); and
paragraph [0019], of provisional application no 63/424, 413 discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and paragraph [0031], of provisional application no 63/424, 413 discloses provide training data set including image, text, or image-text pairs to the server 230
Thus, provisional application no 63/424, 413 address receiving by the query elements of the user interface _a user query relating to the context provider context;
generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart(paragraph [0040], discloses output image 124a may be generated by subject-driven image model 130 using input subject image 102 with subject text 112 "backpack" and text prompt 118 "cube shaped)
paragraph [0019], of provisional application no 63/424, 413 discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and generate an output 150 which maybe image captions or classification label [text prompt] and paragraph [0028], discloses a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification output. Thus, provisional application no 63/424, 413 address generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart
encoding, by a multimodal encoder (paragraph [0006], discloses multimodal encoder) , the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector(Fig. 7, 702, discloses encode via an image encoder, the subject image into an image feature vector paragraph [0092], discloses the system generates by a multimodal encoder (e.g., multimodal encoder) a vector representation (e.g., subject embedding 116) pf the subject based on the image feature vector and the text feature vector.., paragraph [0093], discloses based on an input combing the text prompt and the vector representation .., combined by being concatenated and input to a text encoder..) ; and
paragraph [0039], of provisional application no 63/424, 413 discloses multimodal vision-language mode comprising a frozen image encoder and paragraph [0042], of provisional application no 63/424, 413 discloses GemFormer takes as input image feature vectors from the frozen Image Encoder, at least one query, and text. The images are encoded by the frozen image encoder while the text representations are generated by GemFormer from the text. Thus, provisional application no 63/424, 413 address encoding, by a multimodal encoder the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector
generating, using the machine learning model, content according to the text instruction based on the prompt embedding (Fig. 1, paragraph [0026], discloses generates an output image, paragraph [0028], discloses embedding 116 and text prompt 118 may be combined and input to text encoder to generate the prompt for mage model .., generate an output image based on th prompt…, The prompt template may be, for example, "[text prompt], the [subject text] is [subject embedding]"…,, and paragraph [0029], discloses generate a combined subject embedding .., ). paragraph [0042], of provisional application no 63/424, 413 discloses in addition to the image feature input, GemFormer also receives a set of queries that it embeds and uses to contrast the image feature vectors and text representations. The output query embeddings learn the relevant image features before they are contrasted to the text representations. Thus, provisional application no 63/424, 413 address generating, using the machine learning model, content according to the text instruction based on the prompt embedding.
The provisional application no 63/424, 413 teaches all the above elements including a vision-language mode may be trained to receive an input image and generate a text caption of the input image. For another example, vision-language model may be trained to receive a text description of a visual scene and generate an image reconstruction the described visual scene (paragraph [0002]) and for example, a vision interfaces and other display modules that may receive input and/or output information. For example, other applications 216 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification outputs (paragraph [0028]). The provisional application no 63/424, 413 of Li failed to explicitly teach displaying by a user interface of the user experience platform, a chart and query elements of the user interface based on the content provider.
However, Divakaran teaches displaying by a user interface of the user experience platform, a chart and query elements of the user interface based on the content provider (Fig. 6, paragraph [0020], dislcies the visual display to top image [chart] and top words [query elements] and paragraph [0065], discloses visual disply of top images and top words of eight (8) cluster of images and words clustered). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date for a text description of a visual scene and generate an image reconstruction the described visual scene of the provisional application of Li with a feature of displying the representation of embedded content of Divakaran to allows one to represent/graph the idea of word relationships which are hard coded "word vectors" (see Divakaran, paragraph [0037])
With respect to claim 2, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method further comprising: displaying, by a user interface, the query element based on the interaction history (Fig. 9, and paragraph [0098], discloses provides a chart illustrating .., performance of at least one embodiment described , backpack, backpack_dog, can, cat, etc. [query elements] ).
With respect to claim 3, Li in view of Divakaran teaches elements of claim 1,furthermore, Li teaches generating, by the user experience platform, the text prompt based on the content provider context(paragraph [0075], discloses generating or editing an image give a conditioning input such as a text prompt).
With respect to claim 4, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method wherein: the content provider context includes information in multiple modalities including a text modality and an image modality, wherein the content is generated based on the information in the multiple modalities(paragraph [0018], discloses text-to-image generation models .., in different contexts or different variations , existing generation models, and paragraph [0019], discloses image generation models ).
With respect to claim 5, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method wherein: the content provider context comprises a user journey, analytics context, an audience segmentation context, a campaign generation context, or any combination thereof(paragraph [0017], discloses obtained from any give marketing element or performance measures of the other aspect of the marketing elements ..).
With respect to claim 6, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method wherein: the content provider context includes structured information representing a user journey, a campaign brief, or a campaign program(paragraph [0068], discloses provide training dataset includes training image and prompt the server .., and paragraph [0076], discloses a training dataset may be associated with information such as a caption for each image in the training dataset that may be used as a conditioning input t .).
With respect to claim 7, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method further comprising: providing, by the user experience platform, a recommendation to the content provider for an interaction with the user experience platform, wherein the recommendation is based on the content (paragraph [0064], discloses the user device may receive a message indicating a generated image [recommendation] for the server and disply message via UI application).
With respect to claim 8, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method further comprising:
receiving, by the machine learning model, a request from the content provider to generate the content, wherein the content is generated in response to the request (paragraph [0018], discloses machine learing system may have been widely used in image generation tasks..) )
With respect to claim 9, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method further comprising: receiving, by a training component, from the content provider based on the content(paragraph [0045], discloses receiving training dataset from a networked database via commucation and paragraph [0076], dislcies a training dataset may include a verity of images associated with information such as a caption for each image in the training data set ); and
updating, by the training component, the machine learning model (paragraph [0094], discloses updating the plurlity of query vectors). Li failed to explicitly teach the corresponding received training dataset and the corresponding updated a plurlity of a query vector is based on feedback.
However, Divakaran receiving feedback and updating based on received feedback (paragraph [0032], discloses content consumption monitoring module.., paragraph [0033], dislcies information regarding a user(s) can be monored and separated .., monitor a user’s interaction with content associated). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date of the claimed invention for receiving training dataset from a networked database of Li with monitoring content consumption module of Divakaran in order to resulting in performance improvements (see Divakaran, paragraph [0061]).
With respect to claim 10, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method wherein: the machine learning model is trained(paragraph [0045], discloses receiving training dataset from a networked database via commucation and paragraph [0076], dislcies a training dataset may include a verity of images associated with information such as a caption for each image in the training data set ). Li failed to teach the corresponding training dataset is based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform.
However, Divakaran based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform(paragraph [0075], discloses determined user prefer content can include audio content spoken in a user preferred English accent). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date of the claimed invention for receiving training dataset from a networked database of Li with audio content spoke in a user preferred English accent of Divakaran in order to determine the desired to conveyed using the determined user preferred content type (see Divakaran, paragraph [0075]).
With respect to claim 11, Li teaches a non-transitory computer readable medium storing code for content generation ,the code comprising instructions that when executed by at least one processor, causes the at least one processor to perform operation (paragraph [ 0047], disclose a computing devices, such as computing device 400 may include non-transitory, tangible machine-readable media that include executable code that when run by one or more processors (e.g., processor) may cause the one or more percussor to perform ), comprising:
obtaining, by a user experience platform, a content provider context for the user experience platform, wherein the content provider context includes profile information for a content provider and an interaction history of the content provider (paragraph [0045], discloses computing device may receive the input (such as s training dataset) from a networked database via a commucation interface .., such as input subject matter from a user via the user interface, paragraph [0066], discloses database may store user profile relating to the user, predication previously viewed or saved by user historical data receiving from the server, and paragraph [0068], discloses provide training dataset including training image and prompt to the server , ) ;
Additionally, paragraph [0029], of provisional application no 63/424, 413 discloses database 218 may store user profile relating to the user 240, predications previously viewed or saved by the user 240, historical data received from the server 230 and the like, and paragraph [0031], discloses data vender server may correspond to a server to correspond to a server that hosts database 219 to
provide training datasets including image, text, or image-text pairs to the server 230. The
database 219 may be implemented by one or more relational database, distributed
databases, cloud databases, and/ or the like. Thus, obtaining content provider context includes profile information for a content provider and an interaction history of the content provider is addressed, by provisional application no 63/424, 413.
receiving by the query elements of the user interface _a user query relating to the context provider context (Fig. 3, and paragraph [0040], discloses input subject image 102 illustrates a backpack [query elements], which resemble a dog face …., using input subject image 102 which subject text 112 “backpack” and test prompt 118 “cube shaped” , Fig. 7, receive via a data interface, a subject image containing a subject, a text description of the subject in the image ); and
paragraph [0019], of provisional application no 63/424, 413 discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and paragraph [0031], of provisional application no 63/424, 413 discloses provide training data set including image, text, or image-text pairs to the server 230
Thus, provisional application no 63/424, 413 address receiving by the query elements of the user interface _a user query relating to the context provider context;
generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart(paragraph [0040], discloses output image 124a may be generated by subject-driven image model 130 using input subject image 102 with subject text 112 "backpack" and text prompt 118 "cube shaped)
paragraph [0019], of provisional application no 63/424, 413 discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and generate an output 150 which maybe image captions or classification label [text prompt] and paragraph [0028], discloses a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification output. Thus, provisional application no 63/424, 413 address generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart
encoding, by a multimodal encoder (paragraph [0006], discloses multimodal encoder) , the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector(Fig. 7, 702, discloses encode via an image encoder, the subject image into an image feature vector paragraph [0092], discloses the system generates by a multimodal encoder (e.g., multimodal encoder) a vector representation (e.g., subject embedding 116) pf the subject based on the image feature vector and the text feature vector.., paragraph [0093], discloses based on an input combing the text prompt and the vector representation .., combined by being concatenated and input to a text encoder..) ; and
paragraph [0039], of provisional application no 63/424, 413 discloses multimodal vision-language mode comprising a frozen image encoder and paragraph [0042], of provisional application no 63/424, 413 discloses GemFormer takes as input image feature vectors from the frozen Image Encoder, at least one query, and text. The images are encoded by the frozen image encoder while the text representations are generated by GemFormer from the text. Thus, provisional application no 63/424, 413 address encoding, by a multimodal encoder the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector
generating, using the machine learning model, content according to the text instruction based on the prompt embedding (Fig. 1, paragraph [0026], discloses generates an output image, paragraph [0028], discloses embedding 116 and text prompt 118 may be combined and input to text encoder to generate the prompt for mage model .., generate an output image based on th prompt…, The prompt template may be, for example, "[text prompt], the [subject text] is [subject embedding]"…,, and paragraph [0029], discloses generate a combined subject embedding .., ). paragraph [0042], of provisional application no 63/424, 413 discloses in addition to the image feature input, GemFormer also receives a set of queries that it embeds and uses to contrast the image feature vectors and text representations. The output query embeddings learn the relevant image features before they are contrasted to the text representations. Thus, provisional application no 63/424, 413 address generating, using the machine learning model, content according to the text instruction based on the prompt embedding.
The provisional application no 63/424, 413 teaches all the above elements including a vision-language mode may be trained to receive an input image and generate a text caption of the input image. For another example, vision-language model may be trained to receive a text description of a visual scene and generate an image reconstruction the described visual scene (paragraph [0002]) and for example, a vision interfaces and other display modules that may receive input and/or output information. For example, other applications 216 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification outputs (paragraph [0028]). The provisional application no 63/424, 413 of Li failed to explicitly teach displaying by a user interface of the user experience platform, a chart and query elements of the user interface based on the content provider.
However, Divakaran teaches displaying by a user interface of the user experience platform, a chart and query elements of the user interface based on the content provider (Fig. 6, paragraph [0020], dislcies the visual display to top image [chart] and top words [query elements] and paragraph [0065], discloses visual disply of top images and top words of eight (8) cluster of images and words clustered). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date for a text description of a visual scene and generate an image reconstruction the described visual scene of the provisional application of Li with a feature of displying the representation of embedded content of Divakaran to allows one to represent/graph the idea of word relationships which are hard coded "word vectors" (see Divakaran, paragraph [0037])
With respect to claim 12, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the a non-transitory computer readable medium the code further comprising instruction executable by the processor cause the at least one processor to perform operation comprising: displaying the query element based on the interaction history(Fig. 9, and paragraph [0098], discloses provides a chart illustrating .., performance of at least one embodiment described , backpack, backpack_dog, can, cat, etc. [query elements] ).
With respect to claim 13, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the non-transitory computer readable medium the code further comprising instruction executable by the processor cause the at least one processor to perform operation comprising: generating, by the user experience platform, the prompt based on the content provider context(paragraph [0075], discloses generating or editing an image give a conditioning input such as a text prompt).
.
With respect to claim 14, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the teaches the non-transitory computer readable medium wherein : the content provider context includes information in multiple modalities including a text modality and an image modality, wherein the content is generated based on the information in the multiple modalities(paragraph [0018], discloses text-to-image generation models .., in different contexts or different variations , existing generation models, and paragraph [0019], discloses image generation models ).
With respect to claim 15, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the teaches the non-transitory computer readable medium wherein: the content provider context comprises a user journey, analytics context, an audience segmentation context, a campaign generation context, or any combination thereof(paragraph [0017], discloses obtained from any give marketing element or performance measures of the other aspect of the marketing elements ..).
With respect to claim 16, Li in view of Divakaran teaches elements of claim 11, furthermore, Cavander teaches the non-transitory computer readable medium wherein: the content provider context includes structured information representing a user journey, a campaign brief, or a campaign program(paragraph [0068], discloses provide training dataset includes training image and prompt the server .., and paragraph [0076], discloses a training dataset may be associated with information such as a caption for each image in the training dataset that may be used as a conditioning input t .).
With respect to claim 17, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the non-transitory computer readable medium further comprising: providing, by the user experience platform, a recommendation to the content provider for an interaction with the user experience platform, wherein the recommendation is based on the content(paragraph [0017], discloses generating or updating a predictive mode that is used at least in part for generating a marketing plan and paragraph [0018], discloses a model may include a deterministic discrete, dynamic distributed machine learing or discriminative mathematical model (e.g., a support vector machine).
With respect to claim 18, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the non-transitory computer readable medium further comprising:
receiving, by the machine learning model, a request from the content provider to generate the content, wherein the content is generated in response to the request(paragraph [0018], discloses machine learing system may have been widely used in image generation tasks..) )
With respect to claim 19 Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the non-transitory computer readable medium wherein: the machine learning model is trained(paragraph [0045], discloses receiving training dataset from a networked database via commucation and paragraph [0076], dislcies a training dataset may include a verity of images associated with information such as a caption for each image in the training data set ). Li failed to teach the corresponding training dataset is based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform.
However, Divakaran based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform(paragraph [0075], discloses determined user prefer content can include audio content spoken in a user preferred English accent). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date of the claimed invention for receiving training dataset from a networked database of Li with audio content spoke in a user preferred English accent of Divakaran in order to determine the desired to conveyed using the determined user preferred content type (see Divakaran, paragraph [0075]).
With respect to claim 20, Li teaches a system comprising:
memory component(Fig.1, 410 processor and paragraph [0041], discloses processor coupled to memory , ) ;
processing device coupled to the memory component; the processing device conjured to perform operation comprising (paragraph [0041], discloses processor coupled to memory and paragraph [0042], discloses member may be used to store software executed by computing device );
obtaining, by a user experience platform, a content provider context for the user experience platform, wherein the content provider context includes profile information for a content provider and an interaction history of the content provider (paragraph [0045], discloses computing device may receive the input (such as s training dataset) from a networked database via a commucation interface .., such as input subject matter from a user via the user interface, paragraph [0066], discloses database may store user profile relating to the user, predication previously viewed or saved by user historical data receiving from the server, and paragraph [0068], discloses provide training dataset including training image and prompt to the server , ) ;
Additionally, paragraph [0029], of provisional application no 63/424, 413 discloses database 218 may store user profile relating to the user 240, predications previously viewed or saved by the user 240, historical data received from the server 230 and the like, and paragraph [0031], discloses data vender server may correspond to a server to correspond to a server that hosts database 219 to
provide training datasets including image, text, or image-text pairs to the server 230. The
database 219 may be implemented by one or more relational database, distributed
databases, cloud databases, and/ or the like. Thus, obtaining content provider context includes profile information for a content provider and an interaction history of the content provider is addressed, by provisional application no 63/424, 413.
receiving by the query elements of the user interface _a user query relating to the context provider context (Fig. 3, and paragraph [0040], discloses input subject image 102 illustrates a backpack [query elements], which resemble a dog face …., using input subject image 102 which subject text 112 “backpack” and test prompt 118 “cube shaped” , Fig. 7, receive via a data interface, a subject image containing a subject, a text description of the subject in the image ); and
paragraph [0019], of provisional application no 63/424, 413 discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and paragraph [0031], of provisional application no 63/424, 413 discloses provide training data set including image, text, or image-text pairs to the server 230
Thus, provisional application no 63/424, 413 address receiving by the query elements of the user interface _a user query relating to the context provider context;
generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart(paragraph [0040], discloses output image 124a may be generated by subject-driven image model 130 using input subject image 102 with subject text 112 "backpack" and text prompt 118 "cube shaped)
paragraph [0019], of provisional application no 63/424, 413 discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and generate an output 150 which maybe image captions or classification label [text prompt] and paragraph [0028], discloses a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification output. Thus, provisional application no 63/424, 413 address generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart
encoding, by a multimodal encoder (paragraph [0006], discloses multimodal encoder) , the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector(Fig. 7, 702, discloses encode via an image encoder, the subject image into an image feature vector paragraph [0092], discloses the system generates by a multimodal encoder (e.g., multimodal encoder) a vector representation (e.g., subject embedding 116) pf the subject based on the image feature vector and the text feature vector.., paragraph [0093], discloses based on an input combing the text prompt and the vector representation .., combined by being concatenated and input to a text encoder..) ; and
paragraph [0039], of provisional application no 63/424, 413 discloses multimodal vision-language mode comprising a frozen image encoder and paragraph [0042], of provisional application no 63/424, 413 discloses GemFormer takes as input image feature vectors from the frozen Image Encoder, at least one query, and text. The images are encoded by the frozen image encoder while the text representations are generated by GemFormer from the text. Thus, provisional application no 63/424, 413 address encoding, by a multimodal encoder the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector
generating, using the machine learning model, content according to the text instruction based on the prompt embedding (Fig. 1, paragraph [0026], discloses generates an output image, paragraph [0028], discloses embedding 116 and text prompt 118 may be combined and input to text encoder to generate the prompt for mage model .., generate an output image based on th prompt…, The prompt template may be, for example, "[text prompt], the [subject text] is [subject embedding]"…,, and paragraph [0029], discloses generate a combined subject embedding .., ). paragraph [0042], of provisional application no 63/424, 413 discloses in addition to the image feature input, GemFormer also receives a set of queries that it embeds and uses to contrast the image feature vectors and text representations. The output query embeddings learn the relevant image features before they are contrasted to the text representations. Thus, provisional application no 63/424, 413 address generating, using the machine learning model, content according to the text instruction based on the prompt embedding.
The provisional application no 63/424, 413 teaches all the above elements including a vision-language mode may be trained to receive an input image and generate a text caption of the input image. For another example, vision-language model may be trained to receive a text description of a visual scene and generate an image reconstruction the described visual scene (paragraph [0002]) and for example, a vision interfaces and other display modules that may receive input and/or output information. For example, other applications 216 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification outputs (paragraph [0028]). The provisional application no 63/424, 413 of Li failed to explicitly teach displaying by a user interface of the user experience platform, a chart and query elements of the user interface based on the content provider.
However, Divakaran teaches displaying by a user interface of the user experience platform, a chart and query elements of the user interface based on the content provider (Fig. 6, paragraph [0020], dislcies the visual display to top image [chart] and top words [query elements] and paragraph [0065], discloses visual disply of top images and top words of eight (8) cluster of images and words clustered). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date for a text description of a visual scene and generate an image reconstruction the described visual scene of the provisional application of Li with a feature of displying the representation of embedded content of Divakaran to allows one to represent/graph the idea of word relationships which are hard coded "word vectors" (see Divakaran, paragraph [0037])
The prior art on the record:
Li et al (US Pub., 2024/0161369 A1) discloses embodiments described herein provide systems and methods of subject-driven image generation. In at least one embodiment, a system receives, via a data interface, an image containing a subject, a text description of the subject in the image, and a text prompt relating to a different rendition of the subject. The system encodes, via an image encoder, the image into an image feature vector. The system encodes, via a text encoder, the text description int a text feature vector.
Divakaran et al. (US Pub., No., 2021/0297498 A1) discloses a method, apparatus and system for determining user content associations for determining and providing user preferred content using multimodal embeddings include creating an embedding space for multimodal content by creating a first modality vector representation of the multimodal content having a first modality, creating a second modality vector representation of the multimodal content having a second modality, creating a user vector representation, as a third modality, for each user associated with at least a portion of the multimodal content, and embedding the first and the second modality vector representations and the user vector representations in the common embedding space
Cavander et al (US Pub., 2015/03176701 A1) discloses techniques are disclosed for generating a forward-looking, goal seeking marketing plan that links prior media purchase transactions to predicted future financial results for a brand, product market, or campaign. A computing device is configured to receive input data associated with one or more marketing elements, such as television ads, print ads, and online ads. From the input data, response factors corresponding to each marketing element can be calculated.
Zeng et al (US Pub., 2024/0169623 A1) discloses systems and methods for multi-modal image generation are provided. One or more aspects of the systems and methods includes obtaining a text prompt and layout information indicating a target location for an element of the text prompt within an image to be generated and computing a text feature map including a plurality of values corresponding to the element of the text prompt at pixel locations corresponding to the target location.
Applebaum et al. (US Pub., 2005/0171851 A1) discloses a user authentication system includes a dialogue manager adapted to prompt the user with multiple, selectable passphrases. A selection recognizer recognizes user selection of at least one of the multiple, selectable pass-phrases. A user identity analysis module analyzes one or more potential user identities based on adherence of user selection of the passphrase to predetermined pass-phrase selection criteria assigned one or more enrolled users.
Rodriguez et al (US Pub., 2017/0255198 A1) discloses a system may be configured to manage at least one robotic device. The system may comprise one or more databases and one or more processors in communication with the one or more databases. The one or more processors may be configured to provide an operating system for the at least one robotic device, control motion of the at least one robotic device, configure at least one sensor removably coupled to the at least one robotic device, process data collected by the at least one sensor, and/or perform localization and/or area mapping for the at least one robotic device by comparing data collected by the at least one sensor with data in the one or more databases to generate localization and/or area mapping data.
Response to Arguments
Applicant’s arguments of 35 U.S.C 103 rejection filed on 5 December 2025 with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SABA DAGNEW whose telephone number is (571)270-3271. The examiner can normally be reached 9-6:45.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Waseem Ashraf can be reached on (571) 270 -3948. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SABA DAGNEW/Primary Examiner, Art Unit 3682