Last updated: April 19, 2026
Application No. 18/477,735
SYSTEMS AND METHODS FOR CONTEXTUAL CONTENT GENERATION

Non-Final OA §103
Filed
Sep 29, 2023
Examiner
DAGNEW, SABA
Art Unit
3621
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Adobe Inc.
OA Round
5 (Non-Final)
Interview Optional

— +18.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 594 resolved cases, 2023–2026
Examiner Intelligence

DAGNEW, SABA View full profile →
Grants only 38% of cases
Career Allow Rate
225 granted / 594 resolved
-14.1% vs TC avg
Strong +18% interview lift
Without
With
+18.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
47 currently pending
Career history
641
Total Applications
across all art units
Statute-Specific Performance

§101
31.0%
-9.0% vs TC avg
§103
40.7%
+0.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
8.7%
-31.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 594 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to Pre-Brief appeal conference decision  on 15 January 2026.   Claims 1-21 are currently pending and have been examined.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s)  1-20  is/are rejected under 35 U.S.C. 103 as being unpatentable over Li et al (US Pub., 2024/0161369 A1) in view of Divakaran et al (US Pub. No., 2021/0297498 A1)

With respect to claim 1,Li  teaches a method for content generation(Figs. , 2, 4A-B, 5 paragraphs [0005], [0013]  and [0019], disclose systems and methods for subject-drive image generation), comprising: 

obtaining, by a user experience platform, a content provider context for the user experience platform, wherein the content provider context includes profile information for a content provider and an interaction history of the content provider (paragraph [0045], discloses computing device may receive the input (such as s training dataset) from a networked database via a commucation interface .., such as input subject matter from a user via the user interface, paragraph [0066], discloses  database may store user profile relating to the user, predication previously viewed or saved by user historical data receiving  from the server, and paragraph [0068], discloses  provide training dataset including training image and prompt to the server , ) ; 
Additionally, paragraph [0029], of   provisional application no 63/424, 413  discloses database 218 may store user profile relating to the user 240, predications previously viewed or saved  by the user 240, historical data received from the server 230 and the like,  and paragraph [0031], discloses  data vender server may correspond to a server to correspond to a server that hosts database 219 to
provide training datasets including image, text, or image-text pairs to the server 230. The
database 219 may be implemented by one or more relational database, distributed
databases, cloud databases, and/ or the like.  Thus,  obtaining content provider context includes profile information for a content provider and an interaction history of the content provider is addressed, by provisional   application no 63/424, 413.  

receiving  by the query elements of the user interface  _a user query relating to  the context provider context (Fig. 3, and paragraph [0040], discloses input subject image 102 illustrates a backpack  [query elements], which resemble  a dog face …., using input subject image 102 which subject text 112 “backpack” and test prompt 118 “cube shaped” , Fig. 7, receive via a data interface, a subject image containing  a subject, a text description of the subject in the image ); and 
 paragraph [0019], of   provisional application no 63/424, 413  discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and paragraph [0031], of   provisional application no 63/424, 413 discloses provide training data set including image, text, or image-text pairs to the server 230  
 Thus, provisional application no 63/424, 413 address receiving  by the query elements of the user interface  _a user query relating to  the context provider context; 
 
generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart(paragraph [0040], discloses  output image 124a may be generated by subject-driven image model 130 using input subject image 102 with subject text 112 "backpack" and text prompt 118 "cube shaped)
paragraph [0019], of   provisional application no 63/424, 413  discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and generate an output 150 which maybe  image captions or classification label [text prompt] and paragraph [0028], discloses  a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification output.   Thus, provisional application no 63/424, 413 address generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart

  encoding, by a multimodal encoder (paragraph [0006], discloses multimodal encoder) , the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector(Fig. 7, 702, discloses encode via an image encoder, the subject image into an image feature vector  paragraph [0092], discloses the system generates by a multimodal encoder (e.g., multimodal encoder) a vector representation  (e.g., subject embedding 116) pf the subject based on the image feature vector and the text feature vector.., paragraph [0093], discloses  based on an input combing the text prompt and the vector representation  .., combined by being concatenated and input to a text encoder..) ;  and
paragraph [0039], of   provisional application no 63/424, 413  discloses multimodal vision-language mode comprising a frozen image encoder  and paragraph [0042], of   provisional application no 63/424, 413  discloses GemFormer takes as input image feature vectors from the frozen Image Encoder, at least one query, and text. The images are encoded by the frozen image encoder while the text representations are generated by GemFormer from the text.  Thus, provisional application no 63/424, 413 address  encoding, by a multimodal encoder the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector

generating, using the machine learning model, content according to the text instruction based on the prompt embedding (Fig. 1, paragraph [0026], discloses generates an output image, paragraph [0028], discloses embedding 116 and text prompt 118 may be combined and input to text encoder to generate the prompt for mage model .., generate an output image based on th prompt…,  The prompt template may be, for example, "[text prompt], the [subject text] is [subject embedding]"…,, and paragraph [0029], discloses generate a combined subject embedding .., ).  paragraph [0042], of   provisional application no 63/424, 413  discloses in addition to the image feature input, GemFormer also receives a set of queries that it embeds and uses to contrast the image feature vectors and text representations. The output query embeddings learn the relevant image features before they are contrasted to the text representations. Thus, provisional application no 63/424, 413 address generating, using the machine learning model, content according to the text instruction based on the prompt embedding.  

The provisional application no 63/424, 413 teaches all the above elements including a vision-language mode may be trained to receive an input image and generate a text caption of the input image.  For another example, vision-language model may be trained to receive a text description of a visual scene and generate an image reconstruction the described visual scene (paragraph [0002])  and for example, a vision interfaces and other display modules that may receive input and/or output information. For example, other applications 216 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification outputs (paragraph [0028]).  The provisional application no 63/424, 413 of Li failed to explicitly teach  displaying by a user interface of the user  experience platform, a chart and query elements of the user interface based on the content provider.

However, Divakaran teaches displaying by a user interface of the user  experience platform, a chart and query elements of the user interface based on the content provider (Fig. 6, paragraph [0020], dislcies the visual display to top image [chart]  and top words [query elements]   and paragraph [0065], discloses visual disply of top images and top words of eight (8) cluster of images and words clustered).  Therefore, it would have been obvious to the one ordinary skill in the art before the effective  filing date for   a text description of a visual scene and generate an image reconstruction the described visual scene of the provisional application of Li with a feature of displying the representation of embedded content of Divakaran to  allows one to represent/graph the idea of word relationships which are hard coded "word vectors" (see Divakaran, paragraph [0037]) 

With respect to claim 2, Li  in view of Divakaran teaches elements of claim 1, furthermore, Li  teaches the method  further comprising: displaying, by a user interface, the  query element based on the interaction history (Fig. 9, and paragraph [0098], discloses  provides a chart illustrating ..,  performance of at least one embodiment described , backpack,  backpack_dog, can, cat, etc. [query elements] ).
With respect to claim 3, Li in view of Divakaran teaches elements of claim 1,furthermore, Li teaches    generating, by the user experience platform, the text  prompt based on the content provider context(paragraph [0075], discloses generating or editing an image give a conditioning input such as a text prompt).  

With respect to claim 4, Li  in view of Divakaran teaches elements of claim 1, furthermore, Li  teaches the method wherein: the content provider context includes information in multiple modalities including a text modality and an image modality, wherein the content is generated based on the information in the multiple modalities(paragraph [0018], discloses text-to-image generation models .., in different contexts or different variations , existing generation models, and paragraph [0019], discloses image generation models  ).  

With respect to claim 5, Li in view of Divakaran teaches elements of claim 1, furthermore, Li  teaches the method wherein: the content provider context comprises a user journey, analytics context, an audience segmentation context, a campaign generation context, or any combination thereof(paragraph [0017], discloses obtained from any give marketing element or performance measures of the other aspect of the marketing elements ..).  

With respect to claim 6, Li  in view of Divakaran teaches elements of claim 1, furthermore, Li  teaches the method wherein: the content provider context includes structured information representing a user journey, a campaign brief, or a campaign program(paragraph [0068], discloses provide training dataset includes training  image and prompt the server .., and paragraph [0076], discloses a training dataset   may be associated with information such as a caption for each image in the training dataset that may be used as a conditioning input t .).  

With respect to claim 7, Li  in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method  further comprising: providing, by the user experience platform, a recommendation to the content provider for an interaction with the user experience platform, wherein the recommendation is based on the content (paragraph [0064], discloses the user device may receive a message indicating a generated image [recommendation]  for the server and disply message via UI application).

With respect to claim 8, Li  in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method further comprising: 
receiving, by the machine learning model, a request from the content provider to generate the content, wherein the content is generated in response to the request (paragraph [0018], discloses machine learing system may have been widely used in image generation tasks..) )
With respect to claim 9, Li in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method further comprising: receiving, by a training component,  from the content provider based on the content(paragraph [0045], discloses receiving training dataset from a networked database via   commucation and paragraph [0076], dislcies a training dataset may include a verity of images associated with information such as a caption for each image in the training data set  ); and
 updating, by the training component, the machine learning model  (paragraph [0094], discloses updating the plurlity of query vectors).  Li failed to explicitly  teach the corresponding received training dataset and the corresponding updated a plurlity of a query vector is based on feedback.  
However, Divakaran receiving feedback and updating based on received feedback (paragraph [0032], discloses  content consumption monitoring module.., paragraph [0033], dislcies information regarding a user(s) can be monored and separated .., monitor a user’s interaction with content associated).  Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date of the claimed invention for receiving training dataset from a networked database of Li with monitoring content consumption module  of Divakaran in order to resulting in performance improvements (see Divakaran, paragraph [0061]).

With respect to claim 10, Li  in view of Divakaran teaches elements of claim 1, furthermore, Li teaches the method  wherein: the machine learning model is trained(paragraph [0045], discloses receiving training dataset from a networked database via   commucation and paragraph [0076], dislcies a training dataset may include a verity of images associated with information such as a caption for each image in the training data set  ).  Li failed to teach the corresponding training dataset is based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform.  

However, Divakaran based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform(paragraph [0075], discloses determined user prefer content can include audio content spoken in a user preferred English accent). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date of the claimed invention for receiving training dataset from a networked database of Li with audio content  spoke in a user preferred English accent   of Divakaran in order to determine the desired to conveyed using the determined user preferred content type (see Divakaran, paragraph [0075]).

With respect to claim 11, Li  teaches a non-transitory computer readable medium storing code for content generation ,the code comprising instructions that when executed  by at least one processor, causes the at least one processor to perform operation (paragraph [ 0047], disclose a computing devices, such as computing device 400 may include non-transitory, tangible machine-readable media that include executable code that when run by one or more processors (e.g., processor) may cause the one or more percussor to perform  ), comprising: 

obtaining, by a user experience platform, a content provider context for the user experience platform, wherein the content provider context includes profile information for a content provider and an interaction history of the content provider (paragraph [0045], discloses computing device may receive the input (such as s training dataset) from a networked database via a commucation interface .., such as input subject matter from a user via the user interface, paragraph [0066], discloses  database may store user profile relating to the user, predication previously viewed or saved by user historical data receiving  from the server, and paragraph [0068], discloses  provide training dataset including training image and prompt to the server , ) ; 
Additionally, paragraph [0029], of   provisional application no 63/424, 413  discloses database 218 may store user profile relating to the user 240, predications previously viewed or saved  by the user 240, historical data received from the server 230 and the like,  and paragraph [0031], discloses  data vender server may correspond to a server to correspond to a server that hosts database 219 to
provide training datasets including image, text, or image-text pairs to the server 230. The
database 219 may be implemented by one or more relational database, distributed
databases, cloud databases, and/ or the like.  Thus,  obtaining content provider context includes profile information for a content provider and an interaction history of the content provider is addressed, by provisional   application no 63/424, 413.  

receiving  by the query elements of the user interface  _a user query relating to  the context provider context (Fig. 3, and paragraph [0040], discloses input subject image 102 illustrates a backpack  [query elements], which resemble  a dog face …., using input subject image 102 which subject text 112 “backpack” and test prompt 118 “cube shaped” , Fig. 7, receive via a data interface, a subject image containing  a subject, a text description of the subject in the image ); and 
 paragraph [0019], of   provisional application no 63/424, 413  discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and paragraph [0031], of   provisional application no 63/424, 413 discloses provide training data set including image, text, or image-text pairs to the server 230  
 Thus, provisional application no 63/424, 413 address receiving  by the query elements of the user interface  _a user query relating to  the context provider context; 
 
generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart(paragraph [0040], discloses  output image 124a may be generated by subject-driven image model 130 using input subject image 102 with subject text 112 "backpack" and text prompt 118 "cube shaped)
paragraph [0019], of   provisional application no 63/424, 413  discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and generate an output 150 which maybe  image captions or classification label [text prompt] and paragraph [0028], discloses  a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification output.   Thus, provisional application no 63/424, 413 address generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart

  encoding, by a multimodal encoder (paragraph [0006], discloses multimodal encoder) , the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector(Fig. 7, 702, discloses encode via an image encoder, the subject image into an image feature vector  paragraph [0092], discloses the system generates by a multimodal encoder (e.g., multimodal encoder) a vector representation  (e.g., subject embedding 116) pf the subject based on the image feature vector and the text feature vector.., paragraph [0093], discloses  based on an input combing the text prompt and the vector representation  .., combined by being concatenated and input to a text encoder..) ;  and
paragraph [0039], of   provisional application no 63/424, 413  discloses multimodal vision-language mode comprising a frozen image encoder  and paragraph [0042], of   provisional application no 63/424, 413  discloses GemFormer takes as input image feature vectors from the frozen Image Encoder, at least one query, and text. The images are encoded by the frozen image encoder while the text representations are generated by GemFormer from the text.  Thus, provisional application no 63/424, 413 address  encoding, by a multimodal encoder the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector

generating, using the machine learning model, content according to the text instruction based on the prompt embedding (Fig. 1, paragraph [0026], discloses generates an output image, paragraph [0028], discloses embedding 116 and text prompt 118 may be combined and input to text encoder to generate the prompt for mage model .., generate an output image based on th prompt…,  The prompt template may be, for example, "[text prompt], the [subject text] is [subject embedding]"…,, and paragraph [0029], discloses generate a combined subject embedding .., ).  paragraph [0042], of   provisional application no 63/424, 413  discloses in addition to the image feature input, GemFormer also receives a set of queries that it embeds and uses to contrast the image feature vectors and text representations. The output query embeddings learn the relevant image features before they are contrasted to the text representations. Thus, provisional application no 63/424, 413 address generating, using the machine learning model, content according to the text instruction based on the prompt embedding.  

The provisional application no 63/424, 413 teaches all the above elements including a vision-language mode may be trained to receive an input image and generate a text caption of the input image.  For another example, vision-language model may be trained to receive a text description of a visual scene and generate an image reconstruction the described visual scene (paragraph [0002])  and for example, a vision interfaces and other display modules that may receive input and/or output information. For example, other applications 216 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification outputs (paragraph [0028]).  The provisional application no 63/424, 413 of Li failed to explicitly teach  displaying by a user interface of the user  experience platform, a chart and query elements of the user interface based on the content provider.

However, Divakaran teaches displaying by a user interface of the user  experience platform, a chart and query elements of the user interface based on the content provider (Fig. 6, paragraph [0020], dislcies the visual display to top image [chart]  and top words [query elements]   and paragraph [0065], discloses visual disply of top images and top words of eight (8) cluster of images and words clustered).  Therefore, it would have been obvious to the one ordinary skill in the art before the effective  filing date for   a text description of a visual scene and generate an image reconstruction the described visual scene of the provisional application of Li with a feature of displying the representation of embedded content of Divakaran to  allows one to represent/graph the idea of word relationships which are hard coded "word vectors" (see Divakaran, paragraph [0037]) 

With respect to claim 12, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the a non-transitory computer readable medium the code further comprising instruction executable by the processor cause the at least one processor to perform operation comprising: displaying the  query  element based on the interaction history(Fig. 9, and paragraph [0098], discloses  provides a chart illustrating ..,  performance of at least one embodiment described , backpack,  backpack_dog, can, cat, etc. [query elements] ).

With respect to claim 13, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the  non-transitory computer readable medium the code further comprising instruction executable by the processor cause the at least one processor to perform operation comprising: generating, by the user experience platform, the prompt based on the content provider context(paragraph [0075], discloses generating or editing an image give a conditioning input such as a text prompt).  
.  
 
With respect to claim 14, Li in view of Divakaran teaches elements of claim 11, furthermore, Li  teaches the teaches the non-transitory computer readable medium wherein : the content provider context includes information in multiple modalities including a text modality and an image modality, wherein the content is generated based on the information in the multiple modalities(paragraph [0018], discloses text-to-image generation models .., in different contexts or different variations , existing generation models, and paragraph [0019], discloses image generation models  ).  

With respect to claim 15, Li in view of Divakaran teaches elements of claim 11, furthermore, Li  teaches the teaches the non-transitory computer readable medium wherein: the content provider context comprises a user journey, analytics context, an audience segmentation context, a campaign generation context, or any combination thereof(paragraph [0017], discloses obtained from any give marketing element or performance measures of the other aspect of the marketing elements ..). 
With respect to claim 16, Li in view of Divakaran teaches elements of claim 11, furthermore, Cavander  teaches the non-transitory computer readable medium wherein: the content provider context includes structured information representing a user journey, a campaign brief, or a campaign program(paragraph [0068], discloses provide training dataset includes training  image and prompt the server .., and paragraph [0076], discloses a training dataset   may be associated with information such as a caption for each image in the training dataset that may be used as a conditioning input t .).  
With respect to claim 17, Li in view of Divakaran teaches elements of claim 11, furthermore, Li  teaches the non-transitory computer readable medium further comprising: providing, by the user experience platform, a recommendation to the content provider for an interaction with the user experience platform, wherein the recommendation is based on the content(paragraph [0017], discloses generating or updating a predictive mode that is used at least in part for generating a marketing plan and paragraph [0018], discloses a model may include a deterministic discrete, dynamic distributed machine learing or discriminative mathematical  model (e.g., a support vector machine).  

With respect to claim 18, Li in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the non-transitory computer readable medium further comprising: 
receiving, by the machine learning model, a request from the content provider to generate the content, wherein the content is generated in response to the request(paragraph [0018], discloses machine learing system may have been widely used in image generation tasks..) )

With respect to claim 19 Li   in view of Divakaran teaches elements of claim 11, furthermore, Li teaches the non-transitory computer readable medium wherein: the machine learning model is trained(paragraph [0045], discloses receiving training dataset from a networked database via   commucation and paragraph [0076], dislcies a training dataset may include a verity of images associated with information such as a caption for each image in the training data set  ).  Li failed to teach the corresponding training dataset is based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform.  

However, Divakaran based on a public corpus of natural language documents and fine-tuned based on data from the user experience platform(paragraph [0075], discloses determined user prefer content can include audio content spoken in a user preferred English accent). Therefore, it would have been obvious to the one ordinary skill in the art before the effective filing date of the claimed invention for receiving training dataset from a networked database of Li with audio content  spoke in a user preferred English accent   of Divakaran in order to determine the desired to conveyed using the determined user preferred content type (see Divakaran, paragraph [0075]).

With respect to claim 20, Li   teaches a system comprising: 

memory component(Fig.1, 410 processor and paragraph [0041], discloses processor coupled to memory , ) ;
processing device coupled to the memory component; the processing device conjured to perform operation comprising  (paragraph [0041], discloses processor coupled to memory and paragraph [0042], discloses member may be used to store software executed by computing device ); 
obtaining, by a user experience platform, a content provider context for the user experience platform, wherein the content provider context includes profile information for a content provider and an interaction history of the content provider (paragraph [0045], discloses computing device may receive the input (such as s training dataset) from a networked database via a commucation interface .., such as input subject matter from a user via the user interface, paragraph [0066], discloses  database may store user profile relating to the user, predication previously viewed or saved by user historical data receiving  from the server, and paragraph [0068], discloses  provide training dataset including training image and prompt to the server , ) ; 
Additionally, paragraph [0029], of   provisional application no 63/424, 413  discloses database 218 may store user profile relating to the user 240, predications previously viewed or saved  by the user 240, historical data received from the server 230 and the like,  and paragraph [0031], discloses  data vender server may correspond to a server to correspond to a server that hosts database 219 to
provide training datasets including image, text, or image-text pairs to the server 230. The
database 219 may be implemented by one or more relational database, distributed
databases, cloud databases, and/ or the like.  Thus,  obtaining content provider context includes profile information for a content provider and an interaction history of the content provider is addressed, by provisional   application no 63/424, 413.  

receiving  by the query elements of the user interface  _a user query relating to  the context provider context (Fig. 3, and paragraph [0040], discloses input subject image 102 illustrates a backpack  [query elements], which resemble  a dog face …., using input subject image 102 which subject text 112 “backpack” and test prompt 118 “cube shaped” , Fig. 7, receive via a data interface, a subject image containing  a subject, a text description of the subject in the image ); and 
 paragraph [0019], of   provisional application no 63/424, 413  discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and paragraph [0031], of   provisional application no 63/424, 413 discloses provide training data set including image, text, or image-text pairs to the server 230  
 Thus, provisional application no 63/424, 413 address receiving  by the query elements of the user interface  _a user query relating to  the context provider context; 
 
generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart(paragraph [0040], discloses  output image 124a may be generated by subject-driven image model 130 using input subject image 102 with subject text 112 "backpack" and text prompt 118 "cube shaped)
paragraph [0019], of   provisional application no 63/424, 413  discloses receive input such as input training data (e.g., image-text pairs [ a user query relating to the context provider context] via data interface 115 and generate an output 150 which maybe  image captions or classification label [text prompt] and paragraph [0028], discloses  a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification output.   Thus, provisional application no 63/424, 413 address generating by a user experience platform, a text prompt for a machine learning model based on the chart and the user query, wherein the text prompt includes the user query and a text instructions to the machine learning model to generate content based on the chart

  encoding, by a multimodal encoder (paragraph [0006], discloses multimodal encoder) , the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector(Fig. 7, 702, discloses encode via an image encoder, the subject image into an image feature vector  paragraph [0092], discloses the system generates by a multimodal encoder (e.g., multimodal encoder) a vector representation  (e.g., subject embedding 116) pf the subject based on the image feature vector and the text feature vector.., paragraph [0093], discloses  based on an input combing the text prompt and the vector representation  .., combined by being concatenated and input to a text encoder..) ;  and
paragraph [0039], of   provisional application no 63/424, 413  discloses multimodal vision-language mode comprising a frozen image encoder  and paragraph [0042], of   provisional application no 63/424, 413  discloses GemFormer takes as input image feature vectors from the frozen Image Encoder, at least one query, and text. The images are encoded by the frozen image encoder while the text representations are generated by GemFormer from the text.  Thus, provisional application no 63/424, 413 address  encoding, by a multimodal encoder the text prompt and the chart to obtain a prompt embedding in a multimodal vector space, wherein the prompt embedding comprise a vector

generating, using the machine learning model, content according to the text instruction based on the prompt embedding (Fig. 1, paragraph [0026], discloses generates an output image, paragraph [0028], discloses embedding 116 and text prompt 118 may be combined and input to text encoder to generate the prompt for mage model .., generate an output image based on th prompt…,  The prompt template may be, for example, "[text prompt], the [subject text] is [subject embedding]"…,, and paragraph [0029], discloses generate a combined subject embedding .., ).  paragraph [0042], of   provisional application no 63/424, 413  discloses in addition to the image feature input, GemFormer also receives a set of queries that it embeds and uses to contrast the image feature vectors and text representations. The output query embeddings learn the relevant image features before they are contrasted to the text representations. Thus, provisional application no 63/424, 413 address generating, using the machine learning model, content according to the text instruction based on the prompt embedding.  

The provisional application no 63/424, 413 teaches all the above elements including a vision-language mode may be trained to receive an input image and generate a text caption of the input image.  For another example, vision-language model may be trained to receive a text description of a visual scene and generate an image reconstruction the described visual scene (paragraph [0002])  and for example, a vision interfaces and other display modules that may receive input and/or output information. For example, other applications 216 may contain software programs for asset management, executable by a processor, including a graphical user interface (GUI) configured to provide an interface to the user 240 to view generated captions or classification outputs (paragraph [0028]).  The provisional application no 63/424, 413 of Li failed to explicitly teach  displaying by a user interface of the user  experience platform, a chart and query elements of the user interface based on the content provider.

However, Divakaran teaches displaying by a user interface of the user  experience platform, a chart and query elements of the user interface based on the content provider (Fig. 6, paragraph [0020], dislcies the visual display to top image [chart]  and top words [query elements]   and paragraph [0065], discloses visual disply of top images and top words of eight (8) cluster of images and words clustered).  Therefore, it would have been obvious to the one ordinary skill in the art before the effective  filing date for   a text description of a visual scene and generate an image reconstruction the described visual scene of the provisional application of Li with a feature of displying the representation of embedded content of Divakaran to  allows one to represent/graph the idea of word relationships which are hard coded "word vectors" (see Divakaran, paragraph [0037]) 

The prior art on the record:

Li et al (US Pub., 2024/0161369 A1) discloses embodiments described herein provide systems and methods of subject-driven image generation. In at least one embodiment, a system receives, via a data interface, an image containing a subject, a text description of the subject in the image, and a text prompt relating to a different rendition of the subject. The system encodes, via an image encoder, the image into an image feature vector. The system encodes, via a text encoder, the text description int a text feature vector.

Divakaran et al. (US Pub., No., 2021/0297498 A1) discloses a method, apparatus and system for determining user content associations for determining and providing user preferred content using multimodal embeddings include creating an embedding space for multimodal content by creating a first modality vector representation of the multimodal content having a first modality, creating a second modality vector representation of the multimodal content having a second modality, creating a user vector representation, as a third modality, for each user associated with at least a portion of the multimodal content, and embedding the first and the second modality vector representations and the user vector representations in the common embedding space 

Cavander et al (US Pub., 2015/03176701 A1) discloses techniques are disclosed for generating a forward-looking, goal seeking marketing plan that links prior media purchase transactions to predicted future financial results for a brand, product market, or campaign. A computing device is configured to receive input data associated with one or more marketing elements, such as television ads, print ads, and online ads. From the input data, response factors corresponding to each marketing element can be calculated.
Zeng et al (US Pub., 2024/0169623 A1) discloses systems and methods for multi-modal image generation are provided. One or more aspects of the systems and methods includes obtaining a text prompt and layout information indicating a target location for an element of the text prompt within an image to be generated and computing a text feature map including a plurality of values corresponding to the element of the text prompt at pixel locations corresponding to the target location.

Applebaum et al. (US Pub., 2005/0171851 A1) discloses a user authentication system includes a dialogue manager adapted to prompt the user with multiple, selectable passphrases. A selection recognizer recognizes user selection of at least one of the multiple, selectable pass-phrases. A user identity analysis module analyzes one or more potential user  identities based on adherence of user selection of the passphrase to predetermined pass-phrase selection criteria assigned one or more enrolled users.

Rodriguez et al (US Pub., 2017/0255198 A1) discloses a system may be configured to manage at least one robotic device. The system may comprise one or more databases and one or more processors in communication with the one or more databases. The one or more processors may be configured to provide an operating system for the at least one robotic device, control motion of the at least one robotic device, configure at least one sensor removably coupled to the at least one robotic device, process data collected by the at least one sensor, and/or perform localization and/or area mapping for the at least one robotic device by comparing data collected by the at least one sensor with data in the one or more databases to generate localization and/or area mapping data.

Response to Arguments

Applicant’s arguments of 35 U.S.C 103 rejection filed on 5 December  2025  with respect  to claims 1-20    have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SABA DAGNEW whose telephone number is (571)270-3271. The examiner can normally be reached 9-6:45.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Waseem Ashraf can be reached on (571) 270 -3948. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SABA DAGNEW/Primary Examiner, Art Unit 3682
Read full office action
Prosecution Timeline

Sep 29, 2023
Application Filed
Aug 22, 2024
Non-Final Rejection — §103
Aug 28, 2024
Interview Requested
Oct 15, 2024
Applicant Interview (Telephonic)
Oct 15, 2024
Examiner Interview Summary
Nov 22, 2024
Response Filed
Jan 06, 2025
Final Rejection — §103
Feb 05, 2025
Interview Requested
Feb 26, 2025
Applicant Interview (Telephonic)
Feb 26, 2025
Examiner Interview Summary
Mar 10, 2025
Request for Continued Examination
Mar 12, 2025
Response after Non-Final Action
Mar 19, 2025
Non-Final Rejection — §103
Jun 02, 2025
Examiner Interview Summary
Jun 02, 2025
Applicant Interview (Telephonic)
Jun 24, 2025
Response Filed
Sep 02, 2025
Final Rejection — §103
Nov 05, 2025
Response after Non-Final Action
Dec 05, 2025
Response after Non-Final Action
Dec 05, 2025
Notice of Allowance
Jan 12, 2026
Response after Non-Final Action
Feb 26, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/789,473
Patent 12572959
SYSTEMS AND METHODS FOR OPTIMAL AUTOMATIC ADVERTISING TRANSACTIONS ON NETWORKED DEVICES
2y 5m to grant Granted Mar 10, 2026
18/631,250
Patent 12505426
AUTOMATED MULTI-PARTY TRANSACTION DECISIONING SYSTEM
2y 5m to grant Granted Dec 23, 2025
18/385,779
Patent 12488149
SYSTEM AND METHOD FOR OPTIMIZING ONLINE PRIVACY RECOMMENDATIONS FOR ENTITY USERS
2y 5m to grant Granted Dec 02, 2025
17/839,760
Patent 12450633
RETAIL DIGITAL SIGNAGE AND AUTOMATIC PROMOTION SYSTEM
2y 5m to grant Granted Oct 21, 2025
18/773,125
Patent 12443972
USE OF LOCALIZED BROADCAST SIGNALS TO MODIFY MOBILE APPLICATION BEHAVIOR
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
38%
Grant Probability
56%
With Interview (+18.1%)
3y 11m
Median Time to Grant
High
PTA Risk
Based on 594 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEMS AND METHODS FOR CONTEXTUAL CONTENT GENERATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email