Last updated: April 19, 2026
Application No. 18/459,888
GENERATING CONTENT BASED ON TEXT AND SUPPLEMENTAL INFORMATION

Non-Final OA §103
Filed
Sep 01, 2023
Examiner
HSU, JONI
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Pinterest Inc.
OA Round
3 (Non-Final)
Interview Optional

— +7.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 848 resolved cases, 2023–2026
Examiner Intelligence

HSU, JONI View full profile →
Grants 87% — above average
Career Allow Rate
741 granted / 848 resolved
+25.4% vs TC avg
Moderate +7% lift
Without
With
+7.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
34 currently pending
Career history
882
Total Applications
across all art units
Statute-Specific Performance

§101
8.4%
-31.6% vs TC avg
§103
59.7%
+19.7% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
3.1%
-36.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 848 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 12, 2026 has been entered.
 Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because new grounds of rejection are made in view of Raziperchikolaei (US 20230055699A1).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-4, 7-11, 13, and 16-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gong (US 20240037810A1), Lundin (US 20240257179A1), and Raziperchikolaei (US 20230055699A1).
As per Claim 1, Gong teaches a computing system, comprising:  one or more processors; and a memory storing program instructions that, when executed by the one or more processors, cause the one or more processors (machine 2000 configured to read instructions from a machine-readable medium and perform any of the features described herein, [0123]) at least:  access a training dataset including a plurality of training data records, the plurality of training data records including triplets of data for a plurality of users, one or more triplets of data comprising a training text string submitted by a first user, a training image submitted by the first user (color palette prediction model 625 may be trained using training data derived from text and image pairs, the text represents query text, [0061], multimodal input that includes both query text and at least one sample image, [0055], textual or multimodal input is provided by a user, [0024]), and training supplemental information [0065]; train a machine learning model using the training dataset to generate an image based at least in part on a textual input and a supplemental information ([0061], color palette information 630 output by the color palette prediction model 625 is provided as input to the color palette prediction unit 615, color palette prediction unit 615 may output this color palette information 630 as color palette information 635, [0063], generates the abstract image 120 based on the color palette information 635, [0064], predict mood and/or sentiment information that provides additional semantic context for the model to consider when prediction the color palette for the query, [0065]); receive, by the trained machine learning model, a text input provided by a second user and a supplemental information input associated with the second user; generate, using the trained machine learning model, a rendered image based at least in part on the text input and the supplemental information input; and return the rendered image to the second user (neural model that is trained to predict sequences of color palettes from text inputs, analyzes the textual inquiry and predict mood and/or sentiment information that provides additional semantic context for the model to consider when prediction the color palette for the query, [0065, 0064]).
However, Gong does not teach the training supplemental information representative of the first user; the supplemental information input comprising a user embedding representative of the second user.  However, Lundin teaches training the machine learning model to generate the supplemental information representative of the first user; the supplemental information input comprising a user embedding representative of the second user (use the user interaction time series data 186 to generate predictions of user interest levels in events, user interest level predictions for an event may be based on the type and quantity of user interactions present in the user interaction time series data 182 for previous occurrences of the event, previous occurrences of events similar to the event, and for the event itself, generate user interest level predictions on a continuous basis, and the user interaction time series data 702 is updated with additional user interactions, [0053], content items that are related to the events may be determined, output the generated user interest level predictions for events to the content management system 120, content management system 120 may determine content items related to the events from among the content items 190, content items related to an event may be candidate content items generated by the content generation system 410 based on topic phrases that are related to the event, [0054], the image content and text content generated by the generative systems may be based on the manner in which the generative systems were trained, [0023], select different content items to be displayed at various times to various users based on the user level interest predictions, [0039]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gong to include training the machine learning model to generate the supplemental information representative of the first user; the supplemental information input comprising a user embedding representative of the second user because Lundin suggests that this is useful for generating content with which the user is expected to be interested in, which is advantageous to display to the user content that they are likely to want to see, and not to display to the user content that they likely do not want to see [0053, 0054].
However, Gong and Lundin do not teach training supplemental information comprising a user embedding vector representative of the first user; the supplemental information input comprising a user embedding vector representative of the second user.  However, Raziperchikolaei teaches access a training dataset including training supplemental information comprising a user embedding vector representative of the first user; the supplemental information input comprising a user embedding vector representative of the second user (obtains a training data set with user data, [0044], the user data includes a user-item interaction vector for each training user, [0045], a predicted interaction value for each of a plurality of users, [0082]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gong and Lundin so that the training supplemental information comprising a user embedding vector representative of the first user; the supplemental information input comprising a user embedding vector representative of the second user because Raziperchikolaei suggests that this way, a recommender system achieves better performance with fewer training iterations (Abstract).
9.	As per Claim 2, Gong and Lundin do not teach wherein the user embedding vector representative of the second user is used to predict at least one content item with which the second user is expected to engage.  However, Raziperchikolaei teaches wherein the user embedding vector representative of the second user is used to predict at least one content item with which the second user is expected to engage (the user data includes a user-item interaction vector for each training user, [0045], obtains a user-item interaction vector for a test user, computes a user vector representation for the test user by applying the user neural network encoder to the test user’s user-item interaction vector, computes a predicted interaction value for the test user and item k based on the user and item vector representations computed for the test user and item, [0081], a predicted interaction value for each of a plurality of users, [0082], to identify products to recommend to a particular user, the system computes a predicted interaction value for the user and each of a plurality of items, the system then recommends to the user a certain number of items with which the user has the highest probability of a positive interaction, [0083]).  This would be obvious for the reasons given in the rejection for Claim 1.
10.	As per Claim 3, Gong does not teach wherein the supplemental information input further includes a content embedding for the second user, the content embedding representative of at least one content item that the second user has interacted with.  However, Lundin teaches wherein the supplemental information input further includes a content embedding for the second user, the content embedding representative of at least one content item that the second user has interacted with (user interaction time series data 186 may include telemetry, the telemetry may include, adding of items to a user’s cart, and purchasing of items, [0034], select different content items to be displayed at various times to various users based on the user level interest predictions, [0039], [0053-0054]).  This would be obvious for the reasons given in the rejection for Claim 1.
11.	As per Claim 4, Gong does not teach wherein the supplemental information input includes a plurality of content items that form at least a portion of a collection and the rendered image is generated to be included in the collection.  However, Lundin teaches wherein the supplemental information input includes a plurality of content items that form at least a portion of a collection and the rendered image is generated to be included in the collection (additional image content and text content, which may be used along with any previously generated image content and text content, the additional image content may also be input to the auto-summarizer, creating a loop that may allow for the continuous generation of content based on the topic phrases originally input to the content generation system, [0047]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gong so that the supplemental information input includes a plurality of content items that form at least a portion of a collection and the rendered image is generated to be included in the collection because Lundin suggests that this way, the user can see more content that is related to the topic phrases originally input [0047].
12.	As per Claim 7, Claim 7 is similar in scope to Claim 1, except that Claim 7 has the additional limitations of receiving, from a client device associated with a user, the text input; the supplemental information embedding vector comprising user interactions with respective content items; and providing the rendered image to the client device.  Gong teaches receiving, from a client device (205) associated with a user, the text input; and providing the rendered image to the client device (receives a request for an abstract image from a client device 205, the request may include the textual content that may be provided as the textual input 205, client device 205 may receive the abstract image 120 from the image generation service 210, [0028], a user of the client devices 205, [0031]).  
However, Gong does not teach the supplemental information embedding vector comprising user interactions with respective content items.  However, Lundin teaches the supplemental information embedding comprising user interactions with respective content items [0053, 0054, 0023, 0034].  This would be obvious for the reasons given in the rejection for Claim 1.  
However, Gong and Lundin do not teach the supplemental information embedding vector comprising the user interactions with the respective content items.  However, Raziperchikolaei teaches the supplemental information embedding vector comprising user interactions with respective content items (user-item interaction vector has a “1” value for each item with which a user has had a positive interaction, [0045]).  This would be obvious for the reasons given in the rejection for Claim 1.  Thus, Claim 7 is rejected under the same rationale as Claim 1 along with these additional teachings from Gong, Lundin, and Raziperchikolaei.
13.	As per Claims 8-9, these claims are each similar in scope to Claim 4, and therefore are rejected under the same rationale.  As per Claim 10, Claim 10 is similar in scope to Claim 2, and therefore is rejected under the same rationale.  
14.	As per Claim 11, Gong teaches further comprising:  determining a sequence of text embeddings representative of the text input; and including the supplemental information embedding in the sequence of text embeddings to generate an updated input sequence of embeddings, wherein the machine learning model is configured to receive the updated input sequence of embeddings as an input and generate the rendered image based at least in part on the updated input sequence of embeddings [0065, 0064].
15.	As per Claim 13, Gong teaches wherein:  the text input is received as a search query (structured textual input, such as a query, [0025]); and the method further comprises:  determining a plurality of responsive content items (identifies sample images based on the textual input 410, conducts a search of a datastore of images and a search of images available on data sources on the Internet using a search engine, [0045]); providing the plurality of responsive content items (sample images 420 are provided as an input to the color palette prediction unit 425, [0046]); and providing the rendered image as provided as a responsive content item (receives the color palette information from the color palette prediction unit 425 and to generate the abstract image 120 based on the color palette information, [0050]).
16.	As per Claim 16, Claim 16 is similar in scope to Claim 1, and therefore is rejected under the same rationale.
17.	As per Claim 17, Claim 17 is similar in scope to Claim 1, except that Claim 17 has the additional limitation of receiving, at the trained machine learning model, a plurality of example content items provided by the second user that form at least a portion of a collection of content items.  Gong does not teach receiving, at the trained machine learning model, a plurality of example content items provided by the user that form at least a portion of a collection of content items.  However, Lundin describes “user interest prediction model 110 may generate predictions of user interest levels in an event or related topics based on, for example, a set of time series data of user interactions” [0032].  Lundin describes “topic phrases may be input to the content generation system 410. The topic phrases may be received from any suitable source, such as, for example, the user prediction model 110” [0043].  Lundin describes “image generator 420 may generate and output image content based on the input topic phrases” [0045].  Lundin describes “additional image content and text content based on the topic phrases generated by the auto-summarizer 550, which may be used along with any previously generated image content and text content by the layout generator 540. The additional image content may also be input to the auto-summarizer 550, creating a loop that may allow for the continuous generation of content based on the topic phrases originally input to the content generation system 410” [0047].  Thus, Lundin teaches the user interactions provided by the user are input to the user interest prediction model 110 [0032], which uses the user interactions to output topic phrases [0043], and these topic phrases are input to the image generator 420, which uses these topic phrases to generate image content [0045], and this image content is the previously generated image content that may be used along with the image content generated by the auto-summarizer [0047].  Thus, both this previously generated image content and the image content generated by the auto-summarizer form the collection of content items.  Thus, this previously generated image content forms at least a portion of the collection of content items [0047], and this previously generated image content was generated based on the user interactions provided by the user [0032, 0043, 0045], and thus this previously generated image content is provided by the user.  Thus, Lundin teaches receiving, at the trained machine learning model, a plurality of example content items provided by the second user that form at least a portion of a collection of content items [0032, 0039, 0043, 0045, 0047].  This would be obvious for the reasons given in the rejection for Claim 4.  Thus, Claim 17 is rejected under the same rationale as Claim 1 along with this additional teaching from Lundin.
18.	As per Claim 18, Claim 18 is similar in scope to Claim 11, and therefore is rejected under the same rationale.  As per Claim 19, Claim 19 is similar in scope to Claim 4, and therefore is rejected under the same rationale.
19.	Claim(s) 5, 6, 14, 15, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gong (US 20240037810A1), Lundin (US 20240257179A1), and Raziperchikolaei (US 20230055699A1) in view of Ravi (US 20250077842A1).
20.	As per Claim 5, Gong, Lundin, and Raziperchikolaei are relied upon for the teachings as discussed above relative to Claim 1.
	However, Gong, Lundin, and Raziperchikolaei do not teach wherein the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least determine a first weight for the textual input and a second weight for the supplemental information input.  However, Ravi teaches wherein the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least determine a first weight for the textual input and a second weight for the supplemental information input (user interface 802 includes a style-and-content-weight controller 810, based on user interaction with the style-and-content-weight controller 810, the selective layer conditioning system can determine a weight parameter indicating a relative degree for how much the image prompt 806 and how much the text prompt 808 contribute, respectively, to a desired style and a desired content for generating a digital image, [0072], selective layer conditioning system iteratively generates digital images as the selective layer conditioning system receives additional user interactions via the user interface 802, in response to selection of a different (additional) image prompt, selection of a different (additional) text prompt, and/or selection of a different weight parameter, the selective layer conditioning system generates an additional digital image and provides the additional digital image for display, [0079]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gong, Lundin, and Raziperchikolaei so that the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least determine a first weight for the textual input and a second weight for the supplemental information input because Ravi suggests that this way, the user can easily control whether they want to generate an image more based on the text prompt as desired [0072, 0079].
21.	As per Claim 6, Gong, Lundin, and Raziperchikolaei do not teach wherein determining the first weight and the second weight includes receiving the first weight and the second weight via a user interaction with the user interface.  However, Ravi teaches wherein determining the first weight and the second weight includes receiving the first weight and the second weight via a user interaction with the user interface [0072, 0079].  This would be obvious for the reasons given in the rejection for Claim 5.
22.	As per Claim 14, Gong, Lundin, and Raziperchikolaei do not teach further comprising determining a first weight for the text input and second weight for the user information.  However, Ravi teaches further comprising determining a first weight for the text input and second weight for the user information [0072, 0079].  This would be obvious for the reasons given in the rejection for Claim 5.	
23.	As per Claim 15, Claim 15 is similar in scope to Claim 6, and therefore is rejected under the same rationale.
24.	As per Claim 20, Gong, Lundin, and Raziperchikolaei do not teach further comprising determining a first weight for the text input and a second weight for the plurality of content items.  However, Ravi teaches further comprising determining a first weight for the text input and a second weight for the plurality of content items [0072, 0079].  This would be obvious for the reasons given in the rejection for Claim 5.
25.	Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gong (US 20240037810A1), Lundin (US 20240257179A1), and Raziperchikolaei (US 20230055699A1) in view of Bennett (US 20250021789A1).
	Gong, Lundin, and Raziperchikolaei are relied upon for the teachings as discussed above relative to Claim 7.
	However, Gong, Lundin, and Raziperchikolaei do not teach further comprising:  determining an overall text embedding representative of the text input; and concatenating the supplemental information embedding with the overall text embedding to generate an updated overall input embedding, wherein the machine learning model is configured to receive the updated overall input embedding as an input and generate the rendered image based at least in part on the updated overall input embedding.  However, Bennett teaches further comprising:  determining an overall text embedding representative of the text input; and concatenating the supplemental information embedding with the overall text embedding to generate an updated overall input embedding, wherein the machine learning model is configured to receive the updated overall input embedding as an input and generate the rendered image based at least in part on the updated overall input embedding (deliver two pre-weighted influence inputs, which then may combine the influencing words sets into a single influence output that may then influence a generative AI node, [0100], merged influence output that balances and weighs the value of one of such text inputs against the value of others, weighted influence merger in a first segment made by a first instance might be different than the subsequent instances to perform a different weighting across influence sources or drop or add influence sources, [0071], training one of the reconfigurable neural net based circuit units 105 with a capacity to convert user input text to an image might involve a first set of training data (images and text), [0023]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gong, Lundin, and Raziperchikolaei to include determining an overall text embedding representative of the text input; and concatenating the supplemental information embedding with the overall text embedding to generate an updated overall input embedding, wherein the machine learning model is configured to receive the updated overall input embedding as an input and generate the rendered image based at least in part on the updated overall input embedding because Bennett suggests that this combines the influencing words sets into a single influence output that then influences a generative AI node so that there are less delays [0003, 0100].
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





JH
/JONI HSU/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Sep 01, 2023
Application Filed
Apr 25, 2025
Non-Final Rejection — §103
Jun 19, 2025
Interview Requested
Jul 01, 2025
Examiner Interview Summary
Jul 01, 2025
Applicant Interview (Telephonic)
Jul 22, 2025
Response Filed
Oct 20, 2025
Final Rejection — §103
Jan 12, 2026
Request for Continued Examination
Jan 26, 2026
Response after Non-Final Action
Feb 06, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/257,410
Patent 12592028
METHODS AND DEVICES FOR IMMERSING A USER IN AN IMMERSIVE SCENE AND FOR PROCESSING 3D OBJECTS
2y 5m to grant Granted Mar 31, 2026
18/337,537
Patent 12586306
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR MODELING OBJECT
2y 5m to grant Granted Mar 24, 2026
18/432,989
Patent 12586260
CREATING IMAGE ENHANCEMENT TRAINING DATA PAIRS
2y 5m to grant Granted Mar 24, 2026
18/027,304
Patent 12581168
A METHOD FOR A MEDIA FILE GENERATING AND A METHOD FOR A MEDIA FILE PROCESSING
2y 5m to grant Granted Mar 17, 2026
18/449,286
Patent 12561850
IMAGE GENERATION WITH LEGIBLE SCENE TEXT
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
87%
Grant Probability
95%
With Interview (+7.2%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 848 resolved cases by this examiner. Grant probability derived from career allow rate.