Prosecution Insights
Last updated: May 29, 2026
Application No. 18/499,984

IDENTIFYING IMAGE BASED CONTENT ITEMS USING A LARGE LANGUAGE MODEL

Final Rejection §101§102
Filed
Nov 01, 2023
Examiner
SULTANA, NADIRA
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Pinterest Inc.
OA Round
2 (Final)
74%
Grant Probability
Favorable
3-4
OA Rounds
4m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 74% — above average
74%
Career Allowance Rate
75 granted / 101 resolved
+12.3% vs TC avg
Strong +30% interview lift
Without
With
+30.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
17 currently pending
Career history
129
Total Applications
across all art units

Statute-Specific Performance

§101
5.4%
-34.6% vs TC avg
§103
91.4%
+51.4% vs TC avg
§102
2.5%
-37.5% vs TC avg
§112
0.4%
-39.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 101 resolved cases

Office Action

§101 §102
DETAILED ACTION Notice of AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statement (IDS) submitted on 01/07/2026 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Response to Arguments Applicant’s amendments and arguments filed 02/26/2026, with respect to claim(s) 5-14 have been fully considered. Applicant amended claims 5, 7-9 and added new claims 21-30. Applicant’s arguments in pages 8-10, filed 02/26/2026, with respect to 35 U.S.C 101 rejections of Claims 5-14 have been fully considered but they are not persuasive. Applicant argued about the amended claim that “claim recites computer-implemented features that cannot be practically performed in the human mind. These include, for example as recited by claim 5: providing, as part of a large language model ("LLM") input to an LLM, the caption and at least a portion of the contextual metadata, the LLM configured to generate an output text-description from the LLM input; receiving from the LLM and in response to the LLM input, an LLM output, the LLM output being a text-description representative of the collective plurality of content items of the session; providing, to a recommender system, the text-description as a request for at least one recommended content item from a corpus of content items; determining, based at least in part on the request, the at least one recommended content item from the corpus of content items; and providing the at least one recommended content item for presentation to the user”. Examiner respectfully disagrees. In the amended claim applicant mentioned that “LLM is configured to generate an output text description from LLM input”, “the LLM output being a text-description representative of the collective plurality of content items of the session”. But there is no detail information about what feature of LLM is needed to be used in this invention. The way the claim language is written, any human can perform what LLM is doing. LLM is recited as an additional element. A human can generate a description from an input, he/she can write description of a plurality of content items. Applicant again argued that the claims integrate a judicial exception into a practical application of the exception and reflect an improvement to technology and thereby integrating the judicial exception into a practical application. The specification discloses that recommender systems "respond to queries with recommended content, i.e., content calculated to lead a requesting user to discovering new content”. Examiner respectfully disagrees. “Recommender system” as added in the amended claim, can be a human. A human can recommend a content to a user based on the query and available content items. The human can decide based on different factors from the query, what to recommend to the user. There is nothing in the claim language which shows improvement to providing recommended content to a user. The use of a computer does not preclude performance of the invention via pen and paper or in a person’s mind. Also, the use of a computer or other machinery in its ordinary capacity to perform a task or simply adding a general purpose computer to an abstract idea, does not integrate a judicial exception into a practical application. Here the computer is the machine that is merely an object on which the method operates, which does not integrate the exception into a practical application or provide significantly more. Thus, 35 U.S.C 101 rejections of Claims 5-15 have been maintained. Newly added claims 21-30 are also rejected over 35 U.S.C 101, since they recite features corresponding to features found within claims 5-14. Applicant’s arguments filed 02/26/2026, with respect to claim(s) 5-14, under 35 U.S.C. 102 have been fully considered but they are not persuasive. Applicant argued that the reference Rivkin doesn’t support the amended claim, specifically "the LLM output being a text-description representative of the collective plurality of content items of the session." Examiner respectfully disagrees. Rivkin in para.[0061], recites “when a user provides a high-level depiction of a photography event, a large language model generates a natural language list of photo descriptions that a photographer would typically capture during the event.” So Rivkin does teach the amended claim limitations. For these reasons examiner believes that the previously cited prior art of record of Rivkin does teach the claimed amended limitations. Please see the rejections below. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 5-14, 21-30 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The Independent claims 5, 21, 30 recite “processing a plurality of content items of a session of a user to produce a caption”; “determining, for the plurality of content items, contextual metadata corresponding to the plurality of content items”; “providing, as part of a large language model (“LLM”) input to an LLM, the caption and at least a portion of the contextual metadata, the LLM configured to generate an output text-description from the LLM input”; “receiving from the LLM and in response to the LLM input, an LLM output, the LLM output being a text-description representative of the collective plurality of content items of the session”; “. The limitation of " processing ... ", “determining..”, "providing ... ",” receiving…”, as drafted covers mental activities. More specifically, a human can review/process a catalogue or magazines ( a plurality of content items) of certain items, determine the context and generate captions for the content items, based on the captions, context and using a knowledge base, determine/recognize a certain item is a part of the plurality of content items that were processed, someone can recommend which content item to present to user among all the content items, based on user’s criteria and can present the selection on a piece of paper. All the steps above are examples of observation and evaluation that could be performed in the human mind or with the aid of pencil and paper. Claims recite the additional limitation of “large language model”, for performing the method. “large language model”, as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, which is not sufficient to amount to significantly more than the judicial exception. Claims 21 and 30 recite additional limitation of “processor”, “ storage media”, “non transitory computer readable media”, which are not sufficient to amount to significantly more than the judicial exception The claims as drafted, are not patent eligible. Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claims 5, 21 and 30 are therefore not drawn to eligible subject matter as they are directed to an abstract idea without significantly more than the abstract idea. Claims 6, 22 recite “determining a first sequence of presentation corresponding to the plurality of content items”; “determining, based at least in part on the LLM output, a second plurality of content items from the corpus, wherein the second plurality of content items includes the at least one content item”; “determining, based at least in part on the first sequence, a second sequence of presentation for the second plurality of content items”; “and causing a presentation of the second plurality of content items according to the second sequence”. Determining the sequence of presentation of the different sets of content items could be performed in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claims as drafted, are not patent eligible. Claims 7, 23 recite the additional limitations of “determining, based at least in part on the plurality of content items, a second plurality of content items that do not include the plurality of content items”; “processing the second plurality of content items to produce, for each content item of the second plurality of content items, a second content item caption”; “and including, as part of the LLM input, the second content item caption for each of the second plurality of content items”; “and wherein the at least one content item is included in the second plurality of content items” . Determining from different sets of content items, which content item is included or not included in which set, determining the captions of the content items could be done in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claims as drafted, are not patent eligible. Claims 8, 24 recite “wherein processing the plurality of content items includes processing the plurality of content items to produce, for each content item of the plurality of content items, a content item caption that is descriptive of the content item”; “and the method further comprising: including, as part of the LLM input, the second content item caption for each of the second plurality of content items”, where determining that the caption is a description of the content item and determining caption for other content items could be done by evaluation, observation, assessment and could be performed in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claims as drafted, are not patent eligible. Claims 9, 25 recite “wherein each second content item caption for each content item of the second plurality of content items that is included in the LLM input further includes a second content item identifier corresponding to the content item”; “and wherein the LLM output includes a content item identifier that is used to determine the at least one content item”. Determining that the identifier of the content items are corresponding to those content items could be done by evaluation, observations and could be performed in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claims as drafted, are not patent eligible. Claims 10, 26 recite,” wherein: the LLM output includes a narrative description of the plurality of content items”; “and determining the at least one content item, further includes: processing the narrative description to determine the at least one content item”, where determining that the output is a description of the content item, is observations, assessments, performed in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claims as drafted, are not patent eligible. Claims 11, 27 recite “wherein processing the narrative description includes: converting the narrative description into an embedding”; “projecting the embedding into a multi-dimensional space that includes a plurality of embeddings corresponding to content items of the corpus of content items”; “and determining, based at least in part on the projection and the plurality of embeddings, the at least one content item”. Converting a description into a mapping of content items and determining one content item, based on that mapping, could be done by evaluation, observations and could be performed in the human mind or with the aid of pen and paper. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 11 does not recite any additional limitations. The claims as drafted, are not patent eligible. Claims 12, 28 recite “wherein the at least one caption is a session caption that is descriptive of the plurality of content items of the session”. Determining that the session caption is describing the plurality of content items of the session, could be done by evaluation, observations and could be performed in the human mind or with the aid of pen and paper. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 12 does not recite any additional limitations. The claims as drafted, are not patent eligible. Claims 13, 29 recite “wherein the LLM output further includes an indication of a taste preference represented in the plurality of content items.”. Determining the taste/choice preference of the user could be done by evaluating the caption which is an observation and could be performed in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claims as drafted, are not patent eligible. Claim 14 recites “wherein the LLM input is defined to include at least: the caption; an instruction to be followed by the LLM in processing the LLM input; and a response structure indicating a structure in which the LLM output is to be provided”. Determining how to process the content items, could be done by evaluation, observations and could be performed in the human mind or with the aid of pen and paper. The claim recites additional element of LLM, which as specified in para.[0088], can be GPT-4, BERT, Galactica, LaMDA, Llama, or an LLM defined and trained by the hosting service, is not sufficient to amount to significantly more than the judicial exception. The claim as drafted, is not patent eligible. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention. (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claims 5-14, 21-30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Rivkin et al. ( US 20240114236 A1), hereinafter referenced as Rivkin. Regarding Claim 5, Rivkin teaches a computer-implemented method, comprising: processing a plurality of content items of a session of a user to produce a caption ( Rivkin: Para.[0074]-[0076], Fig.1, to extract key event descriptors from the images, image captions are generated by using photo caption generation module 124, from the images of the user’s photo collection ( plurality of content items) of photo collection module 121. Photo collection module 121 capture an event concept) ; determining, for the plurality of content items, contextual metadata corresponding to the plurality of content items( Rivkin: Para. [0072], [0073], key event descriptor of the photo collection may provide context for an event, textual descriptions, keywords, metadata, or specific attributes associated with an event ); providing, as part of a large language model ("LLM") input to an LLM, the caption and at least a portion of the contextual metadata ( Rivkin: Para.[0114],[0115], Fig. 5, In operation 506, a language model query is constructed using the extracted key event descriptor and the user query including an event description. In operation 507, the language model query is input to the language model. Para.[0125], [0126], Fig. 7A illustrates a language model query which is constructed to include a caption field, which includes image captions generated by the AI captioning model), the LLM configured to generate an output text-description from the LLM input ( Rivkin: Para.[0061], When a user provides a high-level depiction of a photography event, a large language model generates a natural language list of photo descriptions that a photographer would typically capture during the event); receiving from the LLM and in response to the LLM input, an LLM output ( Rivkin: Para.[0116], Fig. 5, In operation 508, a large language model output is generated as photoshoot suggestions), the LLM output being a text-description representative of the collective plurality of content items of the session( Rivkin: Para.[0061], When a user provides a high-level depiction of a photography event, a large language model generates a natural language list of photo descriptions that a photographer would typically capture during the event); and providing, to a recommender system, the text-description as a request for at least one recommended content item from a corpus of content items( Rivkin: Para.[0079], Fig. 1, The photo selection module 122 ( recommender system) may conduct a comparison between the event description provided in the user query and the images stored within the photo collection database. From this process, the photo selection module 122 may identify and select at least one image within the photo collection database as a corresponding match to the event description); determining, based at least in part on the LLM output request, the at least one recommended content item from [[a]] the corpus of content items ( Rivkin: Para.[0180], Fig. 15, If the user query 1503 includes a request for three photos, three image-text embedding pairs with the highest similarity scores are selected, and the associated images and the corresponding photoshoot suggestions are presented to the user); and providing the at least one recommended content item for presentation to the user. as representative of the plurality of content items, wherein the at least one content item is not included in the plurality of content items. ( Rivkin: Para.[0180], Fig. 15, If the user query 1503 includes a request for three photos, three image-text embedding pairs with the highest similarity scores are selected, and the associated images and the corresponding photoshoot suggestions are presented to the user). Claim 21 is a system claim comprising program instructions that, when executed by the one or more processors, cause the one or more processors ( Rivkin: Para.[0195], Fig. 17, processor 1720 includes one or more of a central processing unit, is able to perform control of any one or any combination of the other components of the electronic device 1700, and/or perform an operation or data processing relating to communication. The processor 1720 executes one or more programs stored in the memory 1730) to perform the steps in method claim 5 above and as such, claim 21 is similar in scope and content to claim 5 and therefore, claim 21 is rejected under similar rationale as presented against claim 5 above. Claim 30 is a non-transitory computer readable media claim having one or more non-transitory computer readable media having instructions which, when executed by one or more computers, causes the one or more computers ( Rivkin: Para.[0195],[0196], Fig. 17, The memory 1730 may include a volatile and/or non-volatile memory. The memory 1730 stores information, such as one or more of commands, data, programs ( one or more instructions), applications 1734, etc., which are related to at least one other component of the electronic device 1700 and for driving and controlling the electronic device 1700. The processor 1720 executes one or more programs stored in the memory 1730) to perform the steps in method claim 5 above and as such, claim 30 is similar in scope and content to claim 5 and therefore, claim 30 is rejected under similar rationale as presented against claim 30 above. Regarding Claim 6, Rivkin teaches the computer-implemented method of claim 5. Rivkin further teaches, further comprising: determining a first sequence of presentation corresponding to the plurality of content items ( Rivkin: Para.[0180], Fig. 15, based on the user query 1503 which includes a request for three photos, three image-text embedding pairs with the highest similarity scores are selected, and the associated images and the corresponding photoshoot suggestions are presented to the user); determining, based at least in part on the LLM output, a second plurality of content items from the corpus, wherein the second plurality of content items includes the at least one content item ( Rivkin: Para.[0019],[0144],Fig. 10, based on the cosine similarity score calculated between the text embedding vectors and image embedding vectors, second plurality of content items can be selected with the sequence of "opening presents" photo at the beginning); determining, based at least in part on the first sequence, a second sequence of presentation for the second plurality of content items ( Rivkin: Para.[0144],Fig. 10, first sequence was T1, T2, T3, now the second sequence presentation is starting with T3 based on the similarity score ); and causing a presentation of the second plurality of content items according to the second sequence ( Rivkin: Para.[0080], Fig. 1, image processing module 123 may provide the processed image to the display 150 for presentation to the user. Para. [0180], Fig. 15 illustrates how image-text embedding pairs based on similarity scores can be selected, and the associated images and the corresponding photoshoot suggestions are presented to the user). Claim 22 is a system claim performing the steps in method claim 6 above and as such, claim 22 is similar in scope and content to claim 6 and therefore, claim 22 is rejected under similar rationale as presented against claim 6 above. Regarding Claim 7, Rivkin teaches the computer-implemented method of claim 5. Rivkin further teaches, further comprising: determining, based at least in part on the plurality of content items, a second plurality of content items that do not include the plurality of content items ( Rivkin: Para.[0076], the photo caption generation module 124 generates a key event descriptor such as "clown" as an event concept, which corresponds to the image features from the image of the clown. This occurs even when the user query does not contain any terms related to "clown."); processing the second plurality of content items to produce, for at least one content item of the second plurality of content items, a second content item caption ( Rivkin: Para.[0074]-[0076], Fig.1, to extract key event descriptors from the images, image captions are generated by using photo caption generation module 124, from the images within the photo collection ( plurality of content items) of photo collection module 121. Photo collection module 121 capture an event concept) ; and including, as part of the LLM input, the second content item caption for at least one of the second plurality of content items ( Rivkin: Para.[0114],[0115], Fig. 5, In operation 506, a language model query is constructed using the extracted key event descriptor and the user query including an event description. In operation 507, the language model query is input to the language model. Para.[0125], [0126], Fig. 7A illustrates a language model query which is constructed to include a caption field, which includes image captions generated by the AI captioning model. Para.[0060],[0061], present disclosure provide an apparatus and a method for utilizing large language models); and wherein the at least one content item is included in the second plurality of content items ( Rivkin: Para.[0112], [0113],if a camera captures an image of a clown at a child's birthday party, an automatic photography system dynamically adjusts its content selection to include the clown as a key event descriptor, even when the user did not specifically mention it). Claim 23 is a system claim performing the steps in method claim 7 above and as such, claim 23 is similar in scope and content to claim 7 and therefore, claim 23 is rejected under similar rationale as presented against claim 7 above. Regarding Claim 8, Rivkin teaches the computer-implemented method of claim 7. Rivkin further teaches, wherein processing the plurality of content items includes processing the plurality of content items to produce, for at least one content item of the plurality of content items, a content item caption that is descriptive of the content item ( Rivkin: Para.[0122], the AI captioning model uses its internal language model, which can be implemented using a recurrent neural network (RNN) or a transformer-based architecture, in order to generate a caption. This language model may grasp the context of the input image and generate a coherent and contextually relevant caption in the form of a word sequence. Finally, the AI captioning model outputs this generated caption, which provides a natural language description of the content within the input image); and the method further comprising: including, as part of the LLM input, the second content item caption forat least one of the second plurality of content items (Rivkin: Para.[0124]-[0126], Figs.7A,7B illustrates a LLM query structure ( input), where caption field includes image captions generated by the AI captioning model). Claim 24 is a system claim performing the steps in method claim 8 above and as such, claim 24 is similar in scope and content to claim 8 and therefore, claim 24 is rejected under similar rationale as presented against claim 8 above. Regarding Claim 9, Rivkin teaches the computer-implemented method of claim 8. Rivkin further teaches, wherein: at least one second content item caption for at least one content item of the second plurality of content items that is included in the LLM input further includes a second content item identifier corresponding to the content item (Rivkin: Para.[0073], The photo collection module 121 may use key event descriptors derived from images taken during the event as input to the language model, alongside the user query. The term "key event descriptor" may refer to a piece of information or a characteristic that serves as an identifier or descriptor for a specific event); and wherein the LLM output includes a content item identifier that is used to determine the at least one content item ( Rivkin: Para.[0180], Fig.15, an argmax function may be used to identify the image-text embedding pairs 1507 ( content item) from the LLM output); Claim 25 is a system claim performing the steps in method claim 9 above and as such, claim 25 is similar in scope and content to claim 9 and therefore, claim 25 is rejected under similar rationale as presented against claim 9 above. Regarding Claim 10, Rivkin teaches the computer-implemented method of claim 5. Rivkin further teaches, wherein: the LLM output includes a narrative description of the plurality of content items ( Rivkin: Para.[0116], Fig. 5, In operation 508, a language model output is generated as photoshoot suggestions. Para.[0057], photoshoot suggestions may refer to recommendations, guidance, or ideas proposed by an AI powered language model to assist a user in planning and conducting a photography session. It may describe a specific photography event, and may include advice on subjects and concepts to be captured ( narrative description)); and determining the at least one content item, further includes: processing the narrative description to determine the at least one content item ( Rivkin: Para.[0072], in response to a user query of "I am going to a birthday party. Please list three types of photos I should take," the language model may yield the following photo shoot suggestions: (I) A photo of the birthday person with their cake; (2) A photo of the birthday person opening their presents; and (3) A photo of the birthday person blowing out their candles). Claim 26 is a system claim performing the steps in method claim 10 above and as such, claim 26 is similar in scope and content to claim 10 and therefore, claim 26 is rejected under similar rationale as presented against claim 10 above. Regarding Claim 11, Rivkin teaches the computer-implemented method of claim 10. Rivkin further teaches, wherein processing the narrative description includes: converting the narrative description into an embedding ( Rivkin: Para.[0180], Fig.15, The large language model outputs photoshoot suggestions 1504 ( narrative descriptions) are processed through a text encoder which extracts text embeddings T1, T2, . . . , TN, 1505 from the photo shoot suggestions 1504, respectively) ; projecting the embedding into a multi-dimensional space that includes a plurality of embeddings corresponding to content items of the corpus of content items ( Rivkin: Para.[0180], Fig.15, a plurality of photos 1501 within the photo collection are input into an image encoder, resulting in image embeddings I1 , I2 , ... , IM, 1502. The photoshoot suggestions 1504 are processed through a text encoder which extracts text embeddings T1, T2, . . . , TN, 1505. Image embeddings and the text embeddings creates a joint embedding space 1506); and determining, based at least in part on the projection and the plurality of embeddings, the at least one content item ( Rivkin: Para.[0180], Fig.15, image-text embedding pairs 1507 with the highest similarity scores can be selected ( content item)). Claim 27 is a system claim performing the steps in method claim 11 above and as such, claim 27 is similar in scope and content to claim 11 and therefore, claim 27 is rejected under similar rationale as presented against claim 11 above. Regarding Claim 12, Rivkin teaches the computer-implemented method of claim 5. Rivkin further teaches, wherein the at least one caption is a session caption that is descriptive of the plurality of content items of the session ( Rivkin: Para.[0122], the AI captioning model uses its internal language model, which can be implemented using a recurrent neural network (RNN) or a transformer-based architecture, in order to generate a caption. This language model may grasp the context of the input image and generate a coherent and contextually relevant caption in the form of a word sequence. Finally, the AI captioning model outputs this generated caption, which provides a natural language description of the content within the input image). Claim 28 is a system claim performing the steps in method claim 12 above and as such, claim 28 is similar in scope and content to claim 12 and therefore, claim 28 is rejected under similar rationale as presented against claim 12 above. Regarding Claim 13, Rivkin teaches the computer-implemented method of claim 5. Rivkin further teaches, wherein the LLM output further includes an indication of a taste preference represented in the plurality of content items ( Rivkin: Para.[0057],[0109], photoshoot suggestions ( LLM output) is generated based on a user query describing a specific photography event, and may include advice on subjects and concepts to be captured ( taste preference), such as for “wedding photos”, user’s input might be “"the bride placing the ring on the groom" ). Claim 29 is a system claim performing the steps in method claim 13 above and as such, claim 29 is similar in scope and content to claim 13 and therefore, claim 29 is rejected under similar rationale as presented against claim 13 above. Regarding Claim 14, Rivkin teaches the computer-implemented method of claim 5. Rivkin further teaches, wherein the LLM input is defined to include at least: the caption (Rivkin: Para.[0124]-[0126], Figs.7A,7B illustrates a LLM query structure ( input), where caption field includes image captions generated by the AI captioning model); an instruction to be followed by the LLM in processing the LLM input (Rivkin: Para.[0124]-[0126], Figs.7A,7B illustrates a LLM query structure ( input), preamble field clarifies the task for the language model. The example field offers a model function as an illustrative reference. The event description field contains a user-provided event description. The background information field includes background information provided by the user); and a response structure indicating a structure in which the LLM output is to be provided (Rivkin: Para.[0124]-[0126], Figs.7A,7B illustrates a LLM query structure ( input), The demand field provides instructions to guide the language model's output); Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to NADIRA SULTANA whose telephone number is (571)272-4048. The examiner can normally be reached M-F,7:30 am-5:00pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached on (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /NADIRA SULTANA/Examiner, Art Unit 2653 /Paras D Shah/Supervisory Patent Examiner, Art Unit 2653 05/15/2026
Read full office action

Prosecution Timeline

Nov 01, 2023
Application Filed
Dec 10, 2025
Non-Final Rejection mailed — §101, §102
Feb 16, 2026
Interview Requested
Feb 24, 2026
Examiner Interview Summary
Feb 24, 2026
Applicant Interview (Telephonic)
Feb 26, 2026
Response Filed
May 19, 2026
Final Rejection mailed — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12639522
SYSTEMS AND METHODS FOR EMBODIED MULTIMODAL ARTIFICIAL INTELLIGENCE QUESTION ANSWERING AND DIALOGUE WITH COMMONSENSE KNOWLEDGE
3y 8m to grant Granted May 26, 2026
Patent 12626060
SYSTEMS AND METHODS FOR FACILITATING TEXT ANALYSIS
2y 8m to grant Granted May 12, 2026
Patent 12614029
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM
4y 1m to grant Granted Apr 28, 2026
Patent 12609130
METHOD FOR ADJUSTING AUDIO FREQUENCY AND AUDIO FREQUENCY ADJUSTMENT DEVICE
3y 5m to grant Granted Apr 21, 2026
Patent 12603086
CONTEXTUAL EDITABLE SPEECH RECOGNITION METHODS AND SYSTEMS
4y 1m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+30.4%)
2y 11m (~4m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 101 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month