Last updated: April 19, 2026
Application No. 18/090,482
ADDING THEME-BASED CONTENT TO MESSAGES USING ARTIFICIAL INTELLIGENCE

Non-Final OA §101§103
Filed
Dec 28, 2022
Examiner
KHAN, SHAHID K
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Twilio Inc.
OA Round
4 (Non-Final)
Interview Optional

— +15.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 389 resolved cases, 2023–2026
Examiner Intelligence

KHAN, SHAHID K View full profile →
Grants 74% — above average
Career Allow Rate
287 granted / 389 resolved
+18.8% vs TC avg
Strong +16% interview lift
Without
With
+15.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
31 currently pending
Career history
420
Total Applications
across all art units
Statute-Specific Performance

§101
10.0%
-30.0% vs TC avg
§103
55.7%
+15.7% vs TC avg
§102
16.5%
-23.5% vs TC avg
§112
15.2%
-24.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 389 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the after-final amendment filed 01/07/26 in which claims 1, 11, and 16 were amended. Claims 1—20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/07/26 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claims 1, 11, and 16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1—20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Claims 1—10, 11—15, and 16—20 recite a method, a system, and a non-transitory computer-readable medium, respectively. Each of these correspond to a statutory category (process, machine, and material). 
Claim Interpretation: Under the broadest reasonable interpretation, the terms of the claim are presumed to have their plain meaning consistent with the specification as it would be interpreted by one of ordinary skill in the art. See MPEP 2111. The claim recites a communication platform, which the specification describes as a computer system comprising processors, memory, input/output peripherals, data stores, and other components. See e.g., Spec. ¶ 29. The claim recites a theme identifier which the detailed description describes as textual captions, summaries, descriptions of a message composition, or keywords. The claim also recites a generated content item, which the detailed description describes as including multimedia or non-multimedia items. The claim recites a message composition which the description describes as a message drafted by a user.
Each claim is individually analyzed below according to Steps 2A and 2B of the Mayo/Prometheus eligibility framework.
[Claim 1/11/16] A [method/system/non-transitory computer-readable medium comprising instructions that, responsive to execution by a processing device, cause the processing device to] comprising:
receiving, by a communication platform, information identifying a plurality of recipients to receive a message of an entity, and a message composition of the entity for the message; [Step 2A Prong 2: data inputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data inputting or receiving data has been found by courts as well understood, routine and conventional. MPEP 2106.05(d) Section II.]
determining, by the communication platform and using a first machine learning model trained to determine theme identifiers for given message compositions, one or more theme candidates each associated with the theme of the message composition; [Step 2A Prong 1: Mental process that can be performed in the human mind. Step 2A Prong 2: “by the communication platform and using a first machine learning model trained to determine theme identifiers for given message compositions” constitutes an instruction to apply the exception using generic computer components. Specifically, the limitation refers to the model as a trained model but does not expressly state a training step. Step 2B: mere instruction to apply the exception cannot provide an inventive concept.]
providing the one or more theme candidates for presentation to the entity; [Step 2A Prong 2: data outputting/insignificant extra-solution activity. Step 2B: transmitting data has been found by courts as well-understood, routine, and convention. MPEP 2106.05(d) Section II.]
determining a theme identifier associated with a theme of the message composition based on a user input indicating one of the one or more theme candidates; [Step 2A Prong 2: data inputting constitutes insignificant extra-solution activity; the type of data does not cause the data inputting to practically integrate the judicial exception. Step 2B: receiving data has been found by courts as well-understood, routine, and convention. MPEP 2106.05(d) Section II.]
obtaining, by the communication platform and using a second machine learning model trained to provide content item candidates for given theme identifiers, a first generated content item corresponding to the theme identifier and to a subset of the plurality of recipients; and [Step 2A Prong 1: Mental process that can be performed in the human mind. Step 2A Prong 2: “by the communication platform and using a second machine learning model trained to provide content item candidates for given theme identifiers” constitutes an instruction to apply the exception using generic computer components. Specifically, the limitation refers to the model as a trained model but does not expressly state a training step. Step 2B: mere instruction to apply the exception cannot provide an inventive concept.]
adding, by the communication platform, the first generated content item to the message composition to customize the message of the entity for the subset of the plurality of recipients, [Step 2A Prong 1: Mental process that can be performed in the human mind, or by a human using a pen and paper. Step 2A Prong 2: “by the communication platform” constitutes an instruction to apply the exception using generic computer components. Step 2B: mere instruction to apply the exception cannot provide an inventive concept.] the customized message to be transmitted to a plurality of recipient devices each associated with one of the subset of the plurality of recipients [Step 2A Prong 2: data outputting/insignificant extra-solution activity. Step 2B: transmitting data has been found by courts as well-understood, routine, and convention. MPEP 2106.05(d) Section II.].
[Claim 2/12] The method of claim 1, wherein determining the theme identifier further comprises:
receiving the user input indicating the one of the one or more theme candidates [Step 2A Prong 2: data inputting step recited at a high level of generality and, thus, insignificant extra-solution activity; the type of data does not cause the data inputting to practically integrate the judicial exception. Step 2B: data inputting or receiving data has been found by courts as well understood, routine and conventional.].
[Claim 3/13/18] The method of claim 2, wherein generating the one or more theme candidates further comprises:
providing content of the message composition as input to the first machine learning model; and [Step 2A Prong 2: data inputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data inputting/transmitting has been found by courts as well understood, routine, and conventional]
obtaining an output of the first machine learning model, the output indicating the one or more theme candidates [Step 2A Prong 2: data outputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data outputting has been found by courts as well understood, routine, and conventional]
[Claim 4] The method of claim 1, wherein the first generated content item is a first generated multimedia content item, [Step 2A Prong 1: Mental process that can be performed in the human mind, or by a human using a pen and paper.] and wherein obtaining the first generated multimedia content item further comprises:
generating one or more multimedia content item candidates using the second machine learning model, wherein each multimedia content item candidate of the one or more multimedia content item candidates is associated with the theme identifier; [Step 2A Prong 1: mental process. Step 2A Prong 2: Using a machine learning model links the exception to a technological environment and/or constitutes mere instructions to apply the exception. Step 2B: Mere instructions to apply the exception or linking the exception to a technological environment cannot provide an inventive concept.]
providing the one or more multimedia content item candidates for presentation to the user entity; and [Step 2A Prong 2: data outputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data outputting has been found by courts as well understood, routine, and conventional]
receiving user input indicating the first generated multimedia content item, the first generated multimedia content item corresponding to one of the one or more multimedia content item candidates [Step 2A Prong 2: data inputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data inputting/transmitting has been found by courts as well understood, routine, and conventional].
[Claim 5] The method of claim 4, wherein generating the one or more multimedia content item candidates further comprises:
providing the theme identifier as input to the second machine learning model; and [Step 2A Prong 2: data inputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data inputting/transmitting has been found by courts as well understood, routine, and conventional]
obtaining an output of the second machine learning model, the output indicating the one or more multimedia content item candidates [Step 2A Prong 2: data outputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data outputting has been found by courts as well understood, routine, and conventional].
[Claim 6/19] The method of claim 4, further comprising:
upon providing the one or more multimedia content item candidates for presentation to the entity, receiving user input indicating an updated theme identifier; [Step 2A Prong 2: data inputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data inputting/transmitting has been found by courts as well understood, routine, and conventional]
generating one or more updated multimedia content item candidates using the second machine learning model, wherein each updated multimedia content item candidate of the one or more updated multimedia content item candidates is associated with the updated theme identifier; and [Step 2A Prong 1: generating updated multimedia content item candidates associated with the updated theme identifier describes a mental process. Step 2A Prong 2: second machine learning model is recited at a high level of generality and, thus, constitutes mere instructions to apply the exception or to link the exception to a technological environment. Step 2B: Mere instructions to apply the exception or to link the exception to a technological environment cannot provide an inventive concept]
providing the one or more updated multimedia content item candidates for presentation to the entity [Step 2A Prong 2: data outputting step recited at a high level of generality and, thus, insignificant extra-solution activity. Step 2B: data outputting has been found by courts as well understood, routine, and conventional].
[Claim 7/14] The method of claim 1, wherein adding the first generated content item to the message composition further comprises:
identifying a template field of the message composition; and [Step 2A Prong 1: Mental process] that can be performed in the human mind, or by a human using a pen and paper.]
replacing the template field with the first generated content item for a first recipient segment of the plurality of recipients [Step 2A Prong 1: Mental process] that can be performed in the human mind, or by a human using a pen and paper.].
[Claim 8/15] The method of claim 7, wherein the template field further indicates a plurality of recipient segments of the plurality of recipients, [Step 2A Prong 2: data outputting/presentation step recited at a high level of generality and thus constitutes insignificant extra-solution activity. Step 2B: data outputting has been found by courts as well understood, routine, and conventional] the method further comprising replacing the template field with a second generated content item for a second recipient segment of the plurality of recipients [Step 2A Prong 1: Mental process] that can be performed in the human mind, or by a human using a pen and paper.].
[Claim 9/20] The method of claim 1, wherein the message composition is an electronic mass communication message template, [Step 2A Prong 2: Generally linking the exception to a particular technological field of use (electronic mass messaging). Step 2B: Generally linking the exception to a particular technological field does not provide an inventive concept.] the theme identifier is a descriptive text caption, and the first generated content item is an image [Step 2A Prong 1: Mental process] that can be performed in the human mind, or by a human using a pen and paper.].
[Claim 10] The method of claim 1, wherein the first machine learning model is a transformer machine learning model, and the second machine learning model is a diffusion machine learning model [Step 2A Prong 2: Generally linking the exception to a particular technological field of use (machine learning/artificial intelligence). Step 2B: Generally linking the exception to a particular technological field does not provide an inventive concept.].

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1, 2, 4-6, 9, 11, 12, 16, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bojja (US 2017/0185581 A1; published Jun. 29, 2017) in view of Manico (US 2008/0155422 A1; published Jun. 26, 2008), Zhu, Lixing, et al. "Topic-driven and knowledge-aware transformer for dialogue emotion detection." arXiv preprint arXiv:2106.01071 (2021) (“Zhu”), Ebrahimian Z, Toosi R, Akhaee MA. Multinomial emoji prediction using deep bidirectional transformers and topic modeling. In 2022 30th international conference on electrical engineering (ICEE) 2022 May 17 (pp. 272-277). IEEE (“Ebrahimian”), and Roy (US 2016/0171560 A1; published Jun. 16, 2016).
Regarding claim 1, Bojja discloses [a] method comprising:
receiving, by a communication platform, information identifying a plurality of recipients to receive a message of an entity, and a message composition of the entity for the message; (Bojja ¶ 3 (“Emoji are available for use through a variety of digital devices (e.g., mobile telecommunication devices and tablet computing devices) and are often used when drafting personal e-mails, posting messages on the Internet (e.g., on a social networking site or a web forum), and messaging between mobile devices.”), Bojja ¶ 5 (“Implementations of the systems and methods described herein can be used to suggest one or more emoji to users for insertion into, or to replace content in, documents and electronic communications. Content can include text (e.g., words, phrases, abbreviations, characters, and/or symbols), emoji, images, audio, video, and combinations thereof. [. . .] For example, content can be analyzed by the system as a user types or enters the content.”), Bojja ¶ 7 (“The mixture of text and emoji provides a new communication paradigm that can serve as a messaging platform for use with various clients and for various purposes, including gaming, text messaging, and chat room communications.”), Bojja ¶ 33 (“FIG. 2 illustrates an example method 200 that uses the system 100 to suggest emoji for insertion into a communication. The method 200 begins by obtaining (step 202) features associated with a communication (e.g., an electronic message) of a user. The features can include, for example, a cursor position in the content, one or more words from the communication, one or more words from a previous communication, a user preference (e.g., preferred instances when emoji are to be used, preferred specific emoji, preferred types of emoji, or preferred categories of emoji), and/or demographic information (e.g., an age, gender, ethnicity, income, or citizenship of the user and/or a recipient).”)).
Although Bojja teaches identifying tags, keywords, or phrases (“theme identifier”) to map to emojis using learning models, see ¶ 13, ¶ 21, Bojja does not expressly disclose determining, by the communication platform and using a first machine learning model…one or more theme candidates each associated with the theme of the message composition; (but see Manico ¶ 70 (“At decision 680 an analysis of the combined metadata set and selected assets is performed to determine if an appropriate theme can be suggested. For example, if the face recognition algorithm identifies "Molly" and the user's profile indicates that "Molly" is the user's daughter. The user profile can also contain information that last year at this time the user produced a commemorative DVD of "Molly's 4th Birthday Party". Dynamic themes can be provided to automatically customize a generic theme such as "Birthday" with additional details. If image templates are used in the theme that can be modified with automatic "fill in the blank" text and graphics this would enable changing "Happy Birthday" to "Happy 5.sup.th Birthday Molly" without requiring user intervention.”))
providing the one or more theme candidates for presentation to the entity; (but see Manico ¶ 70 (“Box 690 is included in step 680 and contains a list of available themes, which can be provided locally via a removable memory device such as a memory card or DVD or via a network connection to a service provider. Third party participants and copyrighted content owners can also provide themes on a pay per use type arrangement. The combined input and derived metadata, the analysis and classification algorithm output, and organized asset collection is used to limit the user's choices to themes that are appropriate for the content of the assets and compatible with the asset types. At step 200 the user has the option to accept or reject the suggested theme. If no theme is suggested at step 680 or the user decides to reject the suggested theme at step 200, she is given the option to manually select a theme from a limited list of themes or from the entire available library of available themes at step 210.”))
determining a theme identifier associated with a theme of the message composition based on a user input indicating one of the one or more theme candidates; (but see Manico ¶ 71 (“A selected theme is used in conjunction with the metadata to acquire theme specific third party assets and effects. At step 220 this additional content and treatments can be provided by a removable memory device or can be accessed via a communication network from a service provider or via pointers to a third party provider. Arrangements between various participants concerning revenue distribution and terms for usage of these properties can be automatically monitored and documented by the system based on usage and popularity. These records can also be used to determine user preferences so that popular theme specific third party assets and effects can be ranked higher or given a higher priority increasing the likelihood of consumer satisfaction. These third party assets and effects include dynamic auto-scaling image templates, automatic image layout algorithms, video scene transitions, scrolling titles, graphics, text, poetry, music, songs, digital motion and still images of celebrities, popular figures, and cartoon characters all designed to be used in conjunction with user generated and/or acquired assets. The theme specific third party assets and effects as a whole are suitable for both hardcopy such as greeting cards, collages, posters, mouse pads, mugs, albums, calendars, and soft copy such as movies, videos, digital slide shows, interactive games, websites, DVDs, and digital cartoons. The selected assets and effects can be presented to the user, for her approval, as set of graphic images, a story board, a descriptive list, or as a multimedia presentation.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Manico to automatically find and present a theme to the user for selection and applying a thematic treatment to the communication to provide a compelling visual story, at least because doing so would be “beneficial because of the almost unlimited thematic treatments available today for consumer-generated assets.” Manico ¶ 70.
Bojja and Manico do not expressly disclose a machine learning model trained to determine theme identifiers for given message compositions (but see Zhu p. 2 1st column (“Recently, the Transformer architecture (Vaswani et al., 2017) has empowered language models to transfer large quantities of data to low-resource domains, making it viable to discover topics in conversational texts. In this paper, we propose to add an extra layer to the pre-trained language model to model the latent topics, which is learned by fine-tuning on dialogue datasets to alleviate the data sparsity problem. Inspired by the success of Transformers, we use the Transformer EncoderDecoder structure to perform the Seq2Seq prediction in which an emotion label sequence is predicted given an utterance sequence (i.e., each utterance is assigned with an emotion label).”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Zhu to use a transformer model to discover topics in the text, at least because doing so would enable automatic emotion detection, which is important for the development of empathetic conversational agents. See Zhu pg. 1.
Bojja further discloses obtaining, by the communication platform and using a second machine learning model…a first generated content item corresponding to the theme identifier and to a subset of the plurality of recipients; and (Bojja ¶ 33 (“FIG. 2 illustrates an example method 200 that uses the system 100 to suggest emoji [“generated content item”] for insertion into a communication. The method 200 begins by obtaining (step 202) features associated with a communication (e.g., an electronic message) of a user. The features can include, for example, a cursor position in the content, one or more words from the communication [“theme identifier”], one or more words from a previous communication, a user preference (e.g., preferred instances when emoji are to be used, preferred specific emoji, preferred types of emoji, or preferred categories of emoji), and/or demographic information (e.g., an age, gender, ethnicity, income, or citizenship of the user and/or a recipient [“subset”]). Other suitable features are possible. The features are provided (step 204) to the emoji detection module 116, which preferably employs a plurality of emoji detection methods to identify candidate emoji that might be appropriate for the communication. Output from the emoji detection module 116 is provided (step 206) to the emoji classifier module 118, where one or more classifiers process the output from the emoji detection module and provide (step 208) suggested emoji for the communication. The suggested emoji [content item candidates] can be identified with the assistance of the manager module 120, which can select particular emoji detection methods and/or classifiers to use based on various factors, including, for example, a linguistic domain (e.g., gaming, news, parliamentary proceedings, politics, health, travel, web pages, newspaper articles, and microblog messages) [“subset”], a language used in the communication, one or more user preferences, and the like. The linguistic domain may define or include, for example, words, phrases, sentence structures, or writing styles that are unique or common to particular types of subject matter and/or to users of particular communication systems. For example, gamers may use unique terminology, slang, or sentence structures when communicating with one another in a game environment, whereas newspaper articles or parliamentary proceedings might have a more formal tone with well-structured sentences and/or different terminology.”)).
Bojja does not expressly disclose that the classifier models providing suggested emoji are trained to provide content item candidates for given theme identifiers (but see Ebrahimian Abstract (“As social networks grow more prominent, emojis become highly popular with a huge number of users. The primary function of emojis is to fill in emotional cues that might otherwise be missing from textual communication. These days, emojis are considered to be a large part of popular culture around the world. As a consequence, proper use of emojis in text messages will make you more friendly. Recently, emoji prediction becomes a challenging task, due to the large number of classes and the lack of suitable datasets. In this paper, we propose a multimodal emoji prediction model using the contextual information and the visual information. We used the EfficientNetB7 network to extract information from the images. EfficientNetB7 gives us the 10 most likely classes for each image. Also, we used Latent Dirichlet Allocation (LDA) as topic modeling to find hidden topics [theme identifiers] in the text. The topics extracted by LDA are then combined with the BERT network to improve the performance of this network.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Ebrahimian to predict emoji with a multimodal prediction model using contextual information and visual information, at least because doing so would enable recommending an emoji based on the input text of the user. See Ebrahimian Section 1, 2nd paragraph (“Emoji prediction model could offer benefits for various models. They could be used as recommender in messaging apps, i.e. they recommend an emoji based on the input text of the user. In text generation models, we could use them to enhance the output with emojis. This property is probably becoming more important in chat bots. Since emojis usually reflect an emotion, they could also be useful in the text emotion recognition systems. Sometimes emojis reflect an activity, country flag, objects, etc., so they could be useful for various text classification tasks.”)).
Bojja further discloses adding, by the communication platform, the first generated content item to the message composition to customize the message of the entity for [the subset of] the plurality of recipients, the customized message to be transmitted to a plurality of recipient devices each associated with one of [the subset of] the plurality of recipients (Bojja ¶ 33 (“Finally, at least one of the suggested emoji is inserted (step 210) into the communication. The emoji can be inserted into the communication automatically and/or be selected by the user for insertion. The inserted emoji can replace one or more words or phrases in the communication.”)).
Although Bojja teaches generating suggested emoji using a classifier model based on user-specific and demographic information for insertion in a message such as when drafting a personal email, see ¶ 3, ¶ 33, Bojja does not expressly disclose inserting the emoji into an email for a subset of the plurality of recipients (but see Roy ¶ 18 (“The message skeleton is a basic version of the advertising message, typically without modifiers, that will be personalized for each segment. The message skeleton is evaluated to identify transformation points where keywords may be modified by the insertion of segment-specific modifiers. Extracted modifiers are then evaluated for insertion at the transformation points. If a modifier, which sufficiently expresses the desired sentiment in the language model of a segment, is found, the modifier is inserted to modify a keyword to personalize the message skeleton for the target segment. Personalized messages produced in this manner are then used by the marketer for communications to customers in the segment of the targeted campaign.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Roy to insert the suggested emoji to personalize the message for a target segment, at least because doing so would enable personalizing textual messages based on demographic characteristics. See Roy ¶ 20 (“Although linguistic personalization of advertising messages for a product is described herein, it should be understood that the techniques herein are applicable to personalizing messages for a set of products, as well as a single product. Further, the techniques for linguistic personalization of messages are described in the context of personalizing advertising messages for targeted advertising campaigns; however, these techniques are generally applicable to personalizing textual messages based on demographic characteristics in any context. Consequently, performance of the example procedures is not limited to the advertising messages and targeted marketing campaigns.”). 
Claims 11 and 16 are apparatus and CRM claims corresponding to claim 1 and, therefore, are similarly rejected.

Regarding claim 2, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 1 as discussed above. Bojja does not expressly disclose wherein determining the theme identifier further comprises:
receiving the user input indicating the one of the one or more theme candidates (but see Manico ¶ 70 (“Box 690 is included in step 680 and contains a list of available themes, which can be provided locally via a removable memory device such as a memory card or DVD or via a network connection to a service provider. Third party participants and copyrighted content owners can also provide themes on a pay per use type arrangement. The combined input and derived metadata, the analysis and classification algorithm output, and organized asset collection is used to limit the user's choices to themes that are appropriate for the content of the assets and compatible with the asset types. At step 200 the user has the option to accept or reject the suggested theme. If no theme is suggested at step 680 or the user decides to reject the suggested theme at step 200, she is given the option to manually select a theme from a limited list of themes or from the entire available library of available themes at step 210.”)). 
The rationale for combining Bojja with Manico is the same as set forth earlier.
Claim 12 is an apparatus claim corresponding to claim 2 and, therefore, is similarly rejected.

Regarding claim 4, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 1 as discussed above. Bojja further discloses wherein the first generated content item is a first generated multimedia content item, (Bojja ¶ 5 (“Implementations of the systems and methods described herein can be used to suggest one or more emoji to users…”)) and wherein obtaining the first generated multimedia content item further comprises:
generating one or more multimedia content item candidates using the second machine learning model, wherein each multimedia content item candidate of the one or more multimedia content item candidates is associated with the theme identifier; (Bojja ¶ 102 (“When a user starts typing a word, for example, the language model can predict or suggest emoji, based on the partially typed word. The language model can preferably rank any emoji suggestions from a group of possible suggestions, and the highest ranked suggestion can be presented at or near a cursor position, for possible selection by the user.”))
providing the one or more multimedia content item candidates for presentation to the entity; and (Bojja ¶ 5 (“…the system can provide emoji suggestions to the user in real-time or near real-time.”))
receiving user input indicating the first generated multimedia content item, the first generated multimedia content item corresponding to one of the one or more multimedia content item candidates (Bojja ¶ 5 (“The user may then select one of the emoji suggestions, and the emoji of the suggestion can be inserted into the content at the appropriate location (e.g., at or near a current input cursor position) or can replace a portion of the content.”)).

Regarding claim 5, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 4 as discussed above. Bojja further discloses wherein generating the one or more multimedia content item candidates further comprises: providing the theme identifier as input to the second machine learning model; and obtaining an output of the second machine learning model, the output indicating the one or more multimedia content item candidates (Bojja ¶ 102 (“When a user starts typing a word, for example, the language model can predict or suggest emoji, based on the partially typed word. The language model can preferably rank any emoji suggestions from a group of possible suggestions, and the highest ranked suggestion can be presented at or near a cursor position, for possible selection by the user.”)).

Regarding claim 6, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 4 as discussed above. Bojja further discloses upon providing the one or more multimedia content item candidates for presentation to the entity, receiving user input indicating an updated theme identifier; (Bojja ¶ 97 (user enters the word “police” and receives a police emoji character and then the user types “gear” (“updated theme identifier”)) 
generating one or more updated multimedia content item candidates using the second machine learning model, wherein each updated multimedia content item candidate of the one or more updated multimedia content item candidates is associated with the updated theme identifier; and (Bojja ¶ 97 (upon entering the word “gear” the user receives additional emoji suggestions (“updated multimedia content item candidates”))
providing the one or more updated multimedia content item candidates for presentation to the entity (Bojja ¶ 97 (emojis are suggested to user for the word “gear”)).
Claim 19 is a CRM claim corresponding to claim 6 and, therefore, is rejected similarly.

Regarding claim 9, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 1 as discussed above. Bojja further discloses wherein the message composition is an electronic mass communication message template, (Bojja ¶ 29 (“The given content can be within an electronic document, an electronic message, or other electronic communication.”)) . . . and the first generated content item is an image (Bojja ¶ 29 (suggest emoji to users)). Bojja does not expressly disclose the theme identifier is a descriptive text caption. However, Leydon teaches identifying a context of a user input by semantically analyzing the segments of interest present in the input: “To determine a context of one or more segments of interest, the segment analysis module 206 may semantically analyze the segments of interest present in the input field. Those of skill in the art will appreciate that the semantic analysis of segments may be performed in accordance with one or more techniques known in the art. When analyzing the context of one or more segments of interest, the segment analysis module 206 may determine a subtext or a meaning for the segments of interest. Based on the subtext or meaning identified for the segments of interest, the emoticon suggestion system 200 may identify one or more candidate emoticons for suggestion. The subtext of a segment of the interest may identify a mood or an emotion for that segment of interest. Example subtexts for segments of interest may include, without limitation, happiness, sadness, indifference, anger, resentment, contrition, or excitement. The meaning for segments of the interest may identify an explicit meaning for segments of interest. For example, where a segment of interest recites “I just got a new job!,” the segment analysis module 206 may identify the meaning for the segment of interest as “new job.” Leydon ¶ 47. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Leydon to explicitly identify a subtext of the user input, at least because doing so would enable users to browse and select appropriate emoticons for a given context. Leydon ¶ 4.
Claim 20 is a CRM claim corresponding to claim 9 and, therefore, is rejected similarly.

Regarding claim 17, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 16 as discussed above. Bojja does not expressly disclose wherein the first generated content item is a first generated multimedia content item, (but see Manico ¶ 71 (“At decision step 230 the user is given the option to accept or reject the theme specific assets and effects and if she chooses to reject them, the system presents an alternative set of assets and effects for approval or rejection at step 250. Once the user accepts the theme specific third party assets and effects at step 230, they are combined with the organized user assets at step 240 and the preview module is initiated at step 260.”)) and wherein obtaining the first generated multimedia content item further comprises:
generating one or more multimedia content item candidates using the second machine learning model, wherein each multimedia content item candidate of the one or more multimedia content item candidates is associated with the theme identifier; (but see Manico ¶ 71 (“A selected theme is used in conjunction with the metadata to acquire theme specific third party assets and effects. At step 220 this additional content and treatments can be provided by a removable memory device or can be accessed via a communication network from a service provider or via pointers to a third party provider. Arrangements between various participants concerning revenue distribution and terms for usage of these properties can be automatically monitored and documented by the system based on usage and popularity. These records can also be used to determine user preferences so that popular theme specific third party assets and effects can be ranked higher or given a higher priority increasing the likelihood of consumer satisfaction. These third party assets and effects include dynamic auto-scaling image templates, automatic image layout algorithms, video scene transitions, scrolling titles, graphics, text, poetry, music, songs, digital motion and still images of celebrities, popular figures, and cartoon characters all designed to be used in conjunction with user generated and/or acquired assets. The theme specific third party assets and effects as a whole are suitable for both hardcopy such as greeting cards, collages, posters, mouse pads, mugs, albums, calendars, and soft copy such as movies, videos, digital slide shows, interactive games, websites, DVDs, and digital cartoons.”))
providing the one or more multimedia content item candidates for presentation to the entity; and (but see Manico ¶ 71 (“The selected assets and effects can be presented to the user, for her approval, as set of graphic images, a story board, a descriptive list, or as a multimedia presentation. At decision step 230 the user is given the option to accept or reject the theme specific assets and effects and if she chooses to reject them, the system presents an alternative set of assets and effects for approval or rejection at step 250.”))
receiving user input indicating the first generated multimedia content item, the first generated multimedia content item corresponding to one of the one or more multimedia content item candidates (but see Manico ¶ 71 (“Once the user accepts the theme specific third party assets and effects at step 230, they are combined with the organized user assets at step 240 and the preview module is initiated at step 260.”)).
The rationale for combining Bojja with Manico is the same as set forth earlier.

Claims 3, 13, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bojja, Manico, , Zhu, Ebrahimian, and Roy as applied to claims 2, 12, and 17 above, and further in view of Zhu, Lixing, et al. "Topic-driven and knowledge-aware transformer for dialogue emotion detection." arXiv preprint arXiv:2106.01071 (2021) (“Zhu”).
Regarding claim 3, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 2 as discussed above. Bojja does not expressly disclose wherein generating the one or more theme candidates further comprises: providing content of the message composition as input to the first machine learning model; and obtaining an output of the first machine learning model, the output indicating the one or more theme candidates. However, Zhu teaches using a transformer to label a word collection with a dominant theme name. Abstract, page 8 of 12. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Zhu to analyze the input text to determine a dominant theme, at least because doing so would enable a user to correct the keyword as input to the emoji selection process.
Claims 13 and 18 are apparatus and CRM claims corresponding to claim 2 and, therefore, are rejected similarly.

Claims 7, 8, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Bojja, Manico, Zhu, Ebrahimian, and Roy as applied to claims 1 and 11 above, and further in view of Desjardins (US 2018/0356957 A1; published Dec. 13, 2018).
Regarding claim 7, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 1 as discussed above. Bojja does not expressly disclose wherein adding the first generated content item to the message composition further comprises: identifying a template field of the message composition; and replacing the template field with the first generated content item for a first recipient segment of the plurality of recipients. However, Desjardins teaches determining placement positions for insertion of an emoji in a message thread based on a result of a contextual analysis and automatically inserting the emoji at the determined position. ¶ 12. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Desjardins to automatically insert emojis at determined positions in a message thread between users, at least because doing so would improve user interaction for suggestion and placement of emojis. See Desjardins ¶ 4. 
Claim 14 is an apparatus claims corresponding to claim 7 and is rejected similarly.
 
Regarding claim 8, Bojja, in view of Manico, Zhu, Ebrahimian, Roy, and Desjardins, discloses the invention of claim 7 as discussed above. Bojja does not expressly disclose wherein the template field further indicates a plurality of recipient segments of the plurality of recipients, the method further comprising replacing the template field with a second generated content item for a second recipient segment of the plurality of recipients. However, Desjardins teaches determining placement positions for insertion of an emoji in a message thread based on a result of a contextual analysis and automatically inserting the emoji at the determined position. ¶ 12. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Desjardins to automatically insert emojis at determined positions in a message thread between users, at least because doing so would improve user interaction for suggestion and placement of emojis. See Desjardins ¶ 4.
Claim 15 is an apparatus claim corresponding to claim 8 and, therefore, is rejected similarly.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Bojja, Manico, Zhu, Ebrahimian, and Roy as applied to claims 1 above, and further in view of Gehrman (US 2021/0192126 A1; published Jun. 24, 2021) and Ghosh (US 2024/0062008 A1; published Feb. 22, 2024).
Regarding claim 10, Bojja, in view of Manico, Zhu, Ebrahimian, and Roy, discloses the invention of claim 1 as discussed above. Bojja does not expressly disclose wherein the first machine learning model is a transformer machine learning model. However, Gehrman teaches generating structured text summaries based on a document/text segment utilizing a sequence-to-sequence transformer model. ¶ 27. It would have been obvious to one of ordinary skill in the art before the claimed invention to have modified Bojja to incorporate the teachings of Gehrman to generate a text summary corresponding to a theme of user inputted text using a sequence-to-sequence transformer model, at least because doing so would enable focusing on a particular theme of the input text. Gehrman ¶ 3.
Bojja does not expressly disclose the second machine learning model is a diffusion machine learning model. However, Ghosh teaches a text-to-image machine-learning model that converts input text to an image, the text-to-image machine-learning model being a latent diffusion model. ¶ 49. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bojja to incorporate the teachings of Ghosh to generate emoji corresponding to a detected keyword in the user input at least because doing so would enable enhancing electronic text messages with related images such as emoji. Ghosh ¶ 2.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Jia et al. (US 11,822,612 B2) Automatic Identification Of Additional Content For Webpages.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Jung can be reached at (571)270-3779. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAHID K KHAN/Primary Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Dec 28, 2022
Application Filed
Sep 16, 2024
Non-Final Rejection — §101, §103
Oct 30, 2024
Interview Requested
Nov 21, 2024
Applicant Interview (Telephonic)
Nov 21, 2024
Examiner Interview Summary
Dec 16, 2024
Response Filed
Mar 24, 2025
Non-Final Rejection — §101, §103
Jun 30, 2025
Response Filed
Oct 04, 2025
Final Rejection — §101, §103
Oct 22, 2025
Interview Requested
Nov 18, 2025
Applicant Interview (Telephonic)
Nov 19, 2025
Examiner Interview Summary
Jan 07, 2026
Request for Continued Examination
Jan 24, 2026
Response after Non-Final Action
Feb 07, 2026
Non-Final Rejection — §101, §103
Apr 10, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/807,290
Patent 12591768
DEEP LEARNING ACCELERATION WITH MIXED PRECISION
2y 5m to grant Granted Mar 31, 2026
18/675,206
Patent 12579516
System and Method for Organizing and Designing Comment
2y 5m to grant Granted Mar 17, 2026
18/525,525
Patent 12566813
SYSTEMS AND METHODS FOR RENDERING INTERACTIVE WEB PAGES
2y 5m to grant Granted Mar 03, 2026
18/263,279
Patent 12547298
Display Method and Electronic Device
2y 5m to grant Granted Feb 10, 2026
17/589,370
Patent 12530916
MULTIMODAL MULTITASK MACHINE LEARNING SYSTEM FOR DOCUMENT INTELLIGENCE TASKS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

4-5
Expected OA Rounds
74%
Grant Probability
90%
With Interview (+15.7%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 389 resolved cases by this examiner. Grant probability derived from career allow rate.
ADDING THEME-BASED CONTENT TO MESSAGES USING ARTIFICIAL INTELLIGENCE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email