Last updated: May 29, 2026
Application No. 18/381,105
ADAPTIVE SUGGESTIONS FOR STICKERS

Final Rejection §101§102§103
Filed
Oct 17, 2023
Priority
Jun 02, 2023 — provisional 63/470,829
Examiner
KY, KEVIN
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
2 (Final)
Interview Optional

— +25.6% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 77% grant rate with +25.6% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 560 resolved cases, 2023–2026
Examiner Intelligence

KY, KEVIN View full profile →
Grants 77% — above average
Career Allowance Rate
429 granted / 560 resolved
+14.6% vs TC avg
Strong +26% interview lift
Without
With
+25.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
21 currently pending
Career history
584
Total Applications
across all art units
Statute-Specific Performance

§101
13.0%
-27.0% vs TC avg
§103
74.8%
+34.8% vs TC avg
§102
5.6%
-34.4% vs TC avg
§112
4.0%
-36.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 560 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more. The claim(s) recite(s) limitations that fall under the grouping of abstract idea of “Certain Methods of Organizing Human Activity”, e.g. Concepts Relating To Managing Human Behavior (Step 2A, Prong One) and “Mental Processes”, e.g. concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (step 2A). Specifically, the claim recites various steps of human activity that can be performed in the mind or with pen and paper, such as receiving an image and text input, generating tags or embeddings, storing tags, and selecting/providing an image based on comparison. These limitations describe organizing, storing, and retrieving information, which are mental processes and certain methods of organizing human activity, which are abstract ideas.
Under step 2A, prong two, this judicial exception is not integrated into a practical application. The claim does not recite any specific computing implementation, does not require any particular algorithm, model or hardware configuration, and lacks technical improvement to image processing. Furthermore, the judicial exception is not integrated into a practical application because the claims are directed to an abstract idea with additional generic computer elements (e.g. processor, memory, computer storage medium, etc.), which are generically recited computer elements that do not add a meaningful limitation to the abstract idea because they amount to simply implementing the abstract idea on a computer.
Under step 2B, the claims does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because these are well-understood, routine, conventional computer functions as recognized by the court decisions listed in MPEP § 2106.05(d). The claims do not include an inventive concept that is sufficient to transform the abstract idea into a patent-eligible application.
The dependent claims are considered an abstract idea for the same reasoning’s as listed above (e.g. human activity, mental process), or the claims are not considered to be significantly more. For example, claim 2, 12 and 21 recite nothing more than limiting the abstract idea to a particular environment or field of use. Claims 3 and 13 reflects a conventional manner of receiving input on a generic device, which does not add meaningful limitation to the abstract idea. Claims 4 and 15 recite a computer vision model, but this amounts to a known or generic algorithm to automatic tagging, and does not improve the functioning of a computer or another technology. Claims 5 and 16 recite that the tags and text input correspond to embeddings and the comparison comprises determining a distance. However, this limitation recites a mathematical operation, which itself is an abstract concept and does not integrate the abstract idea into a practical application. Claims 6-10 and 17-20 merely recite extra information gathering, analysis, and output steps or specify where the image is sent. Claim 22-24 recites steps of data analysis, classification, and retrieval. These are steps of mental process and organizing information. These are all well-known, routine and conventional steps and again does not recite an improvement or integrate the abstract idea into a practical application.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 11-15, 17-18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Bojja et al (US 20170185581).
Regarding claim 11, Bojja discloses a non-transitory computer-readable medium storing instructions that, when executed by a processor, causes the processor to perform operations (¶125) comprising: 
obtaining, by a user device, text input (Fig. 5 showing multiple clients 500 with crowdsourcing client 524, type-guessing 520, and text transformations 522; ¶33 The method 200 begins by obtaining (step 202) features associated with a communication (e.g., an electronic message) of a user; the features can include…one or more words from the communication); 
selecting, by the user device, an image based on a comparison between the text input and a tag associated with the image, the tag having been derived from at least one of the image or a prior use of the image, and the image having been extracted from another image (¶12 the keyword matching module can be configured to search for at least one keyword in the communication and match the at least one keyword with at least one tag associated with emoji; ¶33 The suggested emoji can be identified with the assistance of the manager module 120, which can select particular emoji detection methods and/or classifiers to use based on various factors, including, for example, a linguistic domain (e.g., gaming, news, parliamentary proceedings, politics, health, travel, web pages, newspaper articles, and microblog messages), a language used in the communication, one or more user preferences, and the like. The linguistic domain may define or include, for example, words, phrases, sentence structures, or writing styles that are unique or common to particular types of subject matter and/or to users of particular communication systems. For example, gamers may use unique terminology, slang, or sentence structures when communicating with one another in a game environment, whereas newspaper articles or parliamentary proceedings might have a more formal tone with well-structured sentences and/or different terminology); and 
providing, by the user device and responsive to obtaining the text input, the image (¶33 at least one of the suggested emoji is inserted (step 210) into the communication.).

Regarding claim 12, Bojja discloses the non-transitory computer-readable medium of claim 11, wherein the image is a user-placeable digital sticker (Bojja ¶124 the systems and methods, including the emoji detection module 116 and/or the emoji classifier module 118, are configured to suggest GIFs, stickers, and/or other non-word expression items, in addition to emoji).

Regarding claim 13, Bojja discloses the non-transitory computer-readable medium of claim 11, wherein obtaining the text input comprises receiving the text input from the user device as the text input is being typed on a keyboard of the user device (Bojja ¶36 can suggest emoji at real-time or near real-time for words or phrases and can suggest emoji while users are typing or entering messages).

Regarding claim 14, Bojja discloses the non-transitory computer-readable medium of claim 11, wherein the tag represents a subject of the image (¶123 In some instances, the emoji can be tagged with content that describes what each emoji represents. The tagging facilitates formation of a list of emoji that may be available for users).

Regarding claim 15, Bojja discloses the non-transitory computer-readable medium of claim 14, wherein the instructions cause the processor to perform operations further comprising deriving the tag by generating, with a computer vision model (¶13 one classifier includes a supervised learning model, a partially supervised learning model, an unsupervised learning model, and/or an interpolation model), one or more words that describe the subject of the image (Bojja ¶39 For example, the word “star” can be mapped to an image of a yellow star or an image of a red star; Identified phrases may overlap or be mapped to the same emoji in some instances; ¶80 The distributed storage module 506 is a server side data store (e.g., a distributed database) that stores data relevant to emoji-keyword maps).

Regarding claim 17, Bojja discloses the non-transitory computer-readable medium of claim 11, wherein the instructions cause the processor to perform operations further comprising deriving the tag by using the image in an application, wherein the usage comprises text data associated with the application (Bojja ¶32 The dictionaries 124 database may include a dictionary that relates words, phrases, or portions thereof to one or more emoji. The dictionary may cover more than one language and/or multiple dictionaries may be included in the dictionaries 124 database to cover multiple languages (e.g., a separate dictionary for each language).).

Regarding claim 18, Bojja discloses the non-transitory computer-readable medium of claim 17, wherein the text data associated with the application comprises one or more words corresponding to one or more emojis (Bojja ¶32 The dictionaries 124 database may include a dictionary that relates words, phrases, or portions thereof to one or more emoji. The dictionary may cover more than one language and/or multiple dictionaries may be included in the dictionaries 124 database to cover multiple languages (e.g., a separate dictionary for each language).).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6-8, 21 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over in view of Bojja et al (US 20170185581) in view of An et al (US 10497121 B2).
Regarding claim 1, Bojja discloses a method comprising: 
generating, by the user device, one or more tags for the second image based on the subject, wherein the one or more tags represent the subject (¶123 In some instances, the emoji can be tagged with content that describes what each emoji represents. The tagging facilitates formation of a list of emoji that may be available for users); 
storing, in a data structure on the user device, the one or more tags in association with the second image (¶80 The distributed storage module 506 is a server side data store (e.g., a distributed database) that stores data relevant to emoji-keyword maps, player usage information, player preferences, and other information useful for suggesting emoji. The distributed storage module 506 can be, include, or form part of the training data 122, dictionaries 124, chat histories 126, and/or user information 128 databases); 
obtaining, from the user device, text input (Fig. 5 showing multiple clients 500 with crowdsourcing client 524, type-guessing 520, and text transformations 522; ¶33 The method 200 begins by obtaining (step 202) features associated with a communication (e.g., an electronic message) of a user; the features can include…one or more words from the communication);
selecting, by the user device, the second image based on a comparison between the text input and the one or more tags of the data structure (¶12 the keyword matching module can be configured to search for at least one keyword in the communication and match the at least one keyword with at least one tag associated with emoji; ¶33 The suggested emoji can be identified with the assistance of the manager module 120, which can select particular emoji detection methods and/or classifiers to use based on various factors, including, for example, a linguistic domain (e.g., gaming, news, parliamentary proceedings, politics, health, travel, web pages, newspaper articles, and microblog messages), a language used in the communication, one or more user preferences, and the like. The linguistic domain may define or include, for example, words, phrases, sentence structures, or writing styles that are unique or common to particular types of subject matter and/or to users of particular communication systems. For example, gamers may use unique terminology, slang, or sentence structures when communicating with one another in a game environment, whereas newspaper articles or parliamentary proceedings might have a more formal tone with well-structured sentences and/or different terminology); and 
providing, by the user device and responsive to obtaining the text input, the second image (¶33 at least one of the suggested emoji is inserted (step 210) into the communication.).
Bojja does not specifically teach where An teaches obtaining, from a user device, a first image comprising a subject (col 3 lines 37-40 S101: Determine whether a region to be recognized corresponding to a specified feature exists in the image to be processed.); 
generating, by the user device, a second image based on the subject extracted from the first image (col 8 lines 55-60 S103: Extract an image of the main subject region as a foreground image for an extraction process of a foreground target, and take the extracted image as the main subject of the image to be processed.); 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of obtaining, from a user device, a first image comprising a subject and generating, by the user device, a second image based on the subject extracted from the first image from An into the method as disclosed by Bojja. The motivation for doing this is to accurately determine and extract contents of the main subject of the image.

Regarding claim 2, the combination of Bojja and An disclose the method of claim 1, wherein the second image is a user-placeable digital sticker (Bojja ¶124 the systems and methods, including the emoji detection module 116 and/or the emoji classifier module 118, are configured to suggest GIFs, stickers, and/or other non-word expression items, in addition to emoji).

Regarding claim 3, the combination of Bojja and An disclose the method of claim 1, wherein obtaining the text input from the user device comprises receiving the text input from the user device as the text input is being typed on a keyboard of the user device (Bojja ¶36 can suggest emoji at real-time or near real-time for words or phrases and can suggest emoji while users are typing or entering messages).

Regarding claim 4, the combination of Bojja and An disclose the method of claim 1, wherein generating the one or more tags for the second image comprises: generating, by a computer vision model (Bojja ¶13 at least one classifier includes a supervised learning model, a partially supervised learning model, an unsupervised learning model, and/or an interpolation model), one or more words that describe the second image (Bojja ¶39 For example, the word “star” can be mapped to an image of a yellow star or an image of a red star; Identified phrases may overlap or be mapped to the same emoji in some instances; ¶80 The distributed storage module 506 is a server side data store (e.g., a distributed database) that stores data relevant to emoji-keyword maps).

Regarding claim 6, the combination of Bojja and An disclose the method of claim 1, further comprising: 
providing, by the user device, the second image to an application, wherein the application includes text data (Bojja ¶3 Emoji are available for use through a variety of digital devices (e.g., mobile telecommunication devices and tablet computing devices) and are often used when drafting personal e-mails, posting messages on the Internet (e.g., on a social networking site or a web forum), and messaging between mobile devices; ¶33 at least one of the suggested emoji is inserted (step 210) into the communication); 
generating, by the user device, contextual data based on the text data associated with the application (Bojja ¶33 The method 200 begins by obtaining (step 202) features associated with a communication (e.g., an electronic message) of a user. The features can include, for example, a cursor position in the content, one or more words from the communication, one or more words from a previous communication, a user preference (e.g., preferred instances when emoji are to be used, preferred specific emoji, preferred types of emoji, or preferred categories of emoji), and/or demographic information (e.g., an age, gender, ethnicity, income, or citizenship of the user and/or a recipient); and 
storing, in another data structure on the user device, the contextual data in association with the second image (Bojja ¶32 the chat histories 126 database can contain information about past usage of emoji by users, including, for example, whether the users selected one or more emoji suggestions and/or the resultant emoji suggested by the automated system 112. Information related to selection based on rank ordering of emoji suggestions may be stored. The user information 128 database may include demographic information (e.g., age, race, ethnicity, gender, income, residential location, etc.) for users, including both senders and recipients. The user information 128 database may include certain user emoji preferences, such as settings that define the instances when emoji are to be used or are not to be used, any preferences for automatic emoji insertion, and/or any preferred emoji types (e.g., facial expressions or animals) that users may have. In general, the emoji classifier module 118 receives input from the emoji detection module 116, and/or the manager module 120 receives input from the emoji classifier module 118.).

Regarding claim 7, the combination of Bojja and An disclose the method of claim 6, further comprising, in response to the comparison yielding no images: 
selecting, by the user device, a third image based on a comparison between the text input and the contextual data of the other data structure (Bojja ¶97 even when there are no emoji to suggest for a complete phrase, there can be an emoji mapping for few words in the phrase; The systems can locate emojifiable words or phrases and rank suggestions among many available suggestions. For example, when a user is typing “police gear” in a search box, emoji suggestions may be available for the words “police man” and “sports gear” separately, but there may be no emoji suggestions for the complete phrase “police gear.”); and 
providing, by the user device and responsive to obtaining the text input, the third image (Bojja ¶97 emoji suggestions may be available for the words “police man” and “sports gear” separately).

Regarding claim 8, the combination of Bojja and An disclose the method of claim 6, wherein generating the contextual data comprises generating one or more contextual words based on a comparison between at least some of the associated text data and a predetermined word list comprising one or more words corresponding to one or more emojis (Bojja ¶32 The dictionaries 124 database may include a dictionary that relates words, phrases, or portions thereof to one or more emoji. The dictionary may cover more than one language and/or multiple dictionaries may be included in the dictionaries 124 database to cover multiple languages (e.g., a separate dictionary for each language).).

Regarding claim 21, Bojja discloses a device comprising: 
a processor (¶129-130) configured to: 
generate one or more tags for an image of a subject, wherein the one or more tags represent the subject (¶123 In some instances, the emoji can be tagged with content that describes what each emoji represents. The tagging facilitates formation of a list of emoji that may be available for users); 
store, in a data structure on the device, the one or more tags in association with the image (¶80 The distributed storage module 506 is a server side data store (e.g., a distributed database) that stores data relevant to emoji-keyword maps, player usage information, player preferences, and other information useful for suggesting emoji. The distributed storage module 506 can be, include, or form part of the training data 122, dictionaries 124, chat histories 126, and/or user information 128 databases); 
obtain text input (Fig. 5 showing multiple clients 500 with crowdsourcing client 524, type-guessing 520, and text transformations 522; ¶33 The method 200 begins by obtaining (step 202) features associated with a communication (e.g., an electronic message) of a user; the features can include…one or more words from the communication); 
select the image based on a comparison between the text input and the one or more tags of the data structure (¶12 the keyword matching module can be configured to search for at least one keyword in the communication and match the at least one keyword with at least one tag associated with emoji; ¶33 The suggested emoji can be identified with the assistance of the manager module 120, which can select particular emoji detection methods and/or classifiers to use based on various factors, including, for example, a linguistic domain (e.g., gaming, news, parliamentary proceedings, politics, health, travel, web pages, newspaper articles, and microblog messages), a language used in the communication, one or more user preferences, and the like. The linguistic domain may define or include, for example, words, phrases, sentence structures, or writing styles that are unique or common to particular types of subject matter and/or to users of particular communication systems. For example, gamers may use unique terminology, slang, or sentence structures when communicating with one another in a game environment, whereas newspaper articles or parliamentary proceedings might have a more formal tone with well-structured sentences and/or different terminology); and 
provide, responsive to obtaining the text input, the image (¶33 at least one of the suggested emoji is inserted (step 210) into the communication.).
Bojja does not specifically teach where An teaches the image of the subject having been extracted from another image (col 8 lines 55-60 S103: Extract an image of the main subject region as a foreground image for an extraction process of a foreground target, and take the extracted image as the main subject of the image to be processed.).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of the image of the subject having been extracted from another image from An into the method as disclosed by Bojja. The motivation for doing this is to accurately determine and extract contents of the main subject of the image.

Regarding claim 23, the combination of Bojja and An disclose the method of claim 1, wherein the generating of the one or more tags, the storing in the data structure, the selecting of the second image, and the providing of the second image are performed locally on the user device without transmitting the first image or the second image to a remote server (Bojja ¶31 software components for the system 100 (e.g., the emoji detection module 116, the emoji classifier module 118, and/or the manager module 120) or any portions thereof can reside on or be used to perform operations on one or more client devices).



Claim(s) 5 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bojja and An as applied to claim 1 and 6 above, and further in view of Dela Rosa et al (US Patent 12254049 B2).
Claim(s) 16 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bojja as applied to claim 11 and 17 above, and further in view of Dela Rosa et al (US Patent 12254049 B2).
Regarding claim 5, the combination of Bojja and An disclose the method of claim 1, but fails to teach where Dela Rosa teaches wherein the one or more tags of the data structure comprises one or more image embeddings, the text input corresponds to a text embedding (col 19 lines 15-25 encode a plurality of images and text into a common embedding space and search, based on the embedding query vector, a database of AR experiences to identify a subset of AR experiences associated with one or more embeddings that correspond to the embedding query vector), and the comparison comprises determining a distance between the one or more tags of the data structure and the text embedding in an embedding space (col 19 lines 44-50 computes a Gaussian distance metric between the embedding query vector and each of a plurality of indices. The AR experience search system 224 determines that the nearest neighbor indices are associated with the Gaussian distance metric that corresponds to a similarity threshold).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the one or more tags of the data structure comprises one or more image embeddings, the text input corresponds to a text embedding and the comparison comprises determining a distance between the one or more tags of the data structure and the text embedding in an embedding space from Dela Rosa into the method as disclosed by the combination of Bojja and An. The motivation for doing this is to improve determining a category of content associated with the features of the image.

Regarding claim 9, the combination of Bojja and An disclose the method of claim 6, but fails to teach where Dela Rosa teaches wherein the text data corresponds to a text embedding and wherein generating the contextual data comprises selecting one or more contextual words associated with contextual embeddings that are within a threshold distance from the text embedding (Dela Rosa col 19 lines 15-25 encode a plurality of images and text into a common embedding space and search, based on the embedding query vector, a database of AR experiences to identify a subset of AR experiences associated with one or more embeddings that correspond to the embedding query vector; col 19 lines 44-50 computes a Gaussian distance metric between the embedding query vector and each of a plurality of indices. The AR experience search system 224 determines that the nearest neighbor indices are associated with the Gaussian distance metric that corresponds to a similarity threshold).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the text data corresponds to a text embedding and wherein generating the contextual data comprises selecting one or more contextual words associated with contextual embeddings that are within a threshold distance from the text embedding from Dela Rosa into the method as disclosed by the combination of Bojja and An. The motivation for doing this is to improve determining a category of content associated with the features of the image.

Regarding claim 16, Bojja discloses the non-transitory computer-readable medium of claim 11, but fails to teach where Dela Rosa teaches wherein the tag comprises an image embedding, the text input comprises a text embedding (col 19 lines 15-25 encode a plurality of images and text into a common embedding space and search, based on the embedding query vector, a database of AR experiences to identify a subset of AR experiences associated with one or more embeddings that correspond to the embedding query vector), and the comparison comprises determining a distance between the tag and the text embedding in an embedding space (col 19 lines 44-50 computes a Gaussian distance metric between the embedding query vector and each of a plurality of indices. The AR experience search system 224 determines that the nearest neighbor indices are associated with the Gaussian distance metric that corresponds to a similarity threshold).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the one or more tags of the data structure comprises one or more image embeddings, the text input corresponds to a text embedding and the comparison comprises determining a distance between the one or more tags of the data structure and the text embedding in an embedding space from Dela Rosa into the non-transitory computer-readable medium as disclosed by Bojja. The motivation for doing this is to improve determining a category of content associated with the features of the image.

Regarding claim 19, Bojja discloses the non-transitory computer-readable medium of claim 17, but fails to teach where Dela Rosa teaches wherein the text data associated with the application comprises a text embedding (col 19 lines 15-25 encode a plurality of images and text into a common embedding space and search, based on the embedding query vector, a database of AR experiences to identify a subset of AR experiences associated with one or more embeddings that correspond to the embedding query vector).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the text data associated with the application comprises a text embedding from Dela Rosa into the non-transitory computer-readable medium as disclosed by Bojja. The motivation for doing this is to improve determining a category of content associated with the features of the image.

Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bojja and An as applied to claim 6 above, and further in view of Ise (US 20170208288).
Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bojja as applied to claim 17 above, and further in view of Ise (US 20170208288).
Regarding claim 10, the combination of Bojja and An disclose the method of claim 6, but fails to teach where Ise teaches wherein providing the second image to the application comprises: 
receiving, by the user device, a touch-down input on a first area of an electronic display associated with the second image (¶104 When “touch down” against the display item is detected on the screen displaying a captured image with OSD superimpose recording being off, a first guide is displayed. The first guide notifies the user that it is possible to superimpose and record the touched display item on the captured image by turning on OSD superimpose recording by dragging the touched display item to the display area of the captured image.); 
receiving, by the user device, a touch-up input on a second area of the electronic display (¶104 Thereafter, when the display item is dragged and the touch position is moved into the display area of the captured image before “touch up,” a second guide is displayed. The second guide notifies the user that the touched display item can be superimposed and recorded on the captured image by turning on OSD superimpose recording when “touch up” is detected. Then, when “touch up” is detected with the touch position within the display area of the captured image, OSD superimpose recording setting is turned on so that the dragged display item is superimposed and recorded on the captured image) associated with a message transcript (wherein Bojja specifically teaches a message transcript ¶33 at least one of the suggested emoji is inserted (step 210) into the communication); and 
adding, by the user device, the second image to the message transcript (wherein Bojja specifically teaches a message transcript ¶33 at least one of the suggested emoji is inserted (step 210) into the communication), in response to receiving the touch-up input (¶104 Then, when “touch up” is detected with the touch position within the display area of the captured image, OSD superimpose recording setting is turned on so that the dragged display item is superimposed and recorded on the captured image.).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of receiving, by the user device, a touch-down input on a first area of an electronic display associated with the second image receiving, by the user device, a touch-up input on a second area of the electronic display associated with a message transcript adding, by the user device, the second image to the message transcript in response to receiving the touch-up input from Ise into the method including the message transcript as disclosed by the combination of Bojja and An. The motivation for doing this is to improve superimposition of images.

Regarding claim 20, Bojja discloses the non-transitory computer-readable medium of claim 17, but fails to teach where Ise teaches wherein using the image in the application comprises:
receiving, by the user device, a touch-down input on a first area of an electronic display associated with the image (¶104 When “touch down” against the display item is detected on the screen displaying a captured image with OSD superimpose recording being off, a first guide is displayed. The first guide notifies the user that it is possible to superimpose and record the touched display item on the captured image by turning on OSD superimpose recording by dragging the touched display item to the display area of the captured image.); 
receiving, by the user device, a touch-up input on a second area of the electronic display (¶104 Thereafter, when the display item is dragged and the touch position is moved into the display area of the captured image before “touch up,” a second guide is displayed. The second guide notifies the user that the touched display item can be superimposed and recorded on the captured image by turning on OSD superimpose recording when “touch up” is detected. Then, when “touch up” is detected with the touch position within the display area of the captured image, OSD superimpose recording setting is turned on so that the dragged display item is superimposed and recorded on the captured image) associated with a message transcript of the application (wherein Bojja specifically teaches a message transcript ¶33 at least one of the suggested emoji is inserted (step 210) into the communication); and 
adding, by the user device, the image to the message transcript (wherein Bojja specifically teaches a message transcript ¶33 at least one of the suggested emoji is inserted (step 210) into the communication), in response to receiving the touch-up input (¶104 Then, when “touch up” is detected with the touch position within the display area of the captured image, OSD superimpose recording setting is turned on so that the dragged display item is superimposed and recorded on the captured image.).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein using the image in the application comprises: receiving, by the user device, a touch-down input on a first area of an electronic display associated with the image; receiving, by the user device, a touch-up input on a second area of the electronic display; and adding, by the user device, the image to the message transcript, in response to receiving the touch-up input from Ise into the non-transitory computer-readable medium including the message transcript as disclosed by Bojja. The motivation for doing this is to improve superimposition of images.

Claim(s) 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bojja and An as applied to claim 1 and 6 above, and further in view of Chang et al (US Patent 10659405 B2).
Regarding claim 22 the combination of Bojja and An disclose the method of claim 1, but fail to teach where Chang wherein:
generating the second image based on the subject extracted from the first image comprises extracting the subject responsive to a user selection of the subject in the first image (col 47 line 62-67 When a different representation 622 is selected, device 600 updates region 618 to indicate the selected representation 622 and updates sticker region 620 to display stickers corresponding to the selected representation. In FIG. 6C, monkey representation 622-1 is selected in first region 618, and monkey stickers 624 are displayed in sticker region 620);
the second image is a user-placeable digital sticker with a transparent background (see stickers in Fig. 6k wherein sticker 642-1 has transparent background; see further Fig. 9L having different background options 952-1 to 952-6); and 
obtaining the text input comprises obtaining the text input while the user is typing on a predictive keyboard (col 46 lines 36-39 In FIG. 6A, device 600 detects input 604 (e.g., a tap gesture) on affordance 606 and, in response, displays avatar keyboard 605 in keyboard region 603-3, as shown in FIG. 6B.), and providing the second image comprises providing the second image for display within a suggestion area of the predictive keyboard (col 46 lines 48-55 As shown in FIG. 6B, sticker region 610 includes stickers 612, which can be selected for communicating in message user interface 603. The stickers displayed in sticker region 610 each have an appearance that is based on various avatars that are available at device 600).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein: generating the second image based on the subject extracted from the first image comprises extracting the subject responsive to a user selection of the subject in the first image, the second image is a user-placeable digital sticker with a transparent background, and obtaining the text input comprises obtaining the text input while the user is typing on a predictive keyboard, and providing the second image comprises providing the second image for display within a suggestion area of the predictive keyboard from Chang into the method including the message transcript as disclosed by the combination of Bojja and An. The motivation for doing this is to improve displaying and using images in various application user interfaces using electronic devices.

Claim(s) 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bojja and An as applied to claim 1 and 6 above, and further in view of Manin et al (US 20200159765).
Regarding claim 24, the combination of Bojja and An disclose the method of claim 1, but fail to teach where Manin teaches wherein selecting the second image based on the comparison between the text input and the one or more tags comprises computing a similarity score between the text input and the one or more tags and selecting the second image based on the similarity score (¶58 The system determines the relevance score for a candidate image based on a similarity measure that measures a similarity of: (i) the content labels for the candidate image, and (ii) the content labels for the search query; ¶61 The system determines a ranking of the candidate images based on the relevance scores for each candidate image (312)).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein selecting the second image based on the comparison between the text input and the one or more tags comprises computing a similarity score between the text input and the one or more tags and selecting the second image based on the similarity score from Manin into the method including the message transcript as disclosed by the combination of Bojja and An. The motivation for doing this is to improve suggestion of candidate images based in part on the relevance scores for the candidate images.

Response to Arguments
Applicant's arguments filed 12/19/2025 have been fully considered but they are not persuasive.
Regarding claim 1-21, the applicant contends that the claims are not directed to an abstract idea because they involve computer vision processing, embedding space comparisons, and operations that cannot be performed mentally. These arguments are not persuasive.
The claims reside steps including obtaining data, analyzing content, storing data, comparing text to tags, and selecting and presenting an image. These steps constitute data collection, analysis, and selection, which falls within the abstract idea groupings of mental processes and certain methods of organizing human activity. The applicant’s arguments that the claims cannot be performed mentally is not persuasive. The proper inquiry is whether the claim steps are conceptually capable of being performed in the human mind, not whether they can be performed with the same speed or scale. A human can observe an image, identify a subject, associate descriptive labels, compare those labels to text inputs, and select a corresponding image.
Applicants reliance on computer vision and embedding techniques is also not persuasive because such features are not recited in the claim. Eligibility is determined based on the claim language, not unclaimed details in the specification. Accordingly, the claims recite an abstract idea.
Under Step 2A Prong 2,  the claims do not integrate the abstract idea into a practical application. The additional elements, such as user device, processor, and data structure, are recited at a high level of generality and performed generic computer functions. The claims do not improve computer functionality or another technology but instead use a computer as a tool to perform the abstract idea of selecting content based on input.
Under Step 2B, the claims do not include additional elements that mounted significantly more than the abstract idea. The recited elements, such as obtaining data, storing data, comparing data, and presenting results are well understood routine and conventional computer functions. The claims are drafted and in a functional manner and do not recite any specific technological implementation or improvement.
Applicants arguments regarding Berkheimer are not persuasive because the claims do not recite any specific features that would require a factual determination regarding whether they are well understood routine and conventional. Rather, the claims recite only generic computer operations. Additionally, applicants reliance on cases such as BASCOM, McRO and Enfish are improper and not persuasive, as those cases involve specific technological improvement and claim structures not present here.
Regarding claim 11, applicant contends that the apply references (i) fail to disclose tags derived from an image or prior use of the image and (ii) an image extracted from another image. Those arguments are not persuasive.
Under the broadest reasonable interpretation consistent with the specification, the claimed “tag having been derived from at least one of the image or a prior use of the image” encompasses tags that describe or represent the content of the image and/or are informed by usage context.
Bojja discloses that emoji are associated with descriptive tags indicating what each emoji represents (see paragraph 123). Such tags inherently correspond to the visual content of the emoji (e.g. the image), and thus are reasonably interpreted as being derived from the image. Further Bojja discloses the selection of emoji may be influenced by user preferences, communication context, and prior usage (¶133), which constitutes derivation based on prior use. The applicant’s argument improperly requires an explicit derivation mechanism which is not recited in the claim.
With respect to the limitation that “the image is extracted from another image”, the claim does not positively recite any step requiring that the extraction be performed by the claim system or at a particular time. Rather the claim merely requires that the selected image has such a characteristic. Bojja operates on a respiratory of image assets (emoji), and does not limit the origin or manner of creation of those images. Under BRI the reference need not expressly describe the image generation pipeline where the claim does not require it. Applicants argument improperly imports process limitation into a structural/result limitation. Accordingly Bojja discloses or renders obvious all limitations of claim 11 and the rejection is maintained.
The applicant then argues that regarding claim 1, Bojja does not disclose image extraction and that An does not disclose tagging or text based selection and further asserts that the proposed combination will require substantial modification and lacks motivation. These arguments are not persuasive.
At the onset, applicants argument improperly attack the references individually rather than the rejection as a whole. A rejection under 35 U.S.C. 103 is based on combined teachings of the references (MPEP 2145). In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, Bojja teaches generating tags associated with images (emoji) (¶123), storing the tags (¶180), obtaining tax input (¶133), and selecting images based on a comparison between text and associated tags (¶112, ¶133). An discloses extracting a subject from an image and generating a second image based on the extracted subject (e.g. col. 8 lines 55-60). The rejection properly relies on An for the subject extraction and image generating limitations, and on Bojja for the tagging, storage, comparison, and selecting limitations. Applicants argument that neither reference alone discloses all limitations are therefore unpersuasive.
Applicants assertion that the combination requires structural modification or change of principle of operations also not persuasive. Bojja’s core functionality is selecting and providing images based on semantic correspondence with the user input. Incorporating images generated via subject extraction as topped by An merely provides an additional or alternative source of image content and does not alter the fundamental operation of Bojja’s system. The combination represents a predictable use of prior R elements according to their established function.
Further, the claim recites “image” broadly and does not exclude emoji or other graphical assets. Emoji are images and substituting one type of image (e.g. predefined emoji) with another (e.g. extracted subject images” is a routine design choice well within level ordinary skill in the art. 
Regarding the limitation that tags are generated “based on the subject”, Bojja discloses tags that describe what an image represents (¶123), and An discloses extracting the subject of an image. It would have been obvious that tags describing an image correspond to the subject of that image, including where the image is generated based on an extracted subject. Accordingly the combination of Bojja and An teaches or render obvious all limitations to claim one and the rejection of claim 1 is maintained.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEVIN KY/Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Show 1 earlier event
Sep 19, 2025
Non-Final Rejection mailed — §101, §102, §103
Dec 10, 2025
Interview Requested
Dec 18, 2025
Applicant Interview (Telephonic)
Dec 18, 2025
Examiner Interview Summary
Dec 19, 2025
Response Filed
Jan 20, 2026
Interview Requested
Apr 13, 2026
Final Rejection mailed — §101, §102, §103
May 28, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/663,734
Patent 12639357
DYNAMIC CONVERSATION INSIGHTS USING LARGE LANGUAGE MODELS
2y 0m to grant Granted May 26, 2026
18/216,898
Patent 12632925
PARAMETER-EFFICIENT AND RESOLUTION-ROBUST NETWORK ARCHITECTURES FOR IMAGE-TO-IMAGE TRANSLATION
2y 10m to grant Granted May 19, 2026
17/334,959
Patent 12620214
METHOD AND APPARATUS FOR EMPLOYING DEEP LEARNING NEURAL NETWORK TO PREDICT CROPLAND DATA LAYER
4y 11m to grant Granted May 05, 2026
18/406,478
Patent 12614033
DEVICE AND COMPUTER-IMPLEMENTED METHOD FOR OPERATING A KNOWLEDGE BASE COMPRISING A LANGUAGE MODEL
2y 3m to grant Granted Apr 28, 2026
17/676,432
Patent 12597158
POSE ESTIMATION
4y 1m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+25.6%)
2y 6m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 560 resolved cases by this examiner. Grant probability derived from career allowance rate.
ADAPTIVE SUGGESTIONS FOR STICKERS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email