Last updated: April 19, 2026
Application No. 18/392,407
Image Generation with Encoding Semi-structured Multimodal Entity Signature

Final Rejection §103
Filed
Dec 21, 2023
Examiner
SALVUCCI, MATTHEW D
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
2 (Final)
Interview Optional

— +28.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 485 resolved cases, 2023–2026
Examiner Intelligence

SALVUCCI, MATTHEW D View full profile →
Grants 72% — above average
Career Allow Rate
348 granted / 485 resolved
+9.8% vs TC avg
Strong +28% interview lift
Without
With
+28.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
17 currently pending
Career history
502
Total Applications
across all art units
Statute-Specific Performance

§101
4.6%
-35.4% vs TC avg
§103
60.8%
+20.8% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
14.3%
-25.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 485 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Applicant's amendments filed on 26 January 2026 have been entered.  Claims 1, 7, 14, and 20 have been amended.  Claims 5 and 18 have been canceled.  Claims 21 and 22 have been added.  Claims 1-4, 6-17, and 19-22 are still pending in this application, with claims 1, 14, and 20 being independent.

Response to Arguments
Applicant’s arguments with respect to claims 1, 14, and 20 have been considered but are moot because the new ground of rejection does not rely on every reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 7, 9-17, and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Jindal et al. (US Pub. 2025/0022186), hereinafter Jindal, in view of Kanefield et al. (US Patent 11893433), hereinafter Kanefield.
Regarding claim 1, Jindal discloses a method, comprising: receiving, by one or more processors, a signature associated with an entity, wherein the signature includes a plurality of different types of signature elements (Paragraph [0070]: according to some aspects, image generation apparatus 500 obtains a prompt that includes a description of a typographic characteristic of text. In some aspects, the typographic characteristic includes at least one of a font, a text size, a text justification, and a color. In some aspects, the prompt includes a visual description of the image and a description of a location of the text within the image. In some examples, image generation apparatus 500 receives the prompt from a user via a graphical user interface. In some examples, image generation apparatus 500 provides the image to the user via the graphical user interface in response to receiving the prompt; Paragraph [0040]: a “typographic characteristic” refers to one or more of a location of text within an image, a font of the text, a size of the text, a justification of the text, a color of the text, or a style for the text (e.g., an identification of the text as being a heading, a subheading, a body, a title, etc); storing, at a memory in communication with the one or more processors, the signature as a data set (Paragraph [0066]: processor unit 505 is configured to operate a memory array using a memory controller. In other cases, a memory controller is integrated into processor unit 505. In some cases, processor unit 505 is configured to execute computer-readable instructions stored in memory unit 510 to perform various functions; Paragraph [0071]: Text recognition component 515 is an example of, or includes aspects of, the corresponding element described with reference to FIG. 11. According to some aspects, text recognition component 515 is implemented as software stored in memory unit 510 and executable by processor unit 505, as firmware, as one or more hardware circuits, or as a combination thereof; Paragraph [0096]: image generation network 530 comprises image generation parameters stored in memory unit 510. According to some aspects, image generation network 530 is implemented as software stored in memory unit 510 and executable by processor unit 505, as firmware, as one or more hardware circuits, or as a combination thereof); receiving, by the one or more processors, from the memory, a request for an image from a requestor, wherein the request includes the signature and specifications for the image (Fig. 3; Fig. 4; Paragraphs [0062]-[0063]: Referring to FIG. 4, an image generation network (such as the image generation network described with reference to FIG. 5) generates image 405 including text (the words “Ollyer Teas”) and a typographic characteristic (an italic font) specified by prompt 400, as well as other visual characteristics (“the logo of a palm tree”) specified by prompt 400. As shown in FIG. 4, the text and the visual characteristics of image 405 are arranged in a realistic and visually appealing manner. Prompt 400 is the same as the prompt shown in FIG. 3…the image generation network is typographically aware (e.g., able to generate an image including a specific typographic characteristic) because it has been trained based on training data provided by the image generation apparatus. An example of a process for obtaining the training data and training the image generation network based on the training data is described with reference to FIGS. 10-14); in response to receiving the request: selecting, by the one or more processors, at least one of the plurality of signature elements to incorporate into a response to the request (Fig. 8; Paragraphs [0146]-[0152]: in some cases, a user provides a prompt to the image generation apparatus. In some cases, the image generation apparatus receives the prompt from the user via a user interface (e.g., a graphical user interface) provided on a user device by the image generation apparatus. In some cases, the image generation apparatus retrieves the prompt from a database (such as the database described with reference to FIG. 5) or from another data source (such as the Internet). In some cases, the image generation apparatus retrieves the prompt in response to a user instruction…the typographic characteristic includes at least one of a font, a text size, a text justification, and a color. In some cases, the typographic characteristic includes a text style (e.g., a characteristic of the text that corresponds to an implementation of the text as a heading, a subheading, a body, a title, etc.) for the text…operation 815, the system generates an image that includes the text with the typographic characteristic based on the prompt encoding, where the image is generated using an image generation network that is trained to generate images having specific typographic characteristics. In some cases, the operations of this step refer to, or may be performed by, an image generation network as described with reference to FIGS. 5 and 6; Paragraph [0092]: attention mechanism addresses these difficulties by enabling an ANN to selectively focus on different parts of an input sequence, assigning varying degrees of importance or attention to each part. The attention mechanism achieves the selective focus by considering a relevance of each input element with respect to a current state of the ANN); and generating, by the one or more processors based on the specifications for the image, an image incorporating the selected one or more compatible signature elements (Paragraph [0152]: operation 815, the system generates an image that includes the text with the typographic characteristic based on the prompt encoding, where the image is generated using an image generation network that is trained to generate images having specific typographic characteristics. In some cases, the operations of this step refer to, or may be performed by, an image generation network as described with reference to FIGS. 5 and 6; Paragraphs [0174]-[0178]: a user provides the training image to the training component (for example, via a user interface provided on a user device by an image generation apparatus, such as the image generation apparatus described with reference to FIGS. 1, 5, and 11-13). In some cases, the training component retrieves the training image from a database (such as the database described with reference to FIG. 1) or from another data source (such as the Internet). In some cases, the training component retrieves the training image in response to a user instruction…the training component provides the training image to a multimodal text generation model of a machine learning model (such as the multimodal text generation model described with reference to FIG. 5). In some cases, the multimodal text generation model generates the training description as described with reference to FIGS. 11-13. In some cases, the machine learning model provides the training description to the training component…operation 1015, the system trains the image generation network to generate images having the typographic characteristic based on the loss function. In some cases, the operations of this step refer to, or may be performed by, a training component as described with reference to FIG. 5. For example, in some cases, the training component trains the image generation network as described with reference to FIG. 14…FIG. 11 shows an example of a process for obtaining a decoded word 1150 for a training description according to aspects of the present disclosure. The example shown includes image generation apparatus 1100, training image 1125, visual object encoding 1130, text data 1135, text data encoding 1140, joint embedding space 1145, and decoded word 1150. Image generation apparatus 1100 is an example of, or includes aspects of, the corresponding element described with reference to FIGS. 1, 5, 12, and 13).
Jindal does not explicitly disclose the identity of an entity, and the signature elements are features of the entity's identity; or wherein the selecting comprises identifying which signature elements in the plurality of signature elements are compatible with specifications of the request, and selecting one or more of the compatible signature elements for inclusion in the response.
However, Kanefield teaches image generation from a request (Columns 18-19), further comprising the identity of an entity, and the signature elements are features of the entity's identity (Column 16, lines 59-64: the user may submit or otherwise identify a URL. The embodiment may leverage a third-party tool, such as a tool named Brandfetch located at www.brandfetch.com, to retrieve the hex colors (typically a two-color combination) and logo from the URL of the company; Column 17, lines 30-36: the embodiments may include the ability to cache a URL's parameters so that a URL may preferably only be retrieved once, and then referred back to at a later time(s). In certain embodiments, a brand kit may preferably include identity and design parameters (hex colors, logo) which can be tied together at the account or organization level); and wherein the selecting comprises identifying which signature elements in the plurality of signature elements are compatible with specifications of the request, and selecting one or more of the compatible signature elements for inclusion in the response (Column 2, lines 22-28: memory may be used to store logo selection logic. The logo selection logic may instruct the processor to search the URL to identify a pre-determined minimum number of color logos in a specified color band. The number of color logos in the color band may be determined based on a pre-set value, based on artificial intelligence derived from legacy selections, and/or based on a human input number; Column 8, lines 20-30: platform may generate an environmental zone having a user selectable aesthetic appearance. For example the platform may generate numerous QR-code choices from among which the user may select. The platform may generate an environmental zone that includes aspects defined by aspects defined by machine generated design choices; Column 17, lines 30-36: the embodiments may include the ability to cache a URL's parameters so that a URL may preferably only be retrieved once, and then referred back to at a later time(s). In certain embodiments, a brand kit may preferably include identity and design parameters (hex colors, logo) which can be tied together at the account or organization level). Kanefield teaches that this will allow for user to have control in creating visually pleasing codes that enhance brand identity (Column 18).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jindal with the features of above as taught by Kanefield so as to allow for user to have control in creating visually pleasing codes that enhance brand identity as presented by Kanefield.
Regarding claim 2, Jindal, in view of Kanefield teaches the method of claim 1, Jindal discloses wherein the signature is a semi-structured set of data composed of multimodal inputs (Paragraph [0072]: text recognition component 515 performs text recognition on a training image to obtain text data, where the multimodal text generation model 535 takes the text data as an input. According to some aspects, text recognition component 515 is configured to perform text recognition on the training image to obtain text data, wherein the training description is generated based on the text data. In some cases, the text data comprises at least one of a word or words comprised in a text, a location of the text, a font of the text, and a text style of the text. In some cases, the text data is comprised in text recognition token. In some cases, text recognition component 515 outputs the text recognition token comprising the text data in response to the text recognition; Paragraph [0166]: examples of the method further include generating the training description of the training image using a multimodal text generation model based on the training image. Some examples of the method further include performing text recognition on the training image to obtain text data, wherein the multimodal text generation model takes the text data as an input).
Regarding claim 3, Jindal, in view of Kanefield teaches the method of claim 2, Jindal discloses wherein the signature is defined from inputs from a user (Paragraphs [0141]-[0143]: U-Net 700 receives additional input features to produce a conditionally generated output. In some cases, the additional input features include a vector representation of an input prompt (such as the prompt described with reference to FIG. 4). In some cases, the additional input features are combined with intermediate features 715 within U-Net 700 at one or more layers. For example, in some cases, a cross-attention module is used to combine the additional input features and intermediate features 715…examples of the method further include obtaining a noise image. Some examples further include removing noise from the noise image based on the prompt encoding to obtain the image. Some examples of the method further include receiving the prompt from a user via a graphical user interface. Some examples further include providing the image to the user via the graphical user interface in response to receiving the prompt).
Regarding claim 4, Jindal, in view of Kanefield teaches the method of claim 2, Jindal discloses wherein the signature is defined from inputs from artificial intelligence techniques (Paragraph [0090]: attention mechanism is a key component in some ANN architectures, particularly ANNs employed in natural language processing (NLP) and sequence-to-sequence tasks, that allows an ANN to focus on different parts of an input sequence when making predictions or generating output. NLP refers to techniques for using computers to interpret or generate natural language. In some cases, NLP tasks involve assigning annotation data such as grammatical information to words or phrases within a natural language expression. Different classes of machine-learning algorithms have been applied to NLP tasks. Some algorithms, such as decision trees, utilize hard if-then rules. Other systems use neural networks or statistical models which make soft, probabilistic decisions based on attaching real-valued weights to input features. These models can express the relative probability of multiple answers; Paragraph [0139]: U-Net 700 receives input features 705, where input features 705 include an initial resolution and an initial number of channels, and processes input features 705 using an initial neural network layer 710 (e.g., a convolutional network layer) to produce intermediate features 715. In some cases, intermediate features 715 are then down-sampled using a down-sampling layer 720 such that down-sampled features 725 have a resolution less than the initial resolution and a number of channels greater than the initial number of channels; Paragraph [0179]: Supervised learning is one of three basic machine learning paradigms, alongside unsupervised learning and reinforcement learning. Supervised learning is a machine learning technique based on learning a function that maps an input to an output based on example input-output pairs. Supervised learning generates a function for predicting labeled data based on labeled training data consisting of a set of training examples. In some cases, each example is a pair consisting of an input object (typically a vector) and a desired output value).
Regarding claim 6, Jindal, in view of Kanefield teaches the method of claim 1, Jindal discloses wherein the selecting comprises selecting less than all of the signature elements to be incorporated into the response for the request (Fig. 8; Paragraph [0092]: attention mechanism addresses these difficulties by enabling an ANN to selectively focus on different parts of an input sequence, assigning varying degrees of importance or attention to each part. The attention mechanism achieves the selective focus by considering a relevance of each input element with respect to a current state of the ANN; Paragraph [0024]: Prompt-based image generation via a machine learning model is more efficient and less laborious and time-consuming than manual creation of an image by a user. However, conventional image generation models (such as diffusion models and generative adversarial networks) are not able to generate an image that accurately depicts a description of a typographic characteristic included in a prompt because they have not been trained to do so. The lack of appropriate training to provide a typographically aware image generation model is due to a lack of suitable training data for the conventional image generation models to learn from; Paragraph [0137]: cross-attention block then normalizes the attention scores to obtain attention weights (for example, using a softmax function), where the attention weights determine how much information from each value element is incorporated into the final attended representation. By attending to different parts of the key-value sequence simultaneously, the cross-attention block captures relationships and dependencies across the input sequences, allowing reverse diffusion process 640 to better understand the context and generate more accurate and contextually relevant outputs; Paragraphs [0146]-[0152]: in some cases, a user provides a prompt to the image generation apparatus. In some cases, the image generation apparatus receives the prompt from the user via a user interface (e.g., a graphical user interface) provided on a user device by the image generation apparatus. In some cases, the image generation apparatus retrieves the prompt from a database (such as the database described with reference to FIG. 5) or from another data source (such as the Internet). In some cases, the image generation apparatus retrieves the prompt in response to a user instruction…the typographic characteristic includes at least one of a font, a text size, a text justification, and a color. In some cases, the typographic characteristic includes a text style (e.g., a characteristic of the text that corresponds to an implementation of the text as a heading, a subheading, a body, a title, etc.) for the text…operation 815, the system generates an image that includes the text with the typographic characteristic based on the prompt encoding, where the image is generated using an image generation network that is trained to generate images having specific typographic characteristics. In some cases, the operations of this step refer to, or may be performed by, an image generation network as described with reference to FIGS. 5 and 6; Paragraph [0199]: the ground-truth caption for training multimodal text generation model 1120 does not describe a font or a font style for the text included in the image, and so the ground-truth caption for the image for training multimodal text generation model 1120 to generate the training description (or the intermediate description) is not suitable as training data for training an image generation network (such as the image generation network described with reference to FIG. 5) to generate a specific typographic characteristic).
Regarding claim 7, Jindal, in view of Kanefield teaches the method of claim 1, Jindal discloses wherein the signature elements associated with the entity include at least one of the following: entity name, visual elements of logo, attributes of the entity (Paragraph [0029]: example of the present disclosure is used in an image generation context. In the example, a user wants to generate a logo for a dentistry practice, where the logo includes the text element “Smile” rendered with a typographic characteristic of a Cooper Std Black font. The user provides the prompt “a dentistry logo, Smile text rendered with font CooperStdBlack” to the image generation system; Paragraph [0070]: according to some aspects, image generation apparatus 500 obtains a prompt that includes a description of a typographic characteristic of text. In some aspects, the typographic characteristic includes at least one of a font, a text size, a text justification, and a color. In some aspects, the prompt includes a visual description of the image and a description of a location of the text within the image).
Regarding claim 9, Jindal, in view of Kanefield teaches the method of claim 1, Jindal discloses wherein the request further incorporates at least one target emotion (Paragraphs [0225]-[0226]: text combination model 1310 receives intermediate description 1325 and one or more of text encoding(s) 1330, location encoding(s) 1335, font encoding(s) 1340, and text style encoding(s) 1345 for the training image as input and generates training description 1350 in response, where training description 1350 includes font and/or text style information for one or more text objects included in the training image. For example, as shown in FIG. 13, training description 1350 comprises “a man and a dog working on a desk, LOCAL BUSINESSES rendered with a formal font as a title on the top right combined with subheading SUPPORT NETWORK in clean font”, where “formal font” and “clean font” respectively comprise font information for text objects LOCAL BUSINESSES and SUPPORT NETWORK, and “title” and “subheading” respectively comprise text style information for the text objects. In some cases, “font information” refers to a characteristic associated with a font included in the training image. In some cases, text combination model 1310 determines the font information based on an encoding of a font tag included in font encoding… text combination model 1310 is trained to combine intermediate description 1325 with font information provided by font encoding(s) 1340 and text style encoding(s) 1345 to generate one or more preliminary training descriptions, where training description 1350 is generated based on the one or more preliminary training descriptions. Examples of preliminary training descriptions that can be generated based on templates include “<a text object included in intermediate description 1325> is rendered with font <font provided in font encoding(s) 1340 for the text object>”, “<a text object included in intermediate description 1325> is rendered with a <font style information (such as clean, comic, formal, etc.) associated with font provided in font encoding(s) 1340 for the text object> font”, “<a text object included in intermediate description 1325> is used as <text style provided in text style encoding for the text object>”, and the like, which serve as predetermined few-shot learning inputs for text combination model).
Regarding claim 10, Jindal, in view of Kanefield teaches the method of claim 1, Jindal discloses wherein the signature elements are ranked by level of importance, and selecting the at least one signature element is based on the ranking of the signature elements (Paragraphs [0090]-[0093]: attention mechanism is a key component in some ANN architectures, particularly ANNs employed in natural language processing (NLP) and sequence-to-sequence tasks, that allows an ANN to focus on different parts of an input sequence when making predictions or generating output. NLP refers to techniques for using computers to interpret or generate natural language. In some cases, NLP tasks involve assigning annotation data such as grammatical information to words or phrases within a natural language expression. Different classes of machine-learning algorithms have been applied to NLP tasks. Some algorithms, such as decision trees, utilize hard if-then rules. Other systems use neural networks or statistical models which make soft, probabilistic decisions based on attaching real-valued weights to input features. These models can express the relative probability of multiple answers… attention scores are transformed into attention weights through a normalization process, such as applying a softmax function. The attention weights represent the contribution of each input element to the overall attention. The attention weights are used to compute a weighted sum of the input elements, resulting in a context vector. The context vector represents the attended information or the part of the input sequence that the ANN considers most relevant for the current step. The context vector is combined with the current state of the ANN, providing additional information and influencing subsequent predictions or decisions of the ANN; Paragraph [0104]: self-attention mechanism can capture relationships between words of a sequence by assigning attention weights to each word based on a relevance to other words in the sequence, thereby enabling the transformer to model dependencies regardless of a distance between words; Paragraph [0137]: cross-attention block then normalizes the attention scores to obtain attention weights (for example, using a softmax function), where the attention weights determine how much information from each value element is incorporated into the final attended representation. By attending to different parts of the key-value sequence simultaneously, the cross-attention block captures relationships and dependencies across the input sequences, allowing reverse diffusion process 640 to better understand the context and generate more accurate and contextually relevant outputs).
Regarding claim 11, Jindal, in view of Kanefield teaches the method of claim 1, Jindal discloses further comprising providing the generated image to the requestor (Paragraph [0055]: operation 215, the system provides the image to the user. In some cases, the operations of this step refer to, or may be performed by, an image generation apparatus as described with reference to FIGS. 1, 5, and 11-13. For example, in some cases, the image generation apparatus provides the image to the user via the user interface provided on the user device by the image generation apparatus).
Regarding claim 12, Jindal, in view of Kanefield teaches the method of claim 11, Jindal discloses wherein the generated image does not require additional input from the requestor (Fig. 2; Fig. 3; Paragraphs [0056]-[0057]: FIG. 3 shows an example of comparative generated images. The example shown includes prompt 300, first set of comparative images 305, and second set of comparative images 310. Prompt 300 is an example of, or includes aspects of, the corresponding element described with reference to FIG. 4… Referring to FIG. 3, first set of comparative images 305 includes images generated by a comparative text-based diffusion model based on prompt 300 and second set of comparative images 310 includes images generated by a comparative transformer-based text-to-image machine learning model based on prompt 300. As shown in FIG. 3, each of first set of comparative images 305 and second set of comparative images 310 fails to include an image including text and a typographic characteristic specified by prompt 300).
Regarding claim 13, Jindal, in view of Kanefield teaches the method of claim 11, Jindal discloses wherein the generated image does not require subsequent or repeat requests for additional images to be generated from the requestor (Paragraph [0060]: existing training data sets for comparative text-based image generation machine learning models do not include either training descriptions that are sufficiently descriptive of typographic characteristics of training images or a sufficiently large number of training description and training image pairs for the comparative text-based image generation machine learning models to learn to be typographically aware).
Regarding claim 14, the limitations of this claim substantially correspond to the limitations of claim 1; thus they are rejected on similar grounds.
Regarding claim 15, the limitations of this claim substantially correspond to the limitations of claim 2; thus they are rejected on similar grounds.
Regarding claim 16, the limitations of this claim substantially correspond to the limitations of claim 3; thus they are rejected on similar grounds.
Regarding claim 17, the limitations of this claim substantially correspond to the limitations of claim 4; thus they are rejected on similar grounds.
Regarding claim 19, the limitations of this claim substantially correspond to the limitations of claim 6; thus they are rejected on similar grounds.
Regarding claim 20, the limitations of this claim substantially correspond to the limitations of claim 1; thus they are rejected on similar grounds.
Regarding claim 21, Jindal, in view of Kanefield teaches the method of claim 1, Kanefield discloses wherein generating the image incorporating the selected compatible signature elements comprises changing the selected compatible signature elements to align with the request (Column 10, lines 12-23: software platform may prompt the user to make changes to user-selections of URL(s). The software platform may suggest alternatives to the user's URL(s). The software platform may suggest changes based on defining an aesthetic appearance associated with the user-selected URL(s). The software platform may allow the user to override rejection of the URL(s). For example, the user may wish to generate a QR code having a desired aesthetic appearance, even though the software platform has determined that the QR code(s) derived from the selected URL will not include a target error correction level).
Regarding claim 22, the limitations of this claim substantially correspond to the limitations of claim 21; thus they are rejected on similar grounds.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Jindal, in view of Kanefield, and further in view of Isaacson et al. (US Pub. 2023/0360109), hereinafter Isaacson.
Regarding claim 8, Jindal, in view of Kanefield discloses the method of claim 1.
While Jindal teaches wherein the request further incorporates a message from a company (Paragraph [0029]: example of the present disclosure is used in an image generation context. In the example, a user wants to generate a logo for a dentistry practice, where the logo includes the text element “Smile” rendered with a typographic characteristic of a Cooper Std Black font. The user provides the prompt “a dentistry logo, Smile text rendered with font CooperStdBlack” to the image generation system), Jindal, in view of Kanefield does not explicitly disclose wherein the request further incorporates a marketing message.
However, Isaacson teaches multimodal image generation (Paragraphs [0053]-[0056]), further comprising wherein the request further incorporates a marketing message (Fig. 5; Paragraph [0210]: FIG. 5 next shows that the system constructs a drop-down menu (508) or a presentation of various options or object at any location in the user interface. This construction can also include a marketing aspect as companies may pay for how the option is presented. Amazon.com, or a product manufacturer, can pay a small fee to present their product with graphics or multimedia content, if it appears that the user may desire to buy that product, in order to encourage the user to select that option to purchase the product. The system presents the menu or other structured presentation of options for the user to choose (510). The options include one or more purchasing options (512) when the user input indicates via the algorithm that a purchase may be desired). Isaacson teaches that this will allow for the requestor to specify the image presentation (Paragraph [0210]).Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jindal, in view of Kanefield with the features of above as taught by Isaacson so as to allow for the requestor to specify the image presentation as presented by Isaacson.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW D SALVUCCI whose telephone number is (571)270-5748. The examiner can normally be reached M-F: 7:30-4:00PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XIAO WU can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW SALVUCCI/Primary Examiner, Art Unit 2613
Read full office action
Prosecution Timeline

Dec 21, 2023
Application Filed
Oct 23, 2025
Non-Final Rejection — §103
Jan 14, 2026
Applicant Interview (Telephonic)
Jan 14, 2026
Examiner Interview Summary
Jan 26, 2026
Response Filed
Mar 05, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/267,368
Patent 12597198
RAY TRACING METHOD AND APPARATUS BASED ON ATTENTION FOR DYNAMIC SCENES
2y 5m to grant Granted Apr 07, 2026
18/498,919
Patent 12597207
Camera Reprojection for Faces
2y 5m to grant Granted Apr 07, 2026
18/227,616
Patent 12579753
Phased Capture Assessment and Feedback for Mobile Dimensioning
2y 5m to grant Granted Mar 17, 2026
17/463,439
Patent 12561899
Vector Graphic Parsing and Transformation Engine
2y 5m to grant Granted Feb 24, 2026
18/458,942
Patent 12548256
IMAGE PROCESSING APPARATUS FOR GENERATING SURFACE PROFILE OF THREE-DIMENSIONAL GEOMETRIC MODEL, CONTROL METHOD THEREFOR, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+28.5%)
2y 12m
Median Time to Grant
Moderate
PTA Risk
Based on 485 resolved cases by this examiner. Grant probability derived from career allow rate.