Last updated: April 18, 2026
Application No. 18/657,472
DISTRIBUTING PROMPT PROCESSING IN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS

Final Rejection §101§102§103
Filed
May 07, 2024
Examiner
CHUNG, DANIEL WONSUK
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
2 (Final)
This examiner grants 54% of cases after interview

— +37.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 44 resolved cases, 2023–2026
Examiner Intelligence

CHUNG, DANIEL WONSUK View full profile →
Grants 54% of resolved cases
Career Allow Rate
24 granted / 44 resolved
-7.5% vs TC avg
Strong +38% interview lift
Without
With
+37.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
25.2%
-14.8% vs TC avg
§103
52.3%
+12.3% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 44 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 2/20/2026.
Claims 1-19 are pending and have been examined.
All previous objections / rejections not mentioned in this Office Action have been withdrawn by the examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendments
Applicant’s arguments filed on 2/20/2026 have been fully considered but they are not persuasive.  Applicant has amended independent claim 1, 10, and 19.  
Regarding the Applicant’s arguments for the rejections under 35 U.S.C. § 101, applicant asserts that independent claim limitations cannot be performed in the mind and are directed to technological solution to the problems arising in the field of machine learning.  Examiner respectfully disagrees.   During patent examination, pending claims must be “given their broadest reasonable interpretation consistent with the specification.”  MPEP 2111.   Also, claims should not be interpreted by reading limitations of the specification into the claim, to narrow the scope of the claim, by implicitly adding disclosed limitations that have no express basis in the claim language.  In re Prater, 415 F.2d 1393.  Here, the steps in the claim language are broad and examiner interprets the claim broadly.  First, the steps recited in the claim limitation can be performed in the mind.  Specifically, the human mind reading a prompt, dividing the prompt in the mind based on context, thinking of a drawing in the mind based on the prompt segments, and writing on paper the drawing from the mind.  The claim encompasses mental observations or evaluations that can be practically performed in the human mind.  The use of a generative artificial intelligence model can be broadly interpreted as a set of rules or instructions that generate an output according to a prompt input.  The use of layers can be interpreted as sub-rules where a sub-prompt can be routed to a sub-rule in the set of rules to generate an output.  Second, MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception.  Here, the steps to partition an input prompt and route the sub-prompts to different layers of a model to generate a response only recite an outcome and do not include any details of how the steps are accomplished.  The claim utilizes layers, but fails to provide technical details of how the layers are utilized to output a response. Therefore, the claims as currently recited does not overcome the 35 U.S.C. § 101 abstract idea rejection.  
Regarding the Applicant’s arguments for the rejections under 35 U.S.C. § 103, applicant has amended the independent claim language by adding “wherein the generative artificial intelligence model comprises a plurality of layers” and “route the plurality of sub-prompts to different layers of the plurality of layers of the generative artificial intelligence model”.  Here, the “layer” is interpreted as layers in the image generator (Specification P0024) that is part of the generative artificial intelligence model (Specification P0016).  Prior art Brdiczka teaches neural layers as part of the image compiler (Brdiczka P0041) and teaches routing sub-prompts to layers. (Brdiczka P0057).
Therefore, the claims as currently recited does not overcome the prior art reference.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim limitation in claim 19 include the “means for” language and is interpreted under 35 U.S.C. 112(f).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-19 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1, 10, and 19 the limitations of “receive an input prompt for processing using a generative artificial intelligence model, wherein the generative artificial intelligence model comprises a plurality of layers”, “partition the input prompt into a plurality of sub-prompts based on contextual information associated with tokens in the input prompt”, “route the plurality of sub-prompts to different layers of the plurality of layers of the generative artificial intelligence model”, “generate a response to the input prompt using the generative artificial intelligence model based on the plurality of sub-prompts routed to different layers and the contextual information associated with the tokens in the input prompt”, and “output the generated response”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components.  More specifically, the mental process of a human reading a prompt, dividing the prompt in the mind based on context, thinking of a drawing in the mind based on the prompt segments and connecting sub-rules, and writing on paper the drawing from the mind.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas.  Accordingly, the claims recite an abstract idea.
This judicial exception is not integrated into a practical application because the recitation of a system and processor, reads to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using P0040-P0055 in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to read a prompt, divide the prompt in the mind based on context, think of a drawing in the mind based on the prompt segments and connecting sub-rules, and write on paper the drawing from the mind amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.
With respect to claim 2 and 11, the claim recites “the contextual information comprises breadth metrics associated with the tokens in the input prompt”, “to partition the input prompt into the plurality of sub-prompts, the one or more processors are configured to cause the processing system to:”, “associate a respective breadth metric to a respective token from the tokens in the input prompt using a language model”, and “partition the tokens in the input prompt based on respective breadth metrics associated with respective tokens from the tokens in the input prompt”, which reads on a human dividing the prompt according to breadth in the mind.  No additional limitations are present.
	With respect to claim 3 and 12, the claim recites “wherein the respective breadth metric comprises an indication of whether the respective token corresponds to a global concept in the input prompt or one or more local concepts in the input prompt”, which reads on a human dividing the prompt according to global concept or local concept in the mind.  No additional limitations are present.
With respect to claim 4 and 13, the claim recites “wherein to partition the tokens in the input prompt, the one or more processors are configured to cause the processing system to partition the tokens into a set of tokens corresponding to the global concept and one or more sets of tokens corresponding to the one or more local concepts in the input prompt”, which reads on a human dividing the prompt according to global concept or local concept in the mind.  No additional limitations are present.
With respect to claim 5 and 14, the claim recites “the contextual information comprises temporal embeddings associated with the tokens in the input prompt” and “to partition the input prompt into the plurality of sub-prompts, the one or more processors are configured to cause the processing system to partition the tokens in the input prompt into groups of temporally related tokens based on the temporal embeddings”, which reads on a human dividing the prompt according to temporal embedding in the mind.  No additional limitations are present.
With respect to claim 6 and 15, the claim recites “wherein the generative artificial intelligence model includes a gating mechanism configured to perform the routing based on the contextual information associated with the tokens in the input prompt”, which reads on a human dividing the prompt according to calculations done the mind.  No additional limitations are present.
With respect to claim 7 and 16, the claim recites “wherein the gating mechanism comprises an attention layer in the generative artificial intelligence model, the attention layer comprising:”, “a first projection block that projects the contextual information to key data and value data”, “a second projection block that projects the tokens in the input prompt to query data”, “a multi-head attention block that generates an attention output based on the key data, the value data, and the query data”, and “a nonlinear layer that generates an attention mask based on the attention output, the attention mask being combined with the tokens in the input prompt to generate a masked set of tokens as an output of the gating mechanism”, which reads on a human dividing the prompt cased on important words or segments that are found through calculations done in the mind .  No additional limitations are present.
With respect to claim 8 and 17, the claim recites “wherein the generated response comprises an image depicting one or more objects specified by the input prompt”, which reads on a human thinking of an image of an object specified in the read prompt.  No additional limitations are present.
With respect to claim 9 and 18, the claim recites “wherein the generative artificial intelligence model comprises a text-to-image diffusion model configured to generate an image output from a textual input”, which reads on a human thinking of an image from the read prompt.  No additional limitations are present.
These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-6, 8-15, and 17-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Brdiczka et al. (U.S. PG Pub No. 20240127511), hereinafter Brdiczka.

Regarding claim 1 and 10 Brdiczka teaches:
(Claim 1) A processing system, comprising: at least one memory having executable instructions stored thereon; and one or more processors coupled to the at least one memory and configured to execute the executable instructions in order to cause the processing system to: (P0018, Target scene generation system that creates composites of images and/or generates images into a structured scene using a prompt.; P0090, The components and their corresponding elements can comprise one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices.)
(Claim 10) A processor-implemented method for machine learning, comprising: (P0090, When executed by the one or more processors, the computer-executable instructions of the target scene generation system can cause a client device and/or a server device to perform the methods described.)
receive an input prompt for processing using a generative artificial intelligence model, wherein the generative artificial intelligence model comprises a plurality of layers; (P0019, Present disclosure combines natural language processing to analyze and decompose textual descriptions, textual image operations to define a composition, and generative AI to automatically create a composite image with desired styles, visual elements, and image operations.; P0022, The target scene generation system receives an input. The input is a prompt, or a textual description containing 1) image operations, 2) a sentence with a grammatical structure that reflects composition and/or 3) objects and subjects. A desired composition (otherwise referred to herein as a target scene or a description of an image to be generated) is described using the prompt in a natural language format.; P0041, Control elements may translate to specific neural layers (corresponding to generated images determined using generative AI module).)
partition the input prompt into a plurality of sub-prompts based on contextual information associated with tokens in the input prompt; (P0025, The input extractor identifies and parses out information defining visual elements of a target scene (e.g., an arrangement/composition of objects in an image). For example, such parsed information from a natural language description of the image (e.g., input) can include identifying control language and sub-prompts from a received prompt.; P0059, The structural analyzer determines groupings based on the relationships of words in the remaining prompt. Each grouping groups properties of a noun (or subject/object). As described herein, groupings may also include (or otherwise be associated with) control language (such as a frame control element).; P0060, The structural analyzer of the input extractor parses the remaining prompt to isolate sub-prompts for visual element generation. Sub-prompts identified by the structural analyzer include “pirate”, “hate”, and “parrot” and such sub-prompts are generated into images or visual elements using the generative AI module. As illustrated in 302, a group is formed with “pirate” “wearing” hat.” As described herein, the image compiler receives such groupings to group the “pirate” and “hat” visual elements together in a manner such that the generated pirate visual element has a relationship with (e.g., is wearing) the generated hat visual element.)
route the plurality of sub-prompts to different layers of the plurality of layers of the generative artificial intelligence model; (P0057, Neural compositioning is used to generate neural images for a subject of a composition, and a neural layer for each related object. The relationships between the nouns in the remaining prompt (e.g., subject and related objects) specify the neural image (subject) and the neural layers that are applied to it (the objects). In an example, the structural analyzer parses out “pirate” (a neural image because the pirate is the subject), “hat” (a neural layer because the hat is an object related to the pirate), and “parrot” (another neural layer because the parrot is an object related to the pirate).)
generate a response to the input prompt using the generative artificial intelligence model based on the plurality of sub-prompts routed to different layers and the contextual information associated with the tokens in the input prompt; and (P0031, The generative AI module generates a visual element (e.g., an object or a subject) of a target scene using the information received from the image orchestrator (e.g., each of the sub-prompts, features, and/or control elements). The generative AI module generates the image using the control elements, sub-prompts, and a relationship identified between the control elements and sub-prompts (determined using the input extractor as described herein) obtained from the natural language description of the image (e.g., input).)
output the generated response. (P0042, The image compiler provides the target scene as output. … The output is displayed on one or more user devices and/or communicated to one or more downstream devices (e.g., servers, applications, systems, processors, or some combination).)

	Regarding claim 2 and 11 Brdiczka teach claim 1 and 10.
	Brdiczka further teaches:
	the contextual information comprises breadth metrics associated with the tokens in the input prompt; and to partition the input prompt into the plurality of sub-prompts, the one or more processors are configured to cause the processing system to: to partition the input prompt into the plurality of sub-prompts, the one or more processors are configured to cause the processing system to: associate a respective breadth metric to a respective token from the tokens in the input prompt using a language model; and partition the tokens in the input prompt based on respective breadth metrics associated with respective tokens from the tokens in the input prompt. (P0025, Parsed information from a natural language description of the image (e.g., input) can include identifying control language and sub-prompts from a received prompt.; P0047, The input extractor identifies and extracts control language from a prompt. Control language is identified using the control language identifier. Control language may refer to language describing control elements which are visual elements such as borders (e.g., frames), shapes (e.g., circles), vignettes, one or more effects (e.g., double exposure), and the like. Control language may also refer to language describing a composition of a scene.; P0052, The input extractor executes a structural analyzer to identify sub-prompts (including subjects, related objects, and properties (e.g., adjectives)). In some embodiments, the structural analyzer is executed on a remaining prompt (e.g., an input with parsed out control language).)

Regarding claim 3 and 12 Brdiczka teach claim 2 and 11.
	Brdiczka further teaches:
wherein the respective breadth metric comprises an indication of whether the respective token corresponds to a global concept in the input prompt or one or more local concepts in the input prompt. (P0025, Parsed information from a natural language description of the image (e.g., input) can include identifying control language and sub-prompts from a received prompt.; P0047, The input extractor identifies and extracts control language from a prompt. Control language is identified using the control language identifier. Control language may refer to language describing control elements which are visual elements such as borders (e.g., frames), shapes (e.g., circles), vignettes, one or more effects (e.g., double exposure), and the like. Control language may also refer to language describing a composition of a scene.; P0052, The input extractor executes a structural analyzer to identify sub-prompts (including subjects, related objects, and properties (e.g., adjectives)). In some embodiments, the structural analyzer is executed on a remaining prompt (e.g., an input with parsed out control language).)

Regarding claim 4 and 13 Brdiczka teach claim 3 and 12.
	Brdiczka further teaches:
	wherein to partition the tokens in the input prompt, the one or more processors are configured to cause the processing system to partition the tokens into a set of tokens corresponding to the global concept and one or more sets of tokens corresponding to the one or more local concepts in the input prompt. (P0025, Parsed information from a natural language description of the image (e.g., input) can include identifying control language and sub-prompts from a received prompt.; P0047, The input extractor identifies and extracts control language from a prompt. Control language is identified using the control language identifier. Control language may refer to language describing control elements which are visual elements such as borders (e.g., frames), shapes (e.g., circles), vignettes, one or more effects (e.g., double exposure), and the like. Control language may also refer to language describing a composition of a scene.; P0052, The input extractor executes a structural analyzer to identify sub-prompts (including subjects, related objects, and properties (e.g., adjectives)). In some embodiments, the structural analyzer is executed on a remaining prompt (e.g., an input with parsed out control language).)

Regarding claim 5 and 14 Brdiczka teach claim 1 and 10.
	Brdiczka further teaches:
	the contextual information comprises temporal embeddings associated with the tokens in the input prompt; and to partition the input prompt into the plurality of sub-prompts, the one or more processors are configured to cause the processing system to partition the tokens in the input prompt into groups of temporally related tokens based on the temporal embeddings. (P0051, The control language identifier may derive groupings using any suitable grouping technique (e.g., any natural language processing technique, any clustering technique, etc.). For example, the control language identifier executes a natural language toolkit (NLTK) to identify grammatical relationships of the prompt. The identified grammatical relationships group related information of the prompt. The control language identifier may indicate such groupings to the image orchestrator. As a result, the image orchestrator passes the groupings to the image compiler such that a target scene is generated that preserves the structure of the prompt (e.g., input). The groupings preserve the relationship of the objects in the group such that visual elements (corresponding to the objects of the group) are arranged by the image compiler according to the group. For example, a prompt describing “pirate wearing a hat with a parrot on the shoulder frame rectangle 60 20 colors” may result in the control language identifier deriving a frame grouping. The frame grouping groups the frame with the parameters of the frame (e.g., ‘60’, ‘20’, and ‘colors’). Additionally or alternatively, the control language identifier groups the frame with a subject of the image (e.g., a pirate). As described herein, the structural analyzer may also derive groups. For example, a pirate is grouped with a hat and a parrot. Accordingly, one subject (e.g., a pirate) may be in two groups (e.g., a frame, and a group of pirate objects). Alternatively, one group may include both control elements and subject/object relationships (determined by the structural analyzer). As described herein, the image compiler may perform image operations on such groupings.)

Regarding claim 6 and 15 Brdiczka teach claim 1 and 10.
	Brdiczka further teaches:
wherein the generative artificial intelligence model includes a gating mechanism configured to perform the routing based on the contextual information associated with the tokens in the input prompt. (P0057, Neural compositioning is used to generate neural images for a subject of a composition, and a neural layer for each related object. The relationships between the nouns in the remaining prompt (e.g., subject and related objects) specify the neural image (subject) and the neural layers that are applied to it (the objects). In an example, the structural analyzer parses out “pirate” (a neural image because the pirate is the subject), “hat” (a neural layer because the hat is an object related to the pirate), and “parrot” (another neural layer because the parrot is an object related to the pirate).)

Regarding claim 8 and 17 Brdiczka teach claim 1 and 10.
	Brdiczka further teaches:
	wherein the generated response comprises an image depicting one or more objects specified by the input prompt. (P0031, The generative AI module generates a visual element (e.g., an object or a subject) of a target scene using the information received from the image orchestrator (e.g., each of the sub-prompts, features, and/or control elements). The generative AI module generates the image using the control elements, sub-prompts, and a relationship identified between the control elements and sub-prompts (determined using the input extractor as described herein) obtained from the natural language description of the image (e.g., input).)

Regarding claim 9 and 18 Brdiczka teach claim 1 and 10.
	Brdiczka further teaches:
	wherein the generative artificial intelligence model comprises a text-to-image diffusion model configured to generate an image output from a textual input. (0067, The generative AI can be performed using any suitable mechanism. In some embodiments, such generative AI is performed using a diffusion model.)

Regarding claim 19 Brdiczka teaches:
	A processing system, comprising: (P0018, Target scene generation system that creates composites of images and/or generates images into a structured scene using a prompt.)
	means for receiving an input prompt for processing using a generative artificial intelligence model; (P0019, Present disclosure combines natural language processing to analyze and decompose textual descriptions, textual image operations to define a composition, and generative AI to automatically create a composite image with desired styles, visual elements, and image operations.; P0022, The target scene generation system receives an input. The input is a prompt, or a textual description containing 1) image operations, 2) a sentence with a grammatical structure that reflects composition and/or 3) objects and subjects. A desired composition (otherwise referred to herein as a target scene or a description of an image to be generated) is described using the prompt in a natural language format.)
	means for partitioning the input prompt into a plurality of sub-prompts based on contextual information associated with tokens in the input prompt; (P0025, The input extractor identifies and parses out information defining visual elements of a target scene (e.g., an arrangement/composition of objects in an image). For example, such parsed information from a natural language description of the image (e.g., input) can include identifying control language and sub-prompts from a received prompt.; P0059, The structural analyzer determines groupings based on the relationships of words in the remaining prompt. Each grouping groups properties of a noun (or subject/object). As described herein, groupings may also include (or otherwise be associated with) control language (such as a frame control element).; P0060, The structural analyzer of the input extractor parses the remaining prompt to isolate sub-prompts for visual element generation. Sub-prompts identified by the structural analyzer include “pirate”, “hate”, and “parrot” and such sub-prompts are generated into images or visual elements using the generative AI module. As illustrated in 302, a group is formed with “pirate” “wearing” hat.” As described herein, the image compiler receives such groupings to group the “pirate” and “hat” visual elements together in a manner such that the generated pirate visual element has a relationship with (e.g., is wearing) the generated hat visual element.)
	means for generating a response to the input prompt using the generative artificial intelligence model based on the plurality of sub-prompts and the contextual information associated with the tokens in the input prompt; and (P0031, The generative AI module generates a visual element (e.g., an object or a subject) of a target scene using the information received from the image orchestrator (e.g., each of the sub-prompts, features, and/or control elements). The generative AI module generates the image using the control elements, sub-prompts, and a relationship identified between the control elements and sub-prompts (determined using the input extractor as described herein) obtained from the natural language description of the image (e.g., input).)
	means for outputting the generated response. (P0042, The image compiler provides the target scene as output. … The output is displayed on one or more user devices and/or communicated to one or more downstream devices (e.g., servers, applications, systems, processors, or some combination).)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Brdiczka in view of Yan (U.S. PG Pub No. 20220318601), hereinafter Yan.
Regarding claim 7 and 16 Brdiczka teach claim 6 and 15.
Brdiczka does not specifically teach:
wherein the gating mechanism comprises an attention layer in the generative artificial intelligence model, the attention layer comprising:
a first projection block that projects the contextual information to key data and value data;
	a second projection block that projects the tokens in the input prompt to query data;
	a multi-head attention block that generates an attention output based on the key data, the value data, and the query data; and
	a nonlinear layer that generates an attention mask based on the attention output, the attention mask being combined with the tokens in the input prompt to generate a masked set of tokens as an output of the gating mechanism.
	Yan, however, teaches:
wherein the gating mechanism comprises an attention layer in the generative artificial intelligence model, the attention layer comprising: (P0003, Attention mechanism, implemented by a neural network, that generates attention information based on head-specific query information and shared key and value (KV) information.)
a first projection block that projects the contextual information to key data and value data; (Fig. 2, Token embeddings input into linear projection layer to output head-specific key information and head-specific value information.)
a second projection block that projects the tokens in the input prompt to query data; (Fig. 2, Token embeddings input into linear projection layer to output head-specific query information and head-specific value information.; P0027, A query expansion component produces each instance of head-specific query information Q1 by linearly projecting original query information.)
	a multi-head attention block that generates an attention output based on the key data, the value data, and the query data; and (Fig. 2, Multiple heads, head-specific key information, head-specific value information, and head-specific query information fed into attention layer.  )
a nonlinear layer that generates an attention mask based on the attention output, the attention mask being combined with the tokens in the input prompt to generate a masked set of tokens as an output of the gating mechanism. (P0028, A full path (FP) attention probability generation component 118 can then generate a plurality of instances of probability information (p1, p2, . . . , ph) 120, for model dimension d, using the following equation: [equation 1]; P0029, Equation (1) generates a dot product of the head-specific query information Qi and the transpose of the head-specific key information Ki. This effectively identifies the relevance of at least one individual token associated with the original query with each of a plurality of tokens associated with the original key information. Equation (1) scales this product by a scaling factor √{square root over (d)}, to produce a scaled result, and then generates the normalized exponential function (softmax) of the scaled result.; P0062, The self-attention mechanism performs masked self-attention on decoder input information fed to it. The decoder input information, in turn, includes one or more output tokens produced by the decoder 406 (after these tokens have been converted to embeddings in the manner previously described). The self-attention mechanism performs masking so that positions in a sequence after a last-predicted token (which are unknown at this time) do not bias its results.)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to include a gating mechanism that comprises key data, value data, query data, multi-head attention, and nonlinear layer to obtain masked tokens. It would have been obvious to combine the references because the use of key, value, query, and an attention layer is a known technique to yield a predictable result of obtaining relevant tokens for a query.  (Yan P0009, P0029) 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL WONSUK CHUNG whose telephone number is (571)272-1345. The examiner can normally be reached Monday - Friday (7am-4pm)[PT].
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL W CHUNG/Examiner, Art Unit 2659 

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

May 07, 2024
Application Filed
Nov 15, 2025
Non-Final Rejection — §101, §102, §103
Feb 20, 2026
Response Filed
Apr 03, 2026
Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/984,768
Patent 12579471
DATA AUGMENTATION AND BATCH BALANCING METHODS TO ENHANCE NEGATION AND FAIRNESS
2y 5m to grant Granted Mar 17, 2026
17/812,782
Patent 12493892
METHOD AND SYSTEM FOR EXTRACTING CONTEXTUAL PRODUCT FEATURE MODEL FROM REQUIREMENTS SPECIFICATION DOCUMENTS
2y 5m to grant Granted Dec 09, 2025
17/706,303
Patent 12400078
INTERPRETABLE EMBEDDINGS
2y 5m to grant Granted Aug 26, 2025
18/441,766
Patent 12387000
PRIVACY-PRESERVING AVATAR VOICE TRANSMISSION
2y 5m to grant Granted Aug 12, 2025
17/842,986
Patent 12380875
SPEECH SYNTHESIS WITH FOREIGN FRAGMENTS
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
54%
Grant Probability
92%
With Interview (+37.5%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 44 resolved cases by this examiner. Grant probability derived from career allow rate.