Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: an image reception module, a processing module, a storage module, and an image output module in claims 13-14.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 7, 10, 12-14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of Denison (US 2024/0193821).
Regarding claim 1, Green discloses a method for generating an image, the method suitable for generating an artificial intelligence image by means of a computer device (Green, [0050], “a method for modifying an image generated using an image generation artificial intelligence model implementing latent diffusion techniques based on editing of labeled features of the image”. In addition, in paragraph [0074], “FIG. 5 illustrates components of an example device 500 that can be used to perform aspects of the various embodiments of the present disclosure”), the method comprising:
receiving at least one input image (Green, [0027], “the image 101 may include a person 105a holding an object 106a (e.g., a soda can)”);
generating a keyword character set based on the at least one input image (Green, [0028], “the original image 101 is analyzed by the feature extractor 110 that is configured to extract features that have been previously learned…output a hierarchical tree of labeled features 150a for the original image 101, wherein the labeled features can be used by an IGAI model (e.g., as input useful for generating another image). FIG. 1B illustrates an exemplary hierarchical tree 150a including labeled features 1 though N, wherein each of the features can be nested with other features (e.g., features, sub-features, sub-sub-features, etc.)”. The labeled features are textual descriptors of image features read on keyword character set);
performing at least one labeled feature editing operation on the keyword character set based on an editing instruction set corresponding to an editing request after receiving the editing request sent from any one of at least one editing button, and generating an editing character set (Green, [0035], “a label editor 133 that is configured to allow the user to edit and/or modify a labeled feature accessible to the user”. The user’s interaction with the label editor (e.g., via buttons) defines editing instruction that guide how the textual content is modified, therefore, corresponding to an editing instruction set. In addition, in paragraph [0035], “FIG. 1B shows that the user has edited the labeled feature 1 from its original state provided in the hierarchical tree of labeled features 150a, as is shown by the edited labeled feature 1 (151a)”. The user interaction within the label editor (e.g., selection, confirmation) reads on the editing request sent from any one of at least one editing button. Fig. 1B illustrates an editing character set); and
generating the artificial intelligence image based on the editing character set (Green, [0035], “label editor 133 outputs an edited version 150b of the hierarchical tree of labeled features related to the image 101. For example, FIG. 1B shows that the user has edited the labeled feature 1 from its original state provided in the hierarchical tree of labeled features 150a, as is shown by the edited labeled feature 1 (151a). For illustration purposes only, the labeled feature of the object 106a may be edited or modified from being labeled a “can” to a “glass bottle.””. In addition, in paragraph [0038], “the modified version 150c may be in a format that is suitable for performing latent diffusion, such as when the diffusion model 180 internally generates a hierarchical tree of labeled features as a conditioning factor for each internal iteration it performs, such as a latent representation of the modified image 102 at each iteration”. The modified image is considered the artificial intelligence image).
Green is silent with respect to “performing at least one string editing operation”;
Denison discloses performing at least one string editing operation (Denison, [0061], “the user feedback (i.e., prompt 2) can be provided at the text input field as well as the image input field, which are then used to adjust the mouth feature of the dog, as illustrated in box 410a”. The user modifying the predefined text reads on string editing operation).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the user feedback function of Denison into the image modification system of Green in order to modify its text features to generate a modified image. the motivation for doing so would have been allowing user to modify text prompts to generate images to provide more accurate and desirable output.
Regarding claim 2, Green discloses inputting the at least one input image into an image description model (Green, [0028], “the original image 101 is analyzed by the feature extractor 110 that is configured to extract features that have been previously learned…the classifier 120 may be configured as an AI model executed by a deep/machine learning engine that may be configured as a neural network 125. In some embodiments, the feature extractor 110 and classifier 120 are implemented within one or more AI models”); and
outputting the keyword character set by the image description model, wherein the image description model automatically generates the keyword character set corresponding to the at least one input image based on the at least one input image (Green, [0028], “the classifier 120 is configured to output a hierarchical tree of labeled features 150a for the original image 101”).
Regarding claim 3, Green discloses inputting the editing character set into an image generation model (Green, [0039], “to modify the original image 101 based on the edited labeled features, the IGAI processing or diffusion model 180 may act in latent space to perform latent diffusion, based on edited labeled features of the image 101”); and
outputting the artificial intelligence image by the image generation model (Green, [0041], “As such, the diffusion model 180 generates a latent space representation of the image 101, now modified, and after decoding, the decoder 190 outputs the modified image 102”), wherein the image generation model automatically generates the artificial intelligence image corresponding to the editing character set based on the editing character set (Green, [0040], “generated from the encoded and modified version 150d of the hierarchical tree of labeled features) for conditioning (i.e., by the diffusion model 180)”. In addition, in paragraph [0041], “As such, the diffusion model 180 generates a latent space representation of the image 101, now modified, and after decoding, the decoder 190 outputs the modified image 102. This process may be repeated iteratively by the user to achieve a desired and final or resulting image”).
Regarding claim 4, Green discloses automatically generating the at least one editing button according to at least one of contents of the keyword character set (Green, [0034], “The user interface 130 may also include a feature tree presenter 135 configured for generating a feature tree for display”. In addition, in paragraph [0035], “a label editor 133 that is configured to allow the user to edit and/or modify a labeled feature accessible to the user”).
Regarding claim 5, Green discloses determining, based on a stack attribute value individually corresponding to each of the at least one editing, whether to stack the editing operation (Green, [0036-0037], “A modified version 150c of the hierarchical tree of labeled features is then output by the context analyzer 140. For example, FIG. 1B shows the modified version 150c includes modified labeled feature 1 (151b), and modified labeled feature 2 (152a) as well as modified labeled sub-feature 3 (152b) (i.e., a sub-feature of feature 2), and a deleted feature 4 (153). Note that the labeled feature N (154) has not been modified in the modified version 150c of the hierarchical tree of labeled features… the user may wish to add features to the original image 101…in FIG. 1B, the modified version 150c includes the newly added labeled feature N+1 (154), and newly added labeled sub-feature N+2 (155)”).
Green as modified by Denison with the same motivation from claim 1 discloses the at least one editing button (Denison, Fig. 4B) and at least one string editing operation (Denison, [0036], “the user prompt to determine if the user prompt contains text string”).
Regarding claim 7, Green discloses wherein the stack attribute value is set according a user preference setting (Green, [0036], “determine a context within which the user is editing one or more labeled features for the image 101”. In addition, in paragraph [0037], “the user may wish to add features to the original image 101”).
Regarding claim 10, Green discloses receiving object information corresponding to physical dimensions of an object (Green, [0060], “wherein the editing may include changing the cylindrical characteristic of the surface to a bottle or bottle shaped characteristic of the surface”. The cylindrical characteristic of an object is considered object information corresponding to physical dimensions of an object); and
determining dimensions of the artificial intelligence image based on the object information (Green, [0060], “As such, the modified image can be generated based on the editing of one or more labeled features, as previously described”).
Regarding claim 12, Green discloses transmitting the artificial intelligence image to an image output device (Green, [0047], “A decoder 212 then transforms a resulting output from the latent space back to the pixel space. The output 214 may then be processed to improve the resolution. The output 214 is then passed out as the result, which may be an image, graphics, 3D data, or data that can be rendered to a physical form or digital form”), and physically outputting the artificial intelligence image by the image output device (Green, [0057], “the user interface may be configured to display the image and the modified image to the user”).
Regarding claim 13, Green discloses a computer device for generating an image (Green, [0074], “FIG. 5 illustrates components of an example device 500 that can be used to perform aspects of the various embodiments of the present disclosure…a server”), the computer device suitable to be electrically coupled to a user terminal (Green, [0088], “a networked controller and client device can be configured to send certain types of inputs directly from the controller to the cloud game server”), input image (Green, [0051], “the method includes identifying a plurality of features of an image, such as an original image.”), and to generate an output image based on the at least one input image (Green, [0056], “The latent representation of the image is then decoded to generate a high resolution modified image as output”), the computer device comprising:
an image reception module (Green, [0074], “CPU 502 is one or more general-purpose microprocessors having one or more processing cores. Further embodiments can be implemented using one or more CPUs with microprocessor architectures specifically adapted for highly parallel and computationally intensive applications, such as processing operations of interpreting a query, identifying contextually relevant resources, and implementing and rendering the contextually relevant resources in a video game immediately”. The function of image reception module is performed by the processor)
a processing module, configured to be electrically coupled to the image reception module (Green, [0074], “CPU 502 is one or more general-purpose microprocessors having one or more processing cores. Further embodiments can be implemented using one or more CPUs with microprocessor architectures specifically adapted for highly parallel and computationally intensive applications, such as processing operations of interpreting a query, identifying contextually relevant resources, and implementing and rendering the contextually relevant resources in a video game immediately”. The function of the processing module is performed by the processor. Fig. 5 illustrates electrically coupled to the image reception module);
a storage module, configured to be electrically coupled to the processing module, wherein the storage module has a program code stored therein, and wherein the processing module is, after executing the program code stored in the storage module (Green, [0008], “memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to execute a method”. In addition, in paragraph [0077], “The data and/or instructions defining the desired output images can be stored in memory 504 and/or graphics memory 518”. Fig. 5 illustrates the storage module, configured to be electrically coupled to the processing module);
Green as modified by Denison with the same motivation from claim 1 discloses so as to receive at least one input from the user terminal (Denison, [0031], “The client device 100, in some implementations, includes an encoder to encode the user prompt and forward the user prompt over the network 200 to the server 300 for processing”).
The remaining limitations recite in claim 13 are similar in scope to the method recited in claim 1 and therefore are rejected under the same rationale.
Regarding claim 14, Green discloses an image output module, configured to be electrically coupled to the processing module, and suitable for transmitting the output image to an image output device (Green, [0074], “CPU 502 is one or more general-purpose microprocessors having one or more processing cores. Further embodiments can be implemented using one or more CPUs with microprocessor architectures specifically adapted for highly parallel and computationally intensive applications, such as processing operations of interpreting a query, identifying contextually relevant resources, and implementing and rendering the contextually relevant resources in a video game immediately”. The function of image output module is performed by the processor. Fig. 5 illustrates electrically coupled to the processing module. Display 510 is considered an image output device).
Regarding claim 20, Green discloses a non-transitory computer-readable recording medium for generating an image, capable of completing the method for generating an image of claim 1 after a computer device loads and executes a program code stored therein (Green, [0050], “a method for modifying an image generated using an image generation artificial intelligence model implementing latent diffusion techniques based on editing of labeled features of the image”. In addition, in paragraph [0098], “computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter be read by a computer system”).
Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of Denison (US 2024/0193821), as applied to claim 1, in further view of Ravi et al. (US 2024/0362842).
Regarding claim 8, Green teaches operation on the keyword character set (Green, [0028], “the original image 101 is analyzed by the feature extractor 110 that is configured to extract features that have been previously learned…output a hierarchical tree of labeled features 150a for the original image 101, wherein the labeled features can be used by an IGAI model (e.g., as input useful for generating another image). FIG. 1B illustrates an exemplary hierarchical tree 150a including labeled features 1 though N, wherein each of the features can be nested with other features (e.g., features, sub-features, sub-sub-features, etc.)”); Green as modified by Denison with the same motivation from claim 1 teaches each of the at least one editing button (Dension, [0025], “The node map is generated to include a plurality of nodes, with each node in the node map corresponding to an image feature identified from the image input. The node map shows the interconnectivity between the nodes to correspond with the inter-relationship between the image features of the image input. When the image input is the generated image itself, selection of a node from the node map for tuning would result in the selection of the corresponding image feature of the generated image for tuning”), at least one string editing operation (Dension, [0036], “The text content describing the source image is then processed by the analysis module 310 in a manner similar to the text string”);
Green as modified by Dension does not expressly disclose “determining, based on an editing weight value”;
Ravi et al. (hereinafter Ravi) discloses “determining, based on an editing weight value (Ravi, [0095], “the modified digital image 616 reflects the base digital image 606 and the edit text according to the conceptual edit strength parameter and the structural edit strength parameter indicated by the conceptual weight element 610 and the structural weight element 612”);
a level of editing performed (Ravi, [0092], “Based on user interaction with the conceptual weight element the diffusion prior image editing system 102 can determine a conceptual edit strength parameter. Although illustrated in FIG. 6A as a slider element, the conceptual weight element 610 can include a variety of different user interface elements”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the conceptual weighting function of Ravi into the string editing operation of Green as modified by Denison. The motivation for doing so would have been improving efficiency and semantic consistency of the editing process.
Regarding claim 9, Green teaches the keyword character set; Green as modified by Dension and Ravi with the same motivation from claim 8 discloses the editing weight value is set according to content of the text string (Ravi, [0092], “receiving a user interface selection closer to the edit text element 608 can the diffusion prior image editing system 102 can modify the conceptual edit strength parameter to emphasize the edit text”).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of Denison (US 2024/0193821), as applied to claim 1, in further view of Winnemoeller et al. (US 2021/0311619).
Regarding claim 11, Green discloses inputting an image into an image description model (Green, [0028], “the original image 101 is analyzed by the feature extractor 110 that is configured to extract features that have been previously learned…the classifier 120 may be configured as an AI model executed by a deep/machine learning engine that may be configured as a neural network 125. In some embodiments, the feature extractor 110 and classifier 120 are implemented within one or more AI models”);
outputting the keyword character set by the image description model, wherein the image description model automatically generates the keyword character set corresponding to the image (Green, [0028], “the classifier 120 is configured to output a hierarchical tree of labeled features 150a for the original image 101”);
Green as modified by Dension does not expressly disclose “receiving at least one decorative image”;
Winnemoeller et al. (hereinafter Winnemoeller) discloses receiving at least one decorative image (Winnemoeller, [0087], “the snap effects system 106 applies the color action block to a different portion of the digital image (e.g., the background remaining after the cutout). Specifically, the snap effects system 106 executes the color action block by identifying a sub-portion of the digital image 516 associated with the field 512c that is able to receive digital content (i.e., a background portion that can overlay a given color)”);
performing a composition operation on the at least one input image and the at least one decorative image, and generating a choreographed image (Winnemoeller, [0087], “the snap effects system 106 applies the color action block to a different portion of the digital image (e.g., the background remaining after the cutout). Specifically, the snap effects system 106 executes the color action block by identifying a sub-portion of the digital image 516 associated with the field 512c that is able to receive digital content (i.e., a background portion that can overlay a given color)…the snap effects system 106 completes execution of the color action block by applying the overlay at the identified sub-portion of the digital image 516”. The overlaid image reads on a choreographed image).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Green to generate its features based on the overlaid image of Winnemoeller. The motivation for doing so would have been improving consistency and efficiency in feature generation.
Allowable Subject Matter
Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYLE ZHAI whose telephone number is (571)270-3740. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at (571) 272 - 7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KYLE ZHAI/Primary Examiner, Art Unit 2611