DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 12346995 B2), referred herein as Liu in view of Zakharov et al. (US 12169626 B2), referred herein as Zakharov.
Regarding Claim 1, Liu in view of ZHOU teaches a computer implemented method of generating at least one likeness of a specific human subject for use in the generation of a media product, the method comprising (Liu Abst: systems and methods for generating a synthesized image of a user with a trained machine learning diffusion model; col4, ln22-26: The synthesized image 130 includes a character having the same or similar visual features as the user):
receiving subject capture data from a capture session in respect of the specific human subject, wherein the subject capture data is captured using at least one capture device (Liu col4, ln19-22: A trained machine learning diffusion model 128 is configured to receive the image 124 of the user and generate a synthesized image 130 of the user based at least on the image 124 of the user captured via the camera 110);
processing the subject capture data to generate a training data set from the subject capture that enables separation of facial features of the specific human subject from other features (Liu col4, ln41-47: the trained machine learning diffusion model 128 may be trained to extract visual features of the face of the user captured in the image 124 of the user. In other implementations, the trained machine learning diffusion model 128 may be trained to extract visual features of at least some or all of the body of the user captured in the image 124 of the user; col7, ln49-55: FIG. 3 schematically shows a data flow of an example training phase of the machine learning diffusion model 128. A set of training images 300 is generated to train the image encoder 200. In some examples, the set of training images 300 includes images of faces of different people. In some examples, the set of training images 300 includes a set of open-source images); and
using an Al algorithm with the training data set to at least one of (Liu col1, ln1-4: Machine learning generative models can be implemented in a variety of applications such as image-to-text generation, style transfer, image-to-image translation, and text-to-three-dimensional (3D) object generation):
generate at least one likeness of the specific human subject (Liu col4, ln19-24: A trained machine learning diffusion model 128 is configured to receive the image 124 of the user and generate a synthesized image 130 of the user based at least on the image 124 of the user captured via the camera 110. The synthesized image 130 includes a character having the same or similar visual features as the user); and
train an Al model to generate likenesses of the specific human subject (Liu col8, ln10-13: The set of training images 300 with the corresponding amount of noise 304 are fed to the image encoder 200 to generate a set of training embeddings 306 based at least on the set of training images 300 and the amount of noise 304).
Liu recited a machine learning model, but does not explicitly teach an AI algorithm. However, Zakharov teaches an AI algorithm (Zakharov col24, ln63-67: The machine learning programs 900, also referred to as machine learning algorithms or tools, are used as part of the systems described herein to perform operations associated with searches and query responses)
Zakharov discloses an automated image generation system, which is analogous to the present patent application.
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liu to incorporate the teachings of Zakharov, and apply the machine learning algorithms or tools into the systems and methods for generating a synthesized image of a user with a trained machine learning diffusion model.
Doing so would allow users to receive high-quality, automatically generated images using an automated image generator, without requiring sophisticated prompt engineering skills.
Regarding Claim 2, Liu in view of Zakharov teaches the method of claim 1, and further teaches wherein the Al algorithm trains an Al model pre- trained to generate likenesses of human subjects with the training data set to thereby generate an instance of the Al model trained to generate likenesses of the specific human subject (Liu col7, ln20-24: the diffusion model 204 is a pretrained Stable Diffusion model. The Stable Diffusion model processes the input feature vector 216 in an iterative fashion starting with a random starting image information array—e.g., a latent array).
Regarding Claim 3, Liu in view of Zakharov teaches the method of claim 1, and further teaches further comprising conducting the capture session using the at least one capture device (Liu col9, ln26-28: In FIG. 4B, the social media application detects that a face 412 of the user is aligned with the oval 410 and captures an input image of the face of the user via the camera 406).
Regarding Claim 4, Liu in view of Zakharov teaches the method of claim 1, and further teaches further comprising conducting the capture session in a capture booth comprising at least one capture device (Liu col9, ln9-14: The smartphone 402 includes a display 404 and a camera 406 positioned on a same side of the smartphone 402 as the display 404. The smartphone 402 executes a social media application configured to communicate via a computer network with a social network platform executed on a server computing system). The front-facing camera of a smartphone performs the function that is equivalent with the function of a capture booth.
Regarding Claim 5, Liu in view of Zakharov teaches the method of claim 1, and further teaches further comprising conducting the capture session at a kiosk comprising at least one capture device (Liu col9, ln9-14: The smartphone 402 includes a display 404 and a camera 406 positioned on a same side of the smartphone 402 as the display 404. The smartphone 402 executes a social media application configured to communicate via a computer network with a social network platform executed on a server computing system). The front-facing camera of a smartphone performs the function that is equivalent with the function of a capture kiosk.
Regarding Claim 6, Liu in view of Zakharov teaches the method of claim 1, and further teaches wherein the capture data comprises at least one of: a plurality of images captured in the capture session; and video from which a plurality of images can be extracted (Liu col5, ln8-11: the social media application 114 optionally may be configured to capture a video stream 132 of the user via the camera 110. The video stream 132 includes a sequence of images of the user).
Regarding Claim 7, Liu in view of Zakharov teaches the method of claim 1, and further teaches further comprising processing the plurality of images to generate a larger set of images, and outputting the larger set of images as the training data set (Liu col7, ln56-68: the set of training images 300 includes a plurality of synthesized training images 302 that are generated based at least on an initial training image of the set of training images 300… Doing so increases the number and variety of training images, which improves the generalization ability and robustness of the image encoder 200).
Regarding Claim 8, Liu in view of Zakharov teaches the method of claim 7, and further teaches wherein processing the plurality of images to generate a larger set of images comprises at least one of:
generating at least one image having a different viewpoint (Liu FIG. 6);
generating at least one image having a modified background (Liu col6, ln55-58: Each template may include a different combination of embeddings corresponding to a different combination of style and/or scene words that are used to generate the synthesized image 130);
generating at least one image in which the subject's clothing is modified (Liu FIG. 5); and
generating different crop levels for at least one image (Liu FIG. 5).
Regarding Claim 9, Liu in view of Zakharov teaches the method of claim 1, and further teaches computer implemented method of generating a media product comprising generating likenesses of a specific human subject using at least one of the algorithm and model trained to generate likenesses of the specific human subject according to the method of claim 1 (Liu col9, ln64-67: the social media application displays the synthesized image 414 in the GUI 408 without incorporating the synthesized image 414 in the video stream; col11, ln24-28: at 710 the computer-implemented method 700 optionally may include publishing or sharing the synthesized image of the user to a social network platform for viewing by other users of the social network platform).
Regarding Claim 10, Liu in view of Zakharov teaches the method of claim 9, and further teaches wherein generating the media product comprises generating the media product based on at least one prompt comprising a predefined prompt which has been combined with data received from the specific human subject (Liu col10, ln25-36: A first set of synthesized images 502 is generated based at least on the input image 500 and a set of word embeddings that specify a portrait of the first user as a warrior with a mountain background. A second set of synthesized images 504 is generated based at least on the input image 500 and a set of word embeddings that specify an oil painting of the first user wearing a hip-hop rap Christmas hat. A third set of synthesized images 506 is generated based at least on the input image 500 and a set of word embeddings that specify a half body portrait of the first user as a doctor wearing blue scrubs and a white coat).
Regarding Claim 11, Liu in view of Zakharov teaches the method of claim 9, and further teaches wherein generating the media product comprises generating the media product based on at least one prompt combined through the AI algorithm with images captured of the specific human subject (Zakharov col17, ln11-16: The method 500 commences at opening loop block 502 and proceeds to block 504, where input is obtained from a user, e.g., via an input text box in a user interface provided by the interaction client 104. The user wishes to obtain an AI-generated image and provides user input describing the desired image).
Regarding Claim 12, Liu in view of Zakharov teaches a computer implemented method of generating at least one likeness of a specific human subject for use in the generation of a media product (Liu Abst: systems and methods for generating a synthesized image of a user with a trained machine learning diffusion model; col4, ln22-26: The synthesized image 130 includes a character having the same or similar visual features as the user).
The metes and bounds of the claim substantially correspond to the claimed limitations set forth in claims 1, 2, 9 and 10; thus they are rejected on similar grounds and rationale as their corresponding limitations.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samantha (Yuehan) Wang whose telephone number is (571)270-5011. The examiner can normally be reached Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2617