Last updated: April 19, 2026
Application No. 18/742,835
GENERATIVE AI PET AVATAR GENERATION

Non-Final OA §103
Filed
Jun 13, 2024
Examiner
SUN, HAI TAO
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Snap Inc.
OA Round
1 (Non-Final)
Interview Optional

— +26.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 476 resolved cases, 2023–2026
Examiner Intelligence

SUN, HAI TAO View full profile →
Grants 73% — above average
Career Allow Rate
347 granted / 476 resolved
+10.9% vs TC avg
Strong +27% interview lift
Without
With
+26.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
35 currently pending
Career history
511
Total Applications
across all art units
Statute-Specific Performance

§101
6.9%
-33.1% vs TC avg
§103
65.8%
+25.8% vs TC avg
§102
2.3%
-37.7% vs TC avg
§112
15.9%
-24.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 476 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 11-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xie (US 20240169622 A1) and in view of Borovikov (US 20200312003 A1).
Regarding to claim 1,  Xie discloses a system (Fig. 1; [0039]:  user 105 provides an image, a text prompt, and a mask to image editing apparatus 115 via user device 110; [0040]: Image editing apparatus 115 generates a composite image in response to the input and provides the composite image to user 105 via user device 110) comprising: 
at least one processor ([0043]:  image editing apparatus 115 includes one or more processors); and 
at least one memory component storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations (Fig. 4; [0060]: execute computer-readable instructions stored in memory unit 410 to perform various functions; 
    PNG
    media_image1.png
    132
    548
    media_image1.png
    Greyscale
 ; [0061]: a memory device includes random access memory (RAM), read-only memory (ROM), or a hard disk) comprising: 
receiving, by a computing device of a first user, an image of a pet (Fig. 1; [0039]: pixels depict a cat, i.e. a pet; Fig. 2; [0051]:  the system provides an image, a mask, and a text prompt; the user provides the image, the mask, and the text prompt via a graphical user interface; 
    PNG
    media_image2.png
    252
    382
    media_image2.png
    Greyscale
 ; Fig. 7; [0103]: the pre-processing component 720 receives image 705, mask 710, and prompt 715 as input from a user; 
    PNG
    media_image3.png
    140
    308
    media_image3.png
    Greyscale
 ; receive a cat, i.e. a pet, image; [0114]: a user provides the image, the prompt, and the mask to the pre-processing component via a user interface); 
identifying a prompt corresponding to desired characteristics of a first virtual pet avatar for the first user ([0024]: a prompt, i.e. a text input, describes an object to be added to the image as input; [0035]: specify a text prompt; [0040]: the text prompt can be the text of, “hat”, “round-brimmed hat”, “banded hat”, “light-colored banded hat”, etc.; [0049]: the image includes pixels depicting a cat wearing a hat; [0051]: the image editing apparatus identifies the image, the mask, and the text prompt; Fig. 8; [0113]: the system identifies an image, and a prompt identifying an element to be added to the image; [0114]: identify the prompt); 
processing the image of the pet and the prompt by a first generative Artificial Intelligence (AI) model (Fig. 7; [0104]: Diffusion model 740, i.e. a generative AI,  outputs composite image map 745 to decoder 750 based on the input; Fig. 8; [0127]: the system generates a composite image map using a diffusion model, i.e.  a first generative Artificial Intelligence (AI) model, based on the partially noisy image map and the prompt), the first generative AI model being trained to receive images and prompts and to generate virtual pet avatars based on the received images and prompts (Fig. 8; [0127]: the system generates a composite image map using a diffusion model, i.e.  a first generative Artificial Intelligence (AI) model, based on the partially noisy image map and the text prompt; Fig. 12; [0158]: the system trains a diffusion model to generate a composite image map based on an image, a mask, and a text prompt; [0166]: the training component pretrains the diffusion model based on image caption training data to perform the reverse diffusion process; Fig. 12; [0169]: the system trains the diffusion model by updating parameters of the diffusion model based on the comparison); and 
receiving a first virtual pet avatar from the first generative AI model ([0041]: the image editing apparatus 115 provides the composite image to the user 105 via the graphical user interface; [0050]: the image editing apparatus generates a composite image in response to the input and provides the composite image to the user; Fig. 7; [0104]: diffusion model 740 outputs composite image map 745 to decoder 750 based on the input; decoder 750 decodes composite image map 745 to output composite image 755 to the user; 
    PNG
    media_image4.png
    464
    708
    media_image4.png
    Greyscale
; [0128]: the diffusion model obtains the composite image map using a reverse diffusion process).
Xie fails to explicitly disclose a real life image.
In same field of endeavor, Borovikov teaches a real life image ([0036]: images depict a player's real life pet or other animal as captured by a camera; Fig. 3; [0071]:  the video frame 302 is captured by a video camera and depicts a player's pet dog standing in a room).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Xie to include a real life image as taught by Borovikov. The motivation for doing so would have been to capture a player's real life pet  image by a camera; to generate one or more custom behavior models and one or more custom appearance models from the input media; to capture the video frame 302 by a video camera; to provide input images and video data of the input media that depicts the real animal; to generate a list of actions, behaviors, sequences of actions, or similar behavior information observed of the real animal in the input media as taught by Borovikov in paragraphs [0036], [0068], [0071], [0076], and [0086].

Regarding to claim 2, Xie in view of Borovikov discloses the system of claim 1, wherein receiving the real life image includes receiving an image captured by a camera of the computing device of the first user (Borovikov;  [0036]: images depict a player's real life pet or other animal as captured by a camera; [0064]: the images and video obtained at block 202 are originally captured by a camera of the player using a traditional two-dimensional (2D) camera; Fig. 3; [0071]:  the video frame 302 is captured by a video camera and depicts a player's pet dog standing in a room).
Same motivation of claim 1 is applied here.

Regarding to claim 3, Xie in view of Borovikov discloses the system of claim 1, wherein receiving the real life image includes identifying a friend of the first user, identifying an image uploaded by the friend of the first user, determining that the image includes a real life pet, determining that the first user reacted to the uploaded image, and identifying the real life pet as the real life image of the pet (Borovikov;  [0036]: images depict a player's real life pet or other animal as captured by a camera; [0063]: the input media is provided by a player, such as by the player selecting to upload images or videos of the player's pet from player computing system 102; the player enables the interactive computing system 120 to access a third party media source 103, i.e. a friend; [0064]: the images and video obtained at block 202 are originally captured by a camera of the player using a traditional two-dimensional (2D) camera; Fig. 2; [0065]: obtain the animal depicted in the input media; Fig. 3; [0071]:  the video frame 302 is captured by a video camera and depicts a player's pet dog standing in a room; Fig. 4; [0074]: obtain input visual media depicting a real animal; [0076]: the appearance learning system 136 provides input images and video data of the input media that depicts the real animal as input to the retrieved visual style extraction model).
Same motivation of claim 1 is applied here.

Regarding to claim 11, Xie in view of Borovikov discloses the system of claim 1, wherein the first generative AI model comprises a stable diffusion model configured to receive the real image and the prompt to generate the first virtual pet avatar (Xie;  [0103]: pre-processing component 720 receives image 705, mask 710, and prompt 715 as input from a user;  Fig. 7; [0104]: Diffusion model 740 outputs composite image map 745 to decoder 750 based on the input; Fig. 8; [0127]: the system generates a composite image map using a diffusion model, i.e.  a first generative Artificial Intelligence (AI) model, based on the partially noisy image map, i.e. real image, and the prompt).
Xie in view of Borovikov further discloses real life image (Borovikov; [0036]: images depict a player's real life pet or other animal as captured by a camera; Fig. 3; [0071]:  the video frame 302 may have been captured by a video camera and depicts a player's pet dog standing in a room).
Same motivation of claim 1 is applied here.

Regarding to claim 12, Xie in view of Borovikov discloses the system of claim 1, wherein the first virtual pet avatar comprises a three-dimensional media content item configured to perform one or more animated actions (Borovikov; [0023]: 3D mesh or other geometry or 3D object data, textures, animation data, behavior trees; [0041]:  a 3D model; 3D mesh data; [0043]: generate animation data for a virtual animal character to move in a manner that mimics or approximates specific movements performed by the real life animal depicted in input media; [0065]: generate a 3D model for a particular pig depicted in a video; [0070]: store the virtual animal's final 3D model, textures, animations).
Same motivation of claim 1 is applied here.

Regarding to claim 13, Xie in view of Borovikov discloses the system of claim 12, wherein the first generative AI model is configured to generate the first virtual pet avatar to perform the one or more animated actions based on a personality trait within the prompt (Borovikov; [0043]: generate animation data for a virtual animal character to move in a manner that mimics or approximates specific movements performed by the real life animal depicted in input media; [0045]: the player indicates that a certain action just performed by a virtual dog character in a game).

Regarding to claim 14, Xie in view of Borovikov discloses the system of claim 1, further comprising removing a background of the first virtual pet avatar to generate a modified virtual pet avatar (Xie; [0021]:  remove information from an image; [0090]: gradually remove the noise from noisy features 535 at the various noise levels to obtain denoised image features 545 in latent space 525; Fig. 8; [0112]: remove noise from the partially noisy image map such that the composite image map comprises the element realistically inpainted in a region of the image), and applying the modified virtual pet avatar to one or more interaction functions (Xie; Fig. 2; [0054]: the image editing apparatus provides the composite image to the user via the graphical user interface of the user device; Fig. 7; [0104]: decoder 750 decodes composite image map 745 to output composite image 755 to the user; [0181]: the untrained diffusion model predicts noise that can be removed from an intermediate image to obtain the predicted image; an original image is predicted at each stage of the training process ).

Regarding to claim 15, Xie in view of Borovikov discloses the system of claim 1, further comprising applying the prompt and the real image iteratively to the first generative AI model to generate at least a first and second version of the first virtual pet avatar (Xie; [0126]: the partially noisy image data is iteratively generated at each diffusion step t;  [0128]: the diffusion model iteratively removes noise from the partially noisy image map to predict a composite image map comprising the element in the first region that corresponds to the mask and image features from the image in a second region that does not correspond to the mask; [0144]: the diffusion model iteratively denoises the noisy data x.sub.T to obtain the conditional probability distribution;  iteratively outputs a prediction of x.sub.t-1, such as second intermediate image 1025) until the noisy data x.sub.T is reverted to a prediction of the observed variable x.sub.0), and receiving a selection from a user of the first version of the first virtual pet avatar (Xie; [0067]: select the max from the inputs as the output; [0077]: select a large number of regions to analyze using conventional CNN techniques; Fig. 8; [0112]: selectively add noise to a region of an input for the diffusion model; [0118]: the user selects the mask precision indicator via a brush tool input; [0128]: the composite image map is the last iteration of the intermediate composite image map).
Xie in view of Borovikov further discloses real life image (Borovikov; [0036]: images depict a player's real life pet or other animal as captured by a camera; Fig. 3; [0071]:  the video frame 302 may have been captured by a video camera and depicts a player's pet dog standing in a room).
Same motivation of claim 1 is applied here.

Regarding to claim 16, Xie in view of Borovikov discloses the system of claim 1, further comprising: 
modifying the prompt to create first and second versions of the prompt (Xie; [0114]: a user provides the image, the prompt, and the mask to the pre-processing component via a user interface; [0115]: the pre-processing component converts the audio to text to obtain the prompt); and 
applying the first and second versions of the prompt and the real image iteratively to the first generative AI model to generate at least a first and second version of the first virtual pet avatar (Xie; Fig. 2; [0054]: the system decodes the composite image map to obtain a composite image; [0126]: the partially noisy image data is iteratively generated at each diffusion step t;  [0128]: the diffusion model iteratively removes noise from the partially noisy image map to predict a composite image map comprising the element in the first region that corresponds to the mask and image features from the image in a second region that does not correspond to the mask; [0144]: the diffusion model iteratively denoises the noisy data x.sub.T to obtain the conditional probability distribution;  iteratively outputs a prediction of x.sub.t-1, such as second intermediate image 1025), and receiving a selection from a user of the first version of the first virtual pet avatar (Xie; Fig. 2; [0054]: the system decodes the composite image map to obtain a composite image; [0128]: the composite image map is the last iteration of the intermediate composite image map).
Xie in view of Borovikov further discloses real life image (Borovikov; [0036]: images depict a player's real life pet or other animal as captured by a camera; Fig. 3; [0071]:  the video frame 302 may have been captured by a video camera and depicts a player's pet dog standing in a room).
Same motivation of claim 1 is applied here.

Regarding to claim 19, Xie discloses a method (Fig. 1; [0039]:  user 105 provides an image, a text prompt, and a mask to image editing apparatus 115 via user device 110; [0040]: Image editing apparatus 115 generates a composite image in response to the input and provides the composite image to user 105 via user device 110) comprising: 
The rest claim limitations are similar to claim limitations recited in claim 1. Therefore, same rational used to reject claim 1 is also used to reject claim 19. 

Regarding to claim 20, Xie discloses a non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations (Fig. 1; [0039]:  user 105 provides an image, a text prompt, and a mask to image editing apparatus 115 via user device 110; [0040]: Image editing apparatus 115 generates a composite image in response to the input and provides the composite image to user 105 via user device 110; [0043]:  image editing apparatus 115 includes one or more processors; Fig. 4; [0060]: processors execute computer-readable instructions stored in memory unit 410 to perform various functions; 
    PNG
    media_image1.png
    132
    548
    media_image1.png
    Greyscale
 ; [0061]: a memory device includes random access memory (RAM), read-only memory (ROM), or a hard disk) comprising: 
The rest claim limitations are similar to claim limitations recited in claim 1. Therefore, same rational used to reject claim 1 is also used to reject claim 20. 

Claims 4-6, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Xie (US 20240169622 A1) in view of Borovikov (US 20200312003 A1), and further in view of Lotti (US 12293604 B1).
Regarding to claim 4, Xie in view of Borovikov discloses the system of claim 1, further comprising generating the prompt based on the first virtual pet avatar from the first user (Xie; [0115]: the pre-processing component converts the audio to text to generate and obtain the prompt).
Xie in view of Borovikov discloses the first virtual pet avatar from the first user (Borovikov; [0045]: a series of actions are performed by the virtual dog; [0066]: create a custom virtual dog; [0070]: a particular virtual dog character should carry a player's virtual ammo or virtual healing potion in a particular video game in the dog's mouth rather than on the dog's back).
Xie in view of Borovikov fails to explicitly disclose generating the prompt based on one or more preferences from the first user.
In same field of endeavor, Lotti teaches:
generating the prompt based on one or more preferences from the first user (col. 26, lines 1-10: generate the prompt 164 using the textual identifiers 240A-N and the user preferences 222; col. 29, lines 5-15: the identified objects 270 are generated in response to information related to the textual identifiers 240A-N and the user preferences 222 included in the prompt 164).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify  Xie in view of Borovikov to include generating the prompt based on one or more preferences from the first user as taught by Lotti. The motivation for doing so would have been to gradually improve its ability to generate increasingly realistic and diverse data; to generate the prompt 164 using the textual identifiers 240A-N and the user preferences 222 as taught by Lotti in col. 15, lines 25-40 and col. 26, lines 1-10.

Regarding to claim 5, Xie in view of Borovikov and Lotti discloses the system of claim 4, wherein generating the prompt comprises inputting the one or more preferences for the first virtual pet avatar to a large language model (LLM) to generate the prompt (Lotti; col. 15, lines 34-46:  the generative machine learning model 170 is a generative large language model; generate human-like text based on given input; col. 16, lines 1-10: generate human-like text and/or image data based on given input ), the LLM trained to receive as input pet preferences and outputting prompts corresponding to the input pet preferences (Lotti; col. 15, lines 34-46:  the generative machine learning model 170 is a large language model that has been pre-trained on a large corpus of data so as to process, analyze, and generate human-like text based on given input; col. 40, lines 1-15: train the machine learning model; col. 26, lines 1-10: generate the prompt 164 using the textual identifiers 240A-N and the user preferences 222; col. 29, lines 5-15: the identified objects 270 are generated in response to information related to the textual identifiers 240A-N and the user preferences 222 included in the prompt 164; col. 40, lines 40-55: processing logic provides training set T to train the machine learning model).
Same motivation of claim 4 is applied here.

Regarding to claim 6, Xie in view of Borovikov and Lotti discloses the system of claim 5, wherein the LLM is trained to generate prompts configured to be received as input by the first generative AI model (Lotti; col. 15, lines 34-46:  the generative machine learning model 170 is a large language model that has been pre-trained on a large corpus of data so as to process, analyze, and generate human-like text based on given input; col. 26, lines 1-10: generate the prompt 164 using the textual identifiers 240A-N and the user preferences 222; col. 29, lines 5-15: the identified objects 270 are generated in response to information related to the textual identifiers 240A-N and the user preferences 222 included in the prompt 164).
Same motivation of claim 4 is applied here.

Regarding to claim 10, Xie in view of Borovikov and  Lotti discloses the system of claim 4, further comprising identifying the one or more preferences for the first virtual pet avatar based on interaction data by the first user with an interaction function (Xie; [0051]: the image editing apparatus identifies the image, the mask, and the text prompt; Fig. 8; [0113]: the system identifies an image, and a prompt identifying an element to be added to the image; [0114]: identify the image, the prompt, and the mask).

Allowable Subject Matter
Claims 7-9 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 5712727642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

Jun 13, 2024
Application Filed
Mar 03, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/597,939
Patent 12602816
SIMULATED CONFIGURATION EVALUATION APPARATUS AND METHOD
2y 5m to grant Granted Apr 14, 2026
18/684,393
Patent 12603024
DISPLAY CONTROL DEVICE
2y 5m to grant Granted Apr 14, 2026
18/527,903
Patent 12586310
APPARATUS AND METHOD WITH IMAGE PROCESSING
2y 5m to grant Granted Mar 24, 2026
18/066,199
Patent 12578846
GENERATING MASKED REGIONS OF AN IMAGE USING A PREDICTED USER INTENT
2y 5m to grant Granted Mar 17, 2026
18/414,841
Patent 12579727
APPARATUS AND METHOD FOR ASYNCHRONOUS RAY TRACING
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
99%
With Interview (+26.6%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 476 resolved cases by this examiner. Grant probability derived from career allow rate.