Prosecution Insights
Last updated: May 29, 2026
Application No. 18/734,454

METHOD FOR VIRTUAL FITTING, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Non-Final OA §103
Filed
Jun 05, 2024
Priority
Apr 01, 2024 — CN 202410392719.8
Examiner
LETT, THOMAS J
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Xiao-I Plus Inc.
OA Round
1 (Non-Final)
84%
Grant Probability
Favorable
1-2
OA Rounds
10m
Est. Remaining
48%
With Interview

Examiner Intelligence

Grants 84% — above average
84%
Career Allowance Rate
606 granted / 725 resolved
+21.6% vs TC avg
Minimal -36% lift
Without
With
+-35.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
21 currently pending
Career history
748
Total Applications
across all art units

Statute-Specific Performance

§101
5.3%
-34.7% vs TC avg
§103
41.0%
+1.0% vs TC avg
§102
51.0%
+11.0% vs TC avg
§112
2.4%
-37.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 725 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-4, 7-11, 14-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. (TryOnDiffusion: A Tale of Two UNets) in view of Choi et al. (VITON-HD: High-Resolution Virtual Try-On) and further in view of Rombach et al. (High-Resolution Image Synthesis with Latent Diffusion Models). Regarding claim 1, Zhu et al. (TryOn) discloses a method for virtual fitting (apparel try-on results with a significant body shape and pose modification, figure 1), comprising: obtaining a first person image and a garment image (inputs: for both person and garment images using off-the-shelf methods [11, 28]. For garment image, we further segment out the garment Ic using the parsing map. For person image, we generate clothing-agnostic RGB image Ia which removes the original clothing but retains the person identity, section 3, see top of Figure 2); inputting the second person image and the garment image into a virtual fitting model obtained by pre-training to obtain a virtual fitting image (the model takes as input 256×256 try-on result from previous Parallel-UNet model and synthesizes the final try on result Itr at 1024×1024 resolution, section 3.1. The model takes as input the agnostic person image and target garment, section 3); wherein the virtual fitting model is a dual U-Net structure (achieve it via two UNets that handle the garment and the person respectively, section 3.2, page 5) which comprises an image encoder, two U-Nets, and an image decoder, and the two U-Nets are respectively used as a garment characterization network and a latent diffusion network (the model takes as input 256×256 try-on result from previous Parallel-UNet model and synthesizes the final try on result Itr at 1024×1024 resolution, section 3.1); Zhu et al. does not expressly disclose performing a masking process of garment information on the first person image to obtain a second person image Choi et al. teaches generating the clothing-agnostic image Ia and the clothing-agnostic segmentation map Sa, which al low the model to remove the original clothing information thoroughly, and preserve the rest of the image, section 3.1, page 14134. Zhu et al. in view of Choi et al. are analogous art because they are from the similar problem solving area of virtual fitting. At the time of the invention, it would have been obvious to a person of ordinary skill in the art to add the masking feature of Choi et al. to the method of Zhu et al. in order to obtain a masking process of garment information. The motivation for doing so would be to differentiate image data. Zhu et al. does not expressly disclose wherein the two U-Nets have a same network structure that comprises one or more down-sampling layers, one or more intermediate layers, and one or more up-sampling layers and an image encoder, two U-Nets, and an image decoder. Rombach et al. teaches advanced sampling, undersampling, and downsampling blocks, section 3.2 and teaches an autoencoding model which learns a space that is perceptually equivalent to the image space, but offers significantly reduced computational complexity, a decoder D reconstructs the image from the latent and a U-Net performing denoising, section 3. Zhu et al. in view of Rombach et al. are analogous art because they are from the similar problem solving area of image synthesis. At the time of the invention, it would have been obvious to a person of ordinary skill in the art to add the one or more down-sampling layers, one or more intermediate layers, and one or more up-sampling layers and an image encoder, two U-Nets, and an image decoder of Rombach et al. to the method of Zhu et al. in order to obtain an image formation process. The motivation for doing so would be to achieve state-of-the-art synthesis results on image data. Regarding claim 2, Zhu et al. (TryOn) discloses the method according to claim 1, wherein the inputting the second person image and the garment image into a virtual fitting model obtained by pre-training to obtain a virtual fitting image comprises: inputting the garment image into the image encoder to obtain a garment latent feature (Examiner articulates that Virtual Try-on with Diffusion Model employ a network module (e.g., CLIP image encoder or ReferenceNet to extract garment features, which are injected into the process of diffusion denoising to preserve the identity and details of the garment), and taking the garment latent feature as an input of the garment characterization network (similarity between the target person and the source garment, providing a learnable way to represent correspondence for the try-on task, section 3.2); recording a feature of the up-sampling layers, the intermediate layers, and the down-sampling layers when performing a spatial self-attention operation (garment is warped implicitly via a cross attention mechanism, page 1; pose embeddings are then fused to the person-UNet through the attention mechanism, which is implemented by concatenating pose embeddings to the key-value pairs of each self attention layer, section 3.2); inputting the second person image into the image encoder to obtain a person latent feature and mask region information, and taking the person latent feature, the mask region information and a random noise obeying Gaussian distribution as an input of the latent diffusion network (Choi et al: generating the clothing-agnostic image Ia and the clothing-agnostic segmentation map Sa, which al low the model to remove the original clothing information thoroughly, and preserve the rest of the image, section 3.1, page 14134); respectively concatenating the feature, recorded by the garment characterization network, of the up-sampling layers, the intermediate layers, and the down-sampling layers when performing the spatial self-attention operation with a feature of the up-sampling layers, the intermediate layers, and the down-sampling layers at a corresponding position of the latent diffusion network when performing the spatial self-attention operation in a process of performing iterative denoising, to obtain a concatenated feature, and taking the concatenated feature as a feature of the latent diffusion network at the corresponding position (person-UNet takes the clothing-agnostic RGB Ia and the noisy image zt as input. Since Ia and zt are pixel-wise aligned, we directly concatenate them along the channel dimension at the beginning of UNet processing, section 3.2); and inputting a feature output by the latent diffusion network into the image decoder to output the virtual fitting image (Output from 256×256 Parallel-UNet is sent to standard super resolution diffusion to create the 1024×1024 image, figure 2). Regarding claim 3, Rombach et al. discloses the method according to claim 1, wherein a training process of the virtual fitting model comprises: adding a random noise to a training sample in a diffusion step (applies JPEG compressions noise, camera sensor noise, different image interpolations for downsampling, Gaussian blur kernels and Gaussian noise in a random order to an image, D.6.1, page 23) based on Markov chain (learning the reverse process of a fixed Markov Chain of length T), recovering a clean sample from a noise sample in a reverse process (gradually denoising a nor mally distributed variable, section 3.2), calculating a loss between a real noise and an estimated noise (ability to build the underlying UNet primar ily from 2D convolutional layers, and further focusing the objective on the perceptually most relevant bits using the reweighted bound, section 3.2, Eq. 2), back propagating and updating a model parameter of the latent diffusion network until convergence, saving the model parameter and taking the model parameter as a model parameter of the garment characterization network. Regarding claim 4, Choi et al. (ViTOn) discloses the method according to claim 1, wherein the performing a masking process of garment information on the first person image to obtain a second person image comprises: inputting the first person image into a pre-trained deep learning image semantic segmentation neural network model (the image I by utilizing the pre-trained networks [7, 3], where L is a set of integers indicating the se mantic labels, section 3.1) for semantic segmentation to obtain a semantic segmented person image (a clothing-agnostic image Ia and a clothing-agnostic segmentation map Sa as inputs of each stage, which truly eliminate the shape of clothing item and preserve the body parts that need to be reproduced, section 3.1), wherein the semantic segmented person image at least comprises an image divided into a human body information region and a garment information region (remove the clothing region to be replaced and preserve the rest of the image, section 3.1); and performing the mask processing on the garment information region in the semantic segmented person image to obtain the second person image (remove clothing, figure 3). Regarding claim 7, Zhu et al. (TryOn) discloses a method for virtual fitting, comprising: obtaining virtual fitting images by using the method for virtual fitting according to claim 1, wherein the first person image comprises a user image, there are at least two garment images, and the virtual fitting images respectively correspond to the garment images (randomly selected 2804 input pairs out of the 6K test set, ran all four methods on those pairs, section 4); and selecting at least one target virtual fitting image from at least two virtual fitting images for display or recommendation (randomly selected 2804 input pairs out of the 6K test set, ran all four methods on those pairs, and presented to raters. 15 non-expert raters (on crowdsource platform) have been asked to select the best result out of four, section 4). Claim 8, an electronic device, is rejected for the same reason as claim 1. Claim 9, an electronic device, is rejected for the same reason as claim 2. Claim 10, an electronic device, is rejected for the same reason as claim 3. Claim 11, an electronic device, is rejected for the same reason as claim 4. Claim 14, an electronic device, is rejected for the same reason as claim 7. Claim 15, a storage medium claim, is rejected for the same reason as claim 1. Claim 16, a storage medium claim, is rejected for the same reason as claim 2. Claim 17, a storage medium claim, is rejected for the same reason as claim 3. Claim 20, a storage medium claim, is rejected for the same reason as claim 1. Allowable Subject Matter Claims 5, 6, 12, 13, 18 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS J LETT whose telephone number is (571)272-7464. The examiner can normally be reached Mon-Fri 9-6 ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached at (571) 272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /THOMAS J LETT/Primary Examiner, Art Unit 2611
Read full office action

Prosecution Timeline

Jun 05, 2024
Application Filed
Apr 13, 2026
Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12633014
GENERATING IMAGE METHOD AND APPARATUS, DEVICE, AND MEDIUM
2y 5m to grant Granted May 19, 2026
Patent 12627947
APPARATUSES, COMPUTER-IMPLEMENTED METHODS, AND COMPUTER PROGRAM PRODUCTS FOR IMPROVED DATA TRANSMISSION AND TRACKING
2y 8m to grant Granted May 12, 2026
Patent 12620181
DETERMINING AN ASSIGNMENT OF VIRTUAL OBJECTS TO POSITIONS IN A USER FIELD OF VIEW TO RENDER IN A MIXED REALITY DISPLAY
3y 7m to grant Granted May 05, 2026
Patent 12619774
CONTROLLED EXPOSURE TO LOCATION-BASED VIRTUAL CONTENT
2y 5m to grant Granted May 05, 2026
Patent 12602714
LIGHTING AND INTERNET OF THINGS DESIGN USING AUGMENTED REALITY
2y 5m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2
Expected OA Rounds
84%
Grant Probability
48%
With Interview (-35.9%)
2y 10m (~10m remaining)
Median Time to Grant
Low
PTA Risk
Based on 725 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month