Last updated: April 19, 2026

Application No. 18/081,638

SYSTEMS AND METHODS FOR CUSTOMIZING IMAGES BASED ON USER PREFERENCES

Non-Final OA §103

Filed

Dec 14, 2022

Examiner

WANG, YI

Art Unit

2619

Tech Center

2600 — Communications

Assignee

Sony Interactive Entertainment Inc.

OA Round

5 (Non-Final)

Interview Optional

— +14.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 481 resolved cases, 2023–2026

Examiner Intelligence

WANG, YI View full profile →

Grants 76% — above average

Career Allow Rate

368 granted / 481 resolved

+14.5% vs TC avg

Moderate +15% lift

Without

With

+14.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

24 currently pending

Career history

505

Total Applications

across all art units

Statute-Specific Performance

§101

5.3%

-34.7% vs TC avg

§103

64.1%

+24.1% vs TC avg

§102

10.3%

-29.7% vs TC avg

§112

11.7%

-28.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 481 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This is in response to applicant’s amendment/response filed on 01/26/2026, which has been entered and made of record. Claims 1-4 and 6-20 have been amended. No Claim has been cancelled. No Claim has been added. Claims 1-20 are pending in the application. 
The objects to claims 1, 8, and 15 are withdrawn in view of the amendments to these claims.
The objects to claims 6, 13, and 20 are withdrawn in view of the amendments to these claims.
The rejections claims 1-20 are withdrawn in view of the amendments to the independent claims 1, 8, and 15.

Response to Arguments
Applicant’s arguments (Remarks, p. 10-14) with respect to the independent claims 1, 8, and 15, and the dependent claim have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s arguments directed to amended limitation have been addressed in the detail rejection below with new reference by Min et al. in view of Zhang et al.
The arguments regarding dependent claims for the virtue of their dependency are moot because the independent claims are not allowable.

Drawings
The drawings are objected to under 37 CFR 1.83(a) because they fail to show “a textual description classifier 553” as described in the specification.  Any structural detail that is essential for a proper understanding of the disclosed invention should be shown in the drawing. MPEP § 608.02(d). Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “542” has been used to designate both “Input Data” and “Encoder”.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities: The specification (in paragraph [0065]) recites “encoder 542” and “input data 542”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: The specification (in paragraph [0066]) recites “encoder 552” and “”substantive image data identifier 552”.  
Appropriate correction is required.

Claim Objections
Claims 2, 9, and 16 are objected to because of the following informalities: there is a typo in claim limitation “wherein he plurality of training textual descriptions”. Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 8-9, and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Min et al. (20240087179), and in view of Zhang et al. (US 20230419571 A1).

Regarding Claim 8, Min discloses A server system (ABST reciting “Methods and systems), comprising:
 One or more processor; and one or more memories including instructions executable by the one or more processors to cause the one or more processors to: (Fig. 6, processor 610. ¶5 reciting “A system for training a model includes a hardware processor and a memory that stores a computer program. When executed by the hardware processor, the computer program causes the hardware processor to train an encoder”) 
receive, from a user device, an initial image and a textual description specifying desired content for a target image; (¶18 reciting “the image generation system 100 accepts as input an image and a text condition. The condition may include a natural language expression that identifies a particular characteristic or activity that output video 108 should match.” ¶47 disclosing a user input device, and reciting “the peripheral devices 660 may include a display, touch screen, graphics circuitry, keyboard, mouse, speaker system, microphone, network interface, and/or other input/output devices, interface devices, and/or peripheral devices.”)
encode the plurality of textual descriptions to generate an embedding vector representing the textual description in a latent space; (¶19 reciting “The text condition is processed by a language encoder, which generates a representation of the input text condition in a respective second latent space.”)
receive noisy image data by:
	encoding the plurality of initial images to a first latent space (¶19 reciting “The input image is processed by an image encoder 102, generating a representation of the input image in a first latent space.”); and
	adding a random noise to the encoded initial image; (¶23 disclosing adding Gaussian noise to the sample data.)
input the embedding vector and the noisy image data to an artificial intelligence (AI) model; (¶19 reciting “These representations are used as inputs to a latent flow diffusion model (LFDM) 106. The LFDM 106 in turn generates a series of images that make up an output 108 which satisfies the input text condition.”) and
generate, by using the AI model, a denoisy image corresponding to the target image based on the embedding vector, wherein the AI model outputs the target image by executing a denoising process that includes selecting image features in the noisy image data that correspond to semantic features in the embedding vector, removing noise from the selected image features to create denoised image features, and combining the denoised image features into the target image. (¶36-37 disclosing outputting a denoisy image based on the embedding vector e, and ¶37 reciting “to synthesize each new frame {circumflex over (x)} in an output video”)
However, Min does not explicitly disclose a plurality of initial images and a plurality of textual descriptions; and the first latent space and the second latent space are the same latent space.
Zhang teaches “the search-based editing system receives a multi-modal search input that includes multiple visual (e.g., sketch, brush, or image) and/or textual components that provide semantic and layout information to consider when conducting the image search. The search-based editing system further utilizes a multi-modal embedding neural network to generate an input embedding that represents the semantic and layout information from the multi-modal search input.” (¶34). More specifically, ¶139 recites “a common embedding space includes an embedding space for input embeddings that correspond to search input (e.g., queries) of different modals. For instance, as will be discussed below, the search-based editing system 106 generates text embeddings for text queries and image embeddings for image queries within a text-image embedding space in some cases.” Further, ¶198 recites “receiving the search input for conducting the image search comprises receiving a plurality of search inputs; and generating the input embedding for the search input comprises generating a plurality of input embeddings for the plurality of search inputs within a common embedding space.” 
It would have been obvious to one with ordinary skill, before the effective filing date of the claimed invention, to modify the system (taught by Min) to receive a plurality of image inputs and textual inputs, and generating input embeddings within a common embedding space (taught by Zhang). The suggestions/motivations would have been for “image editing using flexible and accurate image search results” (¶27), and to apply a known technique to a known device (method, or product) ready for improvement to yield predictable results.

Regarding Claim 9, Min in view of Zhang discloses The server system of claim 8, further comprising training the AI model based on a plurality of training images and a plurality of training textual descriptions (Min, ¶38 reciting “Referring now to FIG. 4, a training method is shown”; and ¶39 reciting “a conditional latent diffusion probabilistic model is learned over the concatenated continuous latent embeddings of video frames, conditioned on the input text and optionally the latent embedding of the given image. The latent flow diffusion model may be trained on datasets with paired text-video labels.”), wherein the plurality of training images are different from the plurality of initial images, wherein he plurality of training textual descriptions are different from the plurality of textual descriptions (Min, ¶40 disclosing generating a video using an input text and an input image.), and wherein said training the AI model includes determining a similarity between each of the plurality of textual descriptions and a respective one of the plurality of initial images. (Zhang, ¶170 reciting “the search-based editing system 106 retrieves digital images having an embedding that satisfies a threshold proximity (e.g., a threshold cosine distance). Thus, the search-based editing system 106 can provide search results in response to the multi-modal search input 1220.” The suggestions/motivations would have been the same as that of Claim 8 rejections.)

Claim 1, has similar limitations as of Claim(s) 8, therefore it is rejected under the same rationale as Claim(s) 8.
Claim 15, has similar limitations as of Claim(s) 8, therefore it is rejected under the same rationale as Claim(s) 8.
Claim 2, has similar limitations as of Claim(s) 9, therefore it is rejected under the same rationale as Claim(s) 9.
Claim 16, has similar limitations as of Claim(s) 9, therefore it is rejected under the same rationale as Claim(s) 9.

Claim(s) 4-5, 11-12, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Min in view of Zhang et al., and further in view of Melina et al. (US 20190205946 A1).

Regarding Claim 11, Min in view of Zhang discloses The server system of claim 8.
However, Zhang in view of Tang does not explicitly to wherein the instructions further cause the one or more processors to: access a user account to identify a characteristic of a user and a profile of the user, wherein the profile of the user includes an age of the user and the characteristic includes a geographic location of the user, a plurality of game titles played by the user via the user account, a comment made by the user, and a preference of the user, wherein the profile is stored within the user account; and generate the denoisy image corresponding to the target image based on the characteristic and the profile.
Melina teaches “a generative process for a tree may take content potentially created via a conditional process to specify season and/or location by region. Location content associated with user-profile content may likewise be used by a generative process to create a tree image intended to appear as local to where a user lives. As another example, a generative process may create an image of a smiling face intended to be appealing to users with profiles including particularly specified aspects. Thus, in some embodiments, a relationship may emerge between facial features (e.g., gender, ethnicity, age, geometric structure, etc.) appealing to various users, based at least in part on user-interaction content.” (¶60).
More specifically, ¶14 recites “content related to a user, such as identification content and/or demographic content, for example, user id, age, gender, ethnicity, geographical location, education level, income bracket, profession, marital status, social networks, etc., referred to herein as “user-profile content.” User-profile content may be provided by users explicitly, such as a component of registering for an online product and/or service. However, likewise, user-profile content may be collected in other ways, in other embodiments, such as, for example, by use of cookies. The foregoing, again, is meant to be illustrative and thus non-limiting.” Further, ¶52 recites “a generative process, such as 324, may be conditioned on user-profile content from user-profile content database 302, to create one or more media objects”. Furthermore, ¶56 recites “For example, user identification content and/or demographic content may be pushed or pulled, e.g., user id, age, gender, ethnicity, geographical location, education level, income bracket, profession, marital status, social networks, etc., for use in processing.”
It would have been obvious to one with ordinary skill, before the effective filing date of the claimed invention, to modify the system (taught by Min in view of Zhang) to access a user profile and characteristic to generate an image using a generative model, wherein the profile of the user includes an age of the user and the characteristic includes a geographic location of the user, a plurality of game titles played by the user via the user account, a comment made by the user, and a preference of the user, wherein the profile is stored within the user account (taught by Melina). The suggestions/motivations would have been “created media objects may be made more appealing to users” (¶27), and to apply a known technique to a known device (method, or product) ready for improvement to yield predictable results.

Regarding Claim 12, Min in view of Zhang and Melina discloses The server system of claim 8, wherein the instructions further cause the one or more processors to receive a geographic location within a predetermined time period from receiving the textual description. (Melina, ¶56 reciting “For example, user identification content and/or demographic content may be pushed or pulled, e.g., user id, age, gender, ethnicity, geographical location, education level, income bracket, profession, marital status, social networks, etc., for use in processing. Additionally, other ad contextual content may be employed similarly, e.g., time of day, season of year, location of a user, topic of an article, the like, and/or combinations thereof. Of course, these are meant as non-limiting illustrations.” The suggestions/motivations would have been the same as that of Claim 11 rejections.)

Claim 4, has similar limitations as of Claim(s) 11, therefore it is rejected under the same rationale as Claim(s) 11.
Claim 5, has similar limitations as of Claim(s) 12, therefore it is rejected under the same rationale as Claim(s) 12.
Claim 18, has similar limitations as of Claim(s) 11, therefore it is rejected under the same rationale as Claim(s) 11.
Claim 19, has similar limitations as of Claim(s) 12, therefore it is rejected under the same rationale as Claim(s) 12.

Claim(s) 7, 14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Min in view of Zhang, and further in view of Zhang et al. (hereinafter referred to Zhang -2) (US 20230230198 A1), and further in view of Hirt (US 20220408070 A1).

Regarding Claim 14, Min in view of Zhang discloses The server system of claim 8.
However, Min in view of Zhang does not explicitly disclose the instructions further cause the one or more processors to:
condition the denoisy image corresponding to the target image to confirm that the target image satisfies a plurality of constraints thereby generating a conditional image; and 
provide the conditioned image. 
Zhang-2 teaches “systems, non-transitory computer-readable media, and methods that implement a deep learning framework for interactive, multi-round image generation utilizing natural-language feedback.” (¶6). ¶50 teaches outputting a conditioned image, and recites “At an act 206, the interactive image generation system 106 receives an additional natural language command indicating a targeted image modification to the digital image generated at the act 204. For example, the interactive image generation system 106 receives the additional natural language command of “he should have long hair” to indicate the targeted image modification is longer hair length of the portrayed subject.” Further, ¶51 recites “the interactive image generation system 106 leverages additional textual features extracted from the additional natural language command to condition the generative neural network.” 
In addition, ¶42 recites “the interactive image generation client system 110 presents or displays information to a user associated with the client device 108, including generated digital images (and modified digital images) as provided in this disclosure.
It would have been obvious to one with ordinary skill, before the effective filing date of the claimed invention, to modify the system (taught by Min in view of Zhang) to generate and provide a conditional image (taught by Zhang-2). The suggestions/motivations would have been to solve the problems mentioned in ¶2-¶5, and to apply a known technique to a known device (method, or product) ready for improvement to yield predictable results.
However, Min in view of Zhang and Zhang-2 does not explicitly disclose to output a conditioned image by upscaling the image.
Hirt teaches upscaling an image, and recites  “When the input image data from rendering unit 10 is provided in lower resolution than the desired output resolution, viewpoint reprojection unit 20 may perform scaling (e.g., upscaling or downscaling) such that the synthesized image data is at the desired output resolution” (¶46). Further, ¶65 recites “Synthesizing the first plurality of viewpoints may include scaling the input image data to the desired output resolution and interpolating or extrapolating the image data to match the positions and directions of the output viewpoints (which may be declared in a task definition, as described above).”
It would have been obvious to one with ordinary skill, before the effective filing date of the claimed invention, to modify the system (taught by Min in view of Zhang and Zhang-2) to upscale and interpolating or extrapolating the image data (taught by Hirt). The suggestions/motivations would have been for desired resolution (¶65), and to apply a known technique to a known device (method, or product) ready for improvement to yield predictable results.

Claim 7, has similar limitations as of Claim(s) 14, therefore it is rejected under the same rationale as Claim(s) 14.
Claim 20, has similar limitations as of Claim(s) 14, therefore it is rejected under the same rationale as Claim(s) 14.

Allowable Subject Matter
Claims 3, 6, 10, 13, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim 10 is distinguished from the closest known prior art alone or in reasonable combination, in consideration of the claim as a whole, particularly the limitations similar to “to determine a similarity between portions of the embedding vector and each of the plurality of initial images.” in combination with the remaining aspects of the claim and any intervening claims. Claims 3 and 17 are each similar in scope to Claim 10, and therefore also contain allowable subject matter.

Claim 13 is distinguished from the closest known prior art alone or in reasonable combination, in consideration of the claim as a whole, particularly the limitations similar to “combining the denoised image features into the target image comprises: identifying, based on the embedding vector, a plurality of sub-portions of each of the plurality of initial images as representing substantive image data; and stitching the plurality of sub-portions together to form the target image.” in combination with the remaining aspects of the claim and any intervening claims. Claim 6 is similar in scope to Claim 13, and therefore also contain allowable subject matter.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YI WANG whose telephone number is (571)272-6022. The examiner can normally be reached 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at (571)272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YI WANG/Primary Examiner, Art Unit 2619

Read full office action

Prosecution Timeline

Dec 14, 2022

Application Filed

Sep 21, 2024

Non-Final Rejection — §103

Dec 02, 2024

Response Filed

Feb 17, 2025

Final Rejection — §103

May 02, 2025

Request for Continued Examination

May 07, 2025

Response after Non-Final Action

May 24, 2025

Non-Final Rejection — §103

Jul 16, 2025

Interview Requested

Jul 23, 2025

Applicant Interview (Telephonic)

Jul 23, 2025

Examiner Interview Summary

Jul 29, 2025

Response Filed

Oct 29, 2025

Final Rejection — §103

Jan 26, 2026

Applicant Interview (Telephonic)

Jan 26, 2026

Examiner Interview Summary

Jan 26, 2026

Request for Continued Examination

Jan 30, 2026

Response after Non-Final Action

Mar 03, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/102,036

Patent 12579758

DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR INTERACTING WITH VIRTUAL OBJECTS USING HAND GESTURES

2y 5m to grant Granted Mar 17, 2026

18/186,629

Patent 12579752

SYSTEM AND METHOD FOR CREATING AND FURNISHING DIGITAL MODELS OF INDOOR SPACES

2y 5m to grant Granted Mar 17, 2026

18/245,112

Patent 12579708

CHARACTER DISPLAY METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

2y 5m to grant Granted Mar 17, 2026

18/259,954

Patent 12573009

IMAGE PROCESSING METHOD, IMAGE GENERATING METHOD, APPARATUS, DEVICE, AND MEDIUM

2y 5m to grant Granted Mar 10, 2026

18/266,137

Patent 12562084

AUGMENTED REALITY WINDOW

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

76%

Grant Probability

91%

With Interview (+14.7%)

2y 7m

Median Time to Grant

High

PTA Risk

Based on 481 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEMS AND METHODS FOR CUSTOMIZING IMAGES BASED ON USER PREFERENCES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email