Last updated: April 19, 2026

Application No. 17/918,767

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, LEARNING DEVICE, GENERATION METHOD, AND PROGRAM FOR EXPRESSING TEXTURE IN A GENERATED IMAGE

Final Rejection §103§112

Filed

Oct 13, 2022

Examiner

HANSEN, CONNOR LEVI

Art Unit

2672

Tech Center

2600 — Communications

Assignee

Sony Group Corporation

OA Round

4 (Final)

Interview Optional

— +29.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 28 resolved cases, 2023–2026

Examiner Intelligence

HANSEN, CONNOR LEVI View full profile →

Grants 75% — above average

Career Allow Rate

21 granted / 28 resolved

+13.0% vs TC avg

Strong +29% interview lift

Without

With

+29.2%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

32 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

19.1%

-20.9% vs TC avg

§103

39.9%

-0.1% vs TC avg

§102

16.8%

-23.2% vs TC avg

§112

23.7%

-16.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 28 resolved cases

Office Action

§103 §112

Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claims 1-5 and 7-10 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1+-5, 7-10 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 1 recites “generate, on a basis of the input image and the user input signal, a control signal  partially adjusting an output texture in an output image to be inferred using a learned inference model”. This is being interpreted as requiring that the control signal itself performs the partial adjustment to the output texture. However, the specification teaches that texture adjustment is performed by a learned inference model, which uses the control signal (e.g., texture axis values) to adjust texture in partial regions of an output image (see paragraphs 87 and 130-137). The control signal corresponds to merely input texture axis values, and no support is provided for the control signal itself performing texture adjustment. Thus, the claim is rejected on the ground of a lack of written description under 35 U.S.C. 112(a)

Dependent claims 2-5,7-8 are rejected as being dependent on a rejected base claim.

Claims 9 and 10 contain elements found analogous to that of claim 1. Thus, claims 9 and 10 are similarly rejected under 35 U.S.C. 112(a).

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-5,7-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites “generating, on a basis of the input image and the user input signal, a control signal partially adjusting an output texture in an output image…” which is indefinite. It is unclear what the phrase “partially adjusting” is intended to mean. For example, it is not clear if this refers to a partial adjustment to texture characteristics (e.g., quality or degree) or an adjustment applied only to partial regions of the output image. Further, it is unclear how the control signal performs the partial adjustments (see 35 U.S.C. 112(a) analysis above). One of ordinary skill in the art could not ascertain the scope of the claim, thus, the claim is indefinite. For examination purposes, the limitation will be interpreted as “generating, on a basis of the input image and the user input signal, a control signal which is used to partially adjust an output texture in an output image…”.

Claim 2 recites “input the input image to another inference model, and infer a texture label expressing the specific texture of each region formed in the output image…” which is indefinite. It is unclear if the limitation “the specific texture” is meant to refer to the specific texture for the partial region of claim 1, or a different specific texture corresponding to various regions in the output image. For example, in claim 1 a specific texture is designated by a user for a partial region, but there is no mention of various regions formed in the output image. Thus, it is unclear if “the specific texture” is meant to indicate that there is a plurality of regions with the same texture or a plurality of regions with different specific textures. One of ordinary skill in the art could not ascertain the scope of the claim, thus, the claim is indefinite. For examination purposes, the limitation of the claim will be interpreted to mean “input the input image to another inference model, and infer a texture label expressing a specific texture of each region formed in the output image…”. 

Dependent claims 3-5,7-8 are rejected as being dependent on a rejected base claim.

Claims 9 and 10 contain elements found analogous to that of claim 1. Thus, claims 9 and 10 are similarly rejected under 35 U.S.C. 112(b).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5,7-10 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (“Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018), (hereinafter Wang) in view of Men et al (“A common framework for interactive texture transfer”, Proceedings of the IEEE conference on computer vision and pattern recognition. 2018), (hereinafter Men).

Regarding claim 1, Wang teaches an image processing device (Wang, see Fig. 3) comprising: 
circuitry configured to: 
receive an input image; receive an input signal designating a partial region in the input image and a specific texture for the partial region texture for respective regions in the input image; generate, on a basis of the input image, a control signal partially adjusting an output texture in an output image to be inferred using a learned inference model (Wang, “We argue that the semantic categorical prior, i.e., knowing which region belongs to the sky, water, or grass, is beneficial for generating richer and more realistic textures. The categorical prior can be conveniently represented by semantic segmentation probability maps”, pg. 608, 2nd column, Section 3. Methodology, 2nd paragraph, lines 1-5, “A Spatial Feature Transform (SFT) layer learns a mapping function M that outputs a modulation parameter pair (                
                    γ
                    ,
                     
                    β
                
            ) based on some prior condition                 
                    Ψ
                
            . The learned parameter pair adaptively influences the outputs by applying an affine transformation spatially to each intermediate feature maps in an SR network… Meanwhile, we still keep few parameters inside each SFT layer to further adapt the shared conditions to the specific parameters and, providing fine-grained control to the features.”, pg. 609, 1st column, Section 3.1 spatial feature Transform, paragraphs 1 and 3, lines 1-6 and 10-13, respectively, “Some LR images and the corresponding segmentation results are depicted in Fig. 4.”, pg. 609, 2nd column, Section 3.1 spatial feature Transform, paragraph 5, lines 7-8, see Figs. 3 and 4, An image is first input to a segmentation model that generates probability maps representing identified texture regions in the image. The segmentation map is then used to calculate parameters associated with each texture region, acting as the control signal for the texture regions in the super-resolution processes), the output texture indicating at least two of fineness, granularity, shape properties, glossiness, transparency, shadowiness, skin fineness, matteness, irregularities or sizzling feeling (Wang, “In this work, we focus on outdoor scenes since their textures are rich and well-suited for our study. For example, the sky is smooth and lacks sharp edges, while the building is rich of geometric patterns. The water presents smooth surface with waves, while the grass has matted textures. We assume seven categories, i.e., sky, mountain, plant, grass, water, animal and building.”, pg. 610, 2nd column, Section 4. Experiments, paragraph 1, lines 1-7, The semantic classes correspond to distinct visual texture characteristic, such as, the smooth appearance of the sky, the shape properties of patterns on a building, and the matted texture of grass.); 
infer, based on the input image, the control signal and the inference model, the output image in which the partial region has the adjusted output texture (Wang, “The architecture of                 
                    
                        
                            G
                        
                        
                            θ
                        
                    
                
             is shown in Fig. 3. It consists of two streams: a condition network and an SR network. The condition network takes segmentation probability maps as input, which are then processed by four convolution layers. It generates intermediate conditions shared by all the SFT layers…The SR network is built with 16 residual blocks with the proposed SFT layers, which take the shared conditions as input and learn (                
                    γ
                    ,
                     
                    β
                
            ) to modulate the feature maps by applying affine transformation.”, pg. 610, 1st column, Section 3.2 Architecture, paragraphs 2-3, The segmentation probability map of the input image is used to guide the super-resolution network. A final output image is generated, with improved texture regions based on the defined control signal.), the inference model being obtained by performing learning based on a degraded image and a ground-truth image, the degraded image being generated by performing predetermined image processing on the ground-truth image (Wang, “During training, we followed existing studies [7, 27] to obtain LR images by down sampling HR images using MATLAB bicubic kernel... After initialization, with the same training setting, we fine-tuned our full network on outdoor scenes conditionally on the input segmentation probability maps.”, pg. 610, 2nd column, section 4. Experiments, paragraphs 2 and 3, see Fig. 4 bottom row, the super-resolution network is trained with various images, including segmented high-resolution images and corresponding down sampled low-resolution images. See labels of figure 4 corresponding to the segmented images.),

Wang does not teach receiving a user input signal designating a partial region in the input image and a specific texture for the partial region; and generate, on the basis of the input image and the user input signal, a control signal.
However, Men teaches receiving a user input signal designating a partial region in the input image and a specific texture for the partial region; and generate, on the basis of the input image and the user input signal, a control signal (Men, “Interactive texture transfer aims to generate the stylized target image from a given source image with user guidance. Users can control the shape, scale and spatial distribution of the objects to be synthesized in the target image via semantic maps.”, pg. 6355, 2nd column, 1st full paragraph, lines 1-5, “The semantic map specified by users introduces manual control to the texture transfer process. Same color labels in Ssem and Tsem manifest the similar objects with identical stylized texture. We manually produced labels via the brush  and quick selection tool of photoshop in about 30 seconds for each image. A semantic label should cover an object to naked eyes to avoid textures in one label being synthesized in another.”, pg. 6357, 2nd column, 1st full paragraph, lines 1-8, see Fig. 2, A user-specifies texture regions with labels as semantic maps, for texture synthesis.).
Wang teaches determining segmentation probability maps representing texture regions and using those maps as priors to a super-resolution model (Wang, “We argue that the semantic categorical prior, i.e., knowing which region belongs to the sky, water, or grass, is beneficial for generating richer and more realistic textures. The categorical prior can be conveniently represented by semantic segmentation probability maps P,”, pg. 608, 2nd column, 2nd full paragraph, lines 1-5). Men teaches receiving user-specified semantic maps defining texture regions (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the segmentation probability maps of Wang to include user-specified texture regions, such as the semantic maps as taught by Men (Men, pg. 6355, 2nd column, 1st full paragraph, lines 1-5, pg. 6357, 2nd column, 1st full paragraph, lines 1-8, see Fig. 2). The motivation for doing so would have been to allow users to directly define the spatial distribution of textures for each region according to user preference, thereby improving the flexibility and adaptability of the model. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Wang with Men to obtain the invention as specified in claim 1.

Regarding claim 2, Wang in view of Men teaches the image processing device according to claim 1, wherein the circuitry is further configured to: 
input the input image to another inference model, and infer a texture label expressing the specific texture of each region formed in the output image (Wang, “Segmentation probability maps as prior. We provide a brief discussion on the segmentation network we used.”, pg. 609, 1st column, Section 3.1. Spatial Feature Transform, paragraph 4, lines 1-2, “The condition network takes segmentation probability maps as input…”, pg. 610, Section. 3.2. Architecture, paragraph 2, lines 2-4, A segmentation network is used prior to the super-resolution network to identify segmentation probability maps of the image. The segmentation effectively labels the textures of the image, such as, sky, building or grass, and uses these labels to generate the output image.), the another inference model being obtained by performing learning with degraded data and ground-truth data, the degraded data being an image generated by performing the predetermined image processing on an image for learning, the ground-truth being a texture label expressing a texture of each region of the image for learning (see Wang Supplementary Material, “The network is pre-trained on the COCO dataset [8] and then fine-tuned on OutdoorSeg dataset. To better adapt to the bicubic-ed LR input in testing, we fed the network with bicubic-ed training samples during fine-tuning.”, pg. 3, lines 3-4, The fine-tuning of the segmentation network utilizes the same dataset used for training of the super-resolution network. This consists of high-resolution annotated images and corresponding down-sampled low-resolution images); and
generate the control signal on a basis of the texture label that is an inference result (Wang, “A Spatial Feature Transform (SFT) layer learns a mapping function M that outputs a modulation parameter pair (                
                    γ
                    ,
                     
                    β
                
            ) based on some prior condition                 
                    Ψ
                
            .”, pgs. 608 and 609, Section 3.1 Spatial Feature Transform, paragraph 1, lines 1-3, The control signal is generated based on the segmentation probability map, which indicates the appropriate categoric prior for each region.).

Regarding claim 3, Wang in view of Men teaches the image processing device according to claim 2, wherein a plurality of types of texture labels expressing qualitative textures and texture intensities is defined (Wang, “We argue that the semantic categorical prior, i.e., knowing which region belongs to the sky, water, or grass, is beneficial for generating richer and more realistic textures. The categorical prior can be conveniently represented by semantic segmentation probability maps P”, pg. 608, Section 3. Methodology, paragraph 2, lines 1-5, A plurality of different categories is expressed in the segmentation probability maps, such as sky, building or water. The generation of the segmentation probability maps inherently defines qualitative textures and texture intensities in order to divide the regions into corresponding class categories based on observed texture features of the image.). 

Regarding claim 4, Wang in view of Men teaches the image processing device according to claim 3, wherein the circuitry is further configured to: 
convert a texture intensity expressed by the texture label inferred as the inference result with the another inference model into a numerical value, on a basis of a likelihood; and generate the control signal indicating a type of the texture expressed by the texture label as the inference result, and the numerical value (Wang, “A Spatial Feature Transform (SFT) layer learns a mapping function M that outputs a modulation parameter pair (                
                    γ
                    ,
                     
                    β
                
            ) based on some prior condition                 
                    Ψ
                
            .”, pg. 608 and 609, Section 3.1. Spatial Feature Transform, paragraph 1, lines 1-3, The probability maps defined as priors are converted to adjustable parameters based on the texture intensity of each class. The parameters are used as a control signal to individually adjust the texture regions in the super-resolution network.).

Regarding claim 5, Wang in view of Men teaches the image processing device according to claim 4, wherein the circuitry is further configured to adjust a relationship between the texture intensity and the numerical value, in accordance with an object included in each region (Wang, “We argue that the semantic categorical prior, i.e., knowing which region belongs to the sky, water, or grass, is beneficial for generating richer and more realistic textures.”, pg. 608, Section. 3. Methodology, paragraph 2, lines 1-3, the objects in the image define the probability map. A relationship between the identified texture of each region and parameters of the super-resolution network are determined based on the type of object present in the region.).

Regarding claim 7, Wang in view of Men teaches the image processing device according to claim 1, wherein the circuitry is further configured to: 
detect an object included in the input image (Wang, “We assume seven categories, i.e., sky, mountain, plant, grass, water, animal and building. A ‘background’ category is used to encompass regions that do not appear in the aforementioned categories.”, pg. 610, 2nd column, Section 4. Experiments, paragraph 1, lines 6-9, Prior to input to the super-resolution network, the input image is subject to segmentation. This process includes categorizing regions of an image as objects.), wherein the learning of the inference model is performed by learning a coefficient that varies with each object included in the ground-truth image (Wang, “A Spatial Feature Transform (SFT) layer learns a mapping function M that outputs a modulation parameter pair (                
                    γ
                    ,
                     
                    β
                
            ) based on some prior condition                 
                    Ψ
                
            .”, pgs. 608 and 609, Section 3.1. Spatial Feature Transform, paragraph 1, lines 1-3, A texture region-dependent parameter is inferred for each object region of the image.); and
input the input image to the inference model in which a coefficient corresponding to an object included in the input image is set, and infer the output image (Wang, “Our method employs categorical priors to help capture the characteristics of each category, leading to more natural and realistic textures.”, pg. 611, Section 4.1 Qualitative Evaluation, paragraph 1, lines 14-16, the identified parameters of each object region is used to generate the corresponding textures in the output image.).

Regarding claim 8, Wang in view of Men teaches the image processing device according to claim 1, wherein the texture of each region is expressed with a texture of an object included in each region (Wang, “We argue that the semantic categorical prior, i.e., knowing which region belongs to the sky, water, or grass, is beneficial for generating richer and more realistic textures. The categorical prior can be conveniently represented by semantic segmentation probability maps”, pg. 608, 2nd column, Section 3. Methodology, paragraph 2, lines 1-5, the texture probability maps are defined by different objects, such as sky, water or grass).

Claim 9 corresponds to claim 1, reciting an image processing method (Wang, “Our method employs categorical priors to help capture the characteristics of each category, leading to more natural and realistic textures.”, pg. 611, 2nd column, Section 4.1. Qualitative Evaluation, see Fig. 3, see analysis of claim 1) to perform the functions according to claim 1. As indicated in the analysis of claim 1, Wang in view of Men teaches the image processing device according to claim 1. Therefore, claim 9 is rejected for the same reason as of obviousness as claim 1.

Claim 10 corresponds to claim 1, reciting a program for causing a computer to perform a process (Wang, “A Spatial Feature Transform (SFT) layer learns a mapping function M that outputs a modulation parameter pair (                        
                            γ
                            ,
                             
                            β
                        
                    ) based on some prior condition                         
                            Ψ
                        
                    .”, pg. 609, 1st column, Section 3.1 spatial feature Transform, paragraphs 1, lines 1-6, The mapping function of Wang is carried out through computational processes that inherently requires a program executed by a computer.) to perform the functions according to claim 1. As indicated in the analysis of claim 1, Wang in view of Men teaches the image processing device according to claim 1. Therefore, claim 10 is rejected for the same reason as claim 1.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CONNOR LEVI HANSEN whose telephone number is (703)756-5533. The examiner can normally be reached Monday-Friday 9:00-5:00 (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CONNOR L HANSEN/Examiner, Art Unit 2672

/GANDHI THIRUGNANAM/Primary Examiner, Art Unit 2672

Read full office action

Prosecution Timeline

Oct 13, 2022

Application Filed

Feb 03, 2025

Non-Final Rejection — §103, §112

Apr 21, 2025

Response Filed

Jun 16, 2025

Final Rejection — §103, §112

Jul 07, 2025

Interview Requested

Jul 14, 2025

Applicant Interview (Telephonic)

Jul 14, 2025

Examiner Interview Summary

Jul 17, 2025

Request for Continued Examination

Jul 18, 2025

Response after Non-Final Action

Aug 14, 2025

Non-Final Rejection — §103, §112

Sep 22, 2025

Interview Requested

Sep 29, 2025

Examiner Interview Summary

Sep 29, 2025

Applicant Interview (Telephonic)

Oct 06, 2025

Response Filed

Nov 10, 2025

Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/928,394

Patent 12530785

TRACKING DEVICE, TRACKING METHOD, AND RECORDING MEDIUM

2y 5m to grant Granted Jan 20, 2026

17/932,201

Patent 12524984

HISTOGRAM OF GRADIENT GENERATION

2y 5m to grant Granted Jan 13, 2026

18/152,283

Patent 12518363

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, IMAGE PROCESSING SYSTEM, AND STORAGE MEDIUM WITH PIECEWISE LINEAR FUNCTION FOR TONE CONVERSION ON IMAGE

2y 5m to grant Granted Jan 06, 2026

18/160,126

Patent 12499648

IMAGE PROCESSING APPARATUS, IMAGE CAPTURING APPARATUS, CONTROL METHOD, AND STORAGE MEDIUM FOR DETECTING SUBJECT IN CAPTURED IMAGE

2y 5m to grant Granted Dec 16, 2025

17/884,747

Patent 12482257

REDUCING ENVIRONMENTAL INTERFERENCE FROM IMAGES

2y 5m to grant Granted Nov 25, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

75%

Grant Probability

99%

With Interview (+29.2%)

2y 10m

Median Time to Grant

High

PTA Risk

Based on 28 resolved cases by this examiner. Grant probability derived from career allow rate.