Last updated: April 19, 2026

Application No. 18/485,981

METHOD OF AND APPARATUS FOR PROCESSING DIGITAL IMAGE DATA

Non-Final OA §103

Filed

Oct 12, 2023

Examiner

JIA, XIN

Art Unit

2663

Tech Center

2600 — Communications

Assignee

Robert Bosch GmbH

OA Round

1 (Non-Final)

Interview Optional

— +12.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 601 resolved cases, 2023–2026

Examiner Intelligence

JIA, XIN View full profile →

Grants 85% — above average

Career Allow Rate

510 granted / 601 resolved

+22.9% vs TC avg

Moderate +13% lift

Without

With

+12.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

23 currently pending

Career history

624

Total Applications

across all art units

Statute-Specific Performance

§101

3.2%

-36.8% vs TC avg

§103

73.2%

+33.2% vs TC avg

§102

7.8%

-32.2% vs TC avg

§112

6.3%

-33.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 601 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 4 is objected to because of the following informalities: wherein the determination the at east one further digital image is based on the plurality of hierarchical feature maps.  Appropriate correction is required.

Election/Restrictions 
Applicant’s response of 24 Nov. 2025 has been received and entered. In the response, Applicant amended claim 4 and 6-7 and added new claims 17-26, cancelled claims 2-3 and 10-13, and elected Species II without traverse, corresponding to at least Claims 4-9 and 16.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 8-9, 14-15, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kearney (PGPUB: 20210118099 A1) in view of Smith (PGPUB: 20220122305 A1).

Regarding claim 1, 14, 15. Kearney teaches a computer-implemented method of processing digital image data, comprising the following steps: 
determining, by an encoder configured to map a first digital image  (see Fig. 14 B, paragraph 238, the adversarial network 1408 utilizes a six multi-scale level deep encoder-decoder architecture. Each convolutional layer within the encoder and decoder stage of the networks may use three 3×3 convolutions paired with batch normalization and rectified linear unit (ReLU) activations), a noise prediction associated with the first digital image (see Fig. 14 A, paragraph 233, an adversarial network 1408 may receive an uncontaminated image 1410 and process the image 1410 to generate additive noise 1412 to contaminate the input image in order to deceive a victim machine learning model 1414); and 
determining, by the generator of the GAN system, at least one further digital image based on the noise prediction associated with the first digital image (see Fig. 14 B, The resulting high-resolution output channels may be passed through a 1×1 convolutional layer and hyperbolic tangent activation function to produce adversarial noise 1412, which may be in the form of an image, where each pixel is the noise to be added to the pixel at that position in the input image 1410. At each iteration, the adversarial noise 1412 may be added to an image 1410 from a repository of training data entries to obtain the contaminated input image 1402. The contaminated input image 1402 may then be processed using the victim model 1414. The training algorithm may update model parameters of the adversarial network 1408 according to the loss function 1418. In some embodiments, the loss function 1418 is a function of mean squared error (MSE) of the adversarial noise 1412 and inverse cross entropy loss of the victim prediction 1416 relative to an accurate prediction associated with the input image 1408).
However, Kearney does not expressly teach an extended latent space associated with a generator of a generative adversarial network (GAN) system and a plurality of latent variables associated with the extended latent space.
Smith teaches that an input image is processed by a pipeline of an image editing system including an encoder and generator. The encoder processes the input image to produce a latent space representation of the input image (see paragraph 43); The projection pipeline 200 includes an encoder 206 and a generator 210. In the projection pipeline 200, an input image 202 is encoded using the encoder 206 to produce a latent space representation w 208, which is then optimized using a combination of pixel loss 212, latent loss 216, and perceptual loss 218, resulting in an optimized latent space representation w_opt 228 (see Fig. 2, paragraph 92);an edit management subsystem 120 configures edits to the input image 106 using an edit configurer 122 and a feedback generator 124. A projection subsystem 110 generates a latent space representation 113 representing the input image 106. A latent code transformer 114 generates a modified latent space representation 117 by applying one or more transformations, including the edits configured by the edit management subsystem 120, to the latent space representation 113 of the input image. An image generation subsystem 130 includes a generator 132 that generates an image according to the transformed latent space representation 117 (see Fig. 1, paragraph 70). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kearney by Smith to obtain that an input image is processed by a pipeline of an image editing system including an encoder and generator. The encoder processes the input image to produce a latent space representation of the input image, in order to provide an extended latent space associated with a generator of a generative adversarial network (GAN) system; to obtain that an edit management subsystem 120 configures edits to the input image 106 using an edit configurer 122 and a feedback generator 124. A projection subsystem 110 generates a latent space representation 113 representing the input image 106. A latent code transformer 114 generates a modified latent space representation 117 by applying one or more transformations, including the edits configured by the edit management subsystem 120, to the latent space representation 113 of the input image. An image generation subsystem 130 includes a generator 132 that generates an image according to the transformed latent space representation 117, in order to provide a plurality of latent variables associated with the extended latent space. Therefore, combining the elements from prior arts according to known methods and technique would yield predictable results.

Regarding claim 8. The combination teaches the method according to claim 1, further comprising: 
combining the noise prediction associated with the first digital image with a style prediction of a second digital image (see Kearney, Fig. 38, paragraph 452, the input image 3812 may already be a high resolution image (e.g., 1024×1024 instead of 256×256) such that the dimensions of the stages of the encoder 3804b and decoder 3806b are larger. The image may also be contaminated with noise such as gaussian noise, salt and pepper noise, contrast, shadowing noise, or learned noise with a separate machine learning model; the masks 3814 for the input image 3812 may be combined with the output of each stage of the decoder 3806b and the combination may be used as the input to the next stage of the decoder 3806b); and 
generating a further digital image using the generator based on the combined noise prediction associated with the first digital image and the style prediction of the second digital image (see Kearney, Fig. 38, paragraph 436, Each convolutional stage within the encoder 3804 and decoder 3806 of the networks may use 4×4 convolutions paired with batch normalization and rectified linear unit (ReLU) activations. Convolutional downsampling may be used to downsample the output of each multi-scale stage of the encoder 3804. The output of the last stage of the encoder 3804 may be fed into the decoder 3806 to control stylistic variation captured by the resulting synthetic output image 3816 output by the decoder 3806 for a given input image 3812 input to the encoder 3804).

Regarding claim 9. The combination teaches the method according to claim 1, further comprising: providing the noise prediction associated with the first digital image; 
providing different sets of latent variables characterizing different styles to be applied to a semantic content of the first digital image (see Smith,  Fig. 1, paragraph 80, the mapper/augmenter 114B includes functionality to map the latent space representation 113 from one latent space to another. For example, the encoder 112 generates a latent code in a first space, Z space, and the mapper/augmenter 114B applies a mapping to transform the latent code from the Z space to a second space, W space); and 
generating a plurality of digital images with the different styles using the generator based on the noise prediction associated with the first digital image and the different sets of latent variables characterizing the different styles (see Smith,  Fig. 1, paragraph 80, this mapping is executed in some implementations to facilitate image editing by transforming the latent space such that movement in the latent space smoothly correlates with changes to one or more target attributes. As an example, in the W space, incrementing the latent variable in a particular direction continuously makes hair color lighter in an image while maintaining the overall look of the image. In the Z space, such smooth changes with direction in the latent space are not always possible, as the Z space is more “entangled.”).

Regarding claim 16. The combination teaches the method according to claim 1, further comprising using the method for at least one of the following:
a) determining at least one further digital image based on the noise prediction associated with the first digital image and the plurality of latent variables associated with the extended latent space, at least some of the plurality of latent variables being associated with another image and/or other data than the first digital image, 
b) transferring a style from a second digital image to the first digital image, while preserving a content of the first digital image, 
c) disentangling style and content of at least one digital image. 
d) creating different stylized digital images with unchanged content, based on the first digital image and a style of at least one further second digital image, 
e) using or re-using labelled annotations for stylized images, 
f) avoiding annotation work when changing a style of at least one digital image,
g) generating perceptually realistic, digital images with different styles, 
h) providing proxy validation sets for testing out-of-distribution generalization of a neural network system, 
i) training a machine learning system (see Kearney, Fig. 3, paragraph 84, A training algorithm 302 takes as inputs training data entries that each include an image 304 according to any of the imaging modalities described herein and an orientation label 306 indicating the orientation of the image, e.g. 0 degrees, 90 degrees, 180 degrees, and 270 degrees. The orientation label 306 for an image may be assigned by a human observing the image and determining its orientation), 
j) testing a machine learning system, 
k) verifying a machine learning system, 
l) validating a machine learning system, 
m) generating training data for a machine learning system, 
n) data augmentation of existing image data, 
o) improving a generalization performance of a machine learning system, 
p) flexibly manipulating image styles without a training associated with multiple data sets, 
q) utilizing an encoder GAN pipeline to manipulate image styles, 
r) embedding, by the encoder, information associated with an image style into intermediate latent variables, 
s) mixing styles of digital images for generating at least one further digital image including a style based on the mixing. 


Claim(s) 4 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kearney (PGPUB: 20210118099 A1) in view of Smith (PGPUB: 20220122305 A1), and further in view of Zhu (PGPUB: 20230100413 A1).

Regarding claim 4. The combination teaches the method according to claim 1, further comprising
determining a plurality of feature maps from the first digital image (see Smith, Fig. 1, paragraph 86, the generator 132 is attached to one or more auxiliary networks 133A, 133B. Although two auxiliary networks 133A and 133B are pictured, more or fewer auxiliary networks may be implemented. The auxiliary networks 133A and 133B are neural networks attached to selected layers of the generator 132. The auxiliary networks 133A and 133B are trained to output a reduced-resolution version of the ultimate GAN output 139 using intermediate feature vectors extracted from the intermediate layers of the generator 132. These reduced-resolution preview images 135 are transmitted to the feedback generator 124 for further processing),
wherein the determination of the at east one further digital image is based on the plurality of hierarchical feature maps (see Smith, paragraph In a training process, the generator neural network learns to map points in the latent space to specific output images. Such interpretation by the generator neural network gives structure to the latent space, which varies according to the generator used. For a given generator neural network, the latent space structure can be analyzed and traversed to control image generation; see Smith, Fig. 7 and 8, paragraph 148).
However, the combination does not expressly teach hierarchical feature maps.
Zhu teaches that the process 900 includes generating, by a plurality of encoder transformer layers of an encoder sub-network using the plurality of patches as input, a frame of encoded image data. In some examples, the frame of encoded image data is a latent representation of image data. In some cases, the latent representation is a hierarchical feature map generated by the plurality of encoder transformer layers of the encoder sub-network. In some examples, the process 900 can include generating, by the plurality of encoder transformer layers of the encoder sub-network using the plurality of patches as input, a hierarchical feature map for the segmented frame. The process 900 can include generating the frame of encoded image data from the hierarchical feature map (see Fig. 9, paragraph 155).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Zhu to obtain that the process 900 can include generating, by the plurality of encoder transformer layers of the encoder sub-network using the plurality of patches as input, a hierarchical feature map for the segmented frame. The process 900 can include generating the frame of encoded image data from the hierarchical feature map, in order to provide hierarchical feature maps. Therefore, combining the elements from prior arts according to known methods and technique would yield predictable results.

Regarding claim 17. The combination teaches the method according to claim 4, further comprising 
determining the plurality of latent variables associated with the extended latent space for the first digital image (see Smith, Fig. 1, paragraph 80, the encoder 112 generates a latent code in a first space, Z space, and the mapper/augmenter 114B applies a mapping to transform the latent code from the Z space to a second space, W space. This mapping is executed in some implementations to facilitate image editing by transforming the latent space such that movement in the latent space smoothly correlates with changes to one or more target attributes. As an example, in the W space, incrementing the latent variable in a particular direction continuously makes hair color lighter in an image while maintaining the overall look of the image).
However, the combination teaches digital image based on the plurality of hierarchical feature maps.
Zhu teaches that the frame of encoded image data is a latent representation of image data. In some cases, the latent representation is a hierarchical feature map generated by the plurality of encoder transformer layers of the encoder sub-network. In some examples, the process 900 can include generating, by the plurality of encoder transformer layers of the encoder sub-network using the plurality of patches as input, a hierarchical feature map for the segmented frame. The process 900 can include generating the frame of encoded image data from the hierarchical feature map (see Fig. 9, paragraph 155).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Ahu to obtain the process 900 can include generating, by the plurality of encoder transformer layers of the encoder sub-network using the plurality of patches as input, a hierarchical feature map for the segmented frame. The process 900 can include generating the frame of encoded image data from the hierarchical feature map, in order to provide digital image based on the plurality of hierarchical feature maps. Therefore, combining the elements from prior arts according to known methods and technique would yield predictable results.

Regarding claim 18. (New) The method according to claim 17, wherein at least some of the plurality of latent variables determined from the plurality of hierarchical maps characterize at least one of the following aspects of the first digital image: 
a) a style including a non-semantic appearance, 
b) a texture, 
c) a color (see Smith, Fig. 3, paragraph 110, perceptual features are visually representable properties of objects, such as size, shape, color, position, facial expression, etc. These perceptual features are compared, for the input image and the initial output image (e.g., a first initial output image and/or updated initial output images generated at 308)).

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kearney (PGPUB: 20210118099 A1) in view of Smith (PGPUB: 20220122305 A1), and further in view of Epstein (US-PAT-NO: 12136151 B2 ).

Regarding claim 5. The combination does not expressly teach the method according to claim 1, further comprising: randomly and/or pseudo-randomly masking at least a portion of the noise prediction associated with the first digital image.
Epstein teaches that the digital image collaging system 102 performs an act 202 to identify a noise vector. In particular, the digital image collaging system 102 identifies, receives, or generates a noise vector by determining a random sample of data such as pixel data or other digital image data. In certain implementations, the noise vector is a Gaussian noise vector. In some embodiments, the digital image collaging system 102 randomly samples a digital image or randomly generates a vector of data for input into a digital image collaging neural network (e.g., the digital image collaging neural network 116) (see Fig. 2, Col. 9, lines 31-41); illustrates an example series of acts 900 for generating a combined digital image utilizing a digital image collaging neural network. In particular, the series of acts 900 includes an act 902 of determining a scene layout. For example, the act 902 involves determining a scene layout from a noise vector by utilizing a mask generator neural network to generate a digital image mask. In some cases, the act 902 involves utilizing the mask generator neural network to generate, from the noise vector, a digital image mask that indicates one or more regions of the combined digital image to mask (see Fig. 5 and 9, Col. 21, lines 17-27); the series of acts includes an act of generating encoded features for the combined digital image utilizing the encoder neural network. The act 902 can thus include determining a scene layout from the encoded features by utilizing the mask generator neural network to generate a digital image mask. Generating the encoded features can involve extracting the encoded features from a noise vector utilizing the encoder neural network. In one or more implementations, the act 902 involves utilizing the mask generator neural network to generate, from the encoded features, the digital image mask indicating masked regions and unmasked regions for the combined digital image (see Fig. 5 and 9, Col. 21, lines 28-40).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination by Epstein to obtain the act 902 involves determining a scene layout from a noise vector by utilizing a mask generator neural network to generate a digital image mask. In some cases, the act 902 involves utilizing the mask generator neural network to generate, from the noise vector, a digital image mask that indicates one or more regions of the combined digital image to mask, in order to provide randomly and/or pseudo-randomly masking at least a portion of the noise prediction associated with the first digital image. Therefore, combining the elements from prior arts according to known methods and technique would yield predictable results.

Allowable Subject Matter
Claims 6-7 and 19-26 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN JIA whose telephone number is (571)270-5536. The examiner can normally be reached 9:00 am-7:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory Morse can be reached at (571)272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIN JIA/           Primary Examiner, Art Unit 2663

Read full office action

Prosecution Timeline

Oct 12, 2023

Application Filed

Feb 08, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/129,307

Patent 12602786

FREE FLUID ESTIMATION

2y 5m to grant Granted Apr 14, 2026

18/409,907

Patent 12602782

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM

2y 5m to grant Granted Apr 14, 2026

18/521,356

Patent 12602923

METHODS AND APPARATUS TO PROVIDE AN EFFICIENT SAFETY MECHANISM FOR SIGNAL PROCESSING HARDWARE

2y 5m to grant Granted Apr 14, 2026

18/504,878

Patent 12597137

DIGITAL SYNTHESIS OF HISTOLOGICAL STAINS USING MULTIPLEXED IMMUNOFLUORESCENCE IMAGING

2y 5m to grant Granted Apr 07, 2026

18/301,624

Patent 12592311

SYSTEMS AND METHODS FOR PERFORMING OPTIMAL ANCHOR-PRIOR MATCHING OPERATIONS

2y 5m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

85%

Grant Probability

98%

With Interview (+12.8%)

2y 6m

Median Time to Grant

Low

PTA Risk

Based on 601 resolved cases by this examiner. Grant probability derived from career allow rate.

METHOD OF AND APPARATUS FOR PROCESSING DIGITAL IMAGE DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email