Last updated: May 29, 2026
Application No. 18/417,916
IMAGE PROCESSING METHOD AND APPARATUS, COMPUTER, READABLE STORAGE MEDIUM, AND PROGRAM PRODUCT

Non-Final OA §103
Filed
Jan 19, 2024
Priority
Aug 12, 2022 — CN 202210967272.3 +1 more
Examiner
CHEN, BIAO
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Tencent Technology (Shenzhen) Company Limited
OA Round
2 (Non-Final)
Interview Optional

— +26.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 84% grant rate with +26.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 32 resolved cases, 2023–2026
Examiner Intelligence

CHEN, BIAO View full profile →
Grants 84% — above average
Career Allowance Rate
27 granted / 32 resolved
+22.4% vs TC avg
Strong +26% interview lift
Without
With
+26.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
87.1%
+47.1% vs TC avg
§102
2.2%
-37.8% vs TC avg
§112
8.6%
-31.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 32 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Office Action is in response to Applicant’s amendment/response filed on 12/12/2025, which has been entered and made of record. Applicant’s amendments to the Specification previously set forth in the Non-Final Office Action mailed 09/17/2025. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Naruniec et al. (High-Resolution Neural Face Swapping for Visual Effects, Eurographics Symposium on Rendering 2020, Volume 39 (2020), Number 4, pp. 1-16, hereinafter “Naruniec”) in view of Fchollet (Transfer learning & fine-tuning, archive.org, https://web.archive.org/web/20220622141018/https://keras.io/guides/transfer_learning/, hereinafter “Fchollet”).

Regarding claim 1, Naruniec discloses A method for generating an image processing model performed by a computer device, the method comprising: (page 1, Abstract, “this is the first method capable of rendering photo-realistic and temporally coherent results at megapixel resolution”; page 4, col. right, para. 1, “All the models were trained on a single NVIDIA 1080Ti GPU workstation (Intel R CoreTM i7-6700K CPU @ 4.00GHz)”).
obtaining a first source image sample, a first template image sample, (page 3, col. right, Figures 2-3:

    PNG
    media_image1.png
    278
    554
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    190
    654
    media_image2.png
    Greyscale
, 
“A schematic of the full pipeline for swapping a source face of identity s onto a person 
    PNG
    media_image3.png
    14
    40
    media_image3.png
    Greyscale
. In steps (1) and (2) we preprocess the input by cropping and normalizing the face. In step (3) the pre-processed image is fed into the common encoder and decoded with corresponding decoder Ds. In (4) we use our multiband blending to swap the target with the source face … Single-encoder, multi-decoder network architecture”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds to an individual identity. We normalize all available examples to 1024_1024 resolution. Note that in the progressive regime, these images will be downsized in the initial stages of training, while 1024_1024 is the final resolution (see the appendix for details)”). Note that: (1) The training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulate the datasets of lower resolution input data for the corresponding lower resolutions (e.g., 256x256 and 512x512); and (2) At a lower resolution level l (e.g., l = 6; 256x256), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a first template image sample, while the corresponding down-sampled image xs version after cropping and normalization in Figure 2 can be regarded as a first source image sample, while the corresponding down-sampled image xs version after cropping and normalization in Figure 2 can be regarded as a first source image sample. and a first standard synthesized image to form a first training dataset including images having a first resolution and a first quantity; (page 5, col. left, paras. 3-6, “Multi-band blending [BA83], as recently used by Thies et [TZN*15] in the context of face-image compositing is a competing approach to Poisson blending … With this in mind, we copy the two coarsest (i.e. low-frequency) levels of the target’s Laplacian pyramid and blend only the remaining, more detailed levels. The final image is then obtained by reconstructing from the manipulated and blended Laplacian pyramid. We also enforce that the boundary smoothing effect is propagated only into the interior of the face … Therefore, we additionally align the amount of contrast in the generated source face to match the contrast of the target”). Note that: (1) a contrast-preserving, multi-band compositing method is used to composite or reconstruct the first standard synthesized. The first source image sample and the first template image sample at the first resolution can be decomposed into the pyramid Laplacian representations, respectively. The Laplacian representations are manipulated or blended at the corresponding levels. The final image (the first standard synthesized image) is then obtained by reconstructing from the blended Laplacian pyramid.
The final image (the first standard synthesized image as the truth (face)) is then obtained by reconstructing from the blended Laplacian pyramid; and (2) the combination of the first source image sample, the first template image sample, and the first standard synthesized image forms a first training dataset including images having low resolution at a lower resolution level l  (e.g., l = 6; 256x256) as a first resolution and the original number of high resolution (1024x1024) images as a first quantity. 
performing parameter adjustment on an initial image fusion model by using the first training dataset (page 3, col. right, Figures 2-3: 


    PNG
    media_image1.png
    278
    554
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    190
    654
    media_image2.png
    Greyscale
, “A schematic of the full pipeline for swapping a source face of identity s onto a person 
    PNG
    media_image3.png
    14
    40
    media_image3.png
    Greyscale
. In steps (1) and (2) we preprocess the input by cropping and normalizing the face. In step (3) the pre-processed image is fed into the common encoder and decoded with corresponding decoder Ds. In (4) we use our multiband blending to swap the target with the source face … Single-encoder, multi-decoder network architecture”; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds
to an individual identity”; page 4 / col. left / para. 5 – page 4 / col. right / para. 1, “ 

    PNG
    media_image4.png
    166
    554
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    456
    558
    media_image5.png
    Greyscale
”). Note that: (1) Naruniec’s Figure 3 shows a neural network with a single encoder and multiple decoders. The neural network by design without training can be regarded as an initial image fusion model for the face swapping function; (2) The training of the neural network is a process in which the parameters of the neural networks are updated or adjusted, so the training is equivalent to performing parameter adjustment for the neural network; (3) According to Naruniec’s Table 1, the encoder have its layers for 0-8 levels as a shared encoder to encode the shared different features, while the multiple decoders can export different images related data with corresponding resolution levels (e.g. low resolution of 256x256 through high resolution of 1024x1024); and (4) The training itself is using a progressive regime to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets; performed, the partially trained neural network can be regarded as a first parameter adjustment model. to obtain a first parameter adjustment model. Note that: after the first stage training at the corresponding resolution level has been performed, the partially trained neural network can be regarded as a first parameter adjustment model.
Inserting a first resolution update layer into the first parameter adjustment model, to obtain a first update model (Figures 2-3; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 13, col. left, para. 3, “Our model is trained in a progressive regime, starting from coarse, low-resolution 4x4 pixel images and then gradually expanding the network’s capacity as higher-resolution images are used for training, up to 1024x1024 pixels. The base architecture, which focuses on the lowest-resolution data, corresponds to “level 0” in Figures 3 and 13 and Table 1. Each new “level” of the network doubles input and output resolution by adding a composition of two convolutional layers and a down- or up-scaling layer in the encoder and decoder, respectively”). Note that: (1) According to the progressive training regime, the first parameter adjustment model (i.e., the trained neural network using the first template image sample / the first source image sample l the first standard synthesized image at resolution level l) has already had all resolution-related layers at the resolution levels up to 8. However, the encoder’s layers corresponding to level l or lower are trained while other encoder’s layers at levels higher than l only behave like units or operators that leave unchanged with corresponding parameters without training or parameter changes. In the same way, the decoders’ layers corresponding to level l or lower are trained while other decoder’s layers at level higher than l only behave like units or operators that leave unchanged with corresponding parameters without training or parameter changes; and (2) enabling the encoder’s layer corresponding to resolution level l+1 and the decoders’ layer corresponding to level l+1 for next model training stage is equivalent to inserting a first resolution update layer into the first parameter adjustment model. Meanwhile, the trained encoder’s layers at resolution level l or lower and decoder’s layers at resolution level l or lower will be unchanged. including a first plurality of convolution layers and the first resolution update layer concatenated after the first plurality of convolution layers; (page 13, col. left, para. 3, “The base architecture, which focuses on the lowest-resolution data, corresponds to “level 0” in Figures 3 and 13 and Table 1. Each new “level” of the network doubles input and output resolution by adding a composition of two convolutional layers and a down- or up-scaling layer in the encoder and decoder, respectively”; page 14, Table 1, “Detailed description of our encoder (left) and decoder (right). For the Leaky rectified unit (LeakyReLU) we use a = 0:2.”, for the encoder at level l there are a Downsample layer and 2-3 convolution layers while for the decoder at the same level there are an Upsample layer and 2-3 convolution layers). Note that: (1) the first resolution update layer can be regarded as a layer that is enabled at level l (=6); and (2) before this layer there are other convolution layers that can be regarded as a first plurality of convolution layers, resulting in the first resolution update layer concatenated after the first plurality of convolution layers.
obtaining a second source image sample, a second template image sample, and a second standard synthesized image to form a second training dataset including images having a second resolution and a second quantity, wherein the second resolution is greater than the first resolution, wherein the second quantity is smaller than the first quantity; Note that: (1) The training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulate the datasets of lower resolution input data for the corresponding lower resolution levels (e.g., 256x256 and 512x512). At a resolution level l (e.g., l = 7; 512x512), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a second template image sample, while the corresponding down-sampled image xs version after cropping and normalization can be regarded as a second source image sample, and the corresponding image reconstructed from the manipulated and blended Laplacian pyramid using the Laplacian pyramid decompositions of the second template image sample the second source image sample as explained above, can be regarded as a second standard synthesized image as the truth (face). The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets; (2) the second resolution (512x512 at level 7) is greater than the first resolution (256x256 at level 6); and (3) it is obvious to one having ordinary skills in the art that a second quantity can be a number smaller than the first quantity when one just uses a smaller subset of the down-sampled images by a random selection from all original high resolution (1024x1024) images in order to speed up the training process.
… performing parameter adjustment on the first update model by: … Note that: The training is using a progressive regime. The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets while adjusting the parameters.
inserting a second resolution update layer into the second parameter adjustment model, to obtain a second update model; and Note that: (1) According to the progressive training regime, the second parameter adjustment model (i.e., the partially trained neural network using the second template image sample / the second source image sample l the second standard synthesized image at resolution level l) has already had all resolution-related layers at the resolution levels up to 8. However, the encoder’s layers corresponding to level l (e.g., l=6; 256x256) or lower are not trained while other encoder’s layers at levels higher than l behave also like dummies without training. The decoders’ layer corresponding to level l+1 (e.g., 512x512) is trained while other decoder’s layers at levels higher than l+1 only behave like dummies without enabling and training and the layers at level lower than l+1 are not trained; and (2) enabling the decoders’ layer corresponding to level l+2 (e.g., 1024x1024) for next model training stage is equivalent to inserting a second resolution update layer into the second parameter adjustment model. Meanwhile, the parameters of the trained encoder’s layers at resolution level l or lower and decoder’s layers at resolution level l+1 or lower will be unchanged.
performing parameter adjustment on the second update model by using a third source image sample and a third template image sample, and a third standard synthesized image, to obtain a target image fusion model configured to fuse an object in one image into another image. Note that: (1) The training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulated the datasets of lower resolution input data for the corresponding lower resolution levels (e.g., 256x256 and 512x512). At a resolution level l (e.g., l = 7; 512x512), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a third template image, while the corresponding down-sampled image xs version after cropping and normalization can be regarded as a third source image sample, and the corresponding image reconstructed from the manipulated and blended Laplacian pyramid using the Laplacian pyramid decompositions of the third template image sample the third source image sample as explained above, can be regarded as a third standard synthesized image as the truth (face). The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets; (2) after the third stage training at the corresponding resolution level has been performed, the trained neural network can be regarded as a target image fusion model adjustment model; and (3) the target image fusion model’s decoders can export synthesized image to swap the target with the source face.
However, Naruniec fails to disclose, but in the same art of neural network technology, Fchollet discloses
using the second training dataset to adjust parameters in the first resolution update layer while fixing parameters in the first plurality of convolution layers, to obtain a first layer adjustment model, and (Fchollet, page 1, paras. 1-3, “Transfer learning consists of taking features learned on one problem, and leveraging them on a new, similar problem … The most common incarnation of transfer learning in the context of deep learning is the following workflow: Take layers from a previously trained model.  Freeze them, so as to avoid destroying any of the information they contain during future training rounds. Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.  Train the new layers on your dataset”). Note that: (1) by “transfer learning”, parameters in the first plurality of convolution layers can be fixed; (2) the inserted first resolution update layer can be trained by adjusting the corresponding parameters with the second training dataset; (3) after the network has been trained, a first layer adjustment model is obtained.
using the second training dataset to adjust all parameters in the first layer adjustment model, to obtain a second parameter adjustment model; (Fchollet, page 1, para. 4, “optional step, is fine-tuning, which consists of unfreezing the entire model you obtained above (or part of it), and re-training it on the new data with a very low learning rate. This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data”). Note that: (1) by “fine-tuning”, all parameters of the neural network (the first layer adjustment model) can be unfreezing and open for training; (2) the neural network can be trained by adjusting the corresponding parameters with the second training dataset; (3) after the neural network has been trained, a second parameter adjustment model is obtained.
Naruniec and Fchollet are in the same field of endeavor, namely neural network technology. Before the effective filing date of the claimed invention, it would have been obvious to apply transfer learning and fine-tuning, as taught by Fchollet into Naruniec. The motivation would have been “This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data” (Fchollet, page 1, para. 4). The suggestion for doing so would allow to train specific neural network layers separately and achieve meaningful improvements by incrementally adapting the pretrained features to the new data. Therefore, it would have been obvious to combine Naruniec and Fchollet.

Regarding claim 2, Naruniec in view of Fchollet discloses The method according to claim 1, wherein the performing parameter adjustment on the second update model by using a third source image sample and a third template image sample, and a third standard synthesized image, to obtain a target image fusion model configured to fuse an object in one image into another image comprises: 
inputting the first source image sample and the first template image sample into the initial image fusion model to obtain a first predicted synthesized image; and (Naruniec, page 3, col. right, Figures 2-3:


    PNG
    media_image1.png
    278
    554
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    190
    654
    media_image2.png
    Greyscale
, “A schematic of the full pipeline for swapping a source face of identity s onto a person 
    PNG
    media_image3.png
    14
    40
    media_image3.png
    Greyscale
. In steps (1) and (2) we preprocess the input by cropping and normalizing the face. In step (3) the pre-processed image is fed into the common encoder and decoded with corresponding decoder Ds. In (4) we use our multiband blending to swap the target with the source face … Single-encoder, multi-decoder network architecture”; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds
to an individual identity”). Note that: (1) As explained for claim 1 above, the training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulate the datasets of lower resolution input data for the corresponding lower resolutions (e.g., 256x256 and 512x512). At a lower resolution level l (e.g., l = 6; 256x256), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a first template image, while the corresponding down-sampled image xs version after cropping and normalization can be regarded as a first source image sample; and (2) during the training, the output image from the neural network with the first template image sample and the first source image sample as the inputs can be regarded as a first predicated synthesized image as the predicted face. The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets.
performing parameter adjustment on the initial image fusion model by using the first predicted synthesized image and the first standard synthesized image, to obtain the first parameter adjustment model. (Naruniec, page 4 / col. left / para. 5 – page 4 / col. right / para. 1, “ 

    PNG
    media_image4.png
    166
    554
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    456
    558
    media_image5.png
    Greyscale
”). Note that: (1) The training of the neural network is a process in which the parameters of the neural network are updated or adjusted, so the training is equivalent to performing parameter adjustment for the neural network. The training itself is to minimize the SSIM or MS-SSIM between the first predicted synthesized image and the first standard synthesized image at the corresponding level; and (2) the trained neural network is the first parameter adjustment model.

Regarding claim 3, Naruniec in view of Fchollet discloses The method according to claim 1, further comprising:
performing resolution enhancement processing on the first standard synthesized image, to obtain the second standard synthesized image at a third resolution greater than the first resolution. (Naruniec, page 5, col. left, paras. 3-6, “Multi-band blending [BA83], as recently used by Thies et [TZN*15] in the context of face-image compositing is a competing approach to Poisson blending … With this in mind, we copy the two coarsest (i.e. low-frequency) levels of the target’s Laplacian pyramid and blend only the remaining, more detailed levels. The final image is then obtained by reconstructing from the blended Laplacian pyramid. We also enforce that the boundary smoothing effect is propagated only into the interior of the face … Therefore, we additionally align the amount of contrast in the generated source face to match the contrast of the target”). Note that: (1) a contrast-preserving, multi-band compositing method is used to composite or reconstruct the standard synthesized image at the original high resolution (1024x1024). An original high resolution source image sample and an original high resolution template image sample can be decomposed into the pyramid Laplacian representations, respectively. After manipulations and blending of the presentations, the standard synthesized image at the original high resolution (1024x1024) can be reconstructed; (2) the first standard synthesized image has a resolution level l (e.g., l = 6; 256x256) while the second standard synthesized has a resolution level l+1 (e.g., 512x512) as a third resolution greater than the first resolution (e.g. 256x256); and (3) since the first standard synthesized image can be down-sampled from the second standard synthesized image that is down-sampled to the resolution level l+1 (512x512) from the standard synthesized image at the original high resolution (1024x1024), obtaining the second standard synthesized image at a third resolution (512x512) is equivalent to performing resolution enhancement on or up-scaling the first standard synthesized image.

Regarding claim 4, Naruniec in view of Fchollet discloses The method according to claim 1, further comprising: 
performing resolution enhancement processing on the first source image sample, to obtain the second source image sample at the second resolution; (Naruniec, Figures 2-3; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds to an individual identity”). Note that: (1) When the second resolution (e.g., 512x512) is greater than the first resolution (e.g., 256x256), the second source image sample at the second resolution (e.g., 512x512) can be obtained by down-sampling the corresponding training image data to the resolution level l=7 (512x512) from the original high resolution 1024x1024; and (2) since the first source image sample is down-sampled to the resolution level l=6 (256x256) from the original high resolution 1024x1024, obtaining the second source image sample at the second resolution (512x512) is equivalent to performing resolution enhancement processing on or upscaling the first source image sample.
performing resolution enhancement processing on the first template image sample, to obtain the second template image sample at the second resolution; and Note that: (1) When the second resolution (e.g., 512x512) is greater than the first resolution (e.g., 256x256), the second template image sample at the second resolution (e.g., 512x512) can be obtained by down-sampling the corresponding training image data to the resolution level l=7 (512x512) from the original high resolution 1024x1024; and (2) since the first template image sample is down-sampled to the resolution level l=6 (256x256) from the original high resolution 1024x1024, obtaining the second template image sample at the second resolution (512x512) is equivalent to performing resolution enhancement processing on or upscaling the first template image sample.
performing resolution enhancement processing on the first standard synthesized image, to obtain the second standard synthesized image at a third resolution greater than the first resolution. Note that: (1) When the second resolution (e.g., 512x512) is greater than the first resolution (e.g., 256x256), the second standard synthesized image at the second resolution (e.g., 512x512) can be obtained by down-sampling the corresponding synthesized image obtained at original high resolution to the resolution level l=7 (512x512) from the original high resolution 1024x1024; and (2) since the first standard synthesized image is down-sampled to the resolution level l=6 (256x256) from the original high resolution 1024x1024, obtaining the second standard synthesized image at the second resolution (512x512) is equivalent to performing resolution enhancement processing on or upscaling the first standard synthesized image.

Regarding claim 5, Naruniec in view of Fchollet discloses The method according to claim 1, wherein the performing parameter adjustment on the second update model by using a third source image sample and a third template image sample, and a third standard synthesized image, to obtain a target image fusion model comprises: 
performing parameter adjustment on the second resolution update layer in the second update model by using the third source image sample, the third template image sample, and the third standard synthesized image, to obtain a third parameter adjustment model; and (Naruniec, Figures 2-3; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds to an individual identity”). Note that: (1) The training of the neural network is a process in which the parameters are updated or adjusted, so the training is equivalent to performing parameter adjustment for the neural network; (2) as explained for claim 1 above, the third template image sample and the third source image sample have a resolution of 512x512, and this 512x512 solution can be regarded as a fourth resolution; (3) The parameters of the corresponding enabled resolution layer of the (e.g., 1024x1024) and the corresponding encoder’s layers (e.g., 512x512) are updated or adjusted during the training while the parameters of other layers are not changed; and (4) the trained model can be regarded as a third image fusion model.   
performing fine-tuning on the third parameter adjustment model by using a fourth source image sample, a fourth template image sample, and a fourth standard synthesized image, to obtain the target image fusion model. Note that: (1) The training of the neural network is a process in which the parameters are updated or adjusted, so the training is equivalent to performing parameter adjustment for the neural network. It is known that the training can also have a tuning up stage to further adjust or update the parameters of a pretrained model (the third image fusion model); and (2) a template image sample, a source image sample, and a standard synthesized image of the original high resolution (1024x1024) training data can be regarded as  a fourth source image sample, a fourth template image sample, and a fourth standard synthesized image as inputs and the truth for the neural network, respectively for further tuning-up the parameters of the corresponding encoder’s and decoders’ allowable layers (e.g., 1024x1024) while other layers may be unchanged.

Regarding claim 6, Naruniec in view of Fchollet discloses The method according to claim 5, wherein the third source image sample and the third template image sample both have a fourth resolution, and the third standard synthesized image, the fourth source image sample, the fourth template image sample, and the fourth standard synthesized image all have a fifth resolution that is greater than or equal to the fourth resolution. Note that: as explained for claim 5 above, the third source image sample and the third template image sample both have a fourth resolution (512x512), and the fourth source image sample, the fourth template image sample, and the fourth standard synthesized image all have a resolution 1024x1024. The resolution 1024x1024 can be regarded as a fifth resolution 1024x1024 that is greater than the fourth resolution 512x512.

Regarding claim 7, Naruniec in view of Fchollet discloses The method according to claim 1, wherein the first standard synthesized image is generated by: 
obtaining a first source input image and a first template input image; (Naruniec, page 4, col. left, paras. 2-4, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds to an individual identity. We normalize all available examples to 1024x1024 resolution. Note that in the progressive regime, these images will be downsized in the initial stages of training, while 1024x1024 is the final resolution (see the appendix for details). Note that: P subsets with respective identities include all input image at a high resolution 1024x1024. One of the input images with a first identity can be regarded a first source input image. Another one of the input images with a different identity can be regarded as a first template image. 
performing target object detection on the first source input image, to obtain a target object region corresponding to a target object type in the first source input image, and cropping the target object region in the first source input image, to obtain the first source image sample at a first resolution;
detecting the first template input image, to obtain a to-be-fused region corresponding to a target object type in the first template input image, and cropping the to-be-fused region in the first template input image, to obtain the first template image sample at the first resolution; (Naruniec, page 4, col. left, paras. 2-4, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training …This process is performed by applying an affine transformation to the face image, which aligns the position of a set of defined localized landmarks to the average landmark locations at the desired resolution. In our implementation, we use outer eye corners, outer nose points, and outer mouth corners from the standard 68 landmark point set as our reference. For each normalized face image we create a binary mask mXp that is used during the training process. This mask is delimited by the convex hull of the set of standard 68 facial landmarks fit to Xp. The mask is additionally upscaled by 10 percent to ensure that important features such as eyebrows are not missed due to slight misalignment of the landmarks. The values inside the convex hull are set to 1, while the values outside the hull are set to 0”). Note that: (1) for the normalized face image of the first source input image creating a binary mask mXp is equivalent to performing target object detection on the first source input image, to obtain a target object region corresponding to a target object type in the first source input image, while mXp is a target object region (target face region) after the target object (face) detection, and an image with contents cropped by the mask is the first source image sample at the original high resolution when the input image is the first source input image. The first source image sample at a first resolution can be obtained by down-sampling the first source image sample at the original high resolution to the first resolution; (2) Actually the same process is applied on both the first source input image and the first template input image; and (3) For the normalized face image of the first template input image creating a binary mask mXp is equivalent to detecting the first template input image, to obtain a to-be-fused region corresponding to a target object type in the first template input image, while mXp is a to-be-fused region (face region) corresponding to a target object type (face) after the to-be-fused region detection, and an image with the contents cropped by the mask is a the first template image sample if the input image is the first template input image. The first template image sample at a first resolution can be obtained by down-sampling the first source image sample at the original high resolution to the first resolution.
obtaining the first standard synthesized image of the first source image sample and the first template image sample at the first resolution. (Naruniec, page 5, col. left, paras. 3-6, “Multi-band blending [BA83], as recently used by Thies et [TZN*15] in the context of face-image compositing is a competing approach to Poisson blending … With this in mind, we copy the two coarsest (i.e. low-frequency) levels of the target’s Laplacian pyramid and blend only the remaining, more detailed levels. The final image is then obtained by reconstructing from the blended Laplacian pyramid. We also enforce that the boundary smoothing effect is propagated only into the interior of the face … Therefore, we additionally align the amount of contrast in the generated source face to match the contrast of the target”). Note that: a contrast-preserving, multi-band compositing method is used to composite or reconstruct the first standard synthesized. The first source image sample and the first template image sample at the first resolution can be decomposed into the pyramid Laplacian representations, respectively. The Laplacian representations are manipulated or blended at the corresponding levels. The final image (the first standard synthesized image) is then obtained by reconstructing from the blended Laplacian pyramid.

Claims 8-20 are rejected under 35 U.S.C. 103 as being unpatentable over Naruniec in view of Fchollet, and further in view of Peng et al. (US 2020/0007914 Al, hereinafter “Peng”). 

Claim 8 reciting “A computer device, comprising a processor, a memory, and an input/output interface; the processor being separately connected to the memory and the input/output interface, the input/output interface being configured to receive data and output data, the memory being configured to store a computer program, and the processor being configured to invoke the computer program, to cause the computer device to perform a method for generating an image processing model including:”, is corresponding to the method of claim 1. Therefore, claim 8 is rejected with the same prior art and citations for claim 1. 
In addition, Naruniec in view of Fchollet discloses … to cause the computer device to perform a method for generating an image processing model including: (Naruniec, page 1, Abstract, “this is the first method capable of rendering photo-realistic and temporally coherent results at megapixel resolution”; page 4, col. right, para. 1, “All the models were trained on a single NVIDIA 1080Ti GPU workstation (Intel R CoreTM i7-6700K CPU @ 4.00GHz)”).
However, Naruniec in view of Fchollet fails to disclose, but in the same art of computer graphics, Peng discloses A computer device, comprising a processor, a memory, and an input/output interface; the processor being separately connected to the memory and the input/output interface, the input/output interface being configured to receive data and output data, the memory being configured to store a computer program, and the processor being configured to invoke the computer program, (Peng, FIG. 6: a computer device, comprising “CPU” 402 separately connected to “MEMORY” 401 and “PERIPHERAL INTERFACE” 403 as the input/output interface that receives data from and outputs data to “I/O SUBSYSTEM” 409; para. [0088], “An electronic device is provided. The electronic device includes a memory, a processor, and computer programs stored in the memory and configured to be executed by the processor”).
Naruniec in view of Fchollet, and Peng, are in the same field of endeavor, namely computer graphics. Before the effective filing date of the claimed invention, it would have been obvious to apply a computer device with a memory, a processor, and computer programs stored in the memory and configured to be executed by the processor, as taught by Peng into Naruniec in view of Fchollet. The motivation would have been “An electronic device is provided. The electronic device includes a memory, a processor, and computer programs stored in the memory and configured to be executed by the processor” (Peng, page 6, para. [0088]). The suggestion for doing so would allow to have a non-transitory computer-readable media storing instructions. Therefore, it would have been obvious to combine Naruniec, Fchollet, and Peng.
	
	Claims 9-14 are corresponding to the method of claims 2-7, respectively. Therefore, claims 9-14 are rejected for the same nationale for claims 2-7 and the same rationale for claim 8, respectively. 

Claim 15 reciting “A non-transitory computer-readable storage medium, storing a computer program, the computer program, applicable to be loaded and executed by a processor of a computer device, causing the computer device to perform a method for generating an image processing model including:”, is corresponding to the method of claim 1. Therefore, claim 15 is rejected with the same prior art and citations for claim 1. 
In addition, Naruniec in view of Fchollet discloses … causing the computer device to perform a method for generating an image processing model including: (Naruniec, page 1, Abstract, “this is the first method capable of rendering photo-realistic and temporally coherent results at megapixel resolution”; page 4, col. right, para. 1, “All the models were trained on a single NVIDIA 1080Ti GPU workstation (Intel R CoreTM i7-6700K CPU @ 4.00GHz)”).
However, Naruniec in view of Fchollet fails to disclose, but in the same art of computer graphics, Peng discloses A non-transitory computer-readable storage medium, storing a computer program, the computer program, applicable to be loaded and executed by a processor of a computer device, causing the computer device to perform a method for generating an image processing model including: (Peng, page 8, para. [0123], “A non- transitory computer readable storage medium is provided. The non-transitory computer readable storage medium is configured to store a computer program which, when executed by a processor, causes the processor to carry out following actions”).
Naruniec in view of Fchollet, and Peng, are in the same field of endeavor, namely computer graphics. Before the effective filing date of the claimed invention, it would have been obvious to apply a non-transitory computer-readable media storing instructions, as taught by Peng to Naruniec in view of Fchollet. The motivation would have been “A non-transitory computer readable storage medium is provided. The non-transitory computer readable storage medium is configured to store a computer program which, when executed by a processor, causes the processor to carry out following actions” (Peng, page 8, para. [0123]). The suggestion for doing so would allow to have a non-transitory computer-readable media storing instructions. Therefore, it would have been obvious to combine Naruniec, Fchollet, and Peng.

Claims 16-19 and 20 are corresponding to the method of claims 2-5 and 7, respectively. Therefore, claims 16-19 and 20 are rejected the same rationale for claims 2-5 and 7 and the same rationale for claim 15, respectively.

Response to Arguments
Applicant's arguments with respect to claim rejection 35 U.S.C. 102 and claim rejection 35 U.S.C. 103, have been fully considered but they are not persuasive.

Applicant alleges, “Naruniec does not disclose or suggest "inserting a first resolution update layer into the first parameter adjustment model, to obtain a first update model including a first plurality of convolution layers and the first resolution update layer concatenated after the first plurality of convolution layers," as recited in the amended claim 1.” (page 15, lines 2-5), “Naruniec fails to disclose or suggest that the following steps in training: "using the second training dataset to adjust parameters in the first resolution update layer while fixing parameters in the first plurality of convolution layers, to obtain a first layer adjustment model," and "using the second training dataset to adjust all parameters in the first layer adjustment model, to obtain a second parameter adjustment model," as recited in the amended claim 1.” (page 15, lines 9-14), and “Naruniec also fails to teach or suggest "obtaining a second source image sample, a second template image sample, and a second standard synthesized image to form a second training dataset including images having a second resolution and a second quantity, wherein the second resolution is greater than the first resolution, wherein the second quantity is smaller than the first quantity," as recited in the amended claim 1.” (page 15, lines 18-22), “Claims 2-7 depends from and includes each limitation of amended independent claim 1, and is thus patentable at least for depending from an allowable claim, and for further features recited therein.” (page 16, lines 1-3), “Independent claims 8 and 15 have been amended to recite features similar to the
above-mentioned features in the amended claim 1. As such, claims 8 and 15 are also patentable over N aruniec for at least the same reasons discussed above regarding claim 1.” (page 16, lines 10-12), and “Claims 9-14 and 16-20 depend from and include each limitation of amended independent claims 8 and 15, respectively, are thus patentable at least for depending from an allowable claim, and for further features recited therein.” (page 16, lines 14-16). However, Examiner respectfully disagrees about the respective allegations as whole because:  
Naruniec discloses obtaining a first source image sample, a first template image sample, (page 3, col. right, Figures 2-3:

    PNG
    media_image1.png
    278
    554
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    190
    654
    media_image2.png
    Greyscale
, 
“A schematic of the full pipeline for swapping a source face of identity s onto a person 
    PNG
    media_image3.png
    14
    40
    media_image3.png
    Greyscale
. In steps (1) and (2) we preprocess the input by cropping and normalizing the face. In step (3) the pre-processed image is fed into the common encoder and decoded with corresponding decoder Ds. In (4) we use our multiband blending to swap the target with the source face … Single-encoder, multi-decoder network architecture”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds to an individual identity. We normalize all available examples to 1024_1024 resolution. Note that in the progressive regime, these images will be downsized in the initial stages of training, while 1024_1024 is the final resolution (see the appendix for details)”). Note that: (1) The training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulate the datasets of lower resolution input data for the corresponding lower resolutions (e.g., 256x256 and 512x512); and (2) At a lower resolution level l (e.g., l = 6; 256x256), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a first template image sample, while the corresponding down-sampled image xs version after cropping and normalization in Figure 2 can be regarded as a first source image sample, while the corresponding down-sampled image xs version after cropping and normalization in Figure 2 can be regarded as a first source image sample. and a first standard synthesized image to form a first training dataset including images having a first resolution and a first quantity; (page 5, col. left, paras. 3-6, “Multi-band blending [BA83], as recently used by Thies et [TZN*15] in the context of face-image compositing is a competing approach to Poisson blending … With this in mind, we copy the two coarsest (i.e. low-frequency) levels of the target’s Laplacian pyramid and blend only the remaining, more detailed levels. The final image is then obtained by reconstructing from the manipulated and blended Laplacian pyramid. We also enforce that the boundary smoothing effect is propagated only into the interior of the face … Therefore, we additionally align the amount of contrast in the generated source face to match the contrast of the target”). Note that: (1) a contrast-preserving, multi-band compositing method is used to composite or reconstruct the first standard synthesized. The first source image sample and the first template image sample at the first resolution can be decomposed into the pyramid Laplacian representations, respectively. The Laplacian representations are manipulated or blended at the corresponding levels. The final image (the first standard synthesized image) is then obtained by reconstructing from the blended Laplacian pyramid. The final image (the first standard synthesized image as the truth (face)) is then obtained by reconstructing from the blended Laplacian pyramid; and (2) the combination of the first source image sample, the first template image sample, and the first standard synthesized image forms a first training dataset including images having low resolution at a lower resolution level l  (e.g., l = 6; 256x256) as a first resolution and the original number of high resolution (1024x1024) images as a first quantity. 
Naruniec discloses performing parameter adjustment on an initial image fusion model by using the first training dataset (page 3, col. right, Figures 2-3: 


    PNG
    media_image1.png
    278
    554
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    190
    654
    media_image2.png
    Greyscale
, “A schematic of the full pipeline for swapping a source face of identity s onto a person 
    PNG
    media_image3.png
    14
    40
    media_image3.png
    Greyscale
. In steps (1) and (2) we preprocess the input by cropping and normalizing the face. In step (3) the pre-processed image is fed into the common encoder and decoded with corresponding decoder Ds. In (4) we use our multiband blending to swap the target with the source face … Single-encoder, multi-decoder network architecture”; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 4, col. left, paras. 2-3, “The training of our network is executed using a progressive regime, which we have adapted to work in a non-adversarial setting. This process starts from coarse, low-resolution images formed by down-sampling high-resolution input data and then gradually expands the network’s capacity as higher-resolution images are used for training … We partition the data X into P subsets, where each subset corresponds to an individual identity”; page 4 / col. left / para. 5 – page 4 / col. right / para. 1, “ 

    PNG
    media_image4.png
    166
    554
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    456
    558
    media_image5.png
    Greyscale
”). Note that: (1) Naruniec’s Figure 3 shows a neural network with a single encoder and multiple decoders. The neural network by design without training can be regarded as an initial image fusion model for the face swapping function; (2) The training of the neural network is a process in which the parameters of the neural networks are updated or adjusted, so the training is equivalent to performing parameter adjustment for the neural network; (3) According to Naruniec’s Table 1, the encoder have its layers for 0-8 levels as a shared encoder to encode the shared different features, while the multiple decoders can export different images related data with corresponding resolution levels (e.g. low resolution of 256x256 through high resolution of 1024x1024); and (4) The training itself is using a progressive regime to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets; performed, the partially trained neural network can be regarded as a first parameter adjustment model. to obtain a first parameter adjustment model. Note that: after the first stage training at the corresponding resolution level has been performed, the partially trained neural network can be regarded as a first parameter adjustment model.
Naruniec discloses Inserting a first resolution update layer into the first parameter adjustment model, to obtain a first update model (Figures 2-3; page 14, Table 1: “Detailed description of our encoder (left) and decoder (right)”; page 13, col. left, para. 3, “Our model is trained in a progressive regime, starting from coarse, low-resolution 4x4 pixel images and then gradually expanding the network’s capacity as higher-resolution images are used for training, up to 1024x1024 pixels. The base architecture, which focuses on the lowest-resolution data, corresponds to “level 0” in Figures 3 and 13 and Table 1. Each new “level” of the network doubles input and output resolution by adding a composition of two convolutional layers and a down- or up-scaling layer in the encoder and decoder, respectively”). Note that: (1) According to the progressive training regime, the first parameter adjustment model (i.e., the trained neural network using the first template image sample / the first source image sample l the first standard synthesized image at resolution level l) has already had all resolution-related layers at the resolution levels up to 8. However, the encoder’s layers corresponding to level l or lower are trained while other encoder’s layers at levels higher than l only behave like units or operators that leave unchanged with corresponding parameters without training or parameter changes. In the same way, the decoders’ layers corresponding to level l or lower are trained while other decoder’s layers at level higher than l only behave like units or operators that leave unchanged with corresponding parameters without training or parameter changes; and (2) enabling the encoder’s layer corresponding to resolution level l+1 and the decoders’ layer corresponding to level l+1 for next model training stage is equivalent to inserting a first resolution update layer into the first parameter adjustment model. Meanwhile, the trained encoder’s layers at resolution level l or lower and decoder’s layers at resolution level l or lower will be unchanged. including a first plurality of convolution layers and the first resolution update layer concatenated after the first plurality of convolution layers; (page 13, col. left, para. 3, “The base architecture, which focuses on the lowest-resolution data, corresponds to “level 0” in Figures 3 and 13 and Table 1. Each new “level” of the network doubles input and output resolution by adding a composition of two convolutional layers and a down- or up-scaling layer in the encoder and decoder, respectively”; page 14, Table 1, “Detailed description of our encoder (left) and decoder (right). For the Leaky rectified unit (LeakyReLU) we use a = 0:2.”, for the encoder at level l there are a Downsample layer and 2-3 convolution layers while for the decoder at the same level there are an Upsample layer and 2-3 convolution layers). Note that: (1) the first resolution update layer can be regarded as a layer that is enabled at level l (=6); and (2) before this layer there are other convolution layers that can be regarded as a first plurality of convolution layers, resulting in the first resolution update layer concatenated after the first plurality of convolution layers.
Naruniec discloses obtaining a second source image sample, a second template image sample, and a second standard synthesized image to form a second training dataset including images having a second resolution and a second quantity, wherein the second resolution is greater than the first resolution, wherein the second quantity is smaller than the first quantity; Note that: (1) The training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulate the datasets of lower resolution input data for the corresponding lower resolution levels (e.g., 256x256 and 512x512). At a resolution level l (e.g., l = 7; 512x512), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a second template image sample, while the corresponding down-sampled image xs version after cropping and normalization can be regarded as a second source image sample, and the corresponding image reconstructed from the manipulated and blended Laplacian pyramid using the Laplacian pyramid decompositions of the second template image sample the second source image sample as explained above, can be regarded as a second standard synthesized image as the truth (face). The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets; (2) the second resolution (512x512 at level 7) is greater than the first resolution (256x256 at level 6); and (3) it is obvious to one having ordinary skills in the art that a second quantity can be a number smaller than the first quantity when one just uses a smaller subset of the down-sampled images by a random selection from all original high resolution (1024x1024) images in order to speed up the training process.
Naruniec discloses … performing parameter adjustment on the first update model by: … Note that: The training is using a progressive regime. The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets while adjusting the parameters.
Naruniec discloses inserting a second resolution update layer into the second parameter adjustment model, to obtain a second update model; and Note that: (1) According to the progressive training regime, the second parameter adjustment model (i.e., the partially trained neural network using the second template image sample / the second source image sample l the second standard synthesized image at resolution level l) has already had all resolution-related layers at the resolution levels up to 8. However, the encoder’s layers corresponding to level l (e.g., l=6; 256x256) or lower are not trained while other encoder’s layers at levels higher than l behave also like dummies without training. The decoders’ layer corresponding to level l+1 (e.g., 512x512) is trained while other decoder’s layers at levels higher than l+1 only behave like dummies without enabling and training and the layers at level lower than l+1 are not trained; and (2) enabling the decoders’ layer corresponding to level l+2 (e.g., 1024x1024) for next model training stage is equivalent to inserting a second resolution update layer into the second parameter adjustment model. Meanwhile, the parameters of the trained encoder’s layers at resolution level l or lower and decoder’s layers at resolution level l+1 or lower will be unchanged.
Naruniec discloses performing parameter adjustment on the second update model by using a third source image sample and a third template image sample, and a third standard synthesized image, to obtain a target image fusion model configured to fuse an object in one image into another image. Note that: (1) The training is using a progressive regime. The high-resolution (e.g., 1024x1024) input image data can be down-sampled to formulated the datasets of lower resolution input data for the corresponding lower resolution levels (e.g., 256x256 and 512x512). At a resolution level l (e.g., l = 7; 512x512), the corresponding down-sampled image xt version after cropping and normalization in Figure 2 can be regarded as a third template image, while the corresponding down-sampled image xs version after cropping and normalization can be regarded as a third source image sample, and the corresponding image reconstructed from the manipulated and blended Laplacian pyramid using the Laplacian pyramid decompositions of the third template image sample the third source image sample as explained above, can be regarded as a third standard synthesized image as the truth (face). The training itself is to minimize the SSIM or MS-SSIM at the corresponding level for all P subsets; (2) after the third stage training at the corresponding resolution level has been performed, the trained neural network can be regarded as a target image fusion model adjustment model; and (3) the target image fusion model’s decoders can export synthesized image to swap the target with the source face.
However, Naruniec fails to disclose, but in the same art of neural network technology, Fchollet discloses using the second training dataset to adjust parameters in the first resolution update layer while fixing parameters in the first plurality of convolution layers, to obtain a first layer adjustment model, and (Fchollet, page 1, paras. 1-3, “Transfer learning consists of taking features learned on one problem, and leveraging them on a new, similar problem … The most common incarnation of transfer learning in the context of deep learning is the following workflow: Take layers from a previously trained model.  Freeze them, so as to avoid destroying any of the information they contain during future training rounds. Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.  Train the new layers on your dataset”). Note that: (1) by “transfer learning”, parameters in the first plurality of convolution layers can be fixed; (2) the inserted first resolution update layer can be trained by adjusting the corresponding parameters with the second training dataset; (3) after the network has been trained, a first layer adjustment model is obtained. using the second training dataset to adjust all parameters in the first layer adjustment model, to obtain a second parameter adjustment model; (Fchollet, page 1, para. 4, “optional step, is fine-tuning, which consists of unfreezing the entire model you obtained above (or part of it), and re-training it on the new data with a very low learning rate. This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data”). Note that: (1) by “fine-tuning”, all parameters of the neural network (the first layer adjustment model) can be unfreezing and open for training; (2) the neural network can be trained by adjusting the corresponding parameters with the second training dataset; (3) after the neural network has been trained, a second parameter adjustment model is obtained.
Naruniec and Fchollet are in the same field of endeavor, namely neural network technology. Before the effective filing date of the claimed invention, it would have been obvious to apply transfer learning and fine-tuning, as taught by Fchollet into Naruniec. The motivation would have been “This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data” (Fchollet, page 1, para. 4). The suggestion for doing so would allow to train specific neural network layers separately and achieve meaningful improvements by incrementally adapting the pretrained features to the new data. Therefore, it would have been obvious to combine Naruniec and Fchollet.
Therefore, Naruniec in view of Fchollet discloses all limitations of claim 1.
Claims 2-7 depend from claim1. They are rejected for the respective prior art / citations / rationale above.
Independent claims 8 and 15 are corresponding to the features pf claim 1. Therefore, claims 8 and 15 are rejected for the same rational for claim 1.
Claims 9-14 and 16-20 depend from independent claims 8 and 15, respectively. They are rejected for the respective prior art / citations / rationale above.
The arguments are not persuasive. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BIAO CHEN whose telephone number is (703)756-1199. The examiner can normally be reached M-F 8am-5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee M Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Biao Chen/
Patent Examiner, Art Unit 2611



/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Show 1 earlier event
Sep 17, 2025
Non-Final Rejection mailed — §103
Dec 11, 2025
Applicant Interview (Telephonic)
Dec 11, 2025
Examiner Interview Summary
Dec 12, 2025
Response Filed
Feb 02, 2026
Final Rejection mailed — §103
Mar 06, 2026
Examiner Interview Summary
Mar 06, 2026
Applicant Interview (Telephonic)
Mar 11, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/120,636
Patent 12639886
CONTENT PLAYBACK AND MODIFICATIONS IN A 3D ENVIRONMENT
3y 2m to grant Granted May 26, 2026
18/451,961
Patent 12633057
EXTRACTING 3D SHAPES FROM LARGE-SCALE UNANNOTATED IMAGE DATASETS
2y 9m to grant Granted May 19, 2026
18/498,713
Patent 12614365
NORMALIZING INDIVIDUAL DEPTH PERCEPTION FOR VR
2y 6m to grant Granted Apr 28, 2026
18/522,197
Patent 12602873
AUTOMATIC RETOPOLOGIZATION OF TEXTURED 3D MESHES
2y 4m to grant Granted Apr 14, 2026
18/384,696
Patent 12597149
APPARATUS, METHOD, AND COMPUTER PROGRAM FOR NETWORK COMMUNICATIONS
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
84%
Grant Probability
99%
With Interview (+26.3%)
2y 4m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 32 resolved cases by this examiner. Grant probability derived from career allowance rate.