DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/27/2026 has been entered.
Response to Amendment
In light of Applicant’s amendment of claim 16, the claims no longer invoke 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Accordingly, the claim interpretations under 35 U.S.C. 112(f) are withdrawn.
Status of Claims
Claims 1-3,6-17 and 19-20 are pending. Claims 1, 15 and 16 are amended. Claims 4, 5 and 18 are cancelled.
Response to Arguments
Applicant's arguments filed on January 27, 2026 with respect to rejection of claims under 35 U.S.C. 103 has been fully considered; but they are not found persuasive. Specifically, Applicant argues in page 8, first paragraph that the cited references do not appear to disclose, teach or suggest vision transformers with a cross-shaped window multi-headed self-attention mechanism wherein the horizontal and the vertical stripes in parallel forms a cross-shaped window and the widths of the stripes increases throughout the depth of the network. Examiner respectfully disagrees. The cited prior art reference Wang teaches window based multi-headed self-attention mechanism in page 4, seventh paragraph: “Window-based Multi-head Self-Attention (W-MSA)”; also see Fig. 2. In an analogous cited prior art reference, cited prior art reference Dong teaches computation of vertical and horizontal stripes in parallel forming a cross shaped window— Dong, page 2, ¶02: “Cross-Shaped Window (CSWin)...we perform the self-attention calculation in the horizontal and vertical stripes in parallel”, additionally, Dong further discloses that the widths of the stripes increase throughout the depth, later in the same paragraph— Dong, page 2, ¶02: “we adjust the stripe width according to the depth of the network: small widths for shallow layers and larger widths for deep layers”. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Lee using the teachings of Dong to introduce variable stripe widths in a cross shaped window. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of generating a computationally efficient, optimized vision transformer. Therefore, Applicant's arguments are not found persuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 6, 7, 9, 10, 12-17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the Non-Patent Literature (NPL) Wang et al. (Uformer: A General U-Shaped Transformer for Image Restoration) in view of Lee et al. (US 2021/0133932 A1) and in further view of NPL Dong et al. (CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows).
Regarding claim 1, Wang teaches, An image processing method comprising: (Wang, page 1, ¶01: “Recovering clear images from their degraded versions, i.e., image restoration”) acquiring a first image whose spatial resolution and (Wang, page 3, ¶05: “given a degraded image I”) generating a residual image from the first image (Wang, page 4, ¶03: “apply a 3 x 3 convolution layer to obtain a residual image R”) using a multi-scale hierarchical neural network (Wang, page 2, ¶05: “U-shaped structures with skip-connection to capture multi-scale information hierarchically”) for joint learning of (Wang, page 3, ¶03: “jointly trains standard Transformer blocks with multi-tails and multi-heads on multiple low-level vision tasks”) (Wang, page 2, ¶05: “methods for different image restoration tasks, such as super-resolution”) the multi-scale hierarchical neural network comprising an encoder stage and a decoder stage forming a plurality of symmetrical encoder-decoder levels, (Wang, page 2, ¶02: “Transformer-based encoder-decoder structure”) each encoder and decoder in each level comprising a vision transformer block; (Wang, page 2, ¶01: “inspired by the recent vision Transformers [10, 11], we leverage a depth-wise convolutional layer between two fully-connected layers of the feed-forward network in the Transformer block”) and generating a reconstructed image based on the first and residual images (Wang, page 4, ¶03: “Finally, the restored image is obtained by I’= I+R) wherein each vision transformer block uses a Cross-Shaped Window multi-headed self-attention mechanism (Wang, page 4, ¶07: “Window-based Multi-head Self-Attention (W-MSA). Instead of using global self-attention like the vanilla Transformer, we perform the self-attention within non-overlapping local windows”; also see Fig. 2”). However, Wang does not explicitly teach low-light enhancement and self-attention mechanism comprises horizontal and vertical stripes in parallel that form a cross-shaped window, and wherein the widths of the stripes are gradually increased throughout the depth of the network.
In an analogous field of endeavor, Lee teaches low-light enhancement (Lee, ¶0210: “super resolution, and brightness enhancement of the input image, may be performed through various operation functions of the processor 180”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang using the teachings of Lee to introduce brightness enhancement of an image. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of improving the overall image quality. Therefore, it would have been obvious to combine the analogous arts Wang and Lee to obtain the above-described limitations in claim 1. However, the combination of Wang and Lee does not explicitly teach self-attention mechanism comprises horizontal and vertical stripes in parallel that form a cross-shaped window, and wherein the widths of the stripes are gradually increased throughout the depth of the network.
In an analogous field of endeavor, Dong teaches, self-attention mechanism comprises horizontal and vertical stripes in parallel that form a cross-shaped window, (Dong, page 2, ¶02: “With CSWin self-attention, we perform the self-attention calculation in the horizontal and vertical stripes in parallel”) and wherein the widths of the stripes are gradually increased throughout the depth of the network. (Dong, page 2, ¶02: “we adjust the stripe width according to the depth of the network: small widths for shallow layers and larger widths for deep layers”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Lee using the teachings of Dong to introduce variable stripe widths in a cross shaped window. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of generating a computationally efficient, optimized vision transformer. Therefore, it would have been obvious to combine the analogous arts Wang, Lee and Dong to obtain the invention of claim 1.
Regarding claim 2, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, wherein the network is a residual neural network comprising skip-connections. (Wang, page 2, ¶05: “Learning effective models using the U-shaped structures with skip-connection to capture multi-scale information”).
Regarding claim 3, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, wherein the network has a U-shaped architecture, (Wang, page 2, ¶05: “Learning effective models using the U-shaped structures”) the encoder stage reducing the spatial resolution of the first image while increasing the number of feature channels of the first image at every level, (Wang, page 3, ¶05: “down-sample the maps and double the channels using 4
×
4 convolution with stride 2”) and the decoder stage increasing the said spatial resolution while reducing the said number of feature channels at every level, (Wang, page 4, ¶03: “This layer reduces half of the feature channels and doubles the size of the feature maps”) wherein the spatial resolution of the generated residual image is identical to the spatial resolution of the first acquired image. (Wang, page 4, ¶03: “After the K decoder stages, we reshape the flattened features to 2D feature maps and apply a 3
×
3 convolution layer to obtain a residual image
R
∈
R
3
×
H
×
W
. Finally, the restored image is obtained by I’ = I+R”).
Regarding claim 6, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, wherein each vision transformer block is an Enhanced Cross-Shaped Window transformer block (Wang, page 4, ¶05: “we propose a locally-enhanced window (LeWin) Transformer block”; also see Fig. 2) obtained by combining a Cross-Shaped Window self-attention mechanism with a Locally-enhanced Feed-Forward module (Wang, page 4, ¶05: “Locally-enhanced Feed-Forward Network”). and a Locally-Enhanced Positional Encoding module. (Wang, page 5, ¶02: “we also apply the relative position encoding into the attention module”).
Regarding claim 7, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, wherein the reconstructed image is generated
I
^
N
L
H
R
b
ased on the following equation:
I
^
N
L
H
R
=
I
L
L
L
R
+
I
R
↑
s
(Wang, page 4, ¶03: “the restored image is obtained by I’ = I+R”) Wherein
I
L
L
L
R
is the first image, (Wang, page 3, ¶05: “given a degraded image I”)
I
R
is the residual image (Wang, page 4, ¶03: “obtain a residual image R”) and s is a scaling factor for the upsampling (Wang, page 4, ¶03: “decoder also contains K stages. Each consists of an upsampling layer”) and the symbol + means element-wise addition. (Wang, page 4, ¶03: “the restored image is obtained by I’ = I+R”).
Regarding claim 9, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, comprising extracting a low-level feature map
F
0
∈
R
H
×
W
×
C
from the first image, wherein W and H are a width and a height of the first image and C a number of feature channels of the first image, (Wang, page 3, ¶05: “Uformer firstly applies a 3 × 3 convolutional layer with LeakyReLU to extract low-level features
X
0
∈
R
C
×
H
×
W
”) and inputting the low-level feature map
F
0
to the first encoder level. (Wang, page 3, ¶05: “the feature maps
X
0
are passed through K encoder stages”).
Regarding claim 10, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 9, wherein extracting a low-level feature map
F
0
comprises performing convolutional operations. (Wang, page 3, ¶05: “Uformer firstly applies a 3 × 3 convolutional layer with LeakyReLU to extract low-level features
X
0
∈
R
C
×
H
×
W
”).
Regarding claim 12, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, wherein the network comprises a bottleneck stage between the last encoder level and the first decoder level. (Wang, page 4, ¶02: “a bottleneck stage with a stack of LeWin Transformer blocks is added at the end of the encoder”).
Regarding claim 13, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 12, wherein an output of the bottleneck stage is processed to upsample the size of a latent feature map (Wang, page 4, ¶03: “the proposed decoder also contains K stages. Each consists of an upsampling layer and a stack of LeWin Transformer blocks similar to the encoder”) output at the last encoder level (Wang, page 4, ¶02: “a bottleneck stage with a stack of LeWin Transformer blocks is added at the end of the encoder”) and to reduce the number of feature channels input to the first decoder level. (Wang, page 4, ¶03: “This layer reduces half of the feature channels and doubles the size of the feature maps”).
Regarding claim 14, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 1, wherein the neural network is trained beforehand with low-resolution patch images and corresponding high-resolution patch images, wherein the low-resolution patch images are bigger than 64 x 64 pixels, (Wang, page 6, ¶06: “we train Uformer for 250 epochs with batch size 32 and input image size 128 × 128”) and wherein the corresponding high-resolution patch images are at least 2 to 4 times bigger. (Wang, page 13, ¶02: “We train the proposed Uformer16 on one GPU, with mini-batches of size 8 on the 256×256 samples”).
Regarding claim 15, it recites a computer-readable medium storing a program corresponding to the steps of the method recited in claim 1. Therefore, the recited program steps of the computer-readable medium of claim 15 are mapped to the proposed combination in the same manner as the corresponding steps of the method claim 1. Additionally, the rationale and motivation to combine Wang, Lee and Dong presented in rejection of claim 1, apply to this claim. Additionally, Lee teaches, A non-transitory computer-readable medium storing a program (Lee, ¶0295: “computer program may be recorded in computer readable media”) that, when run on a computer, causes the computer to carry out a method, the method comprising: (Lee, ¶0295: “embodiments of the present disclosure may be implemented in the form of a computer program which can be executed by various components on a computer”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Lee and in further view of Dong using the additional teachings of Lee to introduce a computer readable media. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of storing the program containing the image restoration mechanism. Therefore, it would have been obvious to combine the analogous arts Wang, Lee and Dong to obtain the invention of claim 15.
Regarding claim 16, it recites an apparatus with elements corresponding to the steps of the method recited in claim 1. Therefore, the recited elements of apparatus claim 16 are mapped to the proposed combination in the same manner as the corresponding steps in method claim 1. Additionally, the rationale and motivation to combine Wang, Lee and Dong presented in rejection of claim 1, apply to this claim. Additionally, Lee teaches, An image processing apparatus comprising: (Lee, ¶0046: “The color restoration apparatus 100 according to an embodiment of the present disclosure may be represented as devices such as a terminal, a desktop computer, and a digital camera, depending on the form of implementation”) a processor; and a memory storing a program which, when executed by the processor causes the image processing apparatus to: (Lee, ¶0205: “in order to drive the application program stored in the memory 170, the processor 180 may control at least some of components described”) acquire a first image whose spatial resolution and lightness are to be enhanced (Lee, ¶0043: “restoration apparatus may restore color of an inputted low light image”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Lee and in further view of Dong using the additional teachings of Lee to introduce an apparatus to run the image restoration program. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of automatically performing image restoration mechanism on the apparatus. Therefore, it would have been obvious to combine the analogous arts Wang, Lee and Dong to obtain the invention of claim 16.
Regarding claim 17, it recites an apparatus with elements corresponding to the steps of the method recited in claim 2. Therefore, the recited elements of apparatus claim 17 are mapped to the proposed combination in the same manner as the corresponding steps in method claim 2. Additionally, the rationale and motivation to combine Wang, Lee and Dong presented in rejection of claim 1, apply to this claim.
Regarding claim 19, it recites an apparatus with elements corresponding to the steps of the method recited in claim 6. Therefore, the recited elements of apparatus claim 19 are mapped to the proposed combination in the same manner as the corresponding steps in method claim 6. Additionally, the rationale and motivation to combine Wang, Lee and Dong presented in rejection of claim 1, apply to this claim.
Regarding claim 20, it recites an apparatus with elements corresponding to the steps of the method recited in claim 12. Therefore, the recited elements of apparatus claim 20 are mapped to the proposed combination in the same manner as the corresponding steps in method claim 12. Additionally, the rationale and motivation to combine Wang, Lee and Dong presented in rejection of claim 1, apply to this claim.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the Non-Patent Literature (NPL) Wang et al. (Uformer: A General U-Shaped Transformer for Image Restoration), in view of Lee et al. (US 2021/0133932 A1), in further view of NPL Dong et al. (CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows) and still in further view of Lim et al. (US 2022/0375483 A1).
Regarding claim 8, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 7, wherein upsampling the combination of the acquired first image and generated residual image (Wang, page 4, ¶03: “the features inputted to the LeWin Transformer blocks are the up-sampled features”) comprises performing (Wang, page 4, ¶03: “We use 2 × 2 transposed convolution with stride 2 for the up-sampling”). However, the combination of Wang, Lee and Dong does not explicitly teach pixel-shuffling.
In an analogous field of endeavor, Lim teaches, pixel-shuffling (Lim, ¶0064: “the up-sampling layers 308 may be implemented by transposed convolution and pixel shuffling”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Lee and in further view of Dong using the teachings of Lim to introduce pixel shuffling. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of performing efficient convolutional operations. Therefore, it would have been obvious to combine the analogous arts Wang, Lee, Dong and Lim to obtain the invention of claim 8.
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over the Non-Patent Literature (NPL) Wang et al. (Uformer: A General U-Shaped Transformer for Image Restoration), in view of Lee et al. (US 2021/0133932 A1), in further view of NPL Dong et al. (CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows) and still in further view of Fu et al. (US 10,482,603 B1).
Regarding claim 11, Wang in view of Lee and in further view of Dong teaches, The image processing method according to claim 9, wherein generating the residual image comprises. However, the combination of Wang, Lee and Dong does not explicitly teach extracting deep-level features
F
d
from the low-level features
F
0
in the plurality of symmetrical encoder-decoder levels.
In an analogous field of endeavor, Fu teaches, extracting deep-level features
F
d
from the low-level features
F
0
in the plurality of symmetrical encoder-decoder levels. (Fu, col 13, lines 35-38: “Using this residual connection scheme, the encoder 310 can generate or extract class-specific, high-level features (e.g., object-level feature information 235) that is passed to the decoder”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Lee and in further view of Dong using the teachings of Fu to introduce extracting high-level features. A person skilled in the art would be motivated to combine the known elements as described above and achieve the predictable result of performing optimized image reconstructions. Therefore, it would have been obvious to combine the analogous arts Wang, Lee, Dong and Fu to obtain the invention of claim 11.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEHRAZUL ISLAM whose telephone number is (571)270-0489. The examiner can normally be reached Monday-Friday: 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Saini Amandeep can be reached at (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MEHRAZUL ISLAM/Examiner, Art Unit 2662
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662