DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The Amendment filed 27 February 2026 has been entered and considered. Claims 1, 12, 13, have been amended. Claims 1-20 are all the claims pending in the application. Claims 1-8, 10-11, and 13-19 are rejected. Claims 9, 12, and 20 are objected. Accordingly, this action is made final.
Response to Amendment
Prior Art Rejections
In view of the amendments to independent Claims 1 and 13, and their dependent claims by extension, the rejection under 35 USC 103 using previously cited art Prabhakar in view of Park is maintained.
On page 8 of the Amendment, the Applicant argues the combination of Prabhakar and Park due to Park not being explicitly directed toward multi-exposure fusion, as well as Park utilizing a single input image and Prabhakar using a plurality of input images. The Applicant also further defines independent Claims 1 and 13 to specify that the inputs are multiple LDR images at different exposures and how this distinguishes the application from the references. Examiner respectfully disagrees.
Although Park uses a single image as an input, it still explicitly teaches multi-exposure fusion by generating multiple virtual images from the input image that correspond to different camera exposures (Park: 2) Reflectance And Illumination Enhancement and Fig. 1; main step of our HDRI algorithm is to derive several illuminations that correspond to different camera exposures, which we call as virtual illumination generation (VIG) algorithm). These virtual multi-exposure images are then added together for HDR image generation (Park: 3) HDR Images Generation; the multi-exposure images can be added together with appropriate weights to generate a tone-mapped-like LDR images, which is called exposure fusion method).
Furthermore, with respect to the newly added limitations of Claim 1, and analogous limitations in Claim 13, Prabhakar as modified by Park further teaches an image processing apparatus comprising one or more processors and one or more memories comprising instructions, the one or more processors configured to execute the instructions to cause the apparatus to perform the following (Prabhakar: obvious for the CNN model to be executed on a CPU): receive a plurality of input images, wherein the plurality of input images are low dynamic range (LDR) images with different exposures (Prabhakar: 3.2. Training; each scene consists of 2 low dynamic range images with ±2 EV difference; exposure bias value (EV) indicates the amount of exposure offset from the auto exposure setting of an camera, for example, EV 1 is equal to doubling auto exposure time (EV 0)); wherein the construction operation is adapted for image construction from a plurality of frequency-specific components, to thereby form a high dynamic range (HDR) output image representing a combination of the input images (Park: 3) HDR Images Generation; generate a higher bit-depth HDR luminance image from these virtual multi-exposures (Yk) by using conventional HDRI technique). Thus, contrary to Applicant’s assertions, the proposed combination of Prabhakar in view of Park does indeed teach the newly added features of Claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 4-5, 13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhakar et al. (NPL: DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs, hereafter referred as Prabhakar) in view of Park et al. (NPL: High Dynamic Range and Super-Resolution Imaging From a Single Image, hereafter referred as Park).
Regarding Claim 1:
Prabhakar teaches an image processing apparatus comprising one or more processors and one or more memories comprising instructions, the one or more processors configured to execute the instructions to cause the apparatus to perform the following (Prabhakar: obvious for the CNN model to be executed on a CPU): receive a plurality of input images, wherein the plurality of input images are low dynamic range (LDR) images with different exposures (Prabhakar: 3.2. Training; each scene consists of 2 low dynamic range images with ±2 EV difference; exposure bias value (EV) indicates the amount of exposure offset from the auto exposure setting of an camera, for example, EV 1 is equal to doubling auto exposure time (EV 0)); process each set of decomposed data using one or more convolutional neural networks to form a combined image dataset; and perform a construction operation on the combined image data set (Prabhakar: 3.1. DeepFuse CNN and Fig. 2; Architecture of proposed image fusion CNN illustrated for input exposure stack; the input to this architecture would be exposure image pairs stacked in third dimension).
Prabhakar fails to teach for each input image, form a set of decomposed data by decomposing the input image or a filtered version thereof into a plurality of frequency-specific components each representing the occurrence of features of a respective frequency interval in the input image or the filtered version thereof, wherein the construction operation is adapted for image construction from a plurality of frequency-specific components, to thereby form a high dynamic range (HDR) output image representing a combination of the input images.
Park, like Prabhakar, is directed to multi-exposure fusion for HDR image generation from LDR images. Park does teach for each input image, form a set of decomposed data by decomposing the input image or a filtered version thereof into a plurality of frequency-specific components each representing the occurrence of features of a respective frequency interval in the input image or the filtered version thereof (Park: B. Proposed Single Image HDRI; the proposed HDRI algorithm consists of three steps: image decomposition, enhancement of each component and HDR image reconstruction; in the first stage, the luminance component (Y) of an input image is decomposed into low frequency layer (illumination denoted as I) and high frequency details (reflectance component denoted as R)), wherein the construction operation is adapted for image construction from a plurality of frequency-specific components, to thereby form a high dynamic range (HDR) output image representing a combination of the input images (Park: 3) HDR Images Generation; generate a higher bit-depth HDR luminance image from these virtual multi-exposures (Yk) by using conventional HDRI technique).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Prabhakar to utilize the image decomposition technique on the input image stacks and the multi-exposure blending technique on the DeepFuse CNN, as taught by Park, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. As taught by Park, the proposed modification allows for the enhancement of individual components, like reflectance, to improve image details in relatively bright areas (Park: 2 Reflectance and Illumination Enhancement).
In regards to Claim 4, Prabhakar as modified by Park further teaches the image processing apparatus as claimed in claim 1, the apparatus comprising a camera and the apparatus being configured to, in response to an input from a user of the apparatus, cause the camera to capture the said plurality of input images, each of the input images being captured with a different exposure from others of the input images (Park: 2) Reflectance And Illumination Enhancement and Fig. 1; main step of our HDRI algorithm is to derive several illuminations that correspond to different camera exposures by utilizing multi-exposure images).
In regards to Claim 5, Prabhakar as modified by Park further teaches the image processing apparatus as claimed in claim 1, wherein the decomposed data is formed by decomposing a version of the respective input image filtered by a convolutional filter (Park: 1) Image Decomposition; exploit the Retinex filtering algorithm which estimates the I that corresponds to low frequency component (base layer) and the R that corresponds to high frequency component (detail layer) as: log(Y) = R + log(I); the illumination is estimated as the filtering of Y with a low pass filter G).
Regarding Claim 13:
Prabhakar as modified by Park further teaches a computer-implemented image processing method comprising: receiving a plurality of input images, wherein the plurality of input images are low dynamic range (LDR) images with different exposures (Prabhakar: 3.2. Training; each scene consists of 2 low dynamic range images with ±2 EV difference; exposure bias value (EV) indicates the amount of exposure offset from the auto exposure setting of an camera, for example, EV 1 is equal to doubling auto exposure time (EV 0)); for each input image, forming a set of decomposed data by decomposing the input image or a filtered version thereof into a plurality of frequency-specific components each representing the occurrence of features of a respective frequency interval in the input image or the filtered version thereof (Park: B. Proposed Single Image HDRI; the proposed HDRI algorithm consists of three steps: image decomposition, enhancement of each component and HDR image reconstruction; in the first stage, the luminance component (Y) of an input image is decomposed into low frequency layer (illumination denoted as I) and high frequency details (reflectance component denoted as R)); processing each set of decomposed data using one or more convolutional neural networks to form a combined image dataset (Prabhakar: 3.1. DeepFuse CNN and Fig. 2; Architecture of proposed image fusion CNN illustrated for input exposure stack; the input to this architecture would be exposure image pairs stacked in third dimension); and subjecting the combined image dataset to a construction operation that is adapted for image construction from a plurality of frequency-specific components to thereby form an output image representing a high dynamic range (HDR) combination of the input images (Park: 3) HDR Images Generation; generate a higher bit-depth HDR luminance image from these virtual multi-exposures (Yk) by using conventional HDRI technique).
In regards to Claim 16, Prabhakar as modified by Park further teaches the method as claimed in claim 13, wherein the decomposed data is formed by decomposing a version of the respective input image filtered by a convolutional filter (Park: 1) Image Decomposition; exploit the Retinex filtering algorithm which estimates the I that corresponds to low frequency component (base layer) and the R that corresponds to high frequency component (detail layer) as: log(Y) = R + log(I); the illumination is estimated as the filtering of Y with a low pass filter G).
Claims 2-3 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhakar et al. (NPL: DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs, hereafter referred as Prabhakar) in view of Park et al. (NPL: High Dynamic Range and Super-Resolution Imaging From a Single Image, hereafter referred as Park) and Fu et al. (CN Patent Pub No. 111861957 A, hereafter referred as Fu).
In regards to Claim 2, Prabhakar as modified by Park fails to further teach the image processing apparatus as claimed in claim 1, wherein the step of decomposing the input image comprises performing a discrete wavelet transform operation on the input image.
Fu, like Prabhakar, is directed to the fusion and generation of higher definition images. Fu does teach wherein the step of decomposing the input image comprises performing a discrete wavelet transform operation on the input image (Fu: Contents of the Invention, Par. 4; using the two-dimensional discrete wavelet transform function to perform multi-layer wavelet decomposition to each image to be fused).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Prabhakar to utilize discrete wavelet transform, as taught by Fu, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. As taught by Fu, the proposed modification solves the technical problems that the fusion image obtained by the existing image fusion method is not high enough in contrast and not clear enough (Fu: Contents of the Invention, Par. 1).
In regards to Claim 3, Prabhakar as modified by Park and Fu further teaches the image processing apparatus as claimed in claim 1, wherein the construction operation is an inverse discrete wavelet transform operation (Fu: Specific Implementation Examples, Par. 3 and Fig. 1; and a target second layer high frequency layer performing inverse two-dimensional discrete wavelet decomposition (2D-I DWT) to obtain the final fusion image A).
In regards to Claim 14, Prabhakar as modified by Park and Fu further teaches the method as claimed in claim 13, wherein the step of decomposing the input image comprises performing a discrete wavelet transform operation on the input image (Fu: Contents of the Invention, Par. 4; using the two-dimensional discrete wavelet transform function to perform multi-layer wavelet decomposition to each image to be fused).
In regards to Claim 15, Prabhakar as modified by Park and Fu further teaches the method as claimed in claim 13, wherein the construction operation is an inverse discrete wavelet transform operation (Fu: Specific Implementation Examples, Par. 3 and Fig. 1; and a target second layer high frequency layer performing inverse two-dimensional discrete wavelet decomposition (2D-I DWT) to obtain the final fusion image A).
Claims 6-8, 10-12, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhakar et al. (NPL: DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs, hereafter referred as Prabhakar) in view of Park et al. (NPL: High Dynamic Range and Super-Resolution Imaging From a Single Image, hereafter referred as Park), Fu et al. (CN Patent Pub No. 111861957 A, hereafter referred as Fu), and Ma et al. (CN Patent Pub No. 111382795 A, hereafter referred as Ma).
In regards to Claim 6, Prabhakar as modified by Park and Fu further teaches the image processing apparatus as claimed in claim 1, wherein the apparatus is configured to: mask and weight at least some areas of some of the sets of the decomposed data (Fu: Contents of the Invention, Par. 4; decomposition module, to obtain the approximate coefficient matrix corresponding to the low frequency layer, and a detail coefficient matrix group corresponding to each high frequency layer in the plurality of high frequency layers; each detail coefficient matrix group comprises a plurality of detail coefficient matrix; and the different detail coefficient matrix does not need to decompose direction; a calculating module, for weighting and calculating the approximation coefficient matrix corresponding to the plurality of image to be fused, so as to obtain the target approximation coefficient matrix).
Prabhakar as modified by Park and Fu fails to further teach as to form attention-filtered decomposed data; select a subset of components of the attention-filtered decomposed data that correspond to lower frequencies than other components of the attention-filtered decomposed data; merge at least the components of the subset of components to form merged data; and wherein the merged data form an input to the construction operation.
Ma, like Prabhakar, is directed to decomposing images into difference frequencies. Ma does teach so as to form attention-filtered decomposed data; select a subset of components of the attention-filtered decomposed data that correspond to lower frequencies than other components of the attention-filtered decomposed data (Ma: Summary of the Invention, Par. 12; in step 2, the low frequency group is used as the input of the neural network, the intermediate frequency group is injected into the neural network in the second stage, and the highest frequency group is injected into the neural network as the final input); merge at least the components of the subset of components to form merged data; and wherein the merged data form an input to the construction operation (Ma: Summary of the Invention, Par. 14-16; in step 2, the features and injection information are merged through attention splicing; stitch together the feature and injection frequency information and feed it into a two-level attention module).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to improve the different frequency image decomposition technique of Prabhakar by utilizing attention guided techniques for the frequency grouping of the decomposition, as taught by Ma, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. As taught by Ma, the proposed modification, by using an attention module, allows the features in the neural network to be fused and information to be injected at different frequencies, which significantly improves the accuracy of the various datasets and neural networks (Ma: Detailed ways, Par. 7).
In regards to Claim 7, Prabhakar as modified by Park, Fu, and Ma further teaches the image processing apparatus as claimed in claim 6, wherein the apparatus is configured to decompose the attention-filtered data (Ma: Summary of the Invention, Par. 4; step 1, use multi-level discrete wavelet transform to decompose the information in the natural image into three groups according to the distribution of its frequency bands), merge relatively low frequency components of the attention-filtered data through a plurality of residual operations to form convolved low frequency data (Ma: Summary of the Invention, Par. 14; in step 2, the features and injection information are merged through attention splicing), and perform a reconstruction operation in dependence on relatively high frequency components of the attention-filtered data and the convolved low frequency data (Ma: Summary of the Invention, Par. 16; step 2.2, stitch together the feature and injection frequency information and feed it into a two-level attention module).
In regards to Claim 8, Prabhakar as modified by Park, Fu, and Ma further teaches the image processing apparatus as claimed in claim 1, the apparatus being configured to: for each input image, form the respective set of decomposed data by decomposing the input image or a filtered version thereof into a first plurality of sets of frequency-specific components each representing the occurrence of features of a respective frequency interval in the input image or the filtered version thereof (Ma: Summary of the Invention, Par. 4; step 1, use multi-level discrete wavelet transform to decompose the information in the natural image into three groups according to the distribution of its frequency bands), performing a convolution operation on each of the sets of frequency-specific components to form convolved data and decomposing the convolved data into a second plurality of sets of frequency-specific components each representing the occurrence of features of a respective frequency interval in the convolved data (Ma: Summary of the Invention, Par. 12 and 15-17; in step 2, the low frequency group is used as the input of the neural network, the intermediate frequency group is injected into the neural network in the second stage, and the highest frequency group is injected into the neural network as the final input; Step 2.1, use a 1×1 convolutional layer to increase the channel of injected information to half of the features; Step 2.2, stitch together the feature and injection frequency information and feed it into a two-level attention module, which consists of a convolutional layer and a deconvolutional layer, and generates an spatial and semantic enhancement; Step 2.3, utilize 1×1 convolution to reduce the channels of attention module results to the origin of feature channels.).
In regards to Claim 10, Prabhakar as modified by Park, Fu, and Ma further teaches the image processing apparatus as claimed in claim 8, wherein the first subsets are subsets of relatively low frequency components (Ma: Summary of the Invention, Par. 4 and 12; step 1, use multi-level discrete wavelet transform to decompose the information in the natural image into three groups according to the distribution of its frequency bands; in step 2, the low frequency group is used as the first input of the neural network).
In regards to Claim 11, Prabhakar as modified by Park, Fu, and Ma further teaches the image processing apparatus as claimed in claim 8, wherein the second subsets are subsets of relatively high frequency components (Ma: Summary of the Invention, Par. 4 and 12; step 1, use multi-level discrete wavelet transform to decompose the information in the natural image into three groups according to the distribution of its frequency bands; in step 2, the intermediate frequency group is injected into the neural network in the second stage, and the highest frequency group is injected into the neural network as the final input).
In regards to Claim 17, Prabhakar as modified by Park, Fu, and Ma further teaches The method as claimed in claim 13, further comprising: masking and weighting at least some areas of some of the sets of the decomposed data (Fu: Contents of the Invention, Par. 4; decomposition module, to obtain the approximate coefficient matrix corresponding to the low frequency layer, and a detail coefficient matrix group corresponding to each high frequency layer in the plurality of high frequency layers; each detail coefficient matrix group comprises a plurality of detail coefficient matrix; and the different detail coefficient matrix does not need to decompose direction; a calculating module, for weighting and calculating the approximation coefficient matrix corresponding to the plurality of image to be fused, so as to obtain the target approximation coefficient matrix) so as to form attention-filtered decomposed data; selecting a subset of components of the attention-filtered decomposed data that correspond to lower frequencies than other components of the attention-filtered decomposed data (Ma: Summary of the Invention, Par. 12; in step 2, the low frequency group is used as the input of the neural network, the intermediate frequency group is injected into the neural network in the second stage, and the highest frequency group is injected into the neural network as the final input); merging at least the components of the subset of components to form merged data; and wherein the merged data form an input to the construction operation (Ma: Summary of the Invention, Par. 14-16; in step 2, the features and injection information are merged through attention splicing; stitch together the feature and injection frequency information and feed it into a two-level attention module).
In regards to Claim 18, Prabhakar as modified by Park, Fu, and Ma further teaches the method as claimed in claim 17, further comprising decomposing the attention-filtered data (Ma: Summary of the Invention, Par. 4; step 1, use multi-level discrete wavelet transform to decompose the information in the natural image into three groups according to the distribution of its frequency bands), merging relatively low frequency components of the attention-filtered data through a plurality of residual operations to form convolved low frequency data (Ma: Summary of the Invention, Par. 14; in step 2, the features and injection information are merged through attention splicing), and performing a reconstruction operation in dependence on relatively high frequency components of the attention-filtered data and the convolved low frequency data (Ma: Summary of the Invention, Par. 16; step 2.2, stitch together the feature and injection frequency information and feed it into a two-level attention module).
In regards to Claim 19, Prabhakar as modified by Park, Fu, and Ma further teaches the method as claimed in claim 13, further comprising: for each input image, form the respective set of decomposed data by decomposing the input image or a filtered version thereof into a first plurality of sets of frequency-specific components each representing the occurrence of features of a respective frequency interval in the input image or the filtered version thereof (Ma: Summary of the Invention, Par. 4; step 1, use multi-level discrete wavelet transform to decompose the information in the natural image into three groups according to the distribution of its frequency bands), performing a convolution operation on each of the sets of frequency-specific components to form convolved data and decomposing the convolved data into a second plurality of sets of frequency-specific components each representing the occurrence of features of a respective frequency interval in the convolved data (Ma: Summary of the Invention, Par. 12 and 15-17; in step 2, the low frequency group is used as the input of the neural network, the intermediate frequency group is injected into the neural network in the second stage, and the highest frequency group is injected into the neural network as the final input; Step 2.1, use a 1×1 convolutional layer to increase the channel of injected information to half of the features; Step 2.2, stitch together the feature and injection frequency information and feed it into a two-level attention module, which consists of a convolutional layer and a deconvolutional layer, and generates an spatial and semantic enhancement; Step 2.3, utilize 1×1 convolution to reduce the channels of attention module results to the origin of feature channels.).
Allowable Subject Matter
Claim 9, 12, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Claim 9 and 20 recite, in some variation: the image processing apparatus as claimed in claim 8, the apparatus being configured to: merge the first subset of the second plurality of sets of frequency-specific components to form first merged data; perform a masked and weighted combination of a first subset of the second plurality of sets of frequency-specific components and the first merged data to form first combined data; perform a first convolutional combination of a second subset of the second plurality of sets of frequency-specific components to form second combined data; upsample the first and second combined data to form first upsampled data; perform a masked and weighted combination of a first subset of the first plurality of sets of frequency-specific components and the first upsampled data to form third combined data; perform a second convolutional combination of a second subset of the first plurality of sets of frequency-specific components to form fourth combined data; upsample the third and fourth combined data to form second upsampled data; and wherein the output image is formed in dependence on the second upsampled data. The cited art of record does not teach or suggest such a combination of features.
Because the cited art of record, alone or in combination, does not teach or suggest each and every feature of dependent Claims 9 and 20, this claim would be allowable. Claim 12 would be allowable based on its dependence on Claim 9.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RENAE BITOR whose telephone number is (703)756-5563. The examiner can normally be reached Monday to Friday: 8:00 - 5:30 but off the 1st Friday of the biweek.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, GREG MORSE can be reached at (571)272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RENAE A BITOR/Examiner, Art Unit 2663
/GREGORY A MORSE/Supervisory Patent Examiner, Art Unit 2698