DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3, 7, 10, 15, and 17 have been amended. Claims 1-20 are pending for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/17/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Arguments
Applicant's arguments filed 11/04/2025 have been fully considered but they are not persuasive.
Applicant argues that:
According to MAO, both the first convolution layer and the second convolution layer are included in UV component processing module which is before the Joint processing module. The Office seems to have mapped MAO 's Joint process module to the claimed "second stage convolution," Office Action p. 3. Therefore, both the first convolution layer and the second convolution layer are performed before the second convolution stage, and cannot constitute a teaching or suggestion of the claimed "third convolution stage" which is performed "on an output of the second stage convolution, wherein the third stage convolution comprises a fourth convolution and a fifth convolution that are provided in parallel." The framework of MAO shown in FIG. 8B is different from the framework described present application, in which both the first stage convolution before the second stage convolution and the third stage convolution after the second stage convolution have two convolutions provided in parallel.
Therefore, MAO does not disclose or suggest "performing a first stage convolution on the input image data, wherein the first stage convolution comprises a first convolution and a second convolution that are provided in parallel; performing a second stage convolution on a channel- wise concatenation result of an output of the first convolution and an output of the second convolution; performing a third stage convolution on an output of the second stage convolution, wherein the third stage convolution comprises a fourth convolution and a fifth convolution that are provided in parallel" as recited in amended claim 1.
Examiner respectfully disagrees.
Fig. 9B shows three parallel stages of convolution. The first stage, labeled below, comprises a first and a second convolution that are provided in parallel. Also a second stage is provided on the result of the first and second convolution. Finally, Fig. 9B shows a third stage made up of parallel convolutions.
PNG
media_image1.png
646
890
media_image1.png
Greyscale
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-4, and 15-18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by MAO (US 20240007658 A1).
Regarding claim 1, MAO teaches a method of encoding a video sequence into a bitstream, the method comprising:
receiving a video sequence (Fig. 1B & 1B);
performing a plurality of convolutions on an input image data of the video sequence in YUV format ([0223] Similar to a processing manner of the YUV420 data format, for a YUV444 data format and a YUV422 data format, a quantity of convolution layers and downsampling factors in a horizontal direction and a vertical direction are controlled); wherein performing the plurality of convolutions comprises:
performing a first stage convolution on the input image data, wherein the first stage convolution comprises a first convolution and a second convolution that are provided in parallel ([0222] The Y and UV components are respectively input into the Y component processing module and the UV component processing module, and a network outputs the feature maps of the Y and UV components. In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers. Examiner note: Fig 8A&8B show convolution done in parallel; [0251] Operation 3: Concatenate and stitch a feature map of the Y component, a feature map of the U component, and a feature map of the V component to form a to-be-encoded feature map, input the to-be-encoded feature map to the entropy encoding module, and output a bitstream.);
performing a second stage convolution on a channel-wise concatenation result of an output of the first convolution and an output of the second convolution (In the example shown in FIG. 8B, the joint processing module 2 includes two convolution layers and one nonlinear layer [0233].);
performing a third stage convolution on an output of the second stage convolution (Fig. 8B & 9B. In the example shown in FIG. 8B, the joint processing module 2 includes two convolution layers and one nonlinear layer [0233]. Operation 6: … The response vector g.sub.yi is multiplied by a feature map output at a second convolution layer in the Y component processing module 2 channel by channel, to obtain a feature map after a quality gain.), wherein the third stage convolution comprises a fourth convolution and a fifth convolution that are provided in parallel (Fig. 9B: top portion shows 9 convolutions. Y component processing module, the U component processing module, and the V component processing module are each made up of three convolutions. Rightmost convolution layer collectively make up a third stage convolution and is made up of a fourth convolution and a fifth convolution); and
obtaining an output image data based on an output of the third stage convolution ([0150] Operation 705: Obtain a bitstream of a video signal based on the second feature map of the first signal component and the second feature map of the second signal component.);
and encoding the output image data for generating the bitstream ([0227] Operation 3: Input the to-be-encoded feature map into an entropy encoding module, and output a bitstream.).
Regarding claim 2, MAO teaches the method according to claim 1, wherein performing the first stage convolution on the input image data further comprising:
performing the first convolution on a Y component of the input image data (Fig. 8B: In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers [0222]); and
performing the second convolution on a channel-wise concatenation result of a U component and a V component of the input image data (Fig. 8B: The UV component processing module includes two convolution layers and two nonlinear layers. [0222]).
Regarding claim 3, MAO teaches the method according to claim 1, wherein performing the third stage convolution on the output of the second stage convolution further comprises:
performing a fourth convolution on the output of the second stage convolution (Fig. 9B & Fig. 8B: In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers [0222]); and
performing a fifth convolution on the output of the second stage convolution (Fig. 9B & Fig. 8B: The UV component processing module includes two convolution layers and two nonlinear layers. [0222] Examiner note: second convolution layer of the UV component processing module and second convolution layer of the Y component processing module done in parallel).
Regarding claim 4, MAO teaches the method according to claim 3, wherein obtaining the output image data based on the output of the third stage convolution further comprises:
obtaining a Y component of the output image data based on an output of the fourth convolution (Fig. 8B: output of Y component processing module); and
obtaining a channel-wise concatenation result of a U component and a V component of the output image data based on an output of the fifth convolution (Fig. 8B: output of UV component processing module).
Regarding claim 15, MAO teaches a non-transitory computer readable storage medium storing a bitstream generated by operations (
A bit stream generated by operations comprising… is a product by process claim limitation where the product is the bit stream and the process is the method steps to generate the bitstream. MPEP §2113 recites “Product-by-Process claims are not limited to the manipulations of the recited steps, only the structure implied by the steps”. Thus, the scope of the claim is the storage medium storing the bitstream (with the structure implied by the method steps). The structure includes the information and samples manipulated by the steps.
“To be given patentable weight, the printed matter and associated product must be in a functional relationship. A functional relationship can be found where the printed matter performs some function with respect to the product to which it is associated”. MPEP §2111.05(I)(A). When a claimed “computer-readable medium merely serves as a support for information or data, no functional relationship exists. MPEP §2111.05(III). The storage medium storing the claimed bitstream in claim 15 merely services as a support for the storage of the bitstream and provides no fictional relationship between the stored bitstream and storage medium. Therefor the structure bitstream, which scope is implied by the method steps, is non-functional descriptive material and given no patentable weight. MPEP §2111.05(III). Thus, the claim scope is just a storage medium storing data and is anticipated by MAO which recites a storage medium storing a bitstream ([0096] ).
) comprising:
performing a plurality of convolutions on an input image data of a video sequence in YUV format ([0223] Similar to a processing manner of the YUV420 data format, for a YUV444 data format and a YUV422 data format, a quantity of convolution layers and downsampling factors in a horizontal direction and a vertical direction are controlled); wherein performing the plurality of convolutions comprises:
performing a first stage convolution on the input image data, wherein the first stage convolution comprises a first convolution and a second convolution that are provided in parallel ([0222] The Y and UV components are respectively input into the Y component processing module and the UV component processing module, and a network outputs the feature maps of the Y and UV components. In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers. Examiner note: Fig 8A&8B show convolution done in parallel; [0251] Operation 3: Concatenate and stitch a feature map of the Y component, a feature map of the U component, and a feature map of the V component to form a to-be-encoded feature map, input the to-be-encoded feature map to the entropy encoding module, and output a bitstream.);
performing a second stage convolution on a channel-wise concatenation result of an output of the first convolution and an output of the second convolution (In the example shown in FIG. 8B, the joint processing module 2 includes two convolution layers and one nonlinear layer [0233].);
performing a third stage convolution on an output of the second stage convolution (In the example shown in FIG. 8B, the joint processing module 2 includes two convolution layers and one nonlinear layer [0233]. Operation 6: … The response vector g.sub.yi is multiplied by a feature map output at a second convolution layer in the Y component processing module 2 channel by channel, to obtain a feature map after a quality gain.) wherein the third stage convolution comprises a fourth convolution and a fifth convolution that are provided in parallel (Fig. 9B: top portion shows 9 convolutions. Y component processing module, the U component processing module, and the V component processing module are each made up of three convolutions. Rightmost convolution layer collectively make up a third stage convolution and is made up of a fourth convolution and a fifth convolution); and
obtaining an output image data based on an output of the third stage convolution ([0150] Operation 705: Obtain a bitstream of a video signal based on the second feature map of the first signal component and the second feature map of the second signal component.);
and encoding the output image data for generating the bitstream ([0227] Operation 3: Input the to-be-encoded feature map into an entropy encoding module, and output a bitstream.).
Regarding claim 16, MAO teaches the non-transitory computer readable storage medium according to claim 15, wherein performing the first stage convolution on the input image data further comprising:
performing the first convolution on a Y component of the input image data (Fig. 8B: In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers [0222]); and
performing the second convolution on a channel-wise concatenation result of a U component and a V component of the input image data (Fig. 8B: The UV component processing module includes two convolution layers and two nonlinear layers. [0222]).
Regarding claim 17, MAO teaches the non-transitory computer readable storage medium according to claim 15, wherein performing the third stage convolution on the output of the second stage convolution further comprises:
performing a fourth convolution on the output of the second stage convolution (Fig. 9B & Fig. 8B: In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers [0222]); and
performing a fifth convolution on the output of the second stage convolution (Fig. 9B & Fig. 8B: The UV component processing module includes two convolution layers and two nonlinear layers. [0222] Examiner note: second convolution layer of the UV component processing module and second convolution layer of the Y component processing module done in parallel).
Regarding claim 18, MAO teaches the non-transitory computer readable storage medium according to claim 17, wherein obtaining the output image data based on the output of the third stage convolution further comprises:
obtaining a Y component of the output image data based on an output of the fourth convolution (Fig. 8B: output of Y component processing module); and
obtaining a channel-wise concatenation result of a U component and a V component of the output image data based on an output of the fifth convolution (Fig. 8B: output of UV component processing module).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 5-6, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over MAO in view of Cui (US 20230116285 A1).
Regarding claim 5, MAO teaches the method according to claim 1. MAO does not explicitly teach the following limitations, however, in an analogous art, Cui teaches wherein a set of parameters of each of the plurality of convolutions comprises: an input channel number, an output channel number, a kernel size, a stride, and a padding size (kernel_size being the size of the kernel of the neural network which is a convolutional neural network and n_layers being the number of the layers of the neural network [0015]. The performance of a multichannel image enhancement algorithm varies with some parameters of the input multichannel image (e.g. number of channels, their quality) and also varies across the image data in each channel [0107]. the padding is performed so that the vertical dimension (number of samples after padding) is an integer multiple of the vertical patch dimension. [0119]).
It would have been obvious for a person of ordinary skill in the art, before the effective filling date of the claimed invention, to take the teachings of Cui and apply them to MAO. One would be motivated as such to improve image modification performance due to possibility of adapting the primary channel by selecting it among the image channels (Cui: [0010]).
Regarding claim 6, MAO in view of Cui teaches the method according to claim 5. Cui teaches wherein the set of parameters is determined based on a type of a YUV format and a number of the plurality of convolutions (The patch may have size of 32a×32a×Z, where a∈Z, and Z is the depth of the full stack (e.g. Z=6 for YUV 4:2:0). Then, in a second stage, 64 convolutions are performed with a kernel size of 3×3×Z. Each convolution builds one layer of the next stage… Finally, in stage 5, the output of the network is a class, specifying the best processing parameters for the patch, e.g. Primary=N (primary channel is the channel N) or Filter_off (indication that the image enhancement should not be applied) [0160]). The same motivation used to combine MAO in view of Cui in claim 5 is applicable.
Regarding claim 19, MAO teaches the non-transitory computer readable storage medium according to claim 15. MAO does not explicitly teach the following limitations, however, in an analogous art, Cui teaches wherein a set of parameters of each of the plurality of convolutions comprises: an input channel number, an output channel number, a kernel size, a stride, and a padding size (kernel_size being the size of the kernel of the neural network which is a convolutional neural network and n_layers being the number of the layers of the neural network [0015]. The performance of a multichannel image enhancement algorithm varies with some parameters of the input multichannel image (e.g. number of channels, their quality) and also varies across the image data in each channel [0107]. the padding is performed so that the vertical dimension (number of samples after padding) is an integer multiple of the vertical patch dimension. [0119]).
It would have been obvious for a person of ordinary skill in the art, before the effective filling date of the claimed invention, to take the teachings of Cui and apply them to MAO. One would be motivated as such to improve image modification performance due to possibility of adapting the primary channel by selecting it among the image channels (Cui: [0010]).
Regarding claim 20, MAO in view of Cui teaches the non-transitory computer readable storage medium according to claim 19. Cui teaches wherein the set of parameters is determined based on a type of a YUV format and a number of the plurality of convolutions (The patch may have size of 32a×32a×Z, where a∈Z, and Z is the depth of the full stack (e.g. Z=6 for YUV 4:2:0). Then, in a second stage, 64 convolutions are performed with a kernel size of 3×3×Z. Each convolution builds one layer of the next stage… Finally, in stage 5, the output of the network is a class, specifying the best processing parameters for the patch, e.g. Primary=N (primary channel is the channel N) or Filter_off (indication that the image enhancement should not be applied) [0160]). The same motivation used to combine MAO in view of Cui in claim 19 is applicable.
Claims 7-8, 10, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over MAO in view of XIE (US 20250259275 A1).
Regarding claim 7, MAO teaches a method of decoding a bitstream to output one or more pictures for a video stream, the method comprising:
receiving a bitstream (Fig. 1A&1B); and
decoding, using coded information of the bitstream, one or more pictures comprising a down-sampled image data in YUV format ([0223] Similar to a processing manner of the YUV420 data format, for a YUV444 data format and a YUV422 data format, a quantity of convolution layers and downsampling factors in a horizontal direction and a vertical direction are controlled. Fig. 8B); and
performing a plurality of convolutions on the down-sampled image data, wherein performing the plurality of convolutions (Fig. 8B) comprises:
performing a first stage convolution on the down-sampled image data, wherein the first stage convolution comprises a first convolution and a second convolution provided in parallel ([0222] The Y and UV components are respectively input into the Y component processing module and the UV component processing module, and a network outputs the feature maps of the Y and UV components. In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers. Examiner note: Fig 8A&8B show convolution done in parallel; [0251] Operation 3: Concatenate and stitch a feature map of the Y component, a feature map of the U component, and a feature map of the V component to form a to-be-encoded feature map, input the to-be-encoded feature map to the entropy encoding module, and output a bitstream.);
performing a second stage convolution on a channel-wise concatenation result of an output of the first convolution and an output of the second convolution (In the example shown in FIG. 8B, the joint processing module 2 includes two convolution layers and one nonlinear layer [0233].);
performing a third stage convolution on an output of the second stage convolution (In the example shown in FIG. 8B, the joint processing module 2 includes two convolution layers and one nonlinear layer [0233]. Operation 6: … The response vector g.sub.yi is multiplied by a feature map output at a second convolution layer in the Y component processing module 2 channel by channel, to obtain a feature map after a quality gain.) wherein the third stage convolution comprises a fourth convolution and a fifth convolution that are provided in parallel (Fig. 9B: top portion shows 9 convolutions. Y component processing module, the U component processing module, and the V component processing module are each made up of three convolutions. Rightmost convolution layer collectively make up a third stage convolution and is made up of a fourth convolution and a fifth convolution);
MAO does not explicitly teach the following limitations, however, in an analogous art, Xie teaches performing a bicubic interpolation on the down-sampled image data to obtain a bicubic interpolation result (Taking a picture of a YUV format as an example, the number of channels corresponding to the picture of the YUV format is 1 (that is, C=1), image quality enhancement processing is performed on a Y component through the image model shown in FIG. 5a, and an output result is a super-resolution single-channel image, and a UV component may be interpolated by adopting methods such as bilinear interpolation or bicubic interpolation. [0141]); and
performing an element-wise addition to an output of third stage convolution and the bicubic interpolation result to obtain an up-sampled image data (Fig. 5C).
It would have been obvious for a person of ordinary skill in the art, before the effective filling date of the claimed invention, to take the teachings of XIE and apply them to MAO. One would be motivated as such thus improving the effect of image quality enhancement, reducing the risk of gradient vanishing in the image processing process, and enhancing the performance of the image model. (XIE: [0135]).
Regarding claim 8, MAO in view of XIE teaches the method according to claim 7. MAO teaches wherein performing the first stage convolution on the image data comprises:
performing the first convolution on a Y component of the down-sampled image data (Fig. 8B: In the example shown in FIG. 8B, the Y component processing module includes two convolution layers and two nonlinear layers [0222]); and
performing the second convolution on a channel-wise concatenation result of a U component and a V component of the down-sampled image data (Fig. 8B: The UV component processing module includes two convolution layers and two nonlinear layers. [0222]).
Regarding claim 10, MAO in view of XIE teaches the method according to claim 8. MAO teaches wherein performing the third stage convolution on the output of the second stage convolution further comprises:
performing a fourth convolution on the output of the second stage convolution (Fig. 9B & Fig. 8B: output of Y component processing module); and
performing a fifth convolution on the output of the second stage convolution, (Fig. 9B & Fig. 8B: output of UV component processing module).
Regarding claim 14, MAO in view of XIE teaches the method according to claim 7. XIE teaches wherein a Rectified Linear Unit (ReLU) is applied to each convolution in the first stage convolution and the second stage convolution as an activation function (The activation function may include, is not limited to, a Prelu activation function, a Relu activation function, and the like. [0115]). The same motivation used to combine MAO in view of XIE in claim 7 is applicable.
Claims 9, and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over MAO in view of XIE further in view of Cui.
Regarding claim 9, MAO in view of XIE teaches the method according to claim 7. MAO in view of XIE does not explicitly teach the following limitations, however, in an analogous art, Cui teaches wherein the second stage convolution comprises a series of convolutions (The patch may have size of 32a×32a×Z, where a∈Z, and Z is the depth of the full stack (e.g. Z=6 for YUV 4:2:0). Then, in a second stage, 64 convolutions are performed with a kernel size of 3×3×Z. Each convolution builds one layer of the next stage… Finally, in stage 5, the output of the network is a class, specifying the best processing parameters for the patch, e.g. Primary=N (primary channel is the channel N) or Filter_off (indication that the image enhancement should not be applied)).
It would have been obvious for a person of ordinary skill in the art, before the effective filling date of the claimed invention, to take the teachings of Cui and apply them to MAO. One would be motivated as such to improve image modification performance due to possibility of adapting the primary channel by selecting it among the image channels (Cui: [0010]).
Regarding claim 12, MAO in view of XIE teaches the method according to claim 7. MAO in view of XIE does not explicitly teach the following limitations, however, in an analogous art, Cui teaches wherein a set of parameters of each of the plurality of convolutions comprises: an input channel number, an output channel number, a kernel size, a stride, and a padding size (kernel_size being the size of the kernel of the neural network which is a convolutional neural network and n_layers being the number of the layers of the neural network [0015]. The performance of a multichannel image enhancement algorithm varies with some parameters of the input multichannel image (e.g. number of channels, their quality) and also varies across the image data in each channel [0107]. the padding is performed so that the vertical dimension (number of samples after padding) is an integer multiple of the vertical patch dimension. [0119]).
It would have been obvious for a person of ordinary skill in the art, before the effective filling date of the claimed invention, to take the teachings of Cui and apply them to MAO. One would be motivated as such to improve image modification performance due to possibility of adapting the primary channel by selecting it among the image channels (Cui: [0010]).
Regarding claim 13, MAO in view of XIE and Cui teaches the method according to claim 12. Cui teaches wherein the set of parameters is determined based on a type of a YUV format and a number of the plurality of convolutions (The patch may have size of 32a×32a×Z, where a∈Z, and Z is the depth of the full stack (e.g. Z=6 for YUV 4:2:0). Then, in a second stage, 64 convolutions are performed with a kernel size of 3×3×Z. Each convolution builds one layer of the next stage… Finally, in stage 5, the output of the network is a class, specifying the best processing parameters for the patch, e.g. Primary=N (primary channel is the channel N) or Filter_off (indication that the image enhancement should not be applied) [0160]). The same motivation used to combine MAO in view of XIE and Cui in claim 12 is applicable.
Allowable Subject Matter
Claim 11 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HESHAM K ABOUZAHRA whose telephone number is (571)270-0425. The examiner can normally be reached M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached at 57127227384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HESHAM K ABOUZAHRA/Primary Examiner, Art Unit 2486