Office Action Analysis: 18591943 — IMAGE PROCESSING DEVICE DETERMINING MOTION VECTOR BETWEEN FRAMES, AND METHOD THEREBY

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged that application is a National Stage application of PCT PCT/KR2022/012599. Priority to KR10-2021-0115025 with a priority date of 8/30/2021 is acknowledged under 35 USC 119(e) and 37 CFR 1.78.
Information Disclosure Statement
The IDSs dated 2/29/2024, 1/22/2025, and 12/12/2025 has been considered and placed in the application file.  
1st Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 3, 4, 5, 6, 9, 10, 15, 16, 17, 18, 19, and 20 are rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2021 0366082 A1, (Xiao et al.) in view of US Patent Publication 2019 0138898 A1, (Song et al.).
Claim 1
 Regarding Claim 1, Xiao et al. teach an image processing device comprising:
at least one memory storing one or more instructions; and at least one processor configured to execute the one or more instructions to: ("memory 1304 includes main memory for storing instructions for processor 1302 to execute or data for processor 1302 to operate on," par. 77) obtain a plurality of difference maps between a first frame or a first feature map corresponding to the first frame, and a plurality of second feature maps corresponding to a second frame; ("The computing system may generate a first feature map for the first frame and one or more second feature maps for the one or more second frames," par. 25) obtain a plurality of third feature maps and a plurality of fourth feature maps by performing a first pooling process based on a first size, and a second pooling process based on a second size, on the plurality of difference maps; ("when an image having a size of 24×24 pixels is input in the neural network 1 of FIG. 1, the input image may be output as feature maps of 4 channels having a 20×20 size via a convolution operation between the input image and the kernel. Also, feature maps of 4 channels having a 10×10 size may be output by using only one or more of pixel values of the feature maps of 4 channels having the 20×20 size via a sub-sampling process. Methods for sub-sampling may include max-pooling," par. 57) obtain a plurality of modified difference maps by weighted-summing the plurality of third feature maps and the plurality of fourth feature maps; ("The computing system may generate, by the feature weighting module, a pixel-wise weighting map for each of the one or more up-sampled and warped second feature maps. The computing system may further multiply the pixel-wise weighting map with the corresponding up-sampled and warped second feature map to generate a reweighted feature map for the corresponding second frame," par. 47) identify any one collocated sample based on sizes of sample values ​​of collocated samples of the plurality of modified difference maps corresponding to a current sample of the first frame; ("the feature reweighting module may be a 3-layer convolutional neural network, which may take the RGB-D of the zero-upsampled current frame as well as the zero-upsampled, warped previous frames as input, and generate a pixel-wise weighting map for each previous frame, with values between 0 and 10, where 10 is a hyperparameter," par. 47) and determining a motion vector of the current sample ("The determining may comprise identifying a motion vector for the corresponding second frame having the resolution lower than the target resolution and resizing the motion vector to the target resolution based on bilinear up-sampling," par. 44).
Xiao et al. does not explicitly teach all of determining a filter kernel used to obtain one of the plurality of second feature maps corresponding to one of the plurality of modified difference maps comprising the identified collocated sample.
However, Song et al. teach determine a filter kernel used to obtain one of the plurality of second feature maps corresponding to one of the plurality of modified difference maps comprising the identified collocated sample ("As the convolution operation between the first feature map FM1 and the kernel is performed, a channel of the second feature map FM2 may be generated," par. 61).
Therefore, taking the teachings of Xiao et al. and Song et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature map generating methods as taught by Xiao et al. to use the kernel division methods as taught by Song et al. The suggestion/motivation for doing so would have been that, “That is, semantic segmentation attempts to partition an image into semantically meaningful parts, and to classify the parts into classes. Semantic segmentation is a technique not only for identifying what is in the image, but also for precisely figuring out locations of objects in the image” as noted by the Song et al. disclosure in paragraph [0065], which also motivates combination because the combination would predictably have a higher accuracy as there is a reasonable expectation that the specific feature map generation method and the kernel division method would result in enhanced spatial resolution and more accurate boundary localization in the produced segmentation map; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
The rejection of device claim 1 above applies mutatis mutandis to the corresponding limitations of method claim 15 while noting that the rejection above cites to both device and method disclosures. Claim 15 is mapped below for clarity of the record and to specify any new limitations not included in claim 1.
Claim 2
 Regarding claim 2, Xiao et al. and Song et al. teach the image processing device of claim 1 as noted above.
Xiao et al. do not explicitly teach all of wherein a first stride used in the first pooling process and a second stride used in the second pooling process are different from each other.
[AltContent: textbox (Figure 7 shows first kernel and second sub-kernel.)]
    PNG
    media_image1.png
    422
    592
    media_image1.png
    Greyscale
However, Song et al. teach wherein a first stride used in the first pooling process and a second stride used in the second pooling process are different from each other ("The neural network apparatus may generate the sub-kernels 540 by dividing the second kernel 530 based on a stride value. According to an embodiment, the neural network apparatus may divide the second kernel 530 into the sub-kernels 540, the number of the sub-kernels 540 corresponding to a value obtained by squaring the stride value," par. 79).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 3
 Regarding claim 3, Xiao et al. and Song et al. teach the image processing device of claim 2 as noted above.
Xiao et al. do not explicitly teach all of wherein the first size and the first stride are greater than the second size and the second stride.
However, Song et al. teach wherein the first size and the first stride are greater than the second size and the second stride ("The neural network apparatus may generate the sub-kernels 540 by dividing the second kernel 530 based on a stride value. According to an embodiment, the neural network apparatus may divide the second kernel 530 into the sub-kernels 540, the number of the sub-kernels 540 corresponding to a value obtained by squaring the stride value," par. 79).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 4
 Regarding claim 4, Xiao et al. and Song et al. teach the image processing device of claim 3 as noted above.
Xiao et al. do not explicitly teach all of wherein the first size and the first stride are k and k is a natural number, and wherein the second size and the second stride are k/2.
However, Song et al. teach wherein the first size and the first stride are k and k is a natural number, and wherein the second size and the second stride are k/2 ("The neural network apparatus may generate the sub-kernels 540 by dividing the second kernel 530 based on a stride value. According to an embodiment, the neural network apparatus may divide the second kernel 530 into the sub-kernels 540, the number of the sub-kernels 540 corresponding to a value obtained by squaring the stride value," par. 79).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 5
 Regarding claim 5, Xiao et al. and Song et al. teach the image processing device of claim 1 as noted above.
Song et al. do not explicitly teach all of wherein the at least one processor is further configured to execute the one or more instructions to obtain, from a neural network, a first weight applied to the plurality of third feature maps, and a second weight applied to the plurality of fourth feature maps.
However, Xiao et al. teach wherein the at least one processor is further configured to execute the one or more instructions to obtain, from a neural network, a first weight applied to the plurality of third feature maps, and a second weight applied to the plurality of fourth feature maps ("this subnetwork may process each input frame individually and share weights across all frames except for the current frame …  In particular embodiments, the initial feature map may be based on a first number of channels whereas each of the first feature map and the one or more second feature maps may be based on a second number of channels," par. 42).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 6
 Regarding claim 6, Xiao et al. and Song et al. teach the image processing device of claim 1 as noted above.
Xiao et al. teach wherein the at least one processor is further configured to execute the one or more instructions to: ("memory 1304 includes main memory for storing instructions for processor 1302 to execute or data for processor 1302 to operate on," par. 77) obtain the plurality of modified difference maps by weighted-summing the plurality of third feature maps and the plurality of fourth feature maps, ("The computing system may generate, by the feature weighting module, a pixel-wise weighting map for each of the one or more up-sampled and warped second feature maps. The computing system may further multiply the pixel-wise weighting map with the corresponding up-sampled and warped second feature map to generate a reweighted feature map for the corresponding second frame," par. 47) based on a first preliminary weight and a second preliminary weight that are output from a neural network; ("this subnetwork may process each input frame individually and share weights across all frames except for the current frame …  In particular embodiments, the initial feature map may be based on a first number of channels whereas each of the first feature map and the one or more second feature maps may be based on a second number of channels," par. 42) determine motion vectors corresponding to samples of the first frame, from the plurality of modified difference maps; ("In particular embodiments, the computing system may determine the motion estimation between the associated second time and the first time," par. 44) and motion-compensate the second frame based on the motion vectors, ("In particular embodiments, generating the reconstructed frame corresponding to the first frame may comprise combining the up-sampled first feature map and the reweighted feature maps associated with the one or more second frames," par. 49) and wherein the neural network is trained based on first loss information corresponding to a difference between the motion-compensated second frame and the first frame ("The training loss of our method, as given in Eq. (1), may be a weighted combination of the perceptual loss computed from a pretrained VGG-16 network and the structural similarity index (SSIM)," par. 51).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 9
 Regarding claim 9, Xiao et al. and Song et al. teach the image processing device of claim 1 as noted above.
Xiao et al. teach wherein each of the first pooling process and the second pooling process comprises an average pooling process or a median pooling process ("The embodiments disclosed herein demonstrate the first learned super-sampling method that achieves significant 4×4 super-sampling with high spatial and temporal fidelity," par. 34).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 10
 Regarding claim 10, Xiao et al. and Song et al. teach the image processing device of claim 1 as noted above.
[AltContent: textbox (Figure 4B shows the network and sub-network architecture.)]
    PNG
    media_image2.png
    446
    616
    media_image2.png
    Greyscale
Xiao et al. teach wherein the first feature map is obtained through first convolution processing on the first frame based on a first filter kernel, and wherein the plurality of second feature maps are obtained through second convolution processing on the second frame based on a plurality of second filter kernels ("FIG. 4A illustrates an example network architecture of our method. FIG. 4B illustrates example sub-networks of the example network architecture. The sub-networks may include the feature extraction, feature reweighting, and reconstruction networks. The numbers under each network layer represent the output channels at corresponding layers. The filter size is 3×3 at all layers," par. 42).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 15
 Regarding Claim 15, Xiao et al. teach an image processing method performed by an image processing device, the image processing method comprising: obtaining a plurality of difference maps between a first frame or a first feature map corresponding to the first frame, and a plurality of second feature maps corresponding to a second frame; ("The computing system may generate a first feature map for the first frame and one or more second feature maps for the one or more second frames," par. 25) obtaining a plurality of third feature maps and a plurality of fourth feature maps by performing a first pooling process based on a first size, and a second pooling process based on a second size, on the plurality of difference maps; ("when an image having a size of 24×24 pixels is input in the neural network 1 of FIG. 1, the input image may be output as feature maps of 4 channels having a 20×20 size via a convolution operation between the input image and the kernel. Also, feature maps of 4 channels having a 10×10 size may be output by using only one or more of pixel values of the feature maps of 4 channels having the 20×20 size via a sub-sampling process. Methods for sub-sampling may include max-pooling," par. 57) obtaining a plurality of modified difference maps by weighted-summing the plurality of third feature maps and the plurality of fourth feature maps; ("The computing system may generate, by the feature weighting module, a pixel-wise weighting map for each of the one or more up-sampled and warped second feature maps. The computing system may further multiply the pixel-wise weighting map with the corresponding up-sampled and warped second feature map to generate a reweighted feature map for the corresponding second frame," par. 47) identifying any one collocated sample by considering sizes of sample values ​​of collocated samples of the plurality of modified difference maps corresponding to a current sample of the first frame; ("the feature reweighting module may be a 3-layer convolutional neural network, which may take the RGB-D of the zero-upsampled current frame as well as the zero-upsampled, warped previous frames as input, and generate a pixel-wise weighting map for each previous frame, with values between 0 and 10, where 10 is a hyperparameter," par. 47) and determining a motion vector of the current sample ("The determining may comprise identifying a motion vector for the corresponding second frame having the resolution lower than the target resolution and resizing the motion vector to the target resolution based on bilinear up-sampling," par. 44).
Xiao et al. do not explicitly teach all of determining a filter kernel used to obtain one of the plurality of second feature maps corresponding to one of the plurality of modified difference maps comprising the identified collocated sample.
However, Song et al. teach determining a filter kernel used to obtain one of the plurality of second feature maps corresponding to one of the plurality of modified difference maps comprising the identified collocated sample ("As the convolution operation between the first feature map FM1 and the kernel is performed, a channel of the second feature map FM2 may be generated," par. 61).
Xiao et al. and Song et al. are combined as per claim 1.
Claim 16
 Regarding claim 16, Xiao et al. and Song et al. teach the image processing method of claim 15 as noted above.
Xiao et al. do not explicitly teach all of wherein a first stride used in the first pooling process, and a second stride used in the second pooling process, are different from each other.
However, Song et al. teach wherein a first stride used in the first pooling process, and a second stride used in the second pooling process, are different from each other ("The neural network apparatus may generate the sub-kernels 540 by dividing the second kernel 530 based on a stride value. According to an embodiment, the neural network apparatus may divide the second kernel 530 into the sub-kernels 540, the number of the sub-kernels 540 corresponding to a value obtained by squaring the stride value," par. 79).
Xiao et al. and Song et al. are combined as per claim 15.
Claim 17
 Regarding claim 17, Xiao et al. and Song et al. teach the image processing method of claim 16 as noted above.
Xiao et al. do not explicitly teach all of wherein the first size and the first stride are greater than the second size and the second stride.
However, Song et al. teach wherein the first size and the first stride are greater than the second size and the second stride ("The neural network apparatus may generate the sub-kernels 540 by dividing the second kernel 530 based on a stride value. According to an embodiment, the neural network apparatus may divide the second kernel 530 into the sub-kernels 540, the number of the sub-kernels 540 corresponding to a value obtained by squaring the stride value," par. 79).
Xiao et al. and Song et al. are combined as per claim 15.
Claim 18
 Regarding claim 18, Xiao et al. and Song et al. teach the image processing method of claim 17 as noted above.
Xiao et al. do not explicitly teach all of wherein the first size and the first stride are k and k is a natural number, and wherein the second size and the second stride are k/2.
However, Song et al. teach wherein the first size and the first stride are k and k is a natural number, and wherein the second size and the second stride are k/2 ("The neural network apparatus may generate the sub-kernels 540 by dividing the second kernel 530 based on a stride value. According to an embodiment, the neural network apparatus may divide the second kernel 530 into the sub-kernels 540, the number of the sub-kernels 540 corresponding to a value obtained by squaring the stride value," par. 79).
Xiao et al. and Song et al. are combined as per claim 15.
Claim 19
 Regarding claim 19, Xiao et al. and Song et al. teach the image processing method of claim 15 as noted above.
Song et al. do not explicitly teach all of obtaining, from a neural network, a first weight applied to the plurality of third feature maps, and a second weight applied to the plurality of fourth feature maps.
However, Xiao et al. teach obtaining, from a neural network, a first weight applied to the plurality of third feature maps, and a second weight applied to the plurality of fourth feature maps ("this subnetwork may process each input frame individually and share weights across all frames except for the current frame …  In particular embodiments, the initial feature map may be based on a first number of channels whereas each of the first feature map and the one or more second feature maps may be based on a second number of channels," par. 42).
Xiao et al. and Song et al. are combined as per claim 15.
Claim 20
 Regarding claim 20, Xiao et al. and Song et al. teach the image processing method of claim 15 as noted above.
Xiao et al. teach obtaining the plurality of modified difference maps by weighted-summing the plurality of third feature maps and the plurality of fourth feature maps, ("The computing system may generate, by the feature weighting module, a pixel-wise weighting map for each of the one or more up-sampled and warped second feature maps. The computing system may further multiply the pixel-wise weighting map with the corresponding up-sampled and warped second feature map to generate a reweighted feature map for the corresponding second frame," par. 47) based on a first preliminary weight and a second preliminary weight that are output from a neural network; ("this subnetwork may process each input frame individually and share weights across all frames except for the current frame …  In particular embodiments, the initial feature map may be based on a first number of channels whereas each of the first feature map and the one or more second feature maps may be based on a second number of channels," par. 42) determining motion vectors corresponding to samples of the first frame, from the plurality of modified difference maps; ("In particular embodiments, the computing system may determine the motion estimation between the associated second time and the first time," par. 44) and motion-compensating the second frame based on the motion vectors, ("In particular embodiments, generating the reconstructed frame corresponding to the first frame may comprise combining the up-sampled first feature map and the reweighted feature maps associated with the one or more second frames," par. 49) wherein the neural network is trained based on first loss information corresponding to a difference between the motion-compensated second frame and the first frame ("The training loss of our method, as given in Eq. (1), may be a weighted combination of the perceptual loss computed from a pretrained VGG-16 network and the structural similarity index (SSIM)," par. 51).
Xiao et al. and Song et al. are combined as per claim 15.

2nd Claim Rejections - 35 USC § 103
Claim 7 is rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2021 0366082 A1, (Xiao et al.) and US Patent Publication 2019 0138898 A1, (Song et al.) in view of US Patent Publication 2023 11604993 B1, (Tan).
Claim 7
 Regarding Claim 7, Xiao et al. and Song et al. teach the image processing method of claim 6 as noted above.
Xiao et al. and Song et al. do not explicitly teach all of wherein the neural network is trained further based on second loss information indicating how much a sum of the first preliminary weight and the second preliminary weight differs from a predetermined threshold.
However, Tan teaches wherein the neural network is trained further based on second loss information indicating how much a sum of the first preliminary weight and the second preliminary weight differs from a predetermined threshold ("The difference may comprise a loss calculated between the target weights and weights specified by the filter. In some examples, the techniques may comprise adjusting parameters associated with a filter of a convolutional layer based at least in part on the second loss (e.g., adjusting weights of the filter to reduce a magnitude of the second loss)," col. 2, line 34).
Therefore, taking the teachings of Xiao et al., Song et al., and Tan as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature map generating methods as taught by Xiao et al. and the kernel division methods as taught by Song et al. to use the weight difference threshold calculation as taught by Tan. The suggestion/motivation for doing so would have been that, “(e.g., adjusting weights of the filter to reduce a magnitude of the second loss). This may functionally drive parameters associated with the filter towards the target parameters” as noted by the Tan disclosure in paragraph [10], which also motivates combination because the combination would predictably have a higher efficiency as there is a reasonable expectation that the optimized filter weights derived from the threshold calculation would result in faster convergence and improved accuracy of the neural network model; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

3rd Claim Rejections - 35 USC § 103
Claim 8 is rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2021 0366082 A1, (Xiao et al.) and US Patent Publication 2019 0138898 A1, (Song et al.) in view of US Patent Publication 2022 0108183 A1, (Arpit).
Claim 8
 Regarding Claim 8, Xiao et al. and Song et al. teach the image processing method of claim 6 as noted above.
Xiao et al. and Song et al. do not explicitly teach all of wherein the neural network is trained further based on third loss information indicating how small negative values of ​​the first preliminary weight and the second preliminary weight are.
However, Arpit teaches wherein the neural network is trained further based on third loss information indicating how small negative values of ​​the first preliminary weight and the second preliminary weight are ("The training aims to minimize the loss L(Enc,Dec;λ,τ,B,K) based on the theory above, where λ is the regularization weight, τ is the temperature hyperparameter, B is the mini-batch size, and K≥B is the number of samples used to estimate the negative component of the constructive loss," par. 33).
Therefore, taking the teachings of Xiao et al., Song et al., and Arpit as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature map generating methods as taught by Xiao et al. and the kernel division methods as taught by Song et al. to use the weight regularization techniques as taught by Arpit. The suggestion/motivation for doing so would have been that, “ The negative component is minimized when the representations of the latent variables z_q are uniformly distributed over the unit hyper-sphere. Minimizing the negative component may minimize the contrastive loss” as noted by the Arpit disclosure in paragraph [0043], which also motivates combination because the combination would predictably have a higher efficiency as there is a reasonable expectation that the resulting model would produce more uniformly distributed representations, thereby reducing contrastive loss and improving feature representation quality/efficiency; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

4th Claim Rejections - 35 USC § 103
Claim 11 is rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2021 0366082 A1, (Xiao et al.) and US Patent Publication 2019 0138898 A1, (Song et al.) in view of US Patent Publication 2022 0292806 A1, (Su et al.).
Claim 11
 Regarding Claim 11, Xiao et al. and Song et al. teach the image processing method of claim 10 as noted above.
Xiao et al. teach wherein a first distance between samples of the first frame on which a first convolution operation with the first filter kernel is performed, and a second distance between samples of the second frame on which a second convolution operation with the plurality of second filter kernels is performed ("generating the first feature map for the first frame and the one or more second feature maps for the one or more second frames may be based on one or more convolutional neural networks …  The numbers under each network layer represent the output channels at corresponding layers. The filter size is 3×3 at all layers," par. 42).
Xiao et al. and Song et al. do not explicitly teach all a first distance between samples and a second distance between samples are greater than 1.
However, Su et al. teaches a first distance between samples and a second distance between samples are greater than 1 ("Accordingly, a dilation rate of three (D=3) 206 implies that between two kernel elements 208 there are two free spaces, and the size of the receptive field grows to 7×7," par. 61).
Therefore, taking the teachings of Xiao et al., Song et al., and Su et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature map generating methods as taught by Xiao et al. and the kernel division methods as taught by Song et al. to use the dilated convolution techniques as taught by Su et al. The suggestion/motivation for doing so would have been that, “Dilated convolution, which means the convolution kernel has a dilation rate of at least two or higher, as illustrated in FIG. 2 may also have the effect of increasing the receptive field, while still capture well the local information” as noted by the Su et al. disclosure in paragraph [0060], which also motivates combination because the combination would predictably have a higher efficiency as there is a reasonable expectation that the increased receptive field would allow for a broader context to be captured in the feature maps without a proportional increase in computational parameters or loss of resolution; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

5th Claim Rejections - 35 USC § 103
Claims 12 and 13 are rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2021 0366082 A1, (Xiao et al.) and US Patent Publication 2019 0138898 A1, (Song et al.) in view of US Patent Publication 2018 0174031 A1, (Yang et al.).
Claim 12
 Regarding Claim 12, Xiao et al. and Song et al. teach the image processing method of claim 10 as noted above.
Xiao et al. and Song et al. do not explicitly teach all of wherein, in the first filter kernel, a sample corresponding to the current sample of the first frame has a preset first value, and other samples of the first filter kernel have a value of 0.
However, Yang et al. teaches wherein, in the first filter kernel, a sample corresponding to the current sample of the first frame has a preset first value, and other samples of the first filter kernel have a value of 0 ("Each of the 3×3 kernels in the identity-value convolutional layer P.sub.1 2021 contains numerical value “0” except those kernels located on the diagonal of the N×N kernels. Each of the diagonal kernels 2022 contains numerical value “0” in each of the eight perimeter positions and “1” in the center position," par. 92).
Therefore, taking the teachings of Xiao et al., Song et al., and Yang et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature map generating methods as taught by Xiao et al. and the kernel division methods as taught by Song et al. to use the filter kernels of an identity-value convolutional layer as taught by Yang et al. The suggestion/motivation for doing so would have been that, “As shown in FIG. 20C, the third replacement convolutional layer 2053 contains N×2N of 3×3 filter kernels formed by two identity-value convolutional layer P.sub.1 2021 each containing N×N of 3×3 filter kernels in a vertical stack. As a result, the third particular convolutional layer 2053 is configured for 2N-channel input and N-channel output” as noted by the Yang et al. disclosure in paragraph [0094], which also motivates combination because the combination would predictably have an additional utility as there is a reasonable expectation that the resulting convolutional layer would maintain structural integrity while effectively reducing computational complexity, enabling efficient processing of higher-channel inputs into lower-channel outputs; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
Claim 13
 Regarding Claim 13, Xiao et al., Song et al., and Yang et al. teach the image processing method of claim 12 as noted above.
Xiao et al. and Song et al. do not explicitly teach all of the image processing device of claim 12, wherein, in the plurality of second filter kernels, any one sample has a preset second value, and other samples of the plurality of second filter kernels have a value of 0, and wherein positions of samples having the preset second value in the plurality of second filter kernels are different from each other.

    PNG
    media_image3.png
    444
    426
    media_image3.png
    Greyscale
[AltContent: textbox (Figure 20A shows the plurality of kernels in the identity-value layer.)]However, Yang et al. teaches the image processing device of claim 12, wherein, in the plurality of second filter kernels, any one sample has a preset second value, and other samples of the plurality of second filter kernels have a value of 0, and wherein positions of samples having the preset second value in the plurality of second filter kernels are different from each other ("Each of the 3×3 kernels in the identity-value convolutional layer P.sub.1 2021 contains numerical value “0” except those kernels located on the diagonal of the N×N kernels. Each of the diagonal kernels 2022 contains numerical value “0” in each of the eight perimeter positions and “1” in the center position," par. 92).
Xiao et al., Song et al., and Yang et al. are combined as per claim 12.

6th Claim Rejections - 35 USC § 103
Claim 14 is rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2021 0366082 A1, (Xiao et al.) and US Patent Publication 2019 0138898 A1, (Song et al.) in view of US Patent Publication 2018 0174031 A1, (Yang et al.) and US Patent Publication 2019 0205740 A1, (Judd et al.).
Claim 14
 Regarding Claim 14, Xiao et al., Song et al., and Yang et al. teach the image processing method of claim 13 as noted above.
Xiao et al., Song et al., and Yang et al. do not explicitly teach all of wherein a sign of the preset first value and a sign of the preset second value are opposite to each other.
However, Judd et al. teaches wherein a sign of the preset first value and a sign of the preset second value are opposite to each other ("In the example of FIG. 3, the calculation of the complete filter would take one additional cycle, only the first three cycles are shown here. The elements of both filters have the same values with opposite signs,” par. 56).
Therefore, taking the teachings of Xiao et al., Song et al., Yang et al., and Judd et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify feature map generating methods as taught by Xiao et al. and the kernel division methods as taught by Song et al. to use the filter kernels of an identity-value convolutional layer as taught by Yang et al. and the filter calculation as taught by Judd et al. The suggestion/motivation for doing so would have been that, the preset filter values as modified by having opposite signs can yield a predictable result of improved feature extraction since the oppositely signed filters allow for direct, simultaneous, or accelerated calculation of both positive and negative feature responses within a single pass, thereby reducing the total number of required filters and associated convolutional operations. Thus, a person of ordinary skill would have appreciated including in the preset filter values the opposite signs since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Reference Cited
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
US Patent Publication 2019 0362518 A1 to Croxford et al. discloses a method of processing video data representative of a video comprising a first and second frame to generate output data representative of at least one feature of the second frame.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KARSTEN F LANTZ whose telephone number is (571) 272-4564. The examiner can normally be reached Monday-Friday 8:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ms. Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Karsten F. Lantz/Examiner, Art Unit 2664


Date: 3/5/2026

/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664
Read full office action
IMAGE PROCESSING DEVICE DETERMINING MOTION VECTOR BETWEEN FRAMES, AND METHOD THEREBY

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

IMAGE PROCESSING DEVICE DETERMINING MOTION VECTOR BETWEEN FRAMES, AND METHOD THEREBY

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email