DETAILED ACTION
Status of Claims
Claim(s) 1-24 and 31-36 are pending and are examined herein.
Claim(s) 1 and 13 have been Amended. Claim(s) 25-30 previously Canceled.
Claim(s) 1-24 and 31-36 remain rejected under 35 U.S.C. § 103.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on July 30, 2025 has been entered. Claims 1-24 and 31-36 are pending in the application. Applicant’s amendments to claims have overcome the rejection under 35 U.S.C. § 101 previously set forth in the Non-Final Office Action mailed on May 02, 2025. Applicant’s amendments to the claims have been fully considered and are addressed in the rejections below.
Response to Arguments
Applicant's arguments, with respect to the rejection under 35 U.S.C. § 103 filed on 07/30/2025 (see remarks Pp. 21-22) have been fully considered but are moot in view of the new grounds of rejection necessitated by amendments.
The examiner refers to the updated rejection under 35 U.S.C. § 103 for more details.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim(s) 31-36 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Specifically, newly presented claim 31 recites “An edge device capable of performing machine learning, comprising: at least one memory storing computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to: [perform the recited configuration steps].” The specification as filed describes elastic bottleneck blocks and method/system for the configuration of an elastic bottleneck block (see, e.g., [0005], [0023]-[0025], [0030], [0043]-[0044], [0077], [0080], and [0110]-[0125]). While the specification support that the described method or system may be used to deploy a machine learning model that includes elastic bottleneck blocks on hardware devices, the disclosure does not provide written description of an edge device structurally integrated with a processing system that performs the claimed configuration steps. Nor does the disclosure provide written description of an edge device that sends a request or remote command that causes the processing system to execute the configuration steps.
Accordingly, the claim language introduce limitations (i.e., an edge device comprising processors and memory configured to cause a processing system to perform the recited steps) that lack adequate written description.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim(s) 31-36 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, for pre-AIA the applicant regards as the invention.
Specifically, claim 31 recites: “An edge device capable of performing machine learning, comprising: at least one memory storing computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to” perform the recited configuration steps. However, the recited “the processing system” lacks sufficient antecedent basis in the claim. It is unclear from the claim how the edge device execute or interacts with the processing system to perform the recited steps. The specification does not provide an detail about an edge device that structurally contains a processing system to perform these operations, nor does it describe how the edge device would cause a processing system to perform such operations.
Accordingly, the specification fails to provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
Dependent claims 32-36 inherit the deficiencies of their parent claim 31 and are therefore rejected for the same reason.
For examination purposes, the claimed “edge device” is interpreted as “a processing system” capable of performing the recited steps.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-2, 8-10, 13-14, 20-22, and 31-32 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (NPL: "Dynamic Recursive Neural Network." (2019)) in view of Hanagandi et al., (Pub. No.: US 20210192336 A1).
Regarding Amended Claim 1,
Guo discloses the following:
A method of machine learning, comprising: (Guo, [P. 1, Section: 1, Col 2] “… we propose Dynamic Recursive Neural Network which can reuse blocks dynamically. Fig. 1 gives an overview of our approach. In DRNN, feature that is embedded with high-level information can be brought back to refine low-level filters. In this way, the recursive structure makes full use of the parameters. We introduce a gate unit to determine whether to jump out of the loop in advance. It means that different inputs could loop different times in dynamic recursive blocks, which significantly saves computational resources.” [P. 6, Col. 1] “Our proposed approach for dynamic recursive networks is applicable to both deep network architectures and shallower ones.” [P. 6, Col. 2] “Our proposed method can be applied not only to deep networks, but also to shallower networks like ResNet-20. DRResNet 16 can also reduce parameters and computation by 33.3% and 22.0% while outperforming its counterpart.”)
dynamically configuring, at runtime, a number of loops for a convolution layer of an elastic bottleneck block in a machine learning model architecture; (Guo, [Abstract] “we demonstrate that the DRNN can achieve better performance with fewer blocks by employing block recursively. We further add a gate structure to each block, which can adaptively decide the loop times of recursive blocks to reduce the computational cost.” [P. 1, Col. 2] “… we propose Dynamic Recursive Neural Network which can reuse blocks dynamically. Fig. 1 gives an overview of our approach. … We introduce a gate unit to determine whether to jump out of the loop in advance. It means that different inputs could loop different times in dynamic recursive blocks, which significantly saves computational resources.” [P. 7, Col. 2] “Due to our proposed approach for dynamic recursive networks is general, we also apply dynamic recursive block on MobileNetV2. The result show that the training of dynamic recursive models can be applied to convolution and depthwise-sparable convolution layers in different building blocks. These results indicate that the parameters of convolutional layer is underused and DRNN is an effective means to adaptively assemble network graph on the fly.”) [Examiner’s Note: adaptively decide the loop times / decide whether to jump out of the loop in advance, which loop different times in dynamic recursive blocks. This reads on the dynamic configuration a number of loops of a convolutional layer of an elastic bottleneck block. The gate structure implemented on the residual bottleneck block of a machine learning architecture (i.e., ResNet and MobileNetV2) corresponds to the claimed “elastic bottleneck block.”] for each loop of the number of loops:…., performing a convolution operation … (Guo, [P. 3, Col. 1] “More precisely, let x* be the input of the ith loop of the block, jF(x') be the output and the number of loops is N . We denote x° as the input of recursive block, then x Y is the output of the block.” [P. 5, Figure 5] “Next, the outputs of convolution layer will be divided in to two groups(green and gray) and normalized independently.”)
Guo does not appear to explicitly teach:
for each loop of the number of loops: loading a loop-specific set of convolution weights; performing a convolution operation using the loop-specific set of convolution weights; and storing loop-specific convolution results in a local memory; and determining an output of the convolution layer with the configured number of loops based on a summation of loop-specific convolution results associated with each loop of the number of loops.
However, Hanagandi, in combination with Guo, teaches the following:
for each loop of the number of loops: loading a loop-specific set of convolution weights; (Hanagandi, [0035] “At each clock cycle in the convolution computation, the controller 310 can access the weights kernel 500 from the memory 305, can select (i.e., can be adapted to select, can configured to select, can execute a program to cause selection of, etc.) a specific weight value from the weights kernel 500 and can load (i.e., can be adapted to load, can be configured to load, can execute a program to cause loading of, etc.) that specific weight value into all the primary processing elements 301 in the sub-array 399 and, particularly, into the multipliers 323 of all of MAC units 322 therein …etc.”) performing a convolution operation using the loop-specific set of convolution weights; (Hanagandi, [0035] “so that each MAC unit 322 of each primary processing element 301 can perform a MAC operation using the corresponding activation value stored in the register 325 and that specific weight value.” [0055] “The method can further include, after the pre-loading of the activation values at process step 2024, performing the convolution operation at process steps 2026-2028 of FIG. 20. Specifically, the method can include, at each clock cycle, accessing (e.g., by the controller 310 from the memory 305) the M×M weights kernel and selecting (e.g., by the controller 310) a specific weight value. The method can further include loading the specific weight value into the multipliers 323 of all the MAC units 322 of all of the primary processing elements 301 so that within each MAC Unit of each primary processing element the multiplier determines the product of (i.e., multiplies) the stored activation value by the specific weight value. …etc.”) storing loop-specific convolution results in a local memory; (Hanagandi, [0037] “More specifically, at the end of each clock cycle in the convolution computation, accumulated partial product inputs are forwarded by the processing elements to all immediately adjacent processing elements such that each multiplexor in each processing element receives accumulated partial product inputs from all immediately adjacent processing elements. [0040] “Within the second processing elements 302, the selected accumulated partial product input 338 will simply be buffered (i.e., temporarily held) by the buffer 331 and then output at the end of the clock cycle to each immediately adjacent processing element for possible selection in the next clock cycle (i.e., as an accumulated partial product input available for selection by the multiplexor of that adjacent processing element during a next clock cycle).”) and determining an output of the convolution layer with the configured number of loops based on a summation of loop-specific convolution results associated with each loop of the number of loops. (Hanagandi, [0040] “The accumulator 324 can then determine the sum 329 (i.e., can be adapted to determine the sum, can be configured to determine the sum, etc.) of the product 327 from the multiplier 322 and the selected accumulated partial product 328 from the multiplexor 325.” [Abstract] “the computation can be completed in a relatively low number of clock cycles, which is independent of the number of activation values in the activation matrix and which is equal to the number of weight values in a weights kernel.”)
Accordingly, at the effective filing date of the claimed invention, it would have been prima facie obvious to one of ordinary skill in the art to modify the combination of Guo and Hanagandi to incorporate the hardware implementation of convolution computation as taught by Hanagandi. One would have been motivated to make such a combination in order to reduce the area and energy consumption associated with convolution computations. Doing so would enable efficient neural network processing (Hanagandi [0025]-[0026]).
Regarding Original Claim 2, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above, and further teaches:
further comprising, for each loop of the number of loops, accumulating the loop-specific convolution results to a current convolution results value stored in the local memory. (Hanagandi, [0055]-[0059] “The method can also include, during each successive clock cycle of the convolution computation, …., the accumulator 324 determines the sum 329 of (i.e., accumulates) the product 327 from the multiplier 323 and the selected accumulated partial product input 328 from the multiplexor 325 and then outputs the sum 329 at the end of the clock cycle to all immediately adjacent processing elements for possible selection in the next clock cycle; ….”)
Regarding Original Claim 8, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above, and further teaches:
Guo further teaches: wherein the convolution layer is one of a plurality of convolution layers in the elastic bottleneck block. (Guo, [P. 7, Section: 4.2] “For the deeper network, reusing the parameters of convolutional layer first improve accuracy, before increasing the loop time further decrease accuracy insignificantly. This demonstrates that reusing the parameters of convolutional layer is efficient to improve the capacity of network by dynamic recursive mechanism. … The recursive structure benefits feature re-usage. Due to our proposed approach for dynamic recursive networks is general, we also apply dynamic recursive block on MobileNetV2. The result show that the training of dynamic recursive models can be applied to convolution and depthwise-sparable convolution layers in different building blocks. These results indicate that the parameters of convolutional layer is underused and DRNN is an effective means to adaptively assemble network graph on the fly.”)
Regarding Original Claim 9, Guo in view of Hanagandi teaches the elements of claim 8 as outlined above, and further teaches:
Xu further teaches: wherein the convolution layer comprises a pointwise convolution layer. (Guo, [P. 7, Section: 4.2] “For the deeper network, reusing the parameters of convolutional layer first improve accuracy, before increasing the loop time further decrease accuracy insignificantly. This demonstrates that reusing the parameters of convolutional layer is efficient to improve the capacity of network by dynamic recursive mechanism. … The recursive structure benefits feature re-usage. Due to our proposed approach for dynamic recursive networks is general, we also apply dynamic recursive block on MobileNetV2. The result show that the training of dynamic recursive models can be applied to convolution and depthwise-sparable convolution layers in different building blocks. These results indicate that the parameters of convolutional layer is underused and DRNN is an effective means to adaptively assemble network graph on the fly.”) [Examiner’s Note: the depthwise-separable convolution layer composed of a depthwise convolution layer and a point-wise convolution layer.]
Regarding Original Claim 10, Guo in view of Hanagandi teaches the elements of claim 8 as outlined above, and further teaches:
Guo further teaches: wherein the convolution layer comprises a depthwise convolution layer. (Guo, [P. 7, Section: 4.2] “For the deeper network, reusing the parameters of convolutional layer first improve accuracy, before increasing the loop time further decrease accuracy insignificantly. This demonstrates that reusing the parameters of convolutional layer is efficient to improve the capacity of network by dynamic recursive mechanism. … The recursive structure benefits feature re-usage. Due to our proposed approach for dynamic recursive networks is general, we also apply dynamic recursive block on MobileNetV2. The result show that the training of dynamic recursive models can be applied to convolution and depthwise-sparable convolution layers in different building blocks. These results indicate that the parameters of convolutional layer is underused and DRNN is an effective means to adaptively assemble network graph on the fly.”) [Examiner’s Note: the depthwise-separable convolution layer composed of a depthwise convolution layer and a point-wise convolution layer.]
Regarding Amended Claim 13,
Guo discloses the following:
A processing system comprising: at least one memory storing computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to: dynamically configuring, at runtime, a number of loops for a convolution layer of an elastic bottleneck block in a machine learning model architecture; (Guo, [Abstract] “This paper proposes the dynamic recursive neural network (DRNN), which simplifies the duplicated building blocks in deep neural network. Different from forwarding through different blocks sequentially in previous networks, we demonstrate that the DRNN can achieve better performance with fewer blocks by employing block recursively. We further add a gate structure to each block, which can adaptively decide the loop times of recursive blocks to reduce the computational cost. Since the recursive networks are hard to train, we propose the Loopy Variable Batch Normalization (LVBN) to stabilize the volatile gradient. Further, we improve the LVBN to correct statistical bias caused by the gate structure. Experiments show that the DRNN reduces the parameters and computational cost and while outperforms the original model in term of the accuracy consistently on CIFAR10 and ImageNet-lk. Lastly we visualize and discuss the relation between image saliency and the number of loop time.” [P. 1, Col. 2] “… we propose Dynamic Recursive Neural Network which can reuse blocks dynamically. Fig. 1 gives an overview of our approach. … We introduce a gate unit to determine whether to jump out of the loop in advance. It means that different inputs could loop different times in dynamic recursive blocks, which significantly saves computational resources.” [P. 7, Col. 2] “Due to our proposed approach for dynamic recursive networks is general, we also apply dynamic recursive block on MobileNetV2. The result show that the training of dynamic recursive models can be applied to convolution and depthwise-sparable convolution layers in different building blocks. These results indicate that the parameters of convolutional layer is underused and DRNN is an effective means to adaptively assemble network graph on the fly.” [] “4. Experiments We perform a series experiments to evaluate our Dynamic Recursive Neural Network on image classification benchmarks.) [Examiner’s Note: adaptively decide the loop times / decide whether to jump out of the loop in advance, which loop different times in dynamic recursive blocks. This reads on the dynamic configuration a number of loops of a convolutional layer of an elastic bottleneck block. The gate structure implemented on the residual bottleneck block of a machine learning architecture (i.e., ResNet and MobileNetV2) corresponds to the claimed “elastic bottleneck block.” The DRNN architecture is a computer based network architecture and would inherently include a memory and a processor to be executed.] for each loop of the number of loops:…., performing a convolution operation … (Guo, [P. 3, Col. 1] “More precisely, let x* be the input of the ith loop of the block, jF(x') be the output and the number of loops is N . We denote x° as the input of recursive block, then x Y is the output of the block.” [P. 5, Figure 5] “Next, the outputs of convolution layer will be divided in to two groups(green and gray) and normalized independently.”)
Guo does not appear to explicitly teach:
for each loop of the number of loops: loading a loop-specific set of convolution weights; performing a convolution operation using the loop-specific set of convolution weights; and storing loop-specific convolution results in a local memory; and determining an output of the convolution layer with the configured number of loops based on a summation of loop-specific convolution results associated with each loop of the number of loops.
However, Hanagandi, in combination with Guo, teaches the following:
for each loop of the number of loops: loading a loop-specific set of convolution weights; (Hanagandi, [0035] “At each clock cycle in the convolution computation, the controller 310 can access the weights kernel 500 from the memory 305, can select (i.e., can be adapted to select, can configured to select, can execute a program to cause selection of, etc.) a specific weight value from the weights kernel 500 and can load (i.e., can be adapted to load, can be configured to load, can execute a program to cause loading of, etc.) that specific weight value into all the primary processing elements 301 in the sub-array 399 and, particularly, into the multipliers 323 of all of MAC units 322 therein …etc.”) performing a convolution operation using the loop-specific set of convolution weights; (Hanagandi, [0035] “so that each MAC unit 322 of each primary processing element 301 can perform a MAC operation using the corresponding activation value stored in the register 325 and that specific weight value.” [0055] “The method can further include, after the pre-loading of the activation values at process step 2024, performing the convolution operation at process steps 2026-2028 of FIG. 20. Specifically, the method can include, at each clock cycle, accessing (e.g., by the controller 310 from the memory 305) the M×M weights kernel and selecting (e.g., by the controller 310) a specific weight value. The method can further include loading the specific weight value into the multipliers 323 of all the MAC units 322 of all of the primary processing elements 301 so that within each MAC Unit of each primary processing element the multiplier determines the product of (i.e., multiplies) the stored activation value by the specific weight value. …etc.”) storing loop-specific convolution results in a local memory; (Hanagandi, [0037] “More specifically, at the end of each clock cycle in the convolution computation, accumulated partial product inputs are forwarded by the processing elements to all immediately adjacent processing elements such that each multiplexor in each processing element receives accumulated partial product inputs from all immediately adjacent processing elements. [0040] “Within the second processing elements 302, the selected accumulated partial product input 338 will simply be buffered (i.e., temporarily held) by the buffer 331 and then output at the end of the clock cycle to each immediately adjacent processing element for possible selection in the next clock cycle (i.e., as an accumulated partial product input available for selection by the multiplexor of that adjacent processing element during a next clock cycle).”) and determining an output of the convolution layer with the configured number of loops based on a summation of loop-specific convolution results associated with each loop of the number of loops. (Hanagandi, [0040] “The accumulator 324 can then determine the sum 329 (i.e., can be adapted to determine the sum, can be configured to determine the sum, etc.) of the product 327 from the multiplier 322 and the selected accumulated partial product 328 from the multiplexor 325.” [Abstract] “the computation can be completed in a relatively low number of clock cycles, which is independent of the number of activation values in the activation matrix and which is equal to the number of weight values in a weights kernel.”)
Accordingly, at the effective filing date of the claimed invention, it would have been prima facie obvious to one of ordinary skill in the art to modify the combination of Guo and Hanagandi to incorporate the hardware implementation of convolution computation as taught by Hanagandi. One would have been motivated to make such a combination in order to reduce the area and energy consumption associated with convolution computations. Doing so would enable efficient neural network processing (Hanagandi [0025]-[0026]).
Regarding Original Claim 14,
The claim recites substantially similar limitations as corresponding claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding Original Claim 20,
The claim recites substantially similar limitations as corresponding claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Regarding Original Claim 21,
The claim recites substantially similar limitations as corresponding claim 9 and is rejected for similar reasons as claim 9 using similar teachings and rationale.
Regarding Original Claim 22,
The claim recites substantially similar limitations as corresponding claim 10 and is rejected for similar reasons as claim 10 using similar teachings and rationale.
Regarding New Claim 31,
As established earlier, the claim exhibits a 112(a) and 112(b) issues, and hence for the purposes of examination, the claim is broadly interpreted to recite:
“A processing system capable of performing machine learning, comprising:
at least one memory storing computer-executable instructions; and
one or more processors configured to execute the computer-executable instructions and cause the processing system to: …etc.”
Accordingly, the claim recites substantially similar limitation as corresponding claims 1 and 13 and is rejected for similar reasons as claim 1 and 13 using similar teachings and rationale.
New claim 31 further recites “dynamically configure, at runtime, a number of loops for a convolution layer of an elastic bottleneck block in a machine learning model architecture executable on the edge device”
Guo also teaches: [P. 2, Section ] “DRNN, combined with the gate unit and I-LVBN, achieves even better performance while accelerating the inference of deep networks. To evaluate DRNN, we use MobileNetV2 [33] and ResNet [13] as the base models on classification and other visual tasks. We approve that, with the progressive strategy designed for recursive network, Dynamic Recursive (DR) ResNet-53 outperforms ResNet-101 while reducing model parameters by 47.0% and computational cost by 35.2%. Further, we study the dynamic recursive behavior of the learned model and reveal the relation between the image saliency and the number of loop time.” [P. 2, Col. 2] “Thanks to the loop structure controlled by gate units, DRNN could reuse one block dynamically. In experiments, we prove even compact model like MobileNetV2 could be further improved by applying the dynamic recursive block.” [P. 7, Section: 4.2] “As expected, decreasing the target rate reduces computation time. Interestingly, increasing loop times leads to better result than reducing the execution rate. More loop time means that more high-level information is used to refine the low-level filters to get a stronger ability for representation. The recursive structure benefits feature re-usage. Due to our proposed approach for dynamic recursive networks is general, we also apply dynamic recursive block on MobileNetV2. The result show that the training of dynamic recursive models can be applied to convolution and depthwise-sparable convolution layers in different building blocks. These results indicate that the parameters of convolutional layer is underused and DRNN is an effective means to adaptively assemble network graph on the fly.”) [Examiner’s Note: the DRNN architecture is implemented in (and executed as part of) a machine learning architecture that is executable on the edge device (e.g., MobileNetV2 and ResNet).]
Regarding New Claim 32,
The claim recites substantially similar limitations as corresponding claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Claim(s) 3, 15, and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Guo in view of Hanagandi as described above, and further in view of Oh et al., (NPL: “RRNet: Repetition-reduction network for energy efficient depth estimation” (2019)).
Regarding Original Claim 3, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above:
As explained above, Guo teaches the gating unite that dynamically controls the loop times of a convolution layer in the bottleneck block.
Guo in view of Hanagandi does not appear to explicitly teach:
determining an intermediate layer mode for the convolution layer of the elastic bottleneck block; and configuring a loop parameter based on the intermediate layer mode, wherein the loop parameter configures the number of loops.
However, Oh, in combination with Guo and Hanagandi, teaches the limitations:
determining an intermediate layer mode for the convolution layer of the elastic bottleneck block; and configuring a loop parameter based on the intermediate layer mode, wherein the loop parameter configures the number of loops. (Oh, [Pp. 3-4, Section: 3] “To leverage this critical reduction potential, the proposed RRNet intensifies this bottleneck layer by iterating a RR block and stacking each output per repetition. we derive repetition parameter r from the number of repetition of RR blocks. Each iteration can follow two paths. One path is for the next layer in the encoder; the other path leads to the decoder. In the decoder path, each output per iteration is stacked in the CDCs. This block is repeated by the parameter r. Here, we introduce another two hyperparameters, a reduction parameter rr and a expansion parameter re. rr is the number of output channel in stage 1 by pointwise convolution and re is the number of output channel present after expanding the input through 3x3 depthwise separable convolution (DWconv), which limit the computation or the number of parameters when the channel numbers change.) [Examiner’s Note: Oh teaches the configuration or structure of the intermediate layer mode (i.e., lightweight layer in the bottleneck block pointwise-depthwise-pointwise) scaled to reduction or expansion mode. The scaling parameter (
r
) correspond to the claimed “loop parameter”.]
Therefore, it would have been prima facie obvious to one of ordinary skill in the art, before the effective date of the claimed invention, having the combination of Guo, Hanagandi, and Oh, to incorporate the proposed Repetition-Reduction Network for Energy Efficient Depth Estimation as taught by Oh. One would have been motivated to make such a combination in order to reduce GPU latency, power consumption, and memory usage in resource-constrained environments, while maintaining or improving the model performance (Oh [Abstract]).
Regarding Original Claim 15,
The claim recites substantially similar limitations as corresponding claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Regarding New Claim 33,
The claim recites substantially similar limitations as corresponding claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Claim(s) 4-5, 16-17, and 34 is rejected under 35 U.S.C. 103 as being unpatentable over Guo in view of Hanagandi as described above, and further in view of Gill et al., (Pub. No.: US 20210064955 A1).
Regarding Original Claim 4, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above, and further teaches:
Guo further teaches: performing a nonlinear operation on the output of the convolution layer to generate intermediate activation data; (Guo, [P. 4, Figure 3] “Figure 3: Overview of LVBN when loop time is 3. After the convolution layer, a separated BN layer will be activated each time instead of the same BN layer activated for 3 times. The different style of arrow line represents being into different loop.” [0052] “we adopt a lightweight and universal design for gate unit shown in Fig. 4. We first apply global average pooling on the input feature map to squeeze global spatial information into a 1 x 1 x C channel descriptor and then project the feature to unnormalized scores s for the discrete outputs. We opt to employ two fully connected layers around the non-linearity. where δ refers to the ReLU function, W i G R '**0 , W 2 G R2xd, Xp is the output of global average pooling layer and d is the dimension of the hidden layer. The gate unit constructed by this way adds only a computational overhead of 0.04%.”)
Guo in view of Hanagandi does not appear to explicitly teach:
providing the intermediate activation data as an input to a second convolution layer in the elastic bottleneck block.
However, Gill, in combination with Guo in view of Hanagandi, teaches the limitations:
performing a nonlinear operation on the output of the convolution layer to generate intermediate activation data; (Gill, [0133] “Forming a repeated convolutional application layer through one or more sub-layers may provide additional benefits in implementations of repeated convolution-based attention modules (including parallel repeated convolution-based attention modules as described below). For example, each sub-layer may enable the implementation to further add complexity and/or non-linearity to the module.”) and providing the intermediate activation data as an input to a second convolution layer in the elastic bottleneck block. (Gill, [0117] “the attention module output may be produced and provided as input to a subsequent repeated convolution-based attention module for further processing.” [0118] “… for applying the attention module output to a remaining neural network processing layer. In some embodiments, the remaining neural network processing layer may include one or more convolutional application layer(s), pooling layer(s), non-linearity layer(s), ….”)
Accordingly, at the effective filing date, it would have been prima facie obvious to one ordinarily skilled in the art to modify the combination of Guo and Hanagandi to incorporate the repeated convolution-based attention module as taught by Gill. One would have been motivated to make such a combination in order to utilizes a simplified structure that can be implemented in resource constrained environments, including systems with constrained processing resources, memory resources, networking resources, or the like. Doing so would enable improved performance on edge devices that are constrained by such computational resources (Gill [0031]& [0042]).
Regarding Original Claim 5, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above.
Guo in view of Hanagandi does not appear to explicitly suggest:
wherein the number of loops does not change an input size or an output size of the elastic bottleneck block
However, Gill, in combination with Guo in view of Hanagandi, teaches the limitation:
wherein the number of loops does not change an input size or an output size of the elastic bottleneck block. (Gill, [0172] “In some such embodiments, the attention input data object remains unchanged. As such, computing resources are spared by executing the initial convolutional application layer only once, and augmenting the results with the output of each iteration of the repeated convolutional application layers.”)
The same motivation that was utilized for combining Guo, Hanagandi, and Gill, as set forth in claim 4, is equally applicable to claim 5.
Regarding Original Claim 16,
The claim recites substantially similar limitations as corresponding claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale.
Regarding Original Claim 17,
The claim recites substantially similar limitations as corresponding claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.
Regarding New Claim 34,
The claim recites substantially similar limitations as corresponding claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.
Claim(s) 6, 18, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Guo in view of Hanagandi as outlined above, and further in view of Zhong et al., (Pub. No.: CN 111416743 A).
Regarding Original Claim 6, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above.
Guo in view of Hanagandi does not appear to explicitly describe the claim limitations:
loading bottleneck block configuration data; and configuring a plurality of convolution layers of the elastic bottleneck block based on the bottleneck block configuration data, wherein: the plurality of convolution layers includes the convolution layer, the bottleneck block configuration data configures a loop parameter for each respective layer of the plurality of convolution layers, and the bottleneck block configuration data configures an input size and an output size for each convolution layer of the plurality of convolution layers.
However, Zhong, in combination with Guo and Hanagandi, teaches:
loading bottleneck block configuration data; and configuring a plurality of convolution layers of the elastic bottleneck block based on the bottleneck block configuration data, (Zhong, [Pp. 2-3] “The invention is realized in this way, a convolution network accelerator configuration method, comprising: step one, judging the number of layers of a currently executed forward network layer in an overall network model through a mark, and obtaining a configuration parameter of the currently executed forward network layer; secondly, Load the feature map and weight parameters from the DDR through the configuration parameter; at the same time, the acceleration core of the convolutional layer also configures the parallelism according to the configuration parameters of the executed forward network layer.”) wherein: the plurality of convolution layers includes the convolution layer, the bottleneck block configuration data configures a loop parameter for each respective layer of the plurality of convolution layers, (Zhong, [Pp. 3-6] “Another objective of the present invention is to provide a convolutional network accelerator (a convolutional network accelerator composed of a parameter-configurable network structure and a loop call structure), which includes a convolutional layer, a pooling layer, an activation function layer, and a batch normalization operation layer; the convolution layer judges the number of layers of the currently executed forward network layer in the whole network model through the mark, obtains the configuration parameters of the currently executed forward network layer, and loads the characteristic diagram and the weight parameters from the DDR through the configuration parameters; meanwhile, the acceleration core of the convolutional layer configures the parallelism according to the configuration parameters of the executed forward network layer ….”) and the bottleneck block configuration data configures an input size and an output size for each convolution layer of the plurality of convolution layers. (Zhong, [Pp. 3-6] [0059]-[0061] “The convolutional network accelerator configuration method provided by the embodiment of the present invention includes: Step 1: Use the flag to determine the layer number of the currently executed forward network layer relative to the overall network model, and obtain the configuration parameters of the current layer, such as the size of the input and output feature maps (length, width, number of channels), the size of the convolution kernel (length, width, number of channels), the step size of the convolution and pooling operations, etc. Step 2: Load feature maps and weight parameters in batches from DDR (double data rate off-chip memory) by configuring parameters. At the same time, the acceleration kernel of the convolution layer can also configure the parallelism according to the configuration parameters.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Guo, Hanagandi, and Zhong before them, to incorporate the configuration method for a convolutional network accelerator as taught by Zhong. One would have been motivated to make such a combination in order to achieve flexible hardware reconfiguration on FPGA for different CNN network structures (Zhong [Abstract]).
Regarding Original Claim 18,
The claim recites substantially similar limitations as corresponding claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Regarding New Claim 35,
The claim recites substantially similar limitations as corresponding claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Claim(s) 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Guo in view of Hanagandi as outlined above, and further in view of Fan et al., (IDS: "F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition." (2019)).
Regarding Original Claim 7, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above:
Guo in view of Hanagandi teaches the claimed limitation “determining the output of the convolution layer based on the summation of loop-specific convolution results associated with each loop of the number of loops …” as outlined above in claim 1.
Guo in view of Hanagandi does not appear to explicitly teach:
a skip connection from an input of the convolution layer.
However, Fan, in combination with Guo and Hanagandi, teaches the limitations:
determining the output of the convolution layer based on the summation of loop-specific convolution results associated with each loop of the number of loops (Fan, [P. 6, Section B and FIGS 8 & 10] “2) Computational Flow: The computation starts from the temporal convolution. …, Then the inputs are streamed into the computational engine through the matrix buffer, where intermediate results are accumulated in the output buffer. That block of data is then fed into the ReLU, sliding window and computational engine for 3D depth-wise convolution. The process of point-wise convolution starts when 3D depth-wise convolution finishes the computation of one block.”) and a skip connection from an input of the convolution layer. (Fan, [P. 3, Section: III, Fig. 5] “we propose two variants of building block for 3D CNNs, 3D-1 BRB and 3D-3 BRB, where the number indicates the position of the 3D convolution. Figure 5 shows the structure of these two variants. …., Fig. 5: Two 3D bottleneck residual blocks, with different position of the 3D (3x1x1) convolution.”) [Note: the bottleneck residual blocks depicted in Fig. 5 show the summation of convolution results (i.e., addition) and the skip connection (i.e., Shortcut). See Figures 5 & 8.]
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Gill, Hanagandi, and Fan before them, to incorporate the algorithm-hardware co-design method for the bottleneck residual block as taught by Fan. One would have been motivated to make such a combination in order to achieve comparable performance and accuracy with higher energy efficiency for platforms with limited resources (Fan [Intro]).
Regarding Original Claim 19,
The claim recites substantially similar limitations as corresponding claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Claim(s) 11, 23, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Guo in view of Hanagandi as outlined above, and further in view of Chao et al., (Pub. No.: US 20200065251 A1).
Regarding Original Claim 11,
Guo in view of Hanagandi teaches the elements of claim 1 as outlined above, and further teaches:
Guo in view of Hanagandi does not appear to explicitly teach:
wherein the number of loops is based on a number of input data sources available for the convolution layer.
However, Chao, in combination with Guo and Hanagandi, teaches the limitation:
wherein the number of loops is based on a number of input data sources available for the convolution layer. (Chao, [0033] “The memory-adaptive processing method 100 for the convolutional neural network of the present disclosure utilizes the memory-adaptive processing technique to adaptively change the processing loop structure for the convolutional layer operation according to the size relation among the number N of the input channels of the input feature maps, the number M of the output channels of the output feature maps, the input feature map tile size, the output feature map tile size and the cache free space size of the feature map cache.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Guo, Hanagandi, and Chao before them, to incorporate the memory-adaptive processing method as taught by Chao. One would have been motivated to make such a combination in order to reduce DRAM access and power consumption (Chao [0039]).
Regarding Original Claim 23,
The claim recites substantially similar limitations as corresponding claim 11 and is rejected for similar reasons as claim 11 using similar teachings and rationale.
Regarding New Claim 36,
The claim recites substantially similar limitations as corresponding claim 11 and is rejected for similar reasons as claim 11 using similar teachings and rationale.
Claim(s) 12 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Guo in view of Hanagandi as outlined above, and further in view of Yoshinaga et al., (Pub. No.: US 20200285961 A1).
Regarding Original Claim 12, Guo in view of Hanagandi teaches the elements of claim 1 as outlined above.
Guo in view of Hanagandi does not appear to explicitly teach:
selecting a subset of input data channels from a set of input data channels, wherein the number of loops is based on a number of the selected subset of input data channels.
However, Yoshinaga, in combination with Guo and Hanagandi, teaches the limitations:
selecting a subset of input data channels from a set of input data channels, wherein the number of loops is based on a number of the selected subse