Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-23 are presented in the case.
Priority
Acknowledgment is made of applicant's claim for foreign priority based on application 110141505 filed in Taiwan on 11/08/2021. Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55. Also claims benefit of provisional 63/139,809 filed on 01/21/2021
Information Disclosure Statement
The information disclosure statements submitted on 01/18/2022 and 10/31/2022 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Interpretation
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) are: “an activation unit for performing activation operations on the first operation result” and “a pooling unit for performing pooling operations on the first operation result output from the fourth register region”. in claims 4-5.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. See PGPUB ¶35-36 where algorithms of the units are discussed (RELU and MAX-Pooling) tied to an accelerator.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1- 23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”)
Claim 1, 11, 21 and 23 have the following abstract idea analysis.
Step 1: The claim is directed to “a method, system, apparatus and CRM”. The claims are directed to the statutory categories accordingly.
Step 2A Prong 1: claims recite the abstract idea limitations of "a first operator for operating the first part of the input data and the first part of the weight data to generate a first operation result", "a second operator for operating the first operation result and the second part of the weight data to generate a second operation result," and “performing a first part of the input data and a first part of the weight data by a first operator for generating a first operation result;”. These limitations all include mathematical concepts (applying operators such as multiplication to a weight value). The specification also provides multiplication and accumulate as example operations performed as explained in USPGPUB ¶33. Formulas are mentioned in the MPEP (See MPEP 2106.04(a)(2)) for example “calculating the force of the object by multiplying its mass by its acceleration”. Other sections of the claims such as "registering the first operation result", "registering weights or a descriptor", "triggering based on a predetermined data amount" and "writing the second operation result into the memory unit." are hardware processes, too generic or high level to be listed as a mental or mathematical concept given the available descriptions and MPEP comparisons.
Step 2A Prong 2: The judicial exceptions recited in these claims are not integrated into a practical application. Merely invoking "AI algorithm operation accelerator", "a register" or "memory unit" does not yield eligibility. Claims are still in line with math concepts such as claim 1, 11, 21 and 23 are not specific to a practical application. The additional elements as such are generic registers, processors and memory which do not include specialized hardware. See MPEP § 2106.05(f).
Claim 1, 11, 21 and 23 do not include a particular field but even doing so may not be sufficient to overcome the abstract idea rejection. Merely applying an AI algorithm to a field or data without an advancement in the new field or new hardware is ineligible. MPEP § 2106.05(h).
Step 2B: The claims do not contain significantly more than their judicial exceptions. Registers and hardware are in their standard forms in the field. These additional elements are well-understood, routine, and conventional activity, see MPEP 2106.05(d)(II). Claims lacks any particular "how" or algorithm for a solution in a field in a novel way. Claims require more specificity on processes that would be incapable of simple mathematics, mental processes or use more substantial structure than conventional devices such as non-textbook implementations.
Regarding claims 2-10, 12-20 and 22. merely narrow the previously recited abstract idea limitations with more abstract concepts and/or routine fundamental processes. For the reasons described above with respect to claim 1, 11, 21 this judicial exception is not meaningfully integrated into a practical application, or significantly more than the abstract idea. Abstract idea steps 1, 2A prong 1 and 2 remain the same as independent analysis above. See specification for more practical application concepts as none are seen in claims 2-10, 12-20 and 22
With respect to step 2B These claims disclose similar limitations described for the independent claims above and do not provide anything significantly more than mathematical or mental concepts. Claims 2-10, 12-20 and 22 recite the additional elements of "wherein when the second operator is triggered to be in operation, the first operator continues in operating the input data. wherein the predetermined data amount is configured based on a batch width and a filter parameter. an activation unit for performing activation operations on the first operation result. a pooling unit for performing pooling operations on the first operation result output from the fourth register region. the first operator further includes a first operation element array having a plurality of first operation elements, and each of the first operation elements is configured to: receive the input data and the first part of the weight data corresponding to multi-dimensional positions; and process the input data and the first part of the weight data to generate a plurality of operation results as the first operation result. the second operator further includes a second operation element array having a plurality of second operation elements; and each of the second operation elements is configured to: receive the first operation result and the second part of the weight data; and process the first operation result and the second part of the weight data to generate a plurality of operation results as the second operation result. wherein the first operator has a first maximum operation capacity, the second operator has a second maximum operation capacity smaller than the first maximum operation capacity. wherein a capacity of the fourth register region is configured at least triple times of the predetermined data length of the first register region. wherein a number of the first operation elements is larger than a number of the second operation elements. wherein in the step D, when the second operator performs the second operation, the first operator and the second operator are in parallel processing state. reading the first part of the input data from the memory unit into a first register region; reading a first part of the descriptor from the memory unit into a second register region; and reading the first part of the weight data from the memory unit into a third register region. storing the first operation result of the first operator into a fourth register region. reading a second part of the weight data from the memory unit into a fifth register region. determining whether all the input data in the first register memory are read out and operated, when the step F is no, loading a next batch of the input data from the first register region, and when the step F is yes, the method proceeds to step G; determining whether all data in the fourth register region is processed, when the step G is no, a data address parameter is updated, and when the step G is yes, the method proceeds to step H; and determining whether any input data in the first register region is not read out yet, when the step H is no, the method ends, wherein the predetermined data amount is configured based on a batch width and a filter parameter. determining whether all data in the fourth register region are operated by the second operation, when the step I is no, data in the fourth register region is read out for performing the second operation, and when the step I is yes, a data address is updated and the method ends. performing activation operations on the first operation result. performing pooling operations on the first operation result. wherein the first register region is configured a predetermined data length, and a capacity of the fourth register region is configured at least triple times of the predetermined data length." These elements are more abstract concepts, generic applications to a field of use or well-understood, routine, conventional activity (see MPEP § 2106.05(d) and can't be simply appended to qualify as significantly more or being a practical application. What type of application, or structure of components beyond generic machine learning is still unknown for these claims. Therefore claims 2-10, 12-20 and 22. also recite abstract ideas that do not integrate into a practical application or amount to significantly more than the judicial exception, and are rejected under U.S.C. 101
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-8, 10-19 and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Dally et al. (US 20180046900 A1 hereinafter Dally) in view of Ramirez et al. (US 8745455 B2 hereinafter Ramirez).
As to independent claim 1, Dally teaches an AI algorithm operation accelerator adapted to perform operations on an input data in a memory unit, the memory unit including a first data storage region for storing the input data, a second data storage region for storing a descriptor which includes a weight data, and a third data storage region for storing an output data, the AI algorithm operation accelerator including: [SCNN accelerator with buffers (memory/registers) with input data, weights and outputs Fig. 2b-c 230, 235, 240 ¶51, ¶60 "dataflow relies on input buffers, weight buffer 230 and input activations buffer 235, for storing weights and input activations"]
a first register region for registering a part of the input data, wherein the first register region is configured a predetermined data length; [buffer for input data may be registers ¶87"input activations buffer 310 and buffer 320 may be a set of registers or SRAM that are configured to store the input activations and the positions associated with each input activation value"], [amount of input is compressed into a format with fixed number ¶113, 120]
a second register region for registering a first part of the descriptor; [compressed-sparse format (descriptor) and buffers ¶74, ¶120 "weights are encoded in a compressed-sparse format "]
a third register region for registering a first part of the weight data; [FIFO weight buffer 305 with a storage capacity ¶83-84 ]
a first operator for operating the first part of the input data and the first part of the weight data to generate a first operation result; [Fig. 3A 325 multipliers on input data and weight ¶72 "processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325"]
a fourth register region for registering the first operation result; [Fig. 3A 340 array includes results and buffers ¶73 "accumulator array 340 may include one or more accumulation buffers and adders to store the products generated in the multiplier array 325 and sum the products into the partial sums"]
a fifth register region for registering a second part of the weight data; and [buffer with sequencer and pointers for selecting different weights ¶86 " weight buffer 305 is a FIFO buffer that includes a tail pointer, a channel pointer, and a head pointer. The layer sequencer 215 controls the “input” side of the weight buffer 305, pushing weight vectors into the weight buffer 305"]
a second operator for operating the first operation result and the second part of the weight data to generate a second operation result, [output (result) fed into next layer to generate another result ¶36, ¶83 " output activation volume of a neural network layer can serve as the input activation volume for the next neural network layer, then the output activations buffer 350 is logically swapped with the input activations buffer 310 between processing of the different neural network layers"]
Dally does not specifically teach wherein when a predetermined data amount is stored in the fourth register region, the second operator is triggered to operate the first operation result and the second part of the weight data.
However, Ramirez teaches wherein when a predetermined data amount is stored in the fourth register region, the second operator is triggered to operate the first operation result and the second part of the weight data. [trigger based on buffer fill level (predetermined amount stored) Col. 8 ln 27-40 "When the buffer hits a certain fill level, e.g., a predetermined threshold level, it may send a request signal to an arbiter to indicate that it has a data packet ready for sending (block 680)"]
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify the accelerator disclosed by Dally by incorporating the wherein when a predetermined data amount is stored in the fourth register region, the second operator is triggered to operate the first operation result and the second part of the weight data disclosed by Ramirez because both techniques address the same field of system design and by incorporating Ramirez into Dally enhances the efficiency and throughput of the systems memory usage [Ramirez Col. 2 ln. 28-35]
As to dependent claim 2, the rejection of claim 1 is incorporated, Dally and Ramirez further teach wherein when the second operator is triggered to be in operation, the first operator continues in operating the input data. [Dally double buffering allows keep storing new results while another set is processed by PEs ¶82 " double-buffered so that one set of registers can store new partial sums while the second set of registers are drained out by the post-processing unit 345"]
As to dependent claim 3, the rejection of claim 1 is incorporated, Dally and Ramirez further teach wherein the predetermined data amount is configured based on a batch width and a filter parameter. [Dally batch length and filter weights ¶55 "batch of length N of groups of C channels of input activation planes can be applied to the same volume of filter weights."]
As to dependent claim 4, the rejection of claim 1 is incorporated, Dally and Ramirez further teach an activation unit for performing activation operations on the first operation result. [Dally activation operations are processed by a destination calculation unit ¶82]
As to dependent claim 5, the rejection of claim 1 is incorporated, Dally and Ramirez further teach a pooling unit for performing pooling operations on the first operation result output from the fourth register region. [Dally post processing unit for pooling ¶82]
As to dependent claim 6, the rejection of claim 1 is incorporated, Dally and Ramirez further teach wherein the first operator further includes a first operation element array having a plurality of first operation elements, and [Dally multiplier array Fig. 3A 325 ¶82]
each of the first operation elements is configured to: receive the input data and the first part of the weight data corresponding to multi-dimensional positions; and process the input data and the first part of the weight data to generate a plurality of operation results as the first operation result. [Dally array 325 computes products using weights and inputs ¶81-82]
As to dependent claim 7, the rejection of claim 6 is incorporated, Dally and Ramirez further teach wherein the second operator further includes a second operation element array having a plurality of second operation elements; and [Dally Each processing element (PE) includes a multiplier array fed from another PE Fig. 2A 210, Fig. 2C 240 ¶66]
each of the second operation elements is configured to: receive the first operation result and the second part of the weight data; and [Dally results get fed into next PE ¶83]
process the first operation result and the second part of the weight data to generate a plurality of operation results as the second operation result. [Dally output (result) fed into next layer to generate another result ¶36, ¶83 " output activation volume of a neural network layer can serve as the input activation volume for the next neural network layer, then the output activations buffer 350 is logically swapped with the input activations buffer 310 between processing of the different neural network layers"]
As to dependent claim 8, the rejection of claim 1 is incorporated, Dally and Ramirez further teach wherein the first operator has a first maximum operation capacity, the second operator has a second maximum operation capacity smaller than the first maximum operation capacity. [Dally 16 operations capacity and a 1 operation per cycle capacity ¶105 " a post-processing unit 345 performing one operation per cycle should keep pace with a F×I multiplier array 325 that performs 16 operations per cycle"]
As to dependent claim 10, the rejection of claim 7 is incorporated, Dally and Ramirez further teach wherein a number of the first operation elements is larger than a number of the second operation elements. [Dally 16 operations capacity vs a 1 operation per cycle capacity ¶105 " a post-processing unit 345 performing one operation per cycle should keep pace with a F×I multiplier array 325 that performs 16 operations per cycle"]
As to independent claim 11, Dally teaches an AI algorithm operation accelerating method including steps of: [SCNN accelerator with buffers (memory/registers) with input data, weights and outputs Fig. 2b-c 230, 235, 240 ¶51, ¶60 "dataflow relies on input buffers, weight buffer 230 and input activations buffer 235, for storing weights and input activations"]
A. reading an input data and a descriptor from a memory unit, wherein the descriptor includes a weight data; [memory ¶45, compressed-sparse format (descriptor) and buffers ¶74, ¶120 "weights are encoded in a compressed-sparse format "]
B. performing a first part of the input data and a first part of the weight data by a first operator for generating a first operation result; [Fig. 3A 325 multipliers on input data and weight ¶72 "processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325"]
C. registering the first operation result; [buffer with sequencer and pointers for selecting different weights ¶86 " weight buffer 305 is a FIFO buffer that includes a tail pointer, a channel pointer, and a head pointer. The layer sequencer 215 controls the “input” side of the weight buffer 305, pushing weight vectors into the weight buffer 305"]
E. writing the second operation result into the memory unit. [Writes output and activations ¶45-46 " write weight and/or activation data from the SCNN 200 to the memory"]
Dally does not specifically teach when the first operation result reaches a predetermined data amount, triggering a second operator to perform the first operation result and a second part of the weight data by the second operator for generating a second operation result; and
However, Ramirez teaches when the first operation result reaches a predetermined data amount, triggering a second operator to perform the first operation result and a second part of the weight data by the second operator for generating a second operation result; [trigger based on buffer fill level (predetermined amount stored) Col. 8 ln 27-40 "When the buffer hits a certain fill level, e.g., a predetermined threshold level, it may send a request signal to an arbiter to indicate that it has a data packet ready for sending (block 680)"]
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify the accelerator disclosed by Dally by incorporating the when the first operation result reaches a predetermined data amount, triggering a second operator to perform the first operation result and a second part of the weight data by the second operator for generating a second operation result disclosed by Ramirez because both techniques address the same field of system design and by incorporating Ramirez into Dally enhances the efficiency and throughput of the systems memory usage [Ramirez Col. 2 ln. 28-35]
As to dependent claim 12, the rejection of claim 11 is incorporated, Dally and Ramirez further teach wherein in the step D, when the second operator performs the second operation, the first operator and the second operator are in parallel processing state. [Dally parallel processing PEs ¶44]
As to dependent claim 13, the rejection of claim 11 is incorporated, Dally and Ramirez further teach A01. reading the first part of the input data from the memory unit into a first register region; [Dally buffer for input data may be registers ¶87"input activations buffer 310 and buffer 320 may be a set of registers or SRAM that are configured to store the input activations and the positions associated with each input activation value"], [amount of input is compressed into a format with fixed number ¶113, 120]
A03. reading a first part of the descriptor from the memory unit into a second register region; and [Dally compressed-sparse format (descriptor) and buffers ¶74, ¶120 "weights are encoded in a compressed-sparse format "]
A05. reading the first part of the weight data from the memory unit into a third register region. [Dally read weights from memory ¶45 into multiple Pes with buffers Fig. 2A 204-210, Fig. 2C 230 ¶60]
As to dependent claim 14, the rejection of claim 13 is incorporated, Dally and Ramirez further teach wherein the step C further includes storing the first operation result of the first operator into a fourth register region. [Dally Fig. 3A 340 array includes results and buffers ¶73 "accumulator array 340 may include one or more accumulation buffers and adders to store the products generated in the multiplier array 325 and sum the products into the partial sums"]
As to dependent claim 15, the rejection of claim 14 is incorporated, Dally and Ramirez further teach A07. reading a second part of the weight data from the memory unit into a fifth register region. [Dally read weights from memory ¶45 into multiple Pes with buffers Fig. 2A 204-210, Fig. 2C 230 ¶60]
As to dependent claim 16, the rejection of claim 15 is incorporated, Dally and Ramirez further teach F. determining whether all the input data in the first register memory are read out and operated, when the step F is no, loading a next batch of the input data from the first register region, and when the step F is yes, the method proceeds to step G; [Dally batch processing and channels ¶55 sequencing ¶79]
G. determining whether all data in the fourth register region is processed, when the step G is no, a data address parameter is updated, and when the step G is yes, the method proceeds to step H; and [Dally outputs of operations are a updated address (linear address) ¶96]
H. determining whether any input data in the first register region is not read out yet, when the step H is no, the method ends, [Dally counts outputs to test if data is in register ¶188]
wherein the predetermined data amount is configured based on a batch width and a filter parameter. [Dally batch length and filter weights ¶55 "batch of length N of groups of C channels of input activation planes can be applied to the same volume of filter weights."]
As to dependent claim 17, the rejection of claim 11 is incorporated, Dally and Ramirez further teach I. determining whether all data in the fourth register region are operated by the second operation, when the step I is no, data in the fourth register region is read out for performing the second operation, and when the step I is yes, a data address is updated and the method ends. [Dally counters and end conditions (all data operated) with pointers ¶87-88]
As to dependent claim 18, the rejection of claim 17 is incorporated, Dally and Ramirez further teach performing activation operations on the first operation result. [Dally activations like relu ¶82]
As to dependent claim 19, the rejection of claim 17 is incorporated, Dally and Ramirez further teach performing pooling operations on the first operation result. [Dally pooling ¶82]
As to independent claim 21, Dally teaches A computing system including:
a memory unit [sram ¶87] including a first data storage region for storing an input data, a second data storage region for storing a descriptor which includes a weight data, and a third data storage region for storing an output data; [compressed-sparse format (descriptor) and buffers ¶74, ¶120 "weights are encoded in a compressed-sparse format "]
a memory read-write controller coupled to the memory unit, for controlling read and write of the memory unit; and [read/write interface ¶45-46 " memory interface 205 reads weight and activation data from a memory coupled to the SCNN"]
an AI algorithm operation accelerator coupled to the memory read-write controller, the AI algorithm operation accelerator including: [SCNN accelerator with buffers (memory/registers) with input data, weights and outputs Fig. 2b-c 230, 235, 240 ¶51, ¶60 "dataflow relies on input buffers, weight buffer 230 and input activations buffer 235, for storing weights and input activations"]
a first register region for registering a part of the input data, wherein the first register region is configured a predetermined data length; [buffer for input data may be registers ¶87"input activations buffer 310 and buffer 320 may be a set of registers or SRAM that are configured to store the input activations and the positions associated with each input activation value"], [amount of input is compressed into a format with fixed number ¶113, 120]
a second register region for registering a first part of the descriptor; [compressed-sparse format (descriptor) and buffers ¶74, ¶120 "weights are encoded in a compressed-sparse format "]
a third register region for registering a first part of the weight data; [FIFO weight buffer 305 with a storage capacity ¶83-84 ]
a first operator for operating the first part of the input data and the first part of the weight data to generate a first operation result; [Fig. 3A 325 multipliers on input data and weight ¶72 "processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325"]
a fourth register region for registering the first operation result; [Fig. 3A 340 array includes results and buffers ¶73 "accumulator array 340 may include one or more accumulation buffers and adders to store the products generated in the multiplier array 325 and sum the products into the partial sums"]
a fifth register region for registering a second part of the weight data; and [buffer with sequencer and pointers for selecting different weights ¶86 " weight buffer 305 is a FIFO buffer that includes a tail pointer, a channel pointer, and a head pointer. The layer sequencer 215 controls the “input” side of the weight buffer 305, pushing weight vectors into the weight buffer 305"]
a second operator for operating the first operation result and the second part of the weight data to generate a second operation result, [output (result) fed into next layer to generate another result ¶36, ¶83 " output activation volume of a neural network layer can serve as the input activation volume for the next neural network layer, then the output activations buffer 350 is logically swapped with the input activations buffer 310 between processing of the different neural network layers"]
Dally does not specifically teach wherein when a predetermined data amount is stored in the fourth register region, the second operator is triggered to operate the first operation result and the second part of the weight data.
However, Ramirez teaches wherein when a predetermined data amount is stored in the fourth register region, the second operator is triggered to operate the first operation result and the second part of the weight data. [trigger based on buffer fill level (predetermined amount stored) Col. 8 ln 27-40 "When the buffer hits a certain fill level, e.g., a predetermined threshold level, it may send a request signal to an arbiter to indicate that it has a data packet ready for sending (block 680)"]
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify the accelerator disclosed by Dally by incorporating the wherein when a predetermined data amount is stored in the fourth register region, the second operator is triggered to operate the first operation result and the second part of the weight data disclosed by Ramirez because both techniques address the same field of system design and by incorporating Ramirez into Dally enhances the efficiency and throughput of the systems memory usage [Ramirez Col. 2 ln. 28-35]
As to dependent claim 22, the rejection of claim 21 is incorporated, Dally and Ramirez further teach wherein when the second operator is triggered to be in operation, the first operator continues in operating the input data. [Dally double buffering allows keep storing new results while another set is processed by PEs ¶82 " double-buffered so that one set of registers can store new partial sums while the second set of registers are drained out by the post-processing unit 345"]
As to independent claim 23, Dally teaches a non-transitory computer readable media storing a program code readable and executable by a computer, when the program code is executed by the computer, the computer performing steps of: [computer programs and computer readable media ¶212-213]
A. reading an input data and a descriptor from a memory unit, wherein the descriptor includes a weight data; [memory ¶45, compressed-sparse format (descriptor) and buffers ¶74, ¶120 "weights are encoded in a compressed-sparse format "]
B. performing a first part of the input data and a first part of the weight data by a first operator for generating a first operation result; [Fig. 3A 325 multipliers on input data and weight ¶72 "processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325"]
C. registering the first operation result; [buffer with sequencer and pointers for selecting different weights ¶86 " weight buffer 305 is a FIFO buffer that includes a tail pointer, a channel pointer, and a head pointer. The layer sequencer 215 controls the “input” side of the weight buffer 305, pushing weight vectors into the weight buffer 305"]
E. writing the second operation result into the memory unit. [Writes output and activations ¶45-46 " write weight and/or activation data from the SCNN 200 to the memory"]
Dally does not specifically teach when the first operation result reaches a predetermined data amount, triggering a second operator to perform the first operation result and a second part of the weight data by the second operator for generating a second operation result; and
However, Ramirez teaches when the first operation result reaches a predetermined data amount, triggering a second operator to perform the first operation result and a second part of the weight data by the second operator for generating a second operation result; [trigger based on buffer fill level (predetermined amount stored) Col. 8 ln 27-40 "When the buffer hits a certain fill level, e.g., a predetermined threshold level, it may send a request signal to an arbiter to indicate that it has a data packet ready for sending (block 680)"]
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify the accelerator disclosed by Dally by incorporating the when the first operation result reaches a predetermined data amount, triggering a second operator to perform the first operation result and a second part of the weight data by the second operator for generating a second operation result disclosed by Ramirez because both techniques address the same field of system design and by incorporating Ramirez into Dally enhances the efficiency and throughput of the systems memory usage [Ramirez Col. 2 ln. 28-35]
Claims 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dally and Ramirez as applied in the rejection of claim 1 and 13 above, and further in view of Du et al. (US 10162799 B2 hereinafter Du)
As to dependent claim 9, Dally and Ramirez teach the method of claim 1 above that is incorporated,
Dally and Ramirez do not specifically teach wherein a capacity of the fourth register region is configured at least triple times of the predetermined data length of the first register region.
However, Du teaches wherein a capacity of the fourth register region is configured at least triple times of the predetermined data length of the first register region. [buffer is 3 items wide for remap (triple) Col. 2 ln. 26-36 "each stride between the convolution operations is 1, and each set of the remap data includes 3 remap data. The convolution operation is a 3×3 convolution operation, and the input buffer unit is configured to buffer the latest 2 inputted data and output the latest 2 inputted data in the later clock"]
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify the memories described by Dally and Ramirez by incorporating the wherein a capacity of the fourth register region is configured at least triple times of the predetermined data length of the first register region disclosed by Du because all techniques address the same field of system design and by incorporating Du into Dally and Ramirez provide better buffer devices that can improve the performance of the convolution operation and capable of processing data streams [Du Col. 1 ln. 47-57]
As to dependent claim 20, Dally and Ramirez teach the method of claim 1 above that is incorporated,
Dally and Ramirez do not specifically teach wherein the first register region is configured a predetermined data length, and a capacity of the fourth register region is configured at least triple times of the predetermined data length.
However, Du teaches wherein the first register region is configured a predetermined data length, and a capacity of the fourth register region is configured at least triple times of the predetermined data length. [buffer is 3 items wide for remap (triple) Col. 2 ln. 26-36 "each stride between the convolution operations is 1, and each set of the remap data includes 3 remap data. The convolution operation is a 3×3 convolution operation, and the input buffer unit is configured to buffer the latest 2 inputted data and output the latest 2 inputted data in the later clock"]
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify the memories described by Dally and Ramirez by incorporating the wherein the first register region is configured a predetermined data length, and a capacity of the fourth register region is configured at least triple times of the predetermined data length disclosed by Du because all techniques address the same field of system design and by incorporating Du into Dally and Ramirez provide better buffer devices that can improve the performance of the convolution operation and capable of processing data streams [Du Col. 1 ln. 47-57]
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
Narayanaswami et al. (US 9836691 B1) teaches neural network architecture with weight tensors and multiplication operations (see Col. 11 ln. 45-53)
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Beau Spratt whose telephone number is 571 272 9919. The examiner can normally be reached 8:30am to 5:00pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at 571 272 7212. The fax phone number for the organization where this application or proceeding is assigned is 571 483 7388.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866 217 9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800 786 9199 (IN USA OR CANADA) or 571 272 1000.
/BEAU D SPRATT/Primary Examiner, Art Unit 2143