DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to because Figure 7 extended feature unit "7140-…", which should be “740-…”
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claims 19-20 are objected to because of the following informalities:
Claim 19 line 22 “the feature processor circuit” should be “a feature processor circuit” because there is lack of antecedent basis for such limitation.
Claim 19 line 23 “the weight processor circuit” should be “a weight processor circuit” because there is lack of antecedent basis for such limitation.
Dependent claim is also objected for inheriting the same deficiencies in which claim it depends on.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-4, 15-16, and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more.
Claim 1 recites a device for performing convolution
Under Prong One of Step 2A of the USPTO current eligibility guidance (MPEP 2106), the claim recites limitations under broadest reasonable interpretation cover the performance using pen and paper, such as a plurality of input units each comprising a plurality of input feature elements from different respective channels of an input tensor; generate a plurality of extended feature units each comprising an input feature element from each of the plurality of input units and from a common channel of the input tensor (see at least figure 4, wherein a plurality of data are generated as illustrated in 440 based on 410); a plurality of weight units each comprising a plurality of weight elements from different respective channels of a kernel; generate a plurality of extended weight units each comprising a weight element from each of the plurality of weight units and from a common channel of the kernel (see at least figure 4, wherein a plurality of weights are generated as illustrated in 450 based on 420). Furthermore, the claim also recites limitation cover mathematical calculations, relationship, and/or formula, such as multiply the input feature elements of the extended feature unit by the respective weight elements of the extended weight unit and output a sum of the products (performing MAC operations). Therefore, the claim includes limitations that fall within the “Mental processes / Mathematical Concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Under Prong Two of Step 2A, this judicial exception is not integrated into a practical application. The claim additionally recites a device comprising a plurality of MAC cells, a feature processor circuit, a weight processor circuit. However, the additional elements are recited at a high level of generality, i.e., as computer components performing computer functions, such as receiving, processing, and transmitting data. Furthermore, the claim recites the step of receiving and provide the plurality of extended feature units and weight units to the MAC cells, such limitation is at most considered as insignificant extra/post solution activities (e.g., mere data gathering and outputting the generated result). Moreover, the concept of processing data in parallels using MAC cells is also at most considered as insignificant extra solution activity because such concept is well-known in the art using computer components (See support for under step 2B). Such additional elements fail to provide a meaningful limitation on the judicial exception, and amount to no more than mere instructions to apply the exception using computer components. Thus, the claim is directed to an abstract idea.
Under Step 2B, as discussed with respect to Prong Two of Step 2A, the additional elements in the claim amount no more than mere instructions to apply the exception using a generic component. The same conclusion is reached in step 2B, i.e., mere instructions to apply an exception on a generic element cannot integrate a judicial exception into a practical application at step 2A or provide an inventive concept that is furnished by an element or combination of elements that is recited in the claim in addition to (beyond) the judicial exception. The steps of receiving, providing data, and performing operations in parallel are considered to be insignificant extra-solution activity in step 2A, and are determined to be well-understood, routine, conventional activity in the field. Court decisions cited in MPEP 2106.05(d)(II) section (i), indicate that mere receiving or transmitting data over a network, is well-understood, routing, conventional function when it is claimed in a merely generic manner. Also see Hennessy, John L., et al. Computer Architecture : A Quantitative Approach, Elsevier Science & Technology, 2014. ProQuest Ebook Central, page 530-532 figure 6.1-6.2 describe the concept of having a multiprocessor architecture to support parallel operation, and also describe the concept of SIMD, which processes multiple data in parallel. Thus, the additional element fails to ensure the claim as a whole amount to significantly more than the judicial exception itself. Accordingly, the claim is not patent-eligible under 35 U.S.C. 101.
Claim 2 further recites a controller circuit configured to: read a command from a command register; and generate requests for the plurality of input units and the plurality of weight units based on the command, wherein the plurality of input units and the plurality of weight units are provided to the feature processor circuit and the weight processor circuit from a memory in response to the requests. Such additional elements are at most considered as insignificant extra solution activity under step 2A prong two and determined to be well-understood, routine, and conventional activity under step 2B (see Hennessy, John L., et al. Computer Architecture : A Quantitative Approach, Elsevier Science & Technology, 2014. ProQuest Ebook Central, page 700 describes a CPU first set up the DMA registers, which contain a memory address and number of bytes to be transferred, thus the DMA controller configured to read command from DMA register and generate requests to transfer data based on the command, page 700 also describes that the DMA hardware is a specialized processor that transfer data between memory and IO device. Thus, the data to be operated are transferred from a memory based on requests generated by the DMA controller. Therefore, The claim does not recite additional element that would integrate the judicial exception into a practical application under step 2A prong two or ensure the claim as a whole amount to significantly more than the judicial exception itself under step 2B. Accordingly, the claim is not patent-eligible under 35 U.S.C. 101.
Claim 3 further recites for each iteration of an outer loop, generate and provide a different plurality of extended weight units for each iteration of the outer loop; and for each iteration of an inner loop within the outer loop, generate and provide a different plurality of extended feature units for each iteration of the inner loop; and output the sums of the products for each iteration of the inner loop. Such limitations cover mathematical calculations, relationship, and/or formula (performing convolution by using the same weight values to operate on different feature input data within the inner loop). The claim recites additional elements, such as the controller unit configured to generate one or more request for different data, the weight processor circuit, the feature processor circuit, and the plurality of MAC cells. However, as explained above in claims 1-2, such limitations amount to no more than mere instructions to apply the exception using computer components, and the limitation of having controller to generate request for different data is explained as mere insignificant extra solution activity under step 2A prong two and determined to be well-understood, routine, and conventional under step 2B. Therefore, The claim does not recite additional element that would integrate the judicial exception into a practical application under step 2A prong two or ensure the claim as a whole amount to significantly more than the judicial exception itself under step 2B. Accordingly, the claim is not patent-eligible under 35 U.S.C. 101.
Claim 4 further recites step to accumulate the sums of the products for each iteration of the inner loop to generate an output tensor representing a depthwise convolution of the input tensor and the kernel. Such limitations cover mathematical calculations, relationship, and/or formula (performing accumulation for the sums of the products to generate output tensor representing a depthwise convolution). The claim further recites an accumulator circuit, but such limitation is recited at a high level of generality, e.g., computer component performing computer function of accumulating. Such limitation amounts to no more than mere instructions to apply the exception using computer component. Therefore, The claim does not recite additional element that would integrate the judicial exception into a practical application under step 2A prong two or ensure the claim as a whole amount to significantly more than the judicial exception itself under step 2B. Accordingly, the claim is not patent-eligible under 35 U.S.C. 101.
Claim 15 further recites a number of channels of the input tensor is equal to a number of channels of the kernel and a number of channels of the output tensor. Such limitations cover mathematical calculations, relationship, and/or formula (convolution operation where the number of channels of input tensors, kernels, and output tensor are the same, such limitation merely describes the number of channels of input and output data). The claim does not recite additional element that would integrate the judicial exception into a practical application under step 2A prong two or ensure the claim as a whole amount to significantly more than the judicial exception itself under step 2B. Accordingly, the claim is not patent-eligible under 35 U.S.C. 101.
Claim 16 recites an apparatus claim having similar limitation as apparatus claim 4. Thus, it is rejected for the same reasons.
Claim 19 recite method claim that would be practiced by the apparatus claim 16. Thus, it is rejected for the same reasons.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Park (US – 20220335282 in view of Chen – US 20220269752)
Regarding claim 1, Park discloses a device (Park figure 1 illustrates a device B), comprising:
a plurality of multiplication and accumulation (MAC) cells (Park figure 1 illustrates NPU 1000, wherein figures 4-5 illustrates the NPU includes a processing element array, wherein each PE includes multiplier and adder [i.e., a plurality of MAC cells]);
a feature processor circuit (Park figure 15 illustrates a feature map storage unit 220 [i.e., a feature processor circuit] within the NPU) configured to:
receive a plurality of input units each comprising a plurality of input feature elements from different respective channels of an input tensor (Park [0108] the NPU 1000 may call data stored in main memory to internal memory 200, which includes the feature map storage unit 220 and figure 13 [0334] describes the input feature map 1310 for performing depthwise convolution. Thus, the feature map storage unit 220 receives input feature map 1310 that includes A0-AM, F0-FM, K0-KM,…M0-Mm [i.e., a plurality of input units], each of A0-AM, F0-FM, K0-KM,… M0-Mm comprising a plurality of input feature elements from different channels M of the feature map 1310 [i.e., an input tensor]);
generate a plurality of extended feature units each comprising an input feature element from each of the plurality of input units and from a common channel of the input tensor (Park figures 14-15 [0337-0339] illustrates a plurality of feature elements A0, B0, C0, F0, G0, H0, K0, L0, M0 to Am, Bm Cm, Fm, Gm, Hm, Km, Lm, Mm [i.e., a plurality of extended feature units] are generated for performing the depthwise convolution, wherein each comprising an element from each of the plurality of input units and from a common channel of the tensor 1310. For example, A0, B0, C0, F0, G0, H0, K0, L0, M0 [i.e., an extended feature unit] includes A0 from A0-AM, F0 from F0-FM, k0 from K0-KM,…, and so on [i.e., an element from each of the plurality of input units] and from a first channel (e.g., common)); and
provide the plurality of extended feature units to respective MAC cells of the plurality of MAC cells (Park figures 14-15, A0, B0, C0, F0, G0, H0, K0, L0, M0 to Am, Bm Cm, Fm, Gm, Hm, Km, Lm, Mm [i.e., a plurality of extended feature units] are provided to PEs in the first row of the array [i.e., respective MAC cells of the plurality of MAC cells] via F_in signal);
a weight processor circuit (Park figure 15 illustrates a weight storage unit 210 [i.e., a weight processor circuit] within the NPU) configured to:
receive a plurality of weight units each comprising a plurality of weight elements from different respective channels of a kernel (Park [0108] the NPU 1000 may call data stored in main memory to internal memory 200, which includes the weight storage unit 210 and figure 13 [0334] describes the weight kernel data 1300 for performing depthwise convolution. Thus, the weight storage unit 210 receives weight kernel data 1300 that includes a0-am, d0-dm, g0-gm, … i0-im [i.e., a plurality of weight units], each of a0-am, d0-dm, g0-gm, … i0-im comprising a plurality of weight elements from different channels M of the kernel data 1300 [i.e., a kernel]);
generate a plurality of extended weight units each comprising a weight element from each of the plurality of weight units and from a common channel of the kernel (Park figures 14-15 [0337-0339] illustrates a plurality of weight elements a0, b0, c0, d0, e0, f0, g0, h0, i0 to am, bm, cm, dm, em, fm, gm, hm, im [i.e., a plurality of extended weight units] are generated for performing the depthwise convolution, wherein each comprising an element from each of the plurality of weight units and from a common channel of the kernel data 1300. For example, a0, b0, c0, d0, e0, f0, g0, h0, i0 [i.e., an extended feature unit] includes a0 from a0-am, d0 from d0-dm, g0 from g0-gm,…, and so on [i.e., an element from each of the plurality of input units] and from a first channel (e.g., common)); and
provide the plurality of extended weight units to respective MAC cells of the plurality of MAC cells (Park figures 14-15, a0, b0, c0, d0, e0, f0, g0, h0, i0 to am, bm, cm, dm, em, fm, gm, hm, im [i.e., a plurality of extended weight units] are provided to PEs in the first row of the array [i.e., respective MAC cells of the plurality of MAC cells] via W_in signal),
wherein each MAC cell of the plurality of MAC cells is configured to multiply the input feature elements of the extended feature unit provided by the feature processor circuit by the respective weight elements of the extended weight unit provided by the weight processor circuit in sequential and output a sum of the products (Park figures 14-15 describes PE_00 receive the first kernel (i.e., a0, b0, c0, d0, e0, f0, g0, h0, i0) [i.e., respective weight element of the extended weight unit] and first feature map portion (i.e., A0, B0, C0, F0, G0, H0, K0, L0, M0) [i.e., the input feature elements of the extended feature unit] are sequentially processed for 9 clock cycles and output a sum of the products).
Park does not teach each MAC cell configured to multiply the input feature elements of the extended feature unit by the respective weight elements of the extended weight unit in parallel. However, Chen disclose each MAC cell of a plurality of MAC cells is configured to multiply input feature elements provided a feature processor circuit by respective weight elements provided by a weight processor circuit in parallel and output a sum of the products (Chen figure 1 illustrates a processing unit array 110 includes a plurality of processing units 111 [i.e., a plurality of MAC cells], an input data memory 131 [i.e., a feature processor circuit], and a weight memory 133 [i.e., a weight processor circuit]. [0038] describes each processing unit 111 [i.e., each MAC] includes 9 multiply-accumulate units that can perform a set of multiply-accumulate operations for 3x3 kernel in parallel, also see figure 3B illustrates input feature elements are multiply by the respective weight elements in parallel and output a sum of the products).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify each PE of the PE array of Park illustrated in figure 15 to have 9 multiply-accumulate units operate in parallel to perform 3x3 kernel as illustrated in figure 3B of Chen. This modification would have been obvious because both references teaches method and apparatus to perform convolution operation on 3x3 kernel. Furthermore, having 9 MAC units within each PE to perform 9 MAC operation in parallel would increase the speed of computation rather than performing sequentially.
Claim 16 recites apparatus claim having similar limitations claim 4. Thus, it is rejected for the same reasons.
claim 19 recites method claim that would be practiced by the apparatus claim 16. Thus, it is rejected for the same reasons.
Claims 2-4, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Park in view of Chen as applied to claim 1 above, and further in view of Whatmough – US 20190311243.
Regarding claim 2, the combined system of Park in view of Chen teaches the device of claim 1, further comprising a controller circuit configured to generate requests for the plurality of input units and the plurality of weight units, wherein the plurality of input units and the plurality of weight units are provided to the feature processor circuit and the weight processor circuit from a memory based on the requests (Park figure 4 illustrates a controller 300 [0109] illustrates controller may control the read and write sequence of the internal memory 200, [0186] describes the controller includes scheduler that configure to load weight kernel and feature map data into weight storage unit 210 and feature map storage unit 220. [0108] describes data for performing neural network operation is stored in main memory 4000 [i.e., a memory]. Thus, requests for such data are generated in order for the controller to load such the feature data and weight kernel from main memory to internal storage units). However, the combined system of Park in view of Chen does not teach a controller configured to read a command from a command register. Whatmough teaches a controller circuit (Whatmough figure 1 illustrates a DMA controller 112) configured to: read a command from a command register; and generate requests for data based on the command, wherein the data are provided to a feature processor circuit and a weight processor circuit from a memory in response to the request (Whatmough figure 1 illustrates control register 122 [i.e., a command register], [0019] describes the interface to a host data processing system is done through DMA unit 112 and bus 124 that enables the accelerator to exchange data and commands with the host processing system. [0018] describes DMA 112 retrieves data to be processed from a host data processing system, wherein [0017] describes such data as image data and kernel weights). Thus, the DMA reads a command from control register and generate requests for image data and kernel weights [i.e., data] based on the command to perform convolution, wherein figure 1 [0020] further illustrates data are provided to buffer 104, which comprises a transposing buffer [i.e., a feature processor circuit] and a weight buffer [i.e., a weight processor circuit] from a data buffer 102 or from main memory [i.e., a memory] provided via DMA)
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the system of Park in view of Chen to include a control register to store command as illustrated in figure 1 of Whatmough for the controller to read the command to generate requests for data to be processed. This modification would have been obvious because the references are directed to system and method for performing convolution neural network and having control register, as recognized by [0019] Whatmough, enables the accelerator to exchange data and commands with the host data processing system.
Regarding claim 3, the combined system of Park in view of Chen and Whatmough teaches the device of claim 2, wherein: the controller circuit (Park figure 4 controller 300) is further configured to: for each iteration of an outer loop (Park figure 13 for each set of kernel data 1300), generate one or more requests for a different plurality of weight units of the kernel (Park [0109] controller control operation on PE array and control the read and write sequence of the internal memory 200. [0187] controller may load weight data into the weight storage unit. Thus, the controller generates request for a0-am, d0-dm, g0-gm, … i0-im [i.e., a different plurality of weight units] of the kernel to be loaded into the weight storage unit 210), wherein the weight processor circuit is configured to generate and provide a different plurality of extended weight units to the plurality of MAC cells for each iteration of the outer loop (Park, figures 13-15, the weight storage unit 210 generate a plurality of different weight elements a0, b0, c0, d0, e0, f0, g0, h0, i0 to am, bm, cm, dm, em, fm, gm, hm, im [i.e., a plurality of different extended weight units] and provide to the plurality of processing elements for each set of kernel data 1300); and for each iteration of an inner loop within the outer loop (Park figure 14 illustrates each sliding step with stride 1 using the same set of weight kernel [i.e., each iteration of an inner loop within the outer loop]), generate one or more requests for a different plurality of input units of the input tensor (Park figure 14 illustrates for each step, the controller would generate request for different plurality of input units of the input tensor 1310, such as F0-FM, K0-KM,Q0-Pm…S0-Rm [i.e., a plurality of different input units], wherein the feature processor circuit is configured to generate and provide a different plurality of extended feature units to the plurality of MAC cells for each iteration of the inner loop (Park figures 14-15 feature map storage unit 220 generates F0, G0, H0, K0, L0, M0, Q0, R0, S0 to Fm, Gm, Hm, Km, Lm, Mm, Pm, Qm, Rm [i.e., a different plurality of extended feature units] are provided to the PE the array [i.e., the plurality of MAC cells] for each slide step [i.e., each iteration of the inner loop]); and the plurality of MAC cells are configured to output the sums of the products for each iteration of the inner loop (Park figure 15 illustrates the PE arrays are configured to output MAC results for each slide windows. For example, PE_00 performs step 1, PE-10 performs step 2, PE-20 performs step 3 of figure 14 as described in [0337-0349]).
Regarding claim 4, the combined system of Park in view of Chen and Whatmough teaches the device of claim 3, further comprising an accumulator circuit configured to accumulate the sums of the products output by the plurality of MAC cells for each iteration of the inner loop to generate an output tensor representing a depthwise convolution of the input tensor and the kernel (Park, figure 4 illustrates each PE includes an accumulator 643, thus, the PE array includes a plurality of accumulator 643 [i.e., an accumulator circuit] that configured to accumulate the sums of products output by the plurality of PEs for each slide window to generate output tensor representing a depthwise convolution as [0064] describes figure 14 illustrates a depthwise convolution on the weight kernel 1300 and the feature data 1310).
Regarding claim 15, the combined system of Park in view of Chen and Whatmough teaches the device of claim 4, wherein a number of the channels of the input tensor is equal to a number of channels of the kernel and a number of channels of the output tensor (Park figure 13 [0064] illustrates a depthwise convolution operation of kernel 3x3xm and input feature map 5x5xM, wherein figure 14 illustrates each kernel channel is mapped on to input data of the corresponding channel to generate output tensor having corresponding channels in the depthwise convolution. Thus, the kernel and the feature map and output tensor having the same number of channels).
Allowable Subject Matter
Claims 5-14 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 20 would be allowable if rewritten to overcome the claim objection, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Regarding to claims 5, 10, 13, 17, and 20, the prior art of record does not teach or suggest a combination of limitations as claimed, including the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Park – US 20220335282
Park teaches a method and system for performing depthwise convolution using a process element array, wherein Park discloses a controller, a weight storage unit, a feature map storage unit that are configured to store weight kernel and feature map as illustrated in figure 13. However, Park does not teach or suggest the particular structure of the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Park – US 20220188612
Park teaches a NPU device for performing convolution operation based on the number of channels, wherein figures 6 and 7 illustrates input having a plurality of channels, where a buffer 11 includes a plurality of vector generators that generate a vector based on values along the channel dimension to be performed on the weight for the convolution. However, Park does not teach or suggest the particular structure of the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Kim – US 20180129935
Kim teaches a convolutional neural network system for performing convolution operation. Figure 3 further teaches input buffer device 110 and weight kernel buffer device 140 that output kernel data and feature map data having a plurality of channels as illustrated in figure 4. Kim merely figure 7 illustrates the channels of input feature and kernel are sequentially stored in the corresponding buffers. Thus, Kim does not teach or suggest the particular structure of the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Whatmough – US 20190311243
Whatmough teaches a systolic convolutional neural network having controller, control register, buffer and systolic array of MAC cells to perform MAC operations, wherein the buffer 104 includes transposing buffers for weight and activation inputs. Figure 7 illustrates the implementation of transpose buffer having rows of FIFO buffer, such that data are buffered in one dimension and outputted in another dimension. However, Whatmough does not teach or suggest the particular structure of the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Chen – US 20220269752
Chen teaches a method for convolution computation having an input data memory, a weight memory, a controller, and a processing unit array, each processing unit includes 9 MAC circuits as illustrated in figure 1, wherein the input data memory generates input feature elements having a plurality of channels to the processing element array to operate with the weight data generated by a weight memory, and each channel of the input feature is sequentially performed after one is completed. Chen does not teach or suggest the particular structure of the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Mills – US 20190340486
Mills teaches system for performing multiply and accumulate operations in neural network processor, figure 4 illustrates input buffer circuit having shifter and kernel extract component to generate input feature data and kernel data and provide such data to the MAC unit. Figure 10A illustrates the matrix-matrix multiplication having N dimensional data of M input channel and rows of weight data having M channels. However, Mills does not teach or suggest that the input buffer circuit and kernel extract each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Shao – US 20200293867
Shao illustrates a system comprises a controller, weight buffer, activation buffer, a plurality of vector units to perform convolution operation. Figure 8 illustrates input data channels are divided into 4 different section for each row of PEs to generate 4 output tile. However, Shao does not teach or suggest the particular structure of the feature processor circuit and the weight processor circuit, each comprising an input buffer configured to receive the input units/ the weight units from the memory, a series of delay registers each storing one of the input units and weight units, a plurality of multiplexer circuits corresponding to respective channels of the kernel, and performing writing and shifting data for each of iteration of the inner loop.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764. The examiner can normally be reached Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached at (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HUY DUONG/Examiner, Art Unit 2182 (571)272-2764
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182