DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Action is non-final and is in response to the claims filed 11/11/2025. Claims 1-14 are currently pending, of which claims 1-14 are currently rejected.
Response to Arguments
Applicant’s arguments filed on 11/11/2025 have been fully considered.
Objection to Title: Applicant argues at the bottom of page 6 of 9 that the title of the invention is indicative of the elements presented in the claims included in a data processing apparatus. Applicant specifically argues “Applicant respectfully declines to present a new title at this time as the claims are all directed toward elements included in a data processing apparatus, which is the title of the present application. Therefore, Applicant fails to understand how the title is not clearly indicative of the invention to which the claims are directed.”
Examiner respectfully disagrees. The title “DATA PROCESSING APPARATUS” is not descriptive of the invention to which the claims are directed to, for this title generally describes any processing apparatus that receives data to process it and does not clearly describe the invention. Examiner suggests the title of the invention to at least include the field the invention is directed to, which is convolutional neural networks.
35 U.S.C. 112(b): Previous 35 U.S.C. 112(b) rejections have been withdrawn. See new 35 U.S.C. 112(b) rejections necessitated by amendments.
35 U.S.C. 103: Applicant argues in pages 7 and 8 that Kfir does not teach data being fed back to the input buffer. Applicant specifically argues that “the above-emphasized portion of claims 1 requires that data in the first output buffer (that is based on data included in the input buffer) to be fed back to the input buffer in a feedback loop. Claim 8 includes a similar feedback loop. The Office Action contends that Fig. 1 of Kfir shows such feedback, but as discussed during the Interview, there is no feedback from an output buffer back to an input buffer in Fig. 1 of Kfir.”
Applicant’s arguments are persuasive. However, see new grounds of rejection below.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the data stored in the input buffer unit" in lines 5-8 of claim 1. It is unclear if applicant intends the data to be part of the “processing target data” previously recited in claim 1, or for the data to be separate from the processing target data. There is insufficient antecedent basis for this limitation in the claim.
Claims 2-7 inherit the same deficiency as claim 1 by reason of dependence. They are rejected for the same reason therein.
Claim 8 recites the limitation "the data stored in the input buffer unit" in lines 6-9 of claim 8. It is unclear if applicant intends the data to be part of the “processing target data” previously recited in claim 8, or for the data to be separate from the processing target data. There is insufficient antecedent basis for this limitation in the claim.
Claims 9-14 inherit the same deficiency as claim 8 by reason of dependence. They are rejected for the same reason therein.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-11, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Kalin Ovtcharov in NPL: “Accelerating Deep Convolutional Neural Networks Using Specialized Hardware” (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf), hereinafter “Ovtcharov”, in view of Xilinx’s NPL: “Accelerating DNNs with Xilinx Alveo Accelerator Cards” (https://docs.amd.com/v/u/en-US/wp504-accel-dnns), hereinafter “Xilinx”, in view of Haseeb Bokhari in NPL: “Network-on-Chip Design” (https://liacs.leidenuniv.nl/~stefanovtp/courses/ES/papers/Ch15_NoC_Design.pdf), hereinafter “Bokhari”, further in view of Chen Zhang in NPL: “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks” (https://dl.acm.org/doi/pdf/10.1145/2684746.2689060), hereinafter “Zhang”.
Regarding Claim 1, Ovtcharov teaches:
A data processing apparatus comprising:
an external memory that stores processing target data (Fig. 3, e.g., DRAM Channels (external memory); Page 2, Last paragraph, e.g., input image pixels are stored in a multi-banked input buffer from the DRAM);
an input buffer unit that stores at least part of the processing target data stored in the external memory (Fig. 3, e.g., shows Multi-Banked input Buffer (input buffer unit) Page 2, Last paragraph, e.g., input image pixels are stored in a multi-banked input buffer from the DRAM);
[a first PE array] that performs … convolution processing using the data stored in the input buffer unit (Fig. 3, e.g., shows PE arrays receiving data from Multi-Banked Input Buffer (input buffer unit); Page 2, Last paragraph, e.g., inputs from Multi-Banked Input Buffer (input buffer unit) are streamed into multiple PE arrays to perform convolutional operations);
[a second PE array] that performs … convolution processing using the data stored in the input buffer unit (Fig. 3, e.g., shows PE arrays receiving data from Multi-Banked Input Buffer (input buffer unit); Page 2, Last paragraph, e.g., inputs from Multi-Banked Input Buffer (input buffer unit) are streamed into multiple PE arrays to perform convolutional operations);
a [Network-on-Chip] that stores, as a first stored result, one of a first result of processing by the [first PE array] or a second result of processing by the [second PE array] (Fig. 3, e.g., shows PE arrays outputting data (first stored result) to Network-on-Chip); and
[the Network-on-Chip] that stores, as a second stored result, the other of the first result or the second result … (Page 3, Paragraph above Fig. 3, e.g., accumulated results are sent back to input buffer for next round of layer computation; Fig. 3, e.g., shows PE arrays outputting data (second stored result) to Network-on-Chip), wherein the first stored result stored in the [Network-on-Chip] is stored in the input buffer unit (Fig. 3, shows Network-on-Chip outputting data to Multi-Banked Input Buffer (input buffer unit)),
Ovtcharov does not teach:
an M x M data processing unit that performs M x M convolution processing using the data stored in the input buffer unit;
an N x N data processing unit that performs N x N convolution processing using the data stored in the input buffer unit;
a first output buffer unit that stores, as a first stored result, one of a first result of processing by the M x M data processing unit or a second result of processing by the N x N data processing unit;
a second output buffer unit that stores, as a second stored result, the other of the first result or the second result not stored in the first output buffer unit, wherein the first stored result stored in the first output buffer unit is stored in the input buffer unit,
the second stored result stored in the second output buffer unit is transferred to the external memory.
However, Xilinx teaches:
an M x M data processing unit that performs M x M convolution processing (Fig. 2, e.g., 3x3 convolution is performed; Page 4, Top paragraph, e.g., xDNN processing engine has dedicated execution paths for different size convolutions) …
an N x N data processing unit that performs N x N convolution processing (Fig. 2, e.g., 1x1 convolution is performed; Page 4, Top paragraph, e.g., xDNN processing engine has dedicated execution paths for different size convolutions) …
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the different convolution sizes performed by xDNN processing engine as taught by Xilinx with the PE arrays performing convolution operations as taught by Ovtcharov. One would have been motivated to combine these references because both references disclose PE arrays performing convolutional operations for neural networks, and Xilinx enhances the model of Ovtcharov because “The xDNN processing engine has dedicated execution paths for each type of command (download, conv, pooling, element-wise, and upload). This allows for convolution commands to be run in parallel with other commands if the network graph allows it.” (Xilinx: Page 4, First paragraph)
Ovtcharov in view of Xilinx do not teach:
a first output buffer unit that stores, as a first stored result, one of a first result of processing by the M x M data processing unit or a second result of processing by the N x N data processing unit;
a second output buffer unit that stores, as a second stored result, the other of the first result or the second result not stored in the first output buffer unit, wherein the first stored result stored in the first output buffer unit is stored in the input buffer unit,
the second stored result stored in the second output buffer unit is transferred to the external memory.
However, Bokhari teaches the structure of a Network-on-Chip (NoC) architecture. Specifically, Bokhari teaches how the Wormhole routing architecture is widely used on NoC. Bokhari explains “The wormhole flow control scheme results in low-area routers, and it is therefore widely used in most on-chip networks” (Bokhari: Page 471, Fourth paragraph). Bokhari shows in Fig. 15.5 how the Wormhole router architecture receives inputs through input buffers to then be selected by the XBAR and outputted.
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the Wormhole architecture in a Network-on-Chip including input buffers as taught by Bokhari with the Network-on-Chip as taught by Ovtcharov in view of Xilinx. One would have been motivated to combine these references because both references disclose data transmitting using Network-on-Chip architecture, and Bokhari enhances the model of Ovtcharov in view of Xilinx because “The wormhole flow control scheme results in low-area routers, and it is therefore widely used in most on-chip networks” (Bokhari: Page 471, Fourth paragraph). Hence, each output from PE arrays would arrive to a respective input buffer (first and second output buffers).
Ovtcharov in view of Xilinx in view of Bokhari do not teach:
the second stored result stored in the second output buffer unit is transferred to the external memory.
However, in the same field of endeavor, Zhang teaches writing data down to DRAM when N/Tn phases (all phases) of computation are complete. Zhang explains “When N/Tn phases of computation and data copying are done, the resulting output feature maps are written down to DRAM” (Zhang: Page 168). Additionally, Ovtcharov shows how the Convolutional Neural Network Accelerator can also Writeback data to DRAM Channels (external memory).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the storing of the final output in DRAM (external memory) as taught by Zhang with the Convolutional Neural Network Accelerator as taught by Ovtcharov in view of Xilinx in view of Bokhari. One would have been motivated to combine these references because both references disclose PE arrays used for Convolutional Neural Networks, and Zhang enhances the model of Ovtcharov in view of Xilinx in view of Bokhari by allowing for final results of the convolution operations to be stored in an external memory.
Regarding Claim 2, Ovtcharov in view of Xilinx in view of Bokhari in view of Zhang teach:
The data processing apparatus according to claim 1, wherein M and N are integers greater than or equal to 1, and M > N (Xilinx: Fig. 2, e.g., 3x3 (MxM) and 1x1 (NxN) convolution is performed; Page 4, Top paragraph, e.g., xDNN processing engine has dedicated execution paths for different size convolutions).
The motivation to combine provided with respect to claim 1 applies equally to claim 2.
Regarding Claim 3, Ovtcharov in view of Xilinx in view of Bokhari in view of Zhang teach:
The data processing apparatus according to claim 2, wherein N = 1 (Fig. 2, e.g., 1x1 convolution is performed; Page 4, Top paragraph, e.g., xDNN processing engine has dedicated execution paths for different size convolutions).
The motivation to combine provided with respect to claim 1 applies equally to claim 3.
Regarding Claim 4, Ovtcharov in view of Xilinx in view of Bokhari in view of Zhang teach:
The data processing apparatus according to claim 1, wherein the processing target data is data defined by three or more orthogonal axes (Ovtcharov: Page 2, Last paragraph, e.g., PE arrays perform 3D convolution step (three orthogonal axes)), and
the M x M convolution processing or the N x N convolution processing is performed on a first axis and a second axis in the processing target data (Xilinx: Xilinx: Fig. 2, e.g., 3x3 (MxM) and 1x1 (NxN) convolution is performed (first and second axis)).
The motivation to combine provided with respect to claim 1 applies equally to claim 4.
Regarding Claim 6, Ovtcharov in view of Xilinx in view of Bokhari in view of Zhang teach:
The data processing apparatus according to claim 4, wherein
if the number of data items belonging to a third axis in the data of the second result of the N x N convolution processing is smaller than the number of data items belonging to the third axis in the data of the first result of the M x M convolution processing, the second result of processing by the N x N data processing unit is stored in the second output buffer unit, and the first result of processing by the M x M data processing unit is stored in the first output buffer unit (Ovtcharov: Fig. 3, e.g., results of convolution operations performed in PE arrays is inputted to Network-on-Chip; Bokhari: Fig. 15.5, e.g., Wormhole router architecture in NoC includes input buffers for each input; Combination would cause for each output from PE arrays to arrive to a respective input buffer (first and second output buffers)).
The motivation to combine provided with respect to claim 1 applies equally to claim 6.
Regarding Claim 7, Ovtcharov in view of Xilinx in view of Bokhari in view of Zhang teach:
The data processing apparatus according to claim 1, wherein the N x N convolution processing and the M x M convolution processing are performed as part of image processing using a neural network (Ovtcharov: Page 2, Last paragraph, e.g., CNN accelerator receives input image pixels; Xilinx: Fig. 2, e.g., 3x3 (MxM) and 1x1 (NxN) convolution is performed).
The motivation to combine provided with respect to claim 1 applies equally to claim 7.
Regarding Claims 8-11 and 13-14, they are media claims practiced by the apparatus of claims 1-4 and 6-7. They are rejected for the same reasons as claims 1-4 and 6-7.
Allowable Subject Matter
Claims 5 and 12 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARLOS H DE LA GARZA whose telephone number is (571)272-0474. The examiner can normally be reached Monday-Friday 9:30AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached at (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.H.D./
Carlos H. De La GarzaExaminer, Art Unit 2182 (571)272-0474
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182