Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The abstract of the disclosure is objected to because the abstract refers to purported merits of the invention: “a flexible and efficient convolution computation mechanism and structure”. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Apparatus claims 9-16 will be addressed first, followed by method claims 1-8.
Claims 1-3, 6, 9-11, and 14 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 20220309336 A1 Minkin (hereinafter “Minkin”).
Regarding claim 9, Minkin teaches the following:
A data processing circuit based on convolution computation (fig 1A, 1B, 2, [0076-0085]).
at least one memory, used to store a code (fig 1A, or 1B 105, [0063]); and
a processor, coupled to the at least one memory and configured to load and execute the code to (fig 1A 110 or 1B 106, [0066-0071], [0080]):
extend input data according to a padding mode to generate extended input data, wherein the input data is used for convolution computation ([0076-0085] input data used for convolution, fig 5, 6A, 6C, [0117-0126] bounding box offset for extend input data, fig 6A, 6C [0120], [0130-1031], [0138] showing bounding box offset by size, according to im2col mode with padding, or padding with zeros or constant values for according to padding mode);
provide coordinates of a two-dimensional coordinate system to a plurality of elements in the extended input data (fig 5, [0120-1025], H, W dimensional coordinate system of the bounding box); and
read the elements in the extended input data according to location information, wherein the location information comprises a size of non-extended input data and coordinates of the elements in the extended input data ([0079-0085], [0120-1026] load for read according to boding box of fig 5, fig 6A, 6C), and the step of reading the elements in the extended input data comprises:
in response to a coordinate of one of the elements in the location information being located outside the non-extended input data in the two-dimensional coordinate system, converting the coordinate in the location information according to the padding mode, wherein the coordinate in the location information is mapped to a coordinate of the non-extended input data (fig 5 501, H = -1, W = -1, 503 H = -1, W = -2, examples of converting location information according to the padding mode wherein the coordinate in the location information is mapped to a coordinate in the non-extended area: offset -1, -2 etc).
Regarding claim 10, in addition to the teachings addressed in the claim 9 analysis, Minkin teaches the following:
wherein the processor is further configured to:
set coordinates of the non-extended input data to be between 0 and w in a first dimension and between 0 and h in a second dimension, where w is a width of the non-extended input data and h is a height of the non-extended input data (fig 5, [0120], e.g., 501 HW = 14x9, h = 14 for height 0-14 loaded); and
set coordinates in the extended input data not belonging to the non-extended input data to be less than zero or greater than w in the first dimension and less than zero or greater than h in the second dimension (fig 5, [0120], e.g., 501 HW = 14x9, 2 = 14 for width 0-9 loaded).
Regarding claim 11, in addition to the teachings addressed in the claim 10 analysis, Minkin teaches the following:
wherein the processor is further configured to:
determine whether a coordinate of one of the elements corresponding to the location information is less than zero or greater than w in the first dimension (fig 5, [0120-0121], [0124] logic specifies bounding box offset with respect to w); and
determine whether a coordinate of the element corresponding to the location information is less than zero or greater than h in the second dimension (fig 5 [0120-0121], [0124] logic specifies bounding box offset with respect to h).
Regarding claim 14, in addition to the teachings addressed in the claim 9 analysis, Minkin teaches the following:
wherein the at least one memory comprises a plurality of memories, the input data is stored in the memories ([0091] shared memory/cache for a plurality of memories, wherein cache comprises a plurality of cache lines, the plurality of cache lines comprised within a cache for a plurality of memories), and the processor is further configured to:
according to a size of a storage space of a single address of each of the memories, store a plurality of first partial data in the input data into the memories, wherein coordinates of at least one of the first partial data at each address in two-dimensional coordinates of the input data of any channel are different, and the address stores elements of a plurality of channels with same coordinates in the input data ([0090-0092]).
Claims 1-3, and 6 are directed to a method that would be practiced by the apparatus as in claims 9-11, and 14 respectively. All steps recited in the method of claims 1-3, and 6 are performed by the apparatus as in claims 9-11, and 14 respectively, as configured. The claim 9-11, and 14 analysis applies equally to claims 1-3, and 6 respectively.
Claims 7-8, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Minkin in view of US 20210303909 A1 Gunnan et al., (hereinafter “Gunnan”).
Regarding claim 15, in addition to the teachings addressed in the claim 9 analysis, Minkin discloses the processor configured to read a first convolution kernel group among a plurality of convolution kernels according to a size ([0127]), and further discloses sizes of registers generally ([0209]), and further discloses storing operands in registers ([0332], [0414], [0422]). Minkin does not, however, explicitly disclose wherein the reading of a first convolution kernel group among a plurality of convolution kernels is according to a size of a sum register, wherein a number of the convolution kernels in the first convolution kernel group is the same as the size of the sum register; and temporarily store a first convolution computation result of the input data and the first convolution kernel group in to the sum register through first input first output.
However, in the same field of endeavor, Gunnan discloses an apparatus similar to Minkin for performing convolution on tensors in a convolutional neural network (abstract, fig 1-2, fig 10). Gunnan further discloses the size of the convolution kernel being according to the size of a sum register, wherein a number of the convolution kernels in the first convolution kernel group is the same size of the sum register ([0102] accumulator register for sum register, sized according to (2k-1)x)2k-1) where k is the kernel size of the kernel matrix for kernel group, fig 10 1025), and temporarily store a first convolution computation result of the input data and the first group into the sum register through first input first output (fig 10 1025A for first group, fig 20A, 20B, [0194-0195], accumulation through FIFO 1425).
It would have been obvious to one of ordinary skill in the art before the effective filing date to read a first convolution kernel group among a plurality of convolution kernels according to Minkin wherein the reading is according to a size of a sum register, and wherein a number of the convolution kernels in the first convolution kernel group is the same as the size of the sum register according to Gunnan; and temporarily store a first convolution computation result of the input data and the first convolution kernel group in to the sum register through first input first output according to Gunnan. It would have been obvious to size the read of a first convolution group according to a size of a sum register to achieve the benefit of processing large amounts of data in a convolutional neural network by portioning the input and kernel values according to register bank sizes to more efficiently process the large amounts of data ([0003], [0029-0031]). It would further have been obvious to temporarily store in the sum register through first in first out to achieve the benefit of avoiding collisions ([0194-0195]).
Regarding claim 16, Minkin in view of Gunnan teach the claim 15 limitations. Minkin further teaches:
wherein the processor is further configured to:
judge that a size of one of the convolution kernels is less than a computation amount of convolution computation ([0102-0103] judging completion based on index value); and
repeatedly provide the input data for the convolution kernels to perform convolution computation ([0004], [0102-0119] iterate for repeatedly provide).
Claims 7-8 are directed to a method that would be practiced by the apparatus as in claims 15-16 respectively. All steps recited in the method of claims 7-8 are performed by the apparatus as in claims 15-16 respectively, as configured. The claim 15-16 analysis applies equally to claims 4-5 respectively.
Allowable Subject Matter
Claims 4-5, and 12-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter.
Applicant claims apparatus and methods for convolution computation, wherein the apparatus as in claim 9 comprises:
at least one memory, used to store a code; and
a processor, coupled to the at least one memory and configured to load and execute the code to:
extend input data according to a padding mode to generate extended input data, wherein the input data is used for convolution computation;
provide coordinates of a two-dimensional coordinate system to a plurality of elements in the extended input data; and
read the elements in the extended input data according to location information, wherein the location information comprises a size of non-extended input data and coordinates of the elements in the extended input data, and the step of reading the elements in the extended input data comprises:
in response to a coordinate of one of the elements in the location information being located outside the non-extended input data in the two-dimensional coordinate system, converting the coordinate in the location information according to the padding mode, wherein the coordinate in the location information is mapped to a coordinate of the non-extended input data.
Wherein the apparatus as in claim 9 further comprises as in claim 10:
wherein the processor is further configured to:
set coordinates of the non-extended input data to be between 0 and w in a first dimension and between 0 and h in a second dimension, where w is a width of the non-extended input data and h is a height of the non-extended input data; and
set coordinates in the extended input data not belonging to the non-extended input data to be less than zero or greater than w in the first dimension and less than zero or greater than h in the second dimension.
Wherein the apparatus as in claim 10 further comprises as in claim 12:
wherein the padding mode is a reflect mirror mode, and the processor is further configured to:
determine that a coordinate of one of the elements corresponding to the location information is less than zero in the first dimension, and further convert a first coordinate of the element in the first dimension into an absolute value of the first coordinate;
determine that a coordinate of the element corresponding to the location information is greater than w in the first dimension, and further convert the first coordinate of the element into a difference between the first coordinate and twice w;
determine that a coordinate of the element corresponding to the location information is less than zero in the second dimension, and further convert a second coordinate of the element in the second dimension into an absolute value of the second coordinate; and
determine that a coordinate of the element corresponding to the location information is greater than h in the second dimension, and further convert the second coordinate of the element into a difference between the second coordinate and twice h.
Wherein the apparatus as in claim 10 further comprises as in claim 13:
wherein the padding mode is a symmetric mirror mode, and the processor is further configured to:
determine that a coordinate of one of the elements corresponding to the location information is less than zero in the first dimension, and further convert a first coordinate of the element in the first dimension into an absolute value of the first coordinate plus one;
determine that a coordinate of the element corresponding to the location information is greater than w in the first dimension, and further convert the first coordinate of the element into a difference between the first coordinate plus one and twice w;
determine that a coordinate of the element corresponding to the location information is less than zero in the second dimension, and further convert a second coordinate of the element in the second dimension into an absolute value of the second coordinate plus one; and
determine that a coordinate of the element corresponding to the location information is greater than h in the second dimension, and further convert the second coordinate of the element into a difference between the second coordinate plus one and twice h.
The primary reason for indication of allowable subject matter is the specific equations used to determine the coordinates with respect to the respective padding modes.
Minkin is the closest prior art found. Minkin discloses the claimed invention according to the above claim mappings. Minkin is silent with respect to both reflect mirror mode and symmetric mirror mode, and further neither explicitly discloses nor suggests the above highlighted equations for determining coordinate location with respect to padding.
Gunnam discloses the claimed invention according to the above claim mappings. Gunnam further discloses a padded feature map [0107], but is silent with respect to both reflect mirror mode and symmetric mirror mode, and further neither explicitly discloses nor suggests the above highlighted equations for determining coordinate location with respect to padding.
US 20200104690 A1 Bai et al., (hereinafter “Bai”) discloses a neural processing unit including a direct memory access core, which includes a read engine for reading and writing blocks of tensors in artificial neural networks (abstract, fig 5A-5C). Bai further discloses padding an input feature map including disclosure of both reflective and symmetric padding type. Bai does not, however, explicitly disclose or suggest the specific equations highlighted above for determining coordinate location with respect to padding.
US 20190392287 A1 Ovsiannikov et al., (hereinafter “Ovsiannikov”) discloses a neural processor wherein a multiplier multiplies tiles of activations and weights, then stored in an output register (abstract, fig 1B-1P). Ovsiannikov further discloses padding operations ([0318], [0322]), but is silent with respect to both mirror mode and symmetric mode. Ovsiannikov further does not, however explicitly disclose or suggest the specific equations highlighted above for determining coordinate location with respect to padding.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289. The examiner can normally be reached on 10:00am - 1200pm, 2:00pm - 8pm ET M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached on 571 272 3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182