Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 10/3/2025.
Applicant arguments/remarks made in amendment filed 10/3/2025.
Claims 2, 4-5, 7-9, 12, 14, 16, and 18-21 are amended. Claims 2-21 are presented for examination.
Response to Arguments
Applicant presents arguments. Each is addressed.
Applicant argues “…independent claims 2, 9, and 16 have been amended, thus rendering the rejections moot. Applicant respectfully requests the Examiner to reconsider and withdraw the double patenting rejections.” (Remarks, page 7, paragraph 7.) Examiner notes that the amended claims are patently distinct from the claims in co-pending application 19/030,424. Therefore, the provisional nonstatutory double patenting rejection has been withdrawn. However, the nonstatutory double patenting rejection over claims 1, 3, 7, and 13 of US 11,783,174 B2 is maintained in view of new mappings to the amended claims.
Applicant argues “Therefore, Applicant respectfully requests that the rejection of claims 2-21 under 35 U.S.C. § 101 be reconsidered and withdrawn…”(Remarks, page 8, paragraph 3, line 8.) Examiner finds applicant’s arguments persuasive. The rejections under 35 U.S.C. § 101 are withdrawn.
Applicant argues “The combination of Martin and McQuillan does not teach or suggest each and every feature of claim 2” as amended. (Remarks, page 9, paragraph 2, line 1.) The argument is moot in view of new grounds of rejection necessitated by amendment. Independent claims 9 and 16 recite the same significant limitations as independent claim 1 and are similarly rejected. The dependent claims remain rejected, at least for depending from rejected base claims.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 2-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3, 7, and 13 of U.S. Patent No. US 11,783,174 B2. Although the claims at issue are not identical, they are not patentably distinct from each other.
Application 18/360,136
Patent US 11,783,174 B2
2
A method implemented by a neural processor circuit, comprising:
sending, work units from a data buffer circuit of the neural processor circuit to a plurality of neural engine circuits of the neural processor circuit, wherein
each of the work units corresponds to a segment of input data, and wherein
each of the plurality of neural engine circuits comprises a respective input buffer circuit;
storing the work units in the respective input buffer circuit of at least one neural engine circuit of the plurality of neural engine circuits;
sending, by a control circuit, a control signal to
the respective input buffer circuit of the at least one neural engine circuit to shift a portion of the work units to a
multiply-accumulator circuit of the at least one neural engine circuit;
shifting, by the respective input buffer circuit, the portion of the work units to the multiply- accumulator circuit based on at least the control signal; and
performing, by the multiply-accumulator circuit, a digital multiply and add operation on the shifted portion of the work units using a kernel.
7
A method of operating a neural processor circuit, comprising:
sending the work units from the data buffer to the neural engines separate from the data reader circuit and the data buffer,
each of the work units corresponding to a
portion of the segment of the input data
a third rasterizer circuit in each of the neural engines 1
storing the received segment of the input data in the data buffer of the neural processor circuit separate from the data reader circuit
instructing
the input buffer circuit to shift portions of the corresponding one of the units to be
multiplied with the kernel at different cycles
shift portions of the corresponding one of the units to be multiplied with the kernel at different cycles
performing a digital multiply and add operation on a portion of the corresponding one of the work units using a kernel to generate processed digital values in the at least one of the neural engines;
3
The method of claim 2, wherein
the segment of the input data
has a first size.
1
a segment of input data received
from a source external to the neural processor circuit work units
of a size that results in an output of a size that fits in an accumulator of each of the neural engines
4
The method of claim 3, further comprising: sending, to the plurality of neural engine circuits,
second work units corresponding to a second segment of the input data from the data buffer circuit, wherein the second segment of the input data has a second size.
1
a second rasterizer circuit configured to
track, independent from the first rasterizer circuit, the segment of the input data and work units stored in the data buffer, each of the work units corresponding to a portion of the segment of the input data stored in the data buffer
5
The method of claim 2, wherein shifting the portion of the work units comprises:
shifting the portion of the work units based on the control signal and task information, wherein
the task information indicates a manner in which the input data is segmented into the work units.
1
third rasterizer circuit instructing the input buffer circuit to
shift portions of the corresponding one of the work units to be multiplied with the kernel at different cycles
the task information indicating at least how the input data is segmented into the work units.
6
The method of claim 5,
wherein the task information further comprises:
a dimension of the input data.
3
wherein the task information further indicates
a dimension of the input data
7
The method of claim 2, further comprising:
generating a processed digital value based on the digital multiply and add operation on the shifted portion of the work units using the kernel; and
storing the processed digital value in an accumulator of the at least one neural engine circuit of the plurality of neural engine circuits.
1
perform a digital multiply and add operation on a portion of the corresponding one of the work units using a kernel to generate processed digital values
store the processed digital values, the corresponding one of the work units having a size that results in the processed
digital values of a size that fits in the accumulator,
8
The method of claim 7, further comprising:
performing, by the multiply-accumulator circuit, a second digital multiply and add operation on the processed digital value.
7
performing a digital multiply and add operation on a portion of the corresponding one of the work units using a kernel to generate processed digital values in the at least one of the neural engines
9
A neural processor circuit, comprising: a plurality of neural engine circuits, each comprising an input buffer circuit;
a data buffer circuit configured to provide, to a respective input buffer circuit of at least one neural engine circuit of the plurality of neural engine circuits, a respective work unit of a plurality of work units, wherein
1
A neural processor circuit,
comprising:
a data reader circuit comprising a first rasterizer circuit configured to track a segment of input data received from a source external to the neural processor circuit; a data buffer separate from the data reader circuit and configured to store the segment of the input data received from the data reader circuit
16
A system, comprising:
a data buffer circuit; and a plurality of neural engine circuits, each neural engine circuit of the plurality of neural engine circuits
13
An integrated circuit (IC) system comprising a neural processor circuit, the neural processor circuit comprising:
a data reader circuit and neural engines separate from the data reader circuit and the data buffer
Note 1: Examiner notes FIG.7 of patent US 11,783,174 B2, shows rasterizer 714 as receiving vector input from the Neural Task Manager 310, which in its broadest sense is functioning as a buffer circuit.
Claims 9-15 of the claimed invention are neural processor circuit claims corresponding to method claims 2-8, respectively. Otherwise, they are not patently distinct. The additional limitations of claim 9 are presented in the table and are not patentably distinct from limitations of claim 1 of the reference patent. Claims 16-21 of the claimed invention are system claims corresponding to method claims 2-7, respectively. Otherwise, they are not patently distinct. The additional elements of claim 16 are presented in the table and are not patentably distinct from limitations of claim 13 of the reference patent.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2-21 are rejected under 35 U.S.C. § 103 as being unpatentable over Martin, et al (US 2022/0043886 A1, Hardware Implementation of Convolutional Layer of Deep Neural Network, herein Martin), and Young, R. (US 9842293 B2, Batch Processing In a Neural Network Processor, herein Young).
Regarding claim 2,
Martin teaches a method, implemented by a neural processor circuit (Martin, FIG. 3, and, abstract, line 1 “Hardware implementations of, and methods for processing, a convolution layer of a DNN…” and paragraph [0011], line 1 “The hardware implementation of a convolutional layer of a DNN may be embodied in hardware on an integrated circuit.”
PNG
media_image1.png
439
678
media_image1.png
Greyscale
In other words, method is method, and convolution engine implemented on integrated circuit is implemented by a neural processor circuit.), comprising:
sending, work units from a data buffer circuit of the neural processor circuit to a plurality of neural engine circuits of the neural processor circuit (Martin, FIG. 9, and, paragraph [0007], line 1 “Described herein are hardware implementations, and methods for processing, a convolution layer of a DNN that comprise a plurality of convolution engines wherein the input data and weights are provided to the convolution engines in an order…” and, paragraph [0008], line 4 “each convolution engine comprising hardware logic configured to receive in each of a plurality of cycles a set of weights and a set of input data values…” and, paragraph [0024], line 1 “FIG. 9 is a block diagram of an example computer system in which the hardware implementation of the DNN is implemented”
PNG
media_image2.png
708
836
media_image2.png
Greyscale
PNG
media_image3.png
28
75
media_image3.png
Greyscale
In other words, set of input data values is work units, input buffer is data buffer circuit, input is provided is sending…work units from a data buffer circuit, and, from FIG. 3, plurality of convolution engines implemented on integrated circuit is plurality of neural engine circuits.) , wherein
each of the work units corresponds to a segment of input data (Martin, paragraph [0008], line 4 “each convolution engine comprising hardware logic configured to receive in each of a plurality of cycles a set of weights and a set of input data values…” In other words, each set of input data values is each of the work units corresponds to a segment of input data.) , and wherein
[each of the plurality of neural engine circuits comprises a respective input buffer circuit];
storing the work units in the respective input buffer circuit of at least one neural engine circuit of the plurality of neural engine circuits (Martin, FIG. 3. See above mapping. In other words, integrated circuit is circuit, convolution engine is neural engine, set of input values is work units, input buffer is input buffer circuit of at least one neural engine, and data sent to input buffer is storing the work units in the input circuit buffer.);
sending, by a control circuit, a control signal to the respective input buffer circuit of the at least one neural engine circuit (Martin, paragraph [0051], line 1 “In some cases, the hardware implementation 300 may also comprise an input buffer controller (not shown) which may be configured to obtain the plurality of input data values related to a particular convolution layer ( or fully connected layer) of a DNN from external memory (not shown) via a memory interface (not shown) and store the received input data values in the input buffer 310.” In other words, input buffer controller is control circuit that sends a signal to the respective input buffer circuit.) to
[shift a portion of the work units] to
a multiply-accumulator circuit of the at least one neural engine circuit (Martin, page 10, column 2, claim 8 “ The hardware implementation of claim 7, further comprising the plurality of convolution engines, each convolution engine comprising hardware logic configured to receive in each of the plurality of cycles a set of weights and a set of input data values, and perform a multiply accumulate operation on the set of weights and the set of input data values.” In other words, hardware logic configured to …perform a multiply accumulate operation is multiply-accumulator circuit, and each convolution engine comprising…hardware logic to perform multiply accumulate operation is of the at least one neural engine circuit.) ; and
performing, by the multiply-accumulator circuit, a digital multiply and add operation on the shifted portion of the work units using a kernel (Martin, paragraph [0008], line 1 “first aspect provides a hardware implementation of a convolution layer of a deep neural network, the hardware implementation comprising: a plurality of convolution engines, each convolution engine comprising hardware logic configured to receive in each of a plurality of cycles a set of
weights and a set of input data values, and perform a multiply accumulate operation on the set of weights and the set of input data values.” And, paragraph [0030], line 5 “The weights may be grouped to form or define one or more filters or kernels.” In other words, hardware implementation is circuit, perform a multiply accumulate operation is performing, by the multiply-accumulator circuit, a digital multiply and add operation, and kernel is kernel.).
Thus far, Martin does not explicitly teach each of the plurality of neural engine circuits comprises a respective input buffer circuit.
Young teaches each of the plurality of neural engine circuits comprises a respective input buffer circuit (Young, FIG. 3, and FIG. 4,
PNG
media_image4.png
411
483
media_image4.png
Greyscale
PNG
media_image5.png
367
483
media_image5.png
Greyscale
In other words, from FIG. 3, each cell is a neural engine circuit, and, from FIG. 4, which shows the circuitry of each cell, the activation register is respective input buffer circuit for each of the plurality of cells.)
Young teaches shift a portion of the work units (Young, column 5, line 23 “For example, over one clock cycle, the activation input at cell 314 can shift to an activation register in cell 316, which is to the right of cell 314. Similarly, the weight input at cell 316 can shift to a weight register at cell 318, which is below cell 314.”Examiner notes the specification of the instant application recites “By changing portions of input data provided to the computation core 416 via shifting, neural engine 314 can perform multiply-accumulate for different portions of input data based on fewer number of read operations.” Therefore, Examiner is interpreting shifting as changing where the allocation of input data goes. In other words, activation input at cell 314 can shift to an activation register in cell 316 is shift a portion of the work units.)
Both Martin and Young are directed to implementing neural networks on hardware, among other things. Martin teaches a method, implemented by a neural processor circuit comprising sending, work units from a data buffer circuit of the neural processor circuit to a plurality of neural engine circuits of the neural processor circuit, wherein each of the work units corresponds to a segment of input data; but does not explicitly teach each of the plurality of neural engine circuits comprises a respective input buffer circuit or shifting a portion of work units. Young teaches each of the plurality of neural engine circuits comprises a respective input buffer circuit and shifting a portion of work units.
In view of the teaching of Martin, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Young into Martin. This would result in a method, implemented by a neural processor circuit comprising sending, work units from a data buffer circuit of the neural processor circuit to a plurality of neural engine circuits of the neural processor circuit, wherein each of the work units corresponds to a segment of input data where each of the plurality of neural engine circuits comprises a respective input buffer circuit where the work units can be shifted.
One of ordinary skill in the art would be motivated to do this in order to more efficiently process neural network inputs in order to improve execution speed. (Young, column 1, line 38 “In general, this specification describes a special-purpose hardware circuit that computes neural network inferences. In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of generating a respective neural network output for each of a plurality of inputs, wherein the generating comprises processing each input through each of a plurality of neural network layers to generate the respective neural network output for the input…”)
Regarding claim 3,
The combination of Martin and Young teaches the method of claim 2, wherein
the segment of the input data has a first size (Martin, paragraph [0048], line 6 “For example, the software tool may provide the input buffer 310 with the dimensions of the input data (e.g. xxyxP of FIG. 2), the convolution window size ( e.g. nxmxP in FIG. 2) and the step sizes (s and t in FIG. 2).” In other words, dimensions of input data is the input data has a first size.)12. Regarding claim 4,
The combination of Martin and Young teaches the method of claim 3, further comprising:
sending, to the plurality of neural engine circuits, second work units corresponding to a second segment of the input data from the data buffer circuit, wherein the second segment of the input data has a second size (Martin, paragraph [0048], line 6 “For example, the software tool may provide the input buffer 310 with the dimensions of the input data (e.g. xxyxP of FIG. 2), the convolution window size ( e.g. nxmxP in FIG. 2) and the step sizes (s and t in FIG. 2).” In other words, provide input buffer is sending…second work units, and dimensions of input data is the input data has a second size.)
Regarding claim 5,
The combination of Martin and Young teaches the method of claim 2, wherein shifting the portion of the work units comprises:
shifting the portion of the work units based on the control signal and task information, wherein the task information indicates a manner in which the input data is segmented into the work units (Young, see mapping of claim 1, and column 6, line 19 “In some implementations, the cell also includes a control register. The control register can store a control signal that determines whether the cell should shift either the weight input or the activation input to adjacent cells.” In other words, shift either the weight input or the activation input is shifting a portion of the work units, and control signal that determines whether the cell should shift is control signal and task information indicates a manner in which the input data is segmented.)
Regarding claim 6,
The combination of Martin and Young teaches the method of claim 5, wherein the task information further comprises:
a dimension of the input data (Martin, paragraph [0031], line 2 “As can be seen in FIG. 2, the data 200 used in a DNN may be arranged as P planes of data, where each plane has a dimension X x Y.” In other words, P is input data, and dimension X x Y is dimension of the input data.)
Regarding claim 7,
The combination of Martin and Young teaches the method of claim 2, further comprising:
generating a processed digital value based on the digital multiply and add operation on the shifted portion of the work units using the kernel (Martin, FIG. 4, and, paragraph [0030], line 5 “The weights may be grouped to form or define one or more filters or kernels.”
PNG
media_image6.png
464
631
media_image6.png
Greyscale
In other words, w is weights that define a kernel, and, FIG. 4 shows generating a digital value based on a digital multiply and add operation on the portion of work units using the kernel.); and
storing the processed digital value in an accumulator of the at least one of the plurality of neural engine circuits (Martin, FIG. 3, In other words, from FIG. 3, accumulator is accumulator, and FIG. 3 shows storing the processed digital value in an accumulator for at least one of the plurality of neural engine circuits.).
Regarding claim 8,
The combination of Martin and Young teaches the method of claim 7, further comprising:
performing, by the multiply-accumulator circuit, a second digital multiply and add operation on the processed digital value (Martin, paragraph [0007], line 4 “…input data and weights are provided to the convolution engines in an order that allows input data and weights read from memory to be used in at least two filter-window calculations performed either by the same convolution engine in successive cycles or by different convolution engines in the same cycle…” In other words, input data and weights read from memory to be used… in at least two…calculations performed either by the same convolution engine in successive cycles or by different convolution engines in the same cycle is performing… a second digital multiply and add operation on the processed digital value.)
Claim 9 is a neural processor circuit claim corresponding to method claim 2. Otherwise, they are the same. Claim 9 recites the additional elements of “a neural processor circuit, comprising a plurality of neural engine circuits, each comprising an input buffer circuit, a data buffer circuit.” The combination of Martin and Young teaches this. (Martin, FIG. 3, FIG. 9, and, paragraph [0007], line 1, and, paragraph [0008], line 4. And, Young, FIG. 3, and FIG. 4. See mapping of claim 2.) Therefore, claim 9 is rejected for the same reasons as claim 2.)
Claims 10-15 are neural processor circuit claims corresponding to method claims 3-8, respectively. Otherwise, they are the same. Therefore, claims 10-15 are rejected for the same reasons as claims 3-8, respectively.
Claims 16-21 are a system, comprising a data buffer circuit, and a plurality of neural engine claims corresponding to method claims 2-7, respectively. Otherwise, they are the same. The combination of Martin and Young teaches a system comprising a data buffer circuit, and a plurality of neural engines. (Martin, FIG. 3, FIG. 9, and, paragraph [0007], line 1, and, paragraph [0008], line 4. FIG. 9 discloses a system. Also, see mapping of claim 2.). Therefore, claims 16-21 are rejected for the same reasons as claims 2-7, respectively.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.I.R./Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124