Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
1. Claims 1-6, 10-11, and 15-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims recite performing a mathematical algorithm that eliminates a multiplier element from the process. This judicial exception is not integrated into a practical application. The mere recitation of an SRAM or SRAM device fails to render the claim statutory as the additional element of using a processing device to perform the mathematical operation amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
2. Claims 1, 5-10, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng (US 11,822,617) in view of Rosing et al (US 2022/0019441, herein Rosing).
Regarding claim 1, Zheng teaches a Random-Access Memory (RAM) device configured to improve in-RAM processing in deep neural network (DNN) systems by eliminating one or more digital to analog converters (DACs) (4:47-55, RAM PIM device, 5:4-12, PIM blocks to eliminate ADCs/DACs), the RAM device comprising:
a deep neural network (DNN) operator that eliminates processes in a correlation of a weight (w) and an input (x) (5:4-24, eliminate neural network processes by utilizing PIM blocks in compute mode & 10:15-35, neural network operations on weights and inputs).
Zheng fails to teach wherein the RAM device is a static random-access memory (SRAM) device or wherein the DNN operator explicitly eliminates multiplication processes.
Rosing teaches a Static Random-Access Memory (SRAM) device ([0383], SRAM) wherein a deep neural network (DNN) operator eliminates multiplication processes ([0332-0334], [0348], elimination of multiplication steps of neural network & correlated datasets).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and Rosing to utilize SRAM in the PIM device and to specifically eliminate multiplication operations in the neural network operator. While Zheng discloses many types of RAM being potential options for implementing the processing-in-memory blocks, Zheng does not explicitly name SRAM as one of them. However, SRAM is a routine and conventional form of RAM and thus implementing the neural network within an SRAM PIM block would merely be a simple design choice as disclosed by Rosing. Additionally, while Zheng states that the PIM blocks may be used to eliminate dot product and convolution operations, which one of ordinary skill in the art would understand both contain multiplication operations, Zheng does not explicitly name multiplication as the operation being eliminated as Rosing does. However, as both disclose the use of processing-in-memory blocks for performing deep neural network operations, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Regarding claim 5, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, further comprising an analog to digital converter (ADC) that obviates the need for a dedicated ADC primitive (Rosing [0223], [0501], ADCs to digitize results in PIM blocks).
Regarding claim 6, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, configured to both store DNN weights and locally process mixed DNN layers to reduce traffic between a processor and memory units (Zheng Fig 4, 5:25-35, 8:25-55, storing data in PIM device to perform neural network operation on weights and other inputs).
Regarding claim 7, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, defined by an array of cells, wherein each cell only performs a 1-bit logic operation, and a plurality of outputs are integrated over time for multibit operations (Zheng 8:48-51, performing operations on single bit inputs then repeating for multibit operation).
Regarding claim 8, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, further comprising a charge/current representation of the operands to reduce the computation to charge/current summation over a wire, to eliminate the need for dedicated modules and operation cycles for product summations (Rosing [0225], [0227], accumulated currents representing result values, [0286], tracking bitline currents & Zheng 4:62-64, tracking bit line currents to form output result).
Regarding claim 9, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, wherein the array is configured to map one or more DNNs with one or more weight matrices in the order of megabytes (Zheng 5:56-58, 11:55-59, mapping input matrices, 10:15-35, weight matrix mapped to multiple sets of PIM blocks when size is greater than 256x256, Rosing [0430], [0506], matric sizes of 1024x1024).
Regarding claim 10, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, configured for single-ended processing (Rosing [0513], transmitting execution voltage on one input bitline and ground on second input bitline).
Regarding claim 15, Zheng teaches a process performed by a Random-Access Memory (RAM) device, the process configured to improve processing in deep neural network (DNN) systems (4:47-55, RAM PIM device), the process including instructions for performing by the RAM the steps of:
eliminating one or more digital to analog converters (5:4-12, PIM blocks to eliminate ADCs/DACs); and
multiplying a one-bit element sign(x) against a full precision weight (w), and a one-bit sign(w) against an input (x) to avoid direct multiplication between full precision variables while performing step of processing at least one of binary DNN layers and mixed DNN layers (6:40-42, 6:50-60 matrix multiplication, 8:57-60, single bit sign values & 10:27-32, high precision weight values as inputs, 8:43-51, binary precision DNN layer).
Zheng fails to teach wherein the RAM device is a static random-access memory (SRAM) device.
Rosing teaches a Static Random-Access Memory (SRAM) device ([0383], SRAM) wherein a deep neural network (DNN) operator eliminates multiplication processes ([0332-0334], [0348], elimination of multiplication steps of neural network & correlated datasets).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and Rosing to utilize SRAM in the PIM device and to specifically eliminate multiplication operations in the neural network operator. While Zheng discloses many types of RAM being potential options for implementing the processing-in-memory blocks, Zheng does not explicitly name SRAM as one of them. However, SRAM is a routine and conventional form of RAM and thus implementing the neural network within an SRAM PIM block would merely be a simple design choice as disclosed by Rosing. Additionally, while Zheng states that the PIM blocks may be used to eliminate dot product and convolution operations, which one of ordinary skill in the art would understand both contain multiplication operations, Zheng does not explicitly name multiplication as the operation being eliminated as Rosing does. However, as both disclose the use of processing-in-memory blocks for performing deep neural network operations, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Regarding claim 16, the combination of Zheng and Rosing teaches the process according to claim 15, wherein further comprising the step of processing within a single product port of SRAM cells, thus reducing dynamic energy of the system (Zheng 9:41-47, single output for a column of PIM blocks computing kernel operations, 4:32-36, improving energy efficiency through PIM usage).
Regarding claim 17, the combination of Zheng and Rosing teaches the process according to claim 15, wherein the process is configured for single-ended processing (Rosing [0513], transmitting execution voltage on one input bitline and ground on second input bitline).
3. Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng and Rosing in view of Afrasiyabi et al (“Non-Euclidean Vector Product for Neural Networks”, herein Afrasiyabi).
Regarding claim 2, the combination of Zheng and Rosing teaches the SRAM device according to claim 1. Zheng and Rosing fail to teach wherein the DNN operator is w⨁x=∑isign(xi)⋅abs(wi)+sign(wi)⋅abs(xi) wherein· is an element-wise multiplication operator, + is an element-wise addition operator, Σ is a vector sum operator, sign() operator is ±1 and abs() operator produces an absolute unsigned value of the operand w or the operand x.
Afrasiyabi teaches a device wherein a DNN operator is w⨁x=∑isign(xi)⋅abs(wi)+sign(wi)⋅abs(xi) wherein· is an element-wise multiplication operator, + is an element-wise addition operator, Σ is a vector sum operator, sign() operator is ±1 and abs() operator produces an absolute unsigned value of the operand w or the operand x (Abstract, §2, vector product defined by summation of the signs of two inputs with their absolute values).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and Rosing with those of Afrasiyabi to utilize the modified vector product operation as another means to eliminate multiplication operations in the neural network operator. While Zheng states that the PIM blocks may be used to eliminate dot product and convolution operations, which one of ordinary skill in the art would understand both contain multiplication operations, Zheng teaches the dot product as being performed using an AND gate and a sign bit of each element and handling negative values in the final accumulation stage (Zheng 8:44-62). Utilizing the vector product operation taught by Afrasiyabi would eliminate the need for the final subtraction operation by using absolute values in the dot product operation, thus improving the efficiency of the processing device. As all three references disclose techniques for increasing energy efficiency and removing certain operations from the typical order of neural network operations, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Regarding claim 3, the combination of Zheng and Rosing teaches the SRAM device according to claim 1, wherein the DNN operator performs the steps of multiplying one-bit sign(x) against higher precision w, and one-bit sign(w) against higher precision x (Zheng 6:40-42, 6:50-60 matrix multiplication, 8:57-60, single bit sign values & 10:27-32, high precision weight values as inputs).
Zheng fails to teach wherein the sign values are multiplied by absolute values of the inputs.
Afrasiyabi teaches a device wherein a DNN operator performs the steps of multiplying one-bit sign(x) against abs(w), and one-bit sign(w) against abs(x) (Abstract, §2, vector product defined by summation of the signs of two inputs with their absolute values).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and Rosing with those of Afrasiyabi to utilize the modified vector product operation as another means to eliminate multiplication operations in the neural network operator. While Zheng states that the PIM blocks may be used to eliminate dot product and convolution operations, which one of ordinary skill in the art would understand both contain multiplication operations, Zheng teaches the dot product as being performed using an AND gate and a sign bit of each element and handling negative values in the final accumulation stage (Zheng 8:44-62). Utilizing the vector product operation taught by Afrasiyabi would eliminate the need for the final subtraction operation by using absolute values in the dot product operation, thus improving the efficiency of the processing device. As all three references disclose techniques for increasing energy efficiency and removing certain operations from the typical order of neural network operations, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
4. Claims 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng and Rosing in view of Nurvitadhi et al (US 11,210,760, herein Nurvitadhi).
Regarding claim 11, the combination of Zheng and Rosing teaches the SRAM device of claim 1 configured to facilitate summing of weight input products (Zheng 8:51-62, accumulation of products).
Zheng and Rosing fail to teach wherein the summing occurs in the time-domain and frequency domain.
Nurvitadhi teaches a device configured to facilitate the summation of time-domain and frequency domain products (37:14-36, matrix multiplication routine of neural network utilizing both time and frequency domains).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and Rosing with those of Afrasiyabi to utilize both time and frequency domains for performing convolution operations in the neural network. While Zheng and Rosing do not explicitly disclose the use of transforms to convert between a time and frequency domain, Zheng does disclose the transforming of digital inputs into and back from analog signals (Zheng 4:55-58) and all three references disclose performing convolution operations (Zheng 5:35-40, Rosing [0533]). As Zheng describes the PIM blocks being reconfigurable to optimize these types of operations, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Claim 18 refers to a process embodiment of the device embodiment of claim 11. Therefore, the above rejection for claim 11 is applicable to claim 18.
5. Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng in view of He et al (US 2023/0186979, herein He).
Regarding claim 19, Zheng teaches a random-access memory (RAM) (4:50) comprising:
a first array half (5:25-55, array of PIM blocks split according to processing mode); and
a second array half, wherein bit lines in the first array half compute a weight-input correlation and bit lines in the second array half digitize a correlation output (5:25-55, array of PIM blocks split into sets, 5:19-24, 9:21-57, 10:66-11:11, kernel weights input to CNN layer to calculate correlation to inputs, PIM array set used for computation, 4:55-56, PIM block includes analog to digital converter).
Zheng fails to teach wherein the RAM is a static random-access memory (SRAM) or the second array half processes a binary search to digitize the correlation output.
He teaches an SRAM device configured to process a binary search to digitize a correlation output ([0028], SRAM cell, [0023], SRAM cell generates MAC operation results for neural network weights and inputs, [0029-0032], quantization of results to a digital representation by SAR ADC, [0030], binary search).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and He to utilize a binary search for quantizing the analog values into a digital representation. While Zheng does not disclose the implementation details of the ADC used to quantize the neural network outputs, one of ordinary skill in the art would understand that the binary search taught by He is a conventional means of performing analog to digital conversion. As both Zheng and He disclose using RAM cells to perform processing-in-memory operations in a neural network, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results.
Regarding claim 20, the combination of Zheng and He teaches the SRAM according to claim 19, wherein the binary search is a successive approximation-based analog-to-digital converter (He [0030], SAR ADC).
6. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zheng and Rosing in view of He.
Regarding claim 12, the combination of Zheng and Rosing teaches the SRAM device according to claim 7, wherein the array comprises: a first array half (Zheng 5:25-55, array of PIM blocks split according to processing mode); and a second array half, wherein bit lines in the first array half compute a weight-input correlation and bit lines in the second array half digitize a correlation output (Zheng 5:25-55, array of PIM blocks split into sets, 5:19-24, 9:21-57, 10:66-11:11, kernel weights input to CNN layer to calculate correlation to inputs, PIM array set used for computation, 4:55-56, PIM block includes analog to digital converter).
Zheng and Rosing fails to teach wherein the second array half processes a binary search to digitize the correlation output.
He teaches an SRAM device configured to process a binary search to digitize a correlation output ([0028], SRAM cell, [0023], SRAM cell generates MAC operation results for neural network weights and inputs, [0029-0032], quantization of results to a digital representation by SAR ADC, [0030], binary search).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Zheng and Rosing with those of He to utilize a binary search for quantizing the analog values into a digital representation. While Zheng does not disclose the implementation details of the ADC used to quantize the neural network outputs, one of ordinary skill in the art would understand that the binary search taught by He is a conventional means of performing analog to digital conversion. As both Zheng and He disclose using RAM cells to perform processing-in-memory operations in a neural network, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results.
Allowable Subject Matter
7. Claims 13 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Rasch (US 12,169,534) discloses an SRAM processing-in-memory device including analog to digital converters.
Far (US 11,615,256) discloses a compute-in-memory neural network processor using digital to analog converters.
Jia (US 2023/0074229) discloses a compute-in-memory array using digital to analog converters.
Sun (US 2022/0398037) discloses a deep neural network processor operating on absolute values and using digital to analog converters.
Yudanov (US 2021/0397932) discloses a processor that performs quantization using a successive approximation analog to digital converter.
Lee (US 2020/0218962) discloses a neural network processor that calculates weight and input correlations and a binary search for quantization.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 8:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL J METZGER/ Primary Examiner, Art Unit 2183