DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Korea on May 2nd, 2025. It is noted, however, that applicant has not filed a certified copy of the KR10-2024-0058729 application as required by 37 CFR 1.55.
The examiner notes that the certified copy has been received for KR10-2024-0088951 filed in Korea on July 5th, 2024.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on December 17th, 2024 was filed. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-5, 8-11, and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Fujiwara et al. (US Publication No. 2023/0068645 – “Fujiwara”) in view of Kim et al. (US Publication No. 2025/0335756 – “Kim”).
Regarding claim 1, Fujiwara teaches A memory device comprising: a wordline decoder configured to control a plurality of wordlines and to select a wordline, to which a first turn-on voltage is applied, depending on a weight value to be applied to an activation value; (Fujiwara paragraph [0082], When the transmission gate TG3 is turned ON by appropriate voltages on the read word lines RWL/RWLB, the stored piece of weight data corresponding to the logic state of the node QB is read through the inverter INV3 and the turned ON transmission gate TG3, and is supplied to the read bit line RBLB for use in a CIM operation. A turn-on voltage can be applied depending on stored weight data) a first memory cell array including memory cells respectively connected to wordlines including a first wordline and a second wordline from among the plurality of wordlines; (Fujiwara paragraph [0031], The memory array 112 further comprises a plurality of word lines (also referred to as “address lines”) WL1 to WLr extending along the rows, and a plurality of bit lines (also referred to as “data lines”) BL1 to BLt extending along the columns of the memory cells MC, where r and t are natural numbers) and a shift adder connected to the first memory cell array through a first bitline and a first bitline bar and configured to generate a first initial calculation result by adding a first input received through the first bitline and a second input received through the first bitline bar, (Fujiwara paragraph [0068], The input data IN are supplied to the other input LOC_2 of the logic circuit 416. In some embodiments, the input data IN comprises multiple bits serially supplied over several clock cycles to the logic circuit 416 to be processed together with the weight data W[0]. In at least one embodiment, the logic circuit 416 is configured to multiply the series of bits in the input data IN with the weight data W[0], and output the multiplication result to the MAC circuit 417 which is configured to perform further processing such as addition, shift, or the like, to obtain a final result of the CIM operation. The input weights may be manipulated through processing such as addition, which can be input through both a bitline and a bitline bar (i.e., see Fujiwara paragraph [0064], In the example configuration in FIG. 4A, the memory segment 412 is a memory column. The description herein with respect to the configuration and operation of the memory segment 412 being a memory column is applicable to other types of memory segments. The memory segment 412 comprises a plurality of memory cells MC[0] . . . MC[N] correspondingly storing weight data W[0] . . . W[N]. The memory cells MC[0] . . . MC[N] are coupled to a pair of a bit line BL and a complementary bit line BLB. For simplicity, the bit line BL is described herein and the description with respect to the bit line BL is applicable to the complementary bit line BLB) wherein the first bit is stored in a first memory cell connected to the first wordline, and wherein the second bit is stored in a second memory cell connected to the second wordline (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The weight data may be stored in a different memory cell corresponding to the bitline/wordline).
Fujiwara does not teach wherein the first memory cell array stores a first activation value including a first bit and a second bit.
However, Kim teaches wherein the first memory cell array stores a first activation value including a first bit and a second bit, (Kim paragraph [0006], In an embodiment of the present disclosure, a deep neural network accelerator may be provided. The deep neural network accelerator may include a memory cell array including memory cells arranged along word lines and bit lines, wherein at least one of the memory cells is includes a first transistor programmed such that a threshold voltage thereof is shifted by a shift voltage corresponding to a weight value; a row driver configured to apply a word line voltage corresponding to an input activation value to the word lines corresponding to the first transistor; and a column driver configured to measure a voltage drop caused by memory cells corresponding to a first bit line among the bit lines, wherein a gate-source voltage of the first transistor is a voltage of a sub-threshold region. Memory cells may store activation values corresponding to a plurality of bits for voltage activation).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Fujiwara with those of Kim. Kim teaches using activation values corresponding to bit values for determination of an activation/turn-on voltage, which allows for more accurate voltage outputs for efficient data storage and processing (i.e., see Kim paragraph [0013], With this approach, the processing to be performed is for example broken down into a series of smaller processing actions, each applied to a different bit range of the input data to be processed. This for example allows different bit ranges to be processed separately, and transferred independently to the second storage device, e.g. at different times. This for example allows a smaller storage to be used as the first storage device and can reduce the bandwidth for transfer of data to the second storage device by transferring the data in portions rather than all at once. In this way, the storage footprint for performing the accumulation can be reduced. Furthermore, the first and second bit ranges of the second storage device can be accessed independently, e.g. by being read from or written to at different times from each other. This can further reduce bandwidth for interacting with the second storage device).
Regarding claim 2, Fujiwara in view of Kim teaches The memory device of claim 1, wherein the first memory cell array further includes memory cells respectively connected to a third wordline and a fourth wordline from among the plurality of wordlines and further stores a second activation value including a third bit and a fourth bit, wherein the third bit is stored in a third memory cell connected to the third wordline, and wherein the fourth bit is stored in a fourth memory cell connected to the fourth wordline (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines).
Regarding claim 3, Fujiwara in view of Kim teaches The memory device of claim 1, wherein the wordline decoder is further configured to: control a plurality of wordline bars; and select a wordline bar to which a second turn-on voltage is applied, depending on a weight value, and wherein the plurality of wordline bars include wordline bars respectively connected to the plurality of memory cells included in the first memory cell array (Fujiwara paragraphs [0041-0042], The word line driver 122 is coupled to the memory array 112 via the word lines WL. The word line driver 122 is configured to decode a row address of the memory cell MC selected to be accessed in a read operation or a write operation. The word line driver 122 is configured to supply a voltage to the selected word line WL corresponding to the decoded row address, and a different voltage to the other, unselected word lines WL. The bit line driver 124 is coupled to the memory array 112 via the bit lines BL. The bit line driver 124 is configured to decode a column address of the memory cell MC selected to be accessed in a read operation or a write operation. The bit line driver 124 is configured to supply a voltage to the selected bit line BL corresponding to the decoded column address, and a different voltage to the other, unselected bit lines BL. Each of the different wordlines and bitlines may correspond to their own respective turn on voltage through weight values).
Regarding claim 4, Fujiwara in view of Kim teaches The memory device of claim 3, further comprising: a shift adder controller configured to provide the shift adder with sign information of the first input and sign information of the second input (Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The shift adder tree may utilize a bit from each input (i.e., a sign of the input) to perform the operation).
Regarding claim 5, Fujiwara in view of Kim teaches The memory device of claim 3, wherein the wordline decoder is configured to: apply the first turn-on voltage to one wordline among the plurality of wordlines such that data of a memory cell connected to the wordline to which the first turn-on voltage is applied are transferred to the shift adder through the first bitline; (Fujiwara paragraph [0082], When the transmission gate TG3 is turned ON by appropriate voltages on the read word lines RWL/RWLB, the stored piece of weight data corresponding to the logic state of the node QB is read through the inverter INV3 and the turned ON transmission gate TG3, and is supplied to the read bit line RBLB for use in a CIM operation. A turn-on voltage can be applied depending on stored weight data. Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The shift adder tree may utilize a bit from each input (i.e., a sign of the input) to perform the operation) and apply the second turn-on voltage to one wordline bar among the plurality of wordline bars such that data of a memory cell connected to the wordline bar to which the second turn-on voltage is applied are transferred to the shift adder through the first bitline bar (Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The shift adder tree may utilize a bit from each input (i.e., a sign of the input) to perform the operation).
Regarding claim 8, Fujiwara in view of Kim teaches The memory device of claim 4, wherein the shift adder includes a first shift adder, and wherein the memory device further comprises: a second memory cell array including memory cells respectively connected to a fifth wordline and a sixth wordline among the plurality of wordlines; (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines) a second shift adder connected to the second memory cell array through the first bitline and the first bitline bar and to generate a second initial calculation result by adding a third input received through the first bitline and a fourth input received through the first bitline bar; (Fujiwara paragraph [0098], The memory macro 630 further comprises 64 logic circuits which are schematically illustrated in FIG. 6A as NOR gates. Each of the 64 logic circuits is coupled to a corresponding data line among the 64 data lines XINLB, and a corresponding group of four read bit lines RBLB. For example, a logic circuit 646 is coupled to the data line 645 and the four read bit lines RBLB corresponding to the memory segment 612. In at least one embodiment, the logic circuit 646 corresponds to the logic circuit 516 described with respect to FIG. 5A. In some embodiments, the 64 logic circuits are 4-bit by 1-bit multipliers, as described herein with respect to FIG. 6B. The values that are input to the plurality of shift adder can be generated through a plurality of bitline bars, from a total of 64) and an adder tree connected to the first shift adder and the second shift adder, and configured to add the first initial calculation result and the second initial calculation result (Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. An adder tree may be used for multiple calculation results in a series of steps).
Regarding claim 9, Fujiwara in view of Kim teaches The memory device of claim 8, wherein the second memory cell array stores a third activation value including a fifth bit and a sixth bit, wherein the fifth bit is stored in a fifth memory cell connected to the fifth wordline, and wherein the sixth bit is stored in a sixth memory cell connected to the sixth wordline (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines).
Regarding claim 10, Fujiwara in view of Kim teaches The memory device of claim 9, wherein the plurality of memory cells are static random access memory (SRAM) cells (Fujiwara paragraph [0031], Various numbers of word lines and/or bit lines in the memory array 112 are within the scope of various embodiments. Example memory types of the memory cells MC include, but are not limited to, static random-access memory (SRAM) ... In one or more example embodiments described herein, the memory cells MC include SRAM memory cells).
Regarding claim 11, Fujiwara in view of Kim teaches The memory device of claim 9, wherein the first memory cell includes: an n-type metal-oxide-semiconductor (NMOS) transistor connected between the first bitline and a first node and including a gate node connected to the first wordline; a PMOS transistor connected between the first bitline bar and a second node and including a gate node connected to a first wordline bar; (Fujiwara Fig. 5A; Fujiwara paragraph [0081], FIG. 5A shows a schematic circuit diagram of an example multi-port memory cell 510, in accordance with some embodiments. In the example configuration in FIG. 5A, the multi-port memory cell 510 comprises inverters INV1, INV2, INV3 and transmission gates TG1, TG2, TG3. The inverters INV1, INV2 are coupled to form a storage circuit ST as described with respect to FIG. 4E. Each of the transmission gates TG1, TG2, TG3 comprises an NMOS transistor and a NMOS transistor. The gates of the PMOS and NMOS transistors in the transmission gate TG1 are correspondingly coupled to a pair of write word lines WWL/WWLB. The gates of the NMOS and PMOS transistors in the transmission gate TG2 are correspondingly coupled to the write word lines WWL/WWLB. The gates of the NMOS and PMOS transistors in the transmission gate TG3 are correspondingly coupled to a pair of read word lines RWL/RWLB. A PMOS and NMOS transistor may be used to calculate values received from wordlines) a first inverter including an input terminal connected to the first node and an output terminal connected to the second node; and a second inverter including an input terminal connected to the second node and an output terminal connected to the first node (Fujiwara paragraph [0077], The memory cell 400E comprises transistors M1, M2, and inverters INV1, INV2. An input of the inverter INV2 is coupled to an output of the inverter INV1 at a node Q. An output of the inverter INV2 is coupled to an input of the inverter INV1 at a node QB. Gates of the transistors M1, M2 are coupled to a word line WL. The transistor M1 is serially coupled between the node Q and a bit line BL. The transistor M2 is serially coupled between the node QB and a complementary bit line BLB. The inverters INV1, INV2 form a storage circuit ST for storing a datum corresponding to a logic state of the node Q or QB. The transistors M1, M2 are access transistors configured to couple the storage circuit ST to the bit lines BL/BLB for read access or write access, in response to an appropriate voltage applied to the word line WL. In some embodiments, each of the inverters INV1, INV2 comprises two transistors, resulting in a total of 6 transistors in the memory cell 400E. A plurality of inverters may receive data from nodes/storage cells).
Regarding claim 13, Fujiwara teaches An operation method of a memory device which provides a compute-in-memory, the method comprising: (Fujiwara paragraph [0024], Memory devices configured to perform computing-in-memory (CIM) operations (also referred to herein as CIM memory devices) are usable neural network applications, as well as other applications. A CIM memory device includes a memory array configured to store weight data to be used, together with input data, in one or more CIM operations. After one or more CIM operations, the weight data in the memory array are updated for further CIM operations) determining signs of a first activation value and a second activation value stored in a memory cell array including a plurality of memory cells connected to a plurality of wordlines; (Fujiwara paragraph [0064], In the example configuration in FIG. 4A, the memory segment 412 is a memory column. The description herein with respect to the configuration and operation of the memory segment 412 being a memory column is applicable to other types of memory segments. The memory segment 412 comprises a plurality of memory cells MC[0] . . . MC[N] correspondingly storing weight data W[0] . . . W[N]. The memory cells MC[0] . . . MC[N] are coupled to a pair of a bit line BL and a complementary bit line BLB. For simplicity, the bit line BL is described herein and the description with respect to the bit line BL is applicable to the complementary bit line BLB. The memory cells MC[0] . . . MC[N] are single-port memory cells configured to use the bit line BL in both a read operation and a write operation. The memory cells MC[0] . . . MC[N] are coupled to corresponding word lines WL[0] . . . WL[N] to be accessed via the corresponding word lines in a read operation or a write operation. The bit line BL is coupled to the weight buffer 414 to receive new weight data to be updated in one of the memory cells MC[0] . . . MC[N] in a write operation. The activation values corresponding to the weight data may be stored in memory cells coupled to wordlines) determining a sign of a first weight to be applied to the first activation value and a sign of a second weight to be applied to the second activation value; (Fujiwara paragraph [0034], Each of the memory cells MC is configured to store a piece of weight data to be used in a CIM operation. In one or more example embodiments described herein, the memory cells MC are single-bit memory cells, i.e., each memory cell is configured to store a bit of weight data. This is an example, and multi-bit memory cells, each of which is configured to store more than one bit of weight data, are within the scopes of various embodiments. In some embodiments, a single-bit memory cell is also referred to as a bitcell. For example, the memory cell 113 coupled to the word line WL1 and the bit line BLt is configured to store a piece W1,t of the weight data. A combination of multiple pieces of weight data stored in multiple memory cells constitutes a weight value to be used in a CIM operation. For simplicity, a piece of weight data stored in a memory cell MC, multiple pieces of weight data stored in multiple memory cells MC, or all pieces of weight data stored in all memory cells MC of the memory array 112 are referred to herein as weight data) and adding a first calculation result obtained by applying the first weight to the first activation value and a second calculation result obtained by applying the second weight to the second activation value for each digit, (Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The result of the CIM operation based on the weight data stored in the memory array 602. A shift adder may obtain a sum calculated from bits based on weight data) wherein the first activation value is stored in first memory cells respectively connected to wordlines including a first wordline and a second wordline from among the plurality of wordlines, (Fujiwara paragraph [0041], The word line driver 122 is coupled to the memory array 112 via the word lines WL. The word line driver 122 is configured to decode a row address of the memory cell MC selected to be accessed in a read operation or a write operation. The word line driver 122 is configured to supply a voltage to the selected word line WL corresponding to the decoded row address, and a different voltage to the other, unselected word lines WL. The activation value (i.e., turn-on voltage) may be stored for each wordline).
Fujiwara does not teach wherein the first weight is applied to the first activation value, based on selecting a wordline, to which a first turn-on voltage is applied, from among the plurality of wordlines.
However, Kim teaches wherein the first weight is applied to the first activation value, based on selecting a wordline, to which a first turn-on voltage is applied, from among the plurality of wordlines (Kim paragraph [0006], In an embodiment of the present disclosure, a deep neural network accelerator may be provided. The deep neural network accelerator may include a memory cell array including memory cells arranged along word lines and bit lines, wherein at least one of the memory cells is includes a first transistor programmed such that a threshold voltage thereof is shifted by a shift voltage corresponding to a weight value; a row driver configured to apply a word line voltage corresponding to an input activation value to the word lines corresponding to the first transistor; and a column driver configured to measure a voltage drop caused by memory cells corresponding to a first bit line among the bit lines, wherein a gate-source voltage of the first transistor is a voltage of a sub-threshold region. Memory cells may store activation values corresponding to a plurality of bits for voltage activation).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Fujiwara with those of Kim. Kim teaches using activation values corresponding to bit values for determination of an activation/turn-on voltage, which allows for more accurate voltage outputs for efficient data storage and processing (i.e., see Kim paragraph [0013], With this approach, the processing to be performed is for example broken down into a series of smaller processing actions, each applied to a different bit range of the input data to be processed. This for example allows different bit ranges to be processed separately, and transferred independently to the second storage device, e.g. at different times. This for example allows a smaller storage to be used as the first storage device and can reduce the bandwidth for transfer of data to the second storage device by transferring the data in portions rather than all at once. In this way, the storage footprint for performing the accumulation can be reduced. Furthermore, the first and second bit ranges of the second storage device can be accessed independently, e.g. by being read from or written to at different times from each other. This can further reduce bandwidth for interacting with the second storage device).
Regarding claim 14, Fujiwara in view of Kim teaches The method of claim 13, wherein the first activation value includes: a first bit stored in a first memory cell connected to the first wordline; and a second bit stored in a second memory cell connected to the second wordline, and wherein the first memory cells are respectively connected to first wordline bars (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines).
Regarding claim 15, Fujiwara in view of Kim teaches The method of claim 13, wherein the second activation value is stored in second memory cells respectively connected to wordlines including a third wordline and a fourth wordline from among the plurality of wordlines, and wherein the second activation value includes: a third bit stored in a third memory cell connected to the third wordline; and a fourth bit stored in a fourth memory cell connected to the fourth wordline (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines).
Regarding claim 16, Fujiwara in view of Kim teaches The method of claim 14, wherein the first weight is applied to the first activation value, further based on selecting a wordline bar, to which a second turn-on voltage is applied, from among the first wordline bars (Fujiwara paragraphs [0041-0042], The word line driver 122 is coupled to the memory array 112 via the word lines WL. The word line driver 122 is configured to decode a row address of the memory cell MC selected to be accessed in a read operation or a write operation. The word line driver 122 is configured to supply a voltage to the selected word line WL corresponding to the decoded row address, and a different voltage to the other, unselected word lines WL. The bit line driver 124 is coupled to the memory array 112 via the bit lines BL. The bit line driver 124 is configured to decode a column address of the memory cell MC selected to be accessed in a read operation or a write operation. The bit line driver 124 is configured to supply a voltage to the selected bit line BL corresponding to the decoded column address, and a different voltage to the other, unselected bit lines BL. Each of the different wordlines and bitlines may correspond to their own respective turn on voltage through weight values).
Regarding claim 17, Fujiwara in view of Kim teaches The method of claim 16, wherein a fifth bit corresponding to a first digit of the first calculation result is transferred to a shift adder, (Fujiwara paragraph [0082], When the transmission gate TG3 is turned ON by appropriate voltages on the read word lines RWL/RWLB, the stored piece of weight data corresponding to the logic state of the node QB is read through the inverter INV3 and the turned ON transmission gate TG3, and is supplied to the read bit line RBLB for use in a CIM operation. A turn-on voltage can be applied depending on stored weight data. Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The shift adder tree may utilize a bit from each input (i.e., a sign of the input) to perform the operation) based on that a first turn-on voltage is applied to a wordline connected to a fifth memory cell storing the fifth bit, wherein a sixth bit corresponding to the first digit of the second calculation result is transferred to the shift adder, based on that a second turn-on voltage is applied to a wordline bar connected to a memory cell storing the sixth bit, (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines) and wherein the shift adder generates a first initial calculation result by adding the fifth bit and the sixth bit (Fujiwara paragraph [0082], When the transmission gate TG3 is turned ON by appropriate voltages on the read word lines RWL/RWLB, the stored piece of weight data corresponding to the logic state of the node QB is read through the inverter INV3 and the turned ON transmission gate TG3, and is supplied to the read bit line RBLB for use in a CIM operation. A turn-on voltage can be applied depending on stored weight data. Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The shift adder tree may utilize a bit from each input (i.e., a sign of the input) to perform the operation).
Regarding claim 18, Fujiwara teaches provide a compute-in-memory, wherein the memory cluster module includes: (Fujiwara paragraph [0024], Memory devices configured to perform computing-in-memory (CIM) operations (also referred to herein as CIM memory devices) are usable neural network applications, as well as other applications. A CIM memory device includes a memory array configured to store weight data to be used, together with input data, in one or more CIM operations. After one or more CIM operations, the weight data in the memory array are updated for further CIM operations) a memory block configured to store the data including activation values and to perform a weight calculation between the activation values; a weight management block configured to control the memory block such that weights are applied to the activation values; (Fujiwara paragraph [0034], Each of the memory cells MC is configured to store a piece of weight data to be used in a CIM operation. In one or more example embodiments described herein, the memory cells MC are single-bit memory cells, i.e., each memory cell is configured to store a bit of weight data. This is an example, and multi-bit memory cells, each of which is configured to store more than one bit of weight data, are within the scopes of various embodiments. In some embodiments, a single-bit memory cell is also referred to as a bitcell. For example, the memory cell 113 coupled to the word line WL1 and the bit line BLt is configured to store a piece W1,t of the weight data. A combination of multiple pieces of weight data stored in multiple memory cells constitutes a weight value to be used in a CIM operation. For simplicity, a piece of weight data stored in a memory cell MC, multiple pieces of weight data stored in multiple memory cells MC, or all pieces of weight data stored in all memory cells MC of the memory array 112 are referred to herein as weight data. The activation values may be calculated based on stored weight data in the memory cells) and a partial-sum accumulation block configured to sum an initial calculation result generated by the memory block to generate an overall calculation result, (Fujiwara paragraph [0099], The memory macro 630 further comprises an adder tree 650 comprising a plurality of adders arranged in multiple stages to accumulate the multiplication results output from the 64 logic circuits or multipliers. In the example configuration in FIG. 6A, the adder tree 650 includes six stages, including adders 647 in the first stage, adders 648 in the second stage, and an adder 649 in the sixth stage. In some embodiments, the adder tree 650 and the 64 logic circuits together form a MAC circuit 651. The result output from the final adder 649 comprises a 10-bit word, in one or more embodiments. In the example configuration in FIG. 6A, the memory macro 630 further comprises a bit shifter and accumulator 652 configured to further perform bit shifting and/or accumulation in response to control signals SIGN and ACM_EN. In some embodiments, the bit shifter and accumulator 652 is simplified or omitted, or has a different configuration. The result of the CIM operation based on the weight data stored in the memory array 602 and the input data XIN is outputted from the memory macro 630 as NOUT. In the example configuration in FIG. 6A, the weight buffer 614 and weight demux 642 are included in the memory macro 630, whereas the write address demux 641, read address demux 643 and input driver 644 are outside the memory macro 630 and are included in a memory controller as described herein. The sums calculated based on the operation from the shift adder are utilized in an adder tree, resulting in partial sums being calculated before the final summation) and a memory cell array connected to the plurality of wordlines and configured to store a first activation value including a first bit and a second bit, and wherein the first bit is stored in a first memory cell connected to a first wordline, and the second bit is stored in a second memory cell connected to the second wordline (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The weight data may be stored in a different memory cell corresponding to the bitline/wordline).
Fujiwara does not teach An accelerator comprising: a processing unit configured to control the accelerator and to perform an operation of the accelerator; and a memory cluster module configured to store data of the accelerator; wherein the memory block includes: a wordline decoder configured to control a plurality of wordlines including a first wordline and a second wordline and to apply the weights to the activation values, based on selecting a wordline to which a first turn-on voltage is applied.
However, Kim teaches An accelerator comprising: a processing unit configured to control the accelerator and to perform an operation of the accelerator; (Kim paragraph [0008], In an embodiment of the present disclosure, an electronic device may be provided. The electronic device may include a deep neural network accelerator configured to perform a matrix computation of a deep neural network; a memory configured to store therein at least partial data of the deep neural network; and a processor configured to control the deep neural network accelerator and the memory. An accelerator may be used to perform in-memory computing, and may include a processor/memory, as described above) and a memory cluster module configured to store data of the accelerator (Kim paragraph [0005-0006], A purpose of the present disclosure is to provide a deep neural network accelerator having a structure of a memory cell array including memory cells, each composed of a transistor having a charge storage layer, and an electronic device including the same. In an embodiment of the present disclosure, a deep neural network accelerator may be provided. The deep neural network accelerator may include a memory cell array including memory cells arranged along word lines and bit lines) wherein the memory block includes: a wordline decoder configured to control a plurality of wordlines including a first wordline and a second wordline and to apply the weights to the activation values, based on selecting a wordline to which a first turn-on voltage is applied; (Kim paragraph [0006], In an embodiment of the present disclosure, a deep neural network accelerator may be provided. The deep neural network accelerator may include a memory cell array including memory cells arranged along word lines and bit lines, wherein at least one of the memory cells is includes a first transistor programmed such that a threshold voltage thereof is shifted by a shift voltage corresponding to a weight value; a row driver configured to apply a word line voltage corresponding to an input activation value to the word lines corresponding to the first transistor; and a column driver configured to measure a voltage drop caused by memory cells corresponding to a first bit line among the bit lines, wherein a gate-source voltage of the first transistor is a voltage of a sub-threshold region. Memory cells may store activation values corresponding to a plurality of bits for voltage activation).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Fujiwara with those of Kim. Kim teaches using activation values corresponding to bit values for determination of an activation/turn-on voltage, which allows for more accurate voltage outputs for efficient data storage and processing (i.e., see Kim paragraph [0013], With this approach, the processing to be performed is for example broken down into a series of smaller processing actions, each applied to a different bit range of the input data to be processed. This for example allows different bit ranges to be processed separately, and transferred independently to the second storage device, e.g. at different times. This for example allows a smaller storage to be used as the first storage device and can reduce the bandwidth for transfer of data to the second storage device by transferring the data in portions rather than all at once. In this way, the storage footprint for performing the accumulation can be reduced. Furthermore, the first and second bit ranges of the second storage device can be accessed independently, e.g. by being read from or written to at different times from each other. This can further reduce bandwidth for interacting with the second storage device).
Regarding claim 19, Fujiwara in view of Kim teaches The accelerator of claim 18, wherein the memory block further includes: a shift adder connected to a first bitline and a first bitline bar connected to the memory cell array, and configured to sum a third bit received through the first bitline and a fourth bit received through the first bitline bar (Fujiwara paragraph [0102], The logic circuit 646 comprises four NOR gates 670-673. First inputs of the NOR gates 670-673 are commonly coupled to the data line 645 to receive the sequence of four bits of input data. Second inputs of the NOR gates 670-673 are correspondingly coupled to the read bit lines RBLB [0]˜[3] coupled to the corresponding four memory columns in the memory segment 612. Four bits of weight data read from a selected row of the memory segment 612 are applied to the second inputs of the NOR gates 670-673. The values corresponding to a plurality of bitlines may be connected to a shift adder for calculating a sum, such as in paragraph [0103], In a first clock cycle, the NOR gates 670-673 multiply each bit of the four bits of weight data with a first bit in the sequence of four bits of input data, and corresponding first results are output to the corresponding adder(s) in the first stage of the adder tree 650).
Regarding claim 20, Fujiwara in view of Kim teaches The accelerator of claim 19, wherein the memory cell array is further connected to a plurality of wordline bars, wherein the third bit is read from a memory cell connected to a wordline, to which the first turn-on voltage is applied, from among the plurality of wordlines, (Fujiwara paragraph [0069], While the CIM operation is being performed using the latched weight data W[0], the bit line BL is isolated by the register 415 from the logic circuit 416 and MAC circuit 417, and is usable for weight data updating in one or more of the memory cells without affecting the CIM operation, and without being affected by the CIM memory device. For example, one of the memory cells MC[1]˜MC[N] is accessed in a write operation by a pulse 428 on the corresponding word line WL[1]_WL[N]. A corresponding new piece of weight data Wn[1]˜Wn[N] is supplied from the weight buffer 414 to the bit line BL and is written, or updated, in the accessed memory cell among the memory cells MC[1]˜MC[N]. The bits may be written in n memory cells corresponding to n word lines) and wherein the fourth bit is connected to a memory cell connected to a wordline bar, to which a second turn-on voltage is applied, from among the plurality of wordline bars (Fujiwara paragraphs [0041-0042], The word line driver 122 is coupled to the memory array 112 via the word lines WL. The word line driver 122 is configured to decode a row address of the memory cell MC selected to be accessed in a read operation or a write operation. The word line driver 122 is configured to supply a voltage to the selected word line WL corresponding to the decoded row address, and a different voltage to the other, unselected word lines WL. The bit line driver 124 is coupled to the memory array 112 via the bit lines BL. The bit line driver 124 is configured to decode a column address of the memory cell MC selected to be accessed in a read operation or a write operation. The bit line driver 124 is configured to supply a voltage to the selected bit line BL corresponding to the decoded column address, and a different voltage to the other, unselected bit lines BL. Each of the different wordlines and bitlines may correspond to their own respective turn on voltage through weight values).
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Fujiwara in view of Kim as applied to claim 9 above, and further in view of Symes et al. (US Publication No. 2024/0248621 – “Symes”).
Regarding claim 12, Fujiwara in view of Kim in further view of Symes teaches The memory device of claim 9, wherein the first activation value and the third activation value are stored in a sign-magnitude form, (see claim 9 Fujiwara above) and wherein the first initial calculation result and the second initial calculation result express a negative number in the form of a 2’s complement (Symes paragraph [0033], The resulting bit output from the first XOR blocks 106a, 106b is provided as an input to a respective second XOR block 108a, 108b along with the output from the respective multiplier block 104a, 104b (which is 12 bits in size) to conditionally take the 1's complement. The bit output by each first XOR block 106a, 106b is also fed into an adder tree 110, along with the output from each of the second XOR blocks 108a, 108b (which is also 12 bits in size), for the adder tree 110 to add the 2's complement of negative multiplier outputs. The adder tree 110 in this example is an 8-input adder tree with 8 carry-ins. A further adder 112 is then used to add the output of the adder tree 110 to a first storage device 114 (labelled in FIG. 1 as L1_A) so as to accumulate a set of operation data for an operation (in this case, a multiplication) with stored data stored within the first storage device 114. In this example, after a predetermined number of operation cycles, the accumulated data within the first storage device 114 is also written to a further first storage device 116 (labelled in FIG. 1 as L1_B). The first storage device 114 and the further first storage device 116 in this example are each accumulators, and may each be referred to as a respective primary accumulator, or a respective L1 accumulator. A 2’s complement negative value may be calculated based on inputs of values to a shift adder).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Fujiwara and Kim with those of Symes. Symes teaches a calculation result expressing a negative number in a 2’s complement form, which may allow for more efficient calculations and minimal storage usage (i.e., see Symes paragraph [0033], The resulting bit output from the first XOR blocks 106a, 106b is provided as an input to a respective second XOR block 108a, 108b along with the output from the respective multiplier block 104a, 104b (which is 12 bits in size) to conditionally take the 1's complement. The bit output by each first XOR block 106a, 106b is also fed into an adder tree 110, along with the output from each of the second XOR blocks 108a, 108b (which is also 12 bits in size), for the adder tree 110 to add the 2's complement of negative multiplier outputs. The adder tree 110 in this example is an 8-input adder tree with 8 carry-ins. A further adder 112 is then used to add the output of the adder tree 110 to a first storage device 114 (labelled in FIG. 1 as L1_A) so as to accumulate a set of operation data for an operation (in this case, a multiplication) with stored data stored within the first storage device 114. In this example, after a predetermined number of operation cycles, the accumulated data within the first storage device 114 is also written to a further first storage device 116 (labelled in FIG. 1 as L1_B). The first storage device 114 and the further first storage device 116 in this example are each accumulators, and may each be referred to as a respective primary accumulator, or a respective L1 accumulator).
Allowable Subject Matter
Claims 6-7 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: Dependent claim 6 has been indicated as containing allowable subject matter. The claim contains the following limitations: “The memory device of claim 4, wherein the shift adder includes: a first multiplexer including a first input terminal connected to the first bitline and a second input terminal receiving logic “0”, and configured to select a first output value in response to the sign information of the first input; a NOT gate including an input connected to the first bitline bar; a second multiplexer including a first input terminal connected to an output terminal of the NOT gate and a second input terminal receiving logic “0”, and configured to select a second output value in response to the sign information of the second input; a third multiplexer including a first input terminal connected to the first bitline and a second input terminal receiving logic “0”, and configured to select a third output value in response to an inverse signal of shift signal; and a fourth multiplexer including a first input terminal connected to the first bitline and a second input terminal receiving the first output value of the first multiplexer, and configured to select a fourth output value in response to the shift signal, and wherein the shift adder generates the first initial calculation result, based on the first output value, the second output value, the third output value, and the fourth output value.”
The specific process described in dependent claim 6 further limiting dependent claim 4 teaching the use of sign values to determine weights of given bitline and wordline based on voltage activation values. Claim 6 further teaches the use of sign information to determine inputs to bitline and bitline bars for a plurality of input terminals to a multiplexer, as well as the specific outputs determined based on predetermined logic operations and logic states including the use of an inverted shift signal is not taught in the technological field and is indicated as objected to for depending upon a rejection claim. Dependent claim 7 depends on dependent claim 6 and is objected to for similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yoo et al. (US Publication No. 2023/0195420) teaches a computing-in-memory system wherein a bitline bar can store logic values which can be used to perform various logic operations, such as “AND” or “NOR” operations (i.e., see Yoo paragraph [0068], FIG. 5 is a Computing-in-Memory (CIM) truth table according to an embodiment of the present invention. Referring to FIG. 5, it can be seen that a local bitline value 53 is determined through an AND operation performed on a bit 51 stored in the memory cell 13 and a bit 52 precharged in the local bitline and that a local bitline bar value 54 is determined through a NOR operation performed on the bit 51 stored in the memory cell 13 and the bit 52 precharged in the local bitline).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONAH C KRIEGER whose telephone number is (571)272-3627. The examiner can normally be reached Monday - Friday 8 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocio Del Mar Perez-Velez can be reached at (571)-270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.C.K./Examiner, Art Unit 2133
/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2133