Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
As per the instant application having Application No. 17/836,720, the amendment filed on 11/26/2025 is herein acknowledged. Claims 46, 62 and 69 have been amended and claims 1-45 have been canceled. Claims 46-75 are pending.
In the response to this Office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
REJECTIONS BASED ON PRIOR ART
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 46-47, 51-53, 62-63 and 67-68 are rejected under 35 U.S.C. 103 as being unpatentable over Yudanov (US 2021/0294746) in view of Tucek et al. (US 2012/0233381).
46. A memory controller of a shared memory system that is shared among a plurality of access agents, wherein the shared memory system is arranged into a set of shared resources (SRs), the memory controller comprising: [Yudanov teaches “a plurality of memory modules (e.g., see memory modules 102a and 102b) that can at least in part implement a global shared context 101… each of the memory modules of the system 100 has a plurality of physical memory partitions (e.g., see physical memory partitions 104a, 104b, 104c, 104d, 104e and 104f)…” (par. 0018) and processors 106a, 106b, 106c and 106d (par. 0018; figs. 1-4 and related text)]
input/output (I/0) circuitry arranged to: receive, from an individual access agent of the plurality of access agents, an access address for a memory transaction, [Yudanov teaches “at least on controller (e.g., a CPU or special-purpose controller), and at least one interface device configured to communicate input and output data for the memory module” (par. 0043) “interface devices… 208a and 208b… can be configured to communicate input and output data, including data related to the global shared context…” (par. 0047; fig. 2 and related text) (see figs. 1 and 3-4 and related text) where “In FIG. 6, the method 600 begins at step 602 with executing code and accessing physical memory of a system of memory modules based on virtual memory addresses of the system associated with memory accesses coded in code of computer programs, by a process of memory module of the system” (par. 0083)]
wherein the access address is assigned to at least one shared resource (SR) in the set of SRs, and access data stored in the at least one SR using the SR address; and [Yudanov teaches “The memory modules of the system can provide address space shared between the modules and applications running on the modules and/or coupled processors. The address space sharing can be achieved by having logical addresses global to the modules, and each logical address can be associated with a certain physical address of a specific module” (par. 0011) “An application running on an embodiment of the memory module system can have its own virtual address space. Association between virtual spaces of various applications an logical address space can be implemented through page tables… Such tables can provide virtual to logical addressing and can further associate a physical address at each module.” (par. 0012)]
control circuitry connected to the I/O circuitry, the control circuitry is arranged to translate the access address into the SR address based on a staggering parameter, wherein the staggering parameter is based on a number of bytes by which individual SR addresses of the set of SRs are staggered in the shared memory system [Yudanov teaches “a predetermined schema and/or algorithm of how virtual addresses map to logical and/or physical addresses, each virtual memory address of the memory module system can include a first sequence of bits. And, each shared memory address of the system can include a second sequence of bits. The mapping of the virtual memory addresses of the system to the shared memory addresses of the system can be based at least partly on mapping the first sequences of bits to the second sequences of bits and the second sequences of bits to the first sequences of bits. In such examples, the first sequence of bits of a virtual memory address of the system is at least in part offset with a second sequence of bits of a shared memory address of the system.” (par. 0031) where “[0032] The arrangement of the first and second sequences of bits can be offset from each other or a formula containing an offset can be used. Thus, it is possible to map an address range of some shared app or module shared among many applications. For example, an address range shared app or module can be fixed in the global shared context but it can be different in virtual space of apps that are using it via sharing. The difference is in the offset or a formula containing the offset. For example, if in global shared context, the shared module is mapped to the address range 30-40 and two applications, using it in their virtual address space, can have it mapped via offset: for the first application offset+100 (130-140) and for the second+1000 (1030-1040). In such an example, applications using the global shared context can map any range to their available virtual address space range by a simple offset or a formula containing an offset.” (see par. 0033)].
Yudanov does not expressly disclose “wherein: the control circuitry is to generate the SR address based upon at least one swizzling operation that involves portions of the access address; and at least one of the portions of the access address is determined, at least in part, based upon the staggering parameter; however, regarding these limitations, Tucek teaches [“[0035] FIG. 7 is a flowchart illustrating an exemplary wear leveling operation 300 applied to the memory system 1 of FIG. 1. In FIG. 7, the operation 300 begins with an operation to write data 30 to an area of the memory 10; the data 30 are accompanied by a logical address. To prevent excessive wear, or memory "hot spots," a wear leveling technique that involves translation down to the line level is applied.” “[0020] The line portion 24 of the address 20 is translated using the logical block address 22 to lookup a line remap key 33 in a line remap table 35. The line portion 24 of the address 20 and the key 33 are passed to function (F) 34, which produces a translated physical line address 25. Using this line address translation allows each block 12 of the memory 10 to use a different remap key for function F, and allows the behavior of the remap function F to be changed…. [0021] The subline portion 26 of the address 20 may be passed through without translation. In many memory types, updates can only be done at a line granularity, but reads may be done at a finer granularity; i.e., at the subline level.” “[0023] Use of a bit-swizzling function as the function 34 is illustrated in FIG. 2. Swizzling is a process in which an input is shifted, rotated, or mixed. In the context of the memory 10 of FIG. 1, the address bits are shifted according to a specified pattern to produce an output different from the input. In FIG. 2, bit-swizzling function 50 includes a number of swizzling elements 52 that either swap or do not swap locations. As shown networked together, the elements 52 can swizzle the incoming bits into any permutation. In FIG. 2, the function 50 can swizzle four bits into any permutation. The swizzling function 50 takes in six bits (either a 0 or a 1) as a remap control key to control the execution of the elements 52. FIG. 2 illustrates one specific remapping based on values of the six remap control key bits.” Where “[0033] In FIG. 6, assume that the logical address is a 10-bit address consisting of the first three bits as a logical block address, the next three bits as a logical line address, and the final four bits as an offset or byte address, aid tor example is 0011111000 (178). The logical block address 22 bits can take any value from zero to seven. The block remap table 132 provides that, a value of zero maps to one, one maps to four, two maps to there, and so on. The output of the translation process from the block remap table 132 is a physical block address 23. The physical block address is applied to the line key lookup table 135 to acquire a key by which the line address portion of the logical address is to be adjusted. The line remap key performs line remapping within a block of the same block size that the block remapping is performed. Assume that the function implemented by the line key lookup table is an addition function (as noted above, multiplication, swizzling, and other functions also are possible), and that the key is one, if the line address is seven, addition of one to this address results in a line remapping from seven to zero. That is, the line remap key of one is applied at the line remap table 133 to the logical line address 24 to yield a physical line address 25. Assuming further that the offset was 8, the logical address translation from 178 yields 408, or 1000001000. Thus, the physical block address 23, the physical line address 25, and the subline, or offset address 26 are combined to form physical address 28 (which corresponds to the claimed SR address)”; thus, teaching a portion of the physical address being determined based on swizzling function 34 which uses line address portion 24 of the logical address and a portion corresponding to sub-line address being determined based on the offset or staggering parameter]
Yudanov and Tucek are analogous art because they are from the same field of endeavor of memory access and control.
Before the effective filing date of the claimed inventions, it would have been obvious to a person of ordinary skill in the art to modify Yudanov to include wherein: the control circuitry is to generate the SR address based upon at least one swizzling operation that involves portions of the access address; and at least one of the portions of the access address is determined, at least in part, based upon the staggering parameter as taught by Tucek since doing so would provide the benefits of facilitating address mapping while also performing wear leveling operations in the memory device.
Therefore, it would have been obvious to combine Yudanov and Tucek for the benefit of creating a storage system/method to obtain the invention as specified in claim 46.
47. The memory controller of claim 46, wherein the staggering parameter is an offset by which the individual SR addresses are staggered in the shared memory system [Yudanov teaches offset (see pars. 0031-0033). Tucek teaches “Thus, the physical block address 23, the physical line address 25, and the subline, or offset address 26 are combined to form physical address 28, Note that the block translation portion is a coarse-grain update or translation; the line translation or update is a function-based remapping. Note also that this address translation scheme also allows very fine-grained addressing--to the byte level. However, the offset portion of the address also could be set to zero so that the smallest unit of memory addressed is a line; similarly, by setting the number of bits in the line portion of the address to zero, the smallest memory unit addressed is a block.”].
52. The memory controller of claim 46, wherein the I/O circuitry is arranged to: provide the accessed data to the individual access agent when the access address is received with a request to obtain data from the at least one SR; and cause storage of the received data in the at least one SR when the access address is received with data to be stored in the at least one SR [Yudanov teaches “interface devices… 208a and 208b… can be configured to communicate input and output data, including data related to the global shared context…” (par. 0047; fig. 2 and related text) (see figs. 1 and 3-4 and related text) where “In FIG. 6, the method 600 begins at step 602 with executing code and accessing physical memory of a system of memory modules based on virtual memory addresses of the system associated with memory accesses coded in code of computer programs, by a process of memory module of the system” (par. 0083). Tucek teaches write accesses (fig. 7 and related text) however, note that one of ordinary skill in the art would have found it obvious to also perform read accesses as taught by Yudanov since doing so would facilitate accessing the stored data].
62. One or more non-transitory computer-readable media (NTCRM) comprising instructions, wherein execution of the instructions by a memory controller of a shared memory system is to cause the memory controller to perform operations comprising: receive, from an individual access agent of the plurality of access agents, an access address for a memory transaction, wherein the access address is assigned to at least one shared resource (SR) in a set of SRs; translate the access address into an SR address based on a staggering parameter, wherein the staggering parameter is based on a number of bytes by which individual SR addresses of the set of SRs are staggered in the shared memory system; and access data stored in the at least one SR using the SR address; wherein: the memory controller is to generate the SR address based upon at least one swizzling operation that involves portions of the access address; and at least one of the portions of the access address is determined, at least in part, based upon the staggering parameter [The rationale in the rejection of claim 46 is herein incorporated].
63. The one or more NTCRM of claim 62, wherein the staggering parameter is an offset by which the individual SR addresses are staggered in the shared memory system [The rationale in the rejection of claim 47 is herein incorporated].
68. The one or more NTCRM of claim 62, wherein execution of the instructions is to cause the memory controller to: provide the accessed data to the individual access agent when the access address is received with a request to obtain data from the at least one SR; and cause storage of the received data in the at least one SR when the access address is received with data to be stored in the at least one SR [The rationale in the rejection of claim 52 is herein incorporated].
Claims 48-50, 64-66 are rejected under 35 U.S.C. 103 as being unpatentable over Yudanov (US 2021/0294746) in view of Tucek et al. (US 20120233381) as applied in the rejection of claim 46 above, and further in view of Zhao et al. (US 2018/0052776).
48. The memory controller of claim 46, wherein the access address includes: …and a stagger seed field, wherein the stagger seed field includes an stagger seed value, the stagger seed value is used for the translation [Yudanov teaches “a predetermined schema and/or algorithm of how virtual addresses map to logical and/or physical addresses, each virtual memory address of the memory module system can include a first sequence of bits. And, each shared memory address of the system can include a second sequence of bits. The mapping of the virtual memory addresses of the system to the shared memory addresses of the system can be based at least partly on mapping the first sequences of bits to the second sequences of bits and the second sequences of bits to the first sequences of bits. In such examples, the first sequence of bits of a virtual memory address of the system is at least in part offset with a second sequence of bits of a shared memory address of the system.” (par. 0031) where “[0032] The arrangement of the first and second sequences of bits can be offset from each other or a formula containing an offset can be used. Thus, it is possible to map an address range of some shared app or module shared among many applications. For example, an address range shared app or module can be fixed in the global shared context but it can be different in virtual space of apps that are using it via sharing. The difference is in the offset or a formula containing the offset. For example, if in global shared context, the shared module is mapped to the address range 30-40 and two applications, using it in their virtual address space, can have it mapped via offset: for the first application offset+100 (130-140) and for the second+1000 (1030-1040). In such an example, applications using the global shared context can map any range to their available virtual address space range by a simple offset or a formula containing an offset.” (see par. 0033). Tucek teaches “[0020] The line portion 24 of the address 20 is translated using the logical block address 22 to lookup a line remap key 33 in a line remap table 35. The line portion 24 of the address 20 and the key 33 are passed to function (F) 34, which produces a translated physical line address 25. Using this line address translation allows each block 12 of the memory 10 to use a different remap key for function F, and allows the behavior of the remap function F to be changed…. [0021] The subline portion 26 of the address 20 may be passed through without translation. In many memory types, updates can only be done at a line granularity, but reads may be done at a finer granularity; i.e., at the subline level.” “[0023] Use of a bit-swizzling function as the function 34 is illustrated in FIG. 2. Swizzling is a process in which an input is shifted, rotated, or mixed. In the context of the memory 10 of FIG. 1, the address bits are shifted according to a specified pattern to produce an output different from the input. In FIG. 2, bit-swizzling function 50 includes a number of swizzling elements 52 that either swap or do not swap locations. As shown networked together, the elements 52 can swizzle the incoming bits into any permutation. In FIG. 2, the function 50 can swizzle four bits into any permutation. The swizzling function 50 takes in six bits (either a 0 or a 1) as a remap control key to control the execution of the elements 52. FIG. 2 illustrates one specific remapping based on values of the six remap control key bits.”] but does not expressly disclose an agent address field, wherein the agent address field includes an agent address value, and the agent address value is a virtual address for the at least one SR in an access agent address space; however, regarding these limitations, Zhao teaches [“[0029] The shared virtual index translation table may be implemented in various forms. Each row of the shared virtual index translation table may be representative of a processing device. The shared virtual index translation table may be filled with a beginning virtual address and an ending virtual address for the allocated segment of the data input for each processing device. The shared virtual index translation table may also be filled with an offset for each processing device provided by the respective processing device. The shared virtual index translation table may be optionally filled with a stride value provided by the kernel for kernel functions that are executed using non-contiguous segments of the input data. The shared virtual index translation table may include multiple rows for a processing device associated with multiple outstanding kernels.” “[0034] In some implementations having a stride value, modifying successive physical addresses to the new base physical address may include the physical address generator adding the new base physical address and a result of the shared virtual index modulo stride value (i.e., new physical address=new base physical address+shared virtual index % stride value).” “[0064] Rather than ignoring the stride value for storing all of the output 412 to the privatized output buffer 404 as in FIG. 4B, successive modified physical addresses may be calculated to eliminate unused space created by the execution of the kernel function for noncontiguous portions of the portion of input data 402b because of the stride value. The modified physical addresses may be calculated by adding the new base physical address 408 to the shared virtual index modulo the stride value (i.e., new physical address=new base physical address+shared virtual index % stride value). Because the stride value may be accounted for in the calculation of the successive new physical addresses 414, the memory space of the memory 410 allocated to accommodate the privatized output buffer 404 may be compacted to a smaller size than when ignoring the stride value as in FIG. 4B.”].
Yudanov, Tucek and Han are analogous art because they are from the same field of endeavor of memory access and control.
Before the effective filing date of the claimed inventions, it would have been obvious to a person of ordinary skill in the art to modify Yudanov and Tucek to include an agent address field, wherein the agent address field includes an agent address value, and the agent address value is a virtual address for the at least one SR in an access agent address space as taught by Zhao since doing so would provide the benefits of facilitating memory sharing while privatizing or assigning memory portions to each processor (par. 0003).
Therefore, it would have been obvious to combine Yudanov and Tucek and Zhao for the benefit of creating a storage system/method to obtain the invention as specified in claim 48.
49. The memory controller of claim 48, wherein the control circuitry is arranged to: perform a bitwise operation on the agent address value using the stagger seed value to obtain the SR address, wherein the bitwise operation includes: a binary shift left operation based on a difference between a number of bits of the agent address field and the staggering parameter, or a binary addition operation to add the stagger seed value to the agent address value; and insert the SR address into the agent address field [Yudanov teaches “[0032] The arrangement of the first and second sequences of bits can be offset from each other or a formula containing an offset can be used. Thus, it is possible to map an address range of some shared app or module shared among many applications. For example, an address range shared app or module can be fixed in the global shared context but it can be different in virtual space of apps that are using it via sharing. The difference is in the offset or a formula containing the offset. For example, if in global shared context, the shared module is mapped to the address range 30-40 and two applications, using it in their virtual address space, can have it mapped via offset: for the first application offset+100 (130-140) and for the second+1000 (1030-1040). In such an example, applications using the global shared context can map any range to their available virtual address space range by a simple offset or a formula containing an offset.” (see par. 0033). Zhao teaches “[0064] Rather than ignoring the stride value for storing all of the output 412 to the privatized output buffer 404 as in FIG. 4B, successive modified physical addresses may be calculated to eliminate unused space created by the execution of the kernel function for noncontiguous portions of the portion of input data 402b because of the stride value. The modified physical addresses may be calculated by adding the new base physical address 408 to the shared virtual index modulo the stride value (i.e., new physical address=new base physical address+shared virtual index % stride value). Because the stride value may be accounted for in the calculation of the successive new physical addresses 414, the memory space of the memory 410 allocated to accommodate the privatized output buffer 404 may be compacted to a smaller size than when ignoring the stride value as in FIG. 4B.” Tucek teaches “[0020] The line portion 24 of the address 20 is translated using the logical block address 22 to lookup a line remap key 33 in a line remap table 35. The line portion 24 of the address 20 and the key 33 are passed to function (F) 34, which produces a translated physical line address 25. Using this line address translation allows each block 12 of the memory 10 to use a different remap key for function F, and allows the behavior of the remap function F to be changed…. [0021] The subline portion 26 of the address 20 may be passed through without translation. In many memory types, updates can only be done at a line granularity, but reads may be done at a finer granularity; i.e., at the subline level.” “[0022]… Possible bijective functions include XOR, addition, multiplication, various cryptographic functions, a bijective look up table, and a keyed bit swizzling function.” “[0023] Use of a bit-swizzling function as the function 34 is illustrated in FIG. 2. Swizzling is a process in which an input is shifted, rotated, or mixed. In the context of the memory 10 of FIG. 1, the address bits are shifted according to a specified pattern to produce an output different from the input. In FIG. 2, bit-swizzling function 50 includes a number of swizzling elements 52 that either swap or do not swap locations. As shown networked together, the elements 52 can swizzle the incoming bits into any permutation. In FIG. 2, the function 50 can swizzle four bits into any permutation. The swizzling function 50 takes in six bits (either a 0 or a 1) as a remap control key to control the execution of the elements 52. FIG. 2 illustrates one specific remapping based on values of the six remap control key bits.”].
50. The memory controller of claim 49, wherein the staggering parameter is a number of bits of the stagger seed field or a number of bits of the stagger seed value [Yudanov teaches “[0032]… For example, if in global shared context, the shared module is mapped to the address range 30-40 and two applications, using it in their virtual address space, can have it mapped via offset: for the first application offset+100 (130-140) and for the second+1000 (1030-1040). In such an example, applications using the global shared context can map any range to their available virtual address space range by a simple offset or a formula containing an offset.” (see par. 0033). Zhao teaches “[0064] Rather than ignoring the stride value for storing all of the output 412 to the privatized output buffer 404 as in FIG. 4B, successive modified physical addresses may be calculated to eliminate unused space created by the execution of the kernel function for noncontiguous portions of the portion of input data 402b because of the stride value. The modified physical addresses may be calculated by adding the new base physical address 408 to the shared virtual index modulo the stride value (i.e., new physical address=new base physical address+shared virtual index % stride value). Because the stride value may be accounted for in the calculation of the successive new physical addresses 414, the memory space of the memory 410 allocated to accommodate the privatized output buffer 404 may be compacted to a smaller size than when ignoring the stride value as in FIG. 4B.” Tucek teaches “[0023] Use of a bit-swizzling function as the function 34 is illustrated in FIG. 2. Swizzling is a process in which an input is shifted, rotated, or mixed. In the context of the memory 10 of FIG. 1, the address bits are shifted according to a specified pattern to produce an output different from the input. In FIG. 2, bit-swizzling function 50 includes a number of swizzling elements 52 that either swap or do not swap locations. As shown networked together, the elements 52 can swizzle the incoming bits into any permutation. In FIG. 2, the function 50 can swizzle four bits into any permutation. The swizzling function 50 takes in six bits (either a 0 or a 1) as a remap control key to control the execution of the elements 52. FIG. 2 illustrates one specific remapping based on values of the six remap control key bits.” “[0030] Modulo 8 addition is but one example of me function F. As discussed above, a swizzling function also may be used for the remapping. Other remapping functions include XOR, would be both faster and probably give better wear balancing than addition. For addition (and XOR) the key 33 in the line remap table 35 should be the same number of bits as the line portion 24 of the address 20. Other functions have different qualities, e.g. swizzling function 50 would requite a 6 bit key for a 4 bit line address 24.” (see par. 0033)].
51. The memory controller of claim 46, wherein data stored in the shared memory system is staggered by: half of a number of SRs in the set of SRs when the staggering parameter is one, a quarter of a number of SRs in the set of SRs when the staggering parameter is two, an eighth of a number of SRs in the set of SRs when the staggering parameter is three, a sixteenth of a number of SRs in the set of SRs when the staggering parameter is four, and a thirty-second of a number of SRs in the set of SRs when the staggering parameter is five [Yudanov teaches “[0032] The arrangement of the first and second sequences of bits can be offset from each other or a formula containing an offset can be used. Thus, it is possible to map an address range of some shared app or module shared among many applications. For example, an address range shared app or module can be fixed in the global shared context but it can be different in virtual space of apps that are using it via sharing. The difference is in the offset or a formula containing the offset. For example, if in global shared context, the shared module is mapped to the address range 30-40 and two applications, using it in their virtual address space, can have it mapped via offset: for the first application offset+100 (130-140) and for the second+1000 (1030-1040). In such an example, applications using the global shared context can map any range to their available virtual address space range by a simple offset or a formula containing an offset. Since virtual address space is flexible then an application can find a free range for mapping. The application compiler or interpreter or a hypervisor can provide semantics for integrating offset-based mapping into the app's framework.” Tucek teaches (pars. 0021-0023; 0030, 0033)] but does not expressly refer to the offset having values such that half of a number of SRs in the set of SRs when the staggering parameter is one, a quarter of a number of SRs in the set of SRs when the staggering parameter is two, an eighth of a number of SRs in the set of SRs when the staggering parameter is three, a sixteenth of a number of SRs in the set of SRs when the staggering parameter is four, and a thirty-second of a number of SRs in the set of SRs when the staggering parameter is five; however, in the manner that Yudanov and Tucek teach offsets of 100 and 1000 or offset corresponding to sub-line address; it would have been obvious to one of ordinary skill in the art to modify these values to include values of 1 an to stagger half of the storage resources or any other value or any other desired value, according to how many storage resources are to be addressed since doing so would involve a mere modification to the value of the offset and it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980). Therefore, it would have been obvious to one of ordinary skill in the art to modify the offset to have a value of 1, 2, 3, 4 or 5 since doing so would at least provide flexibility of design and allow for addressing the different storage resources accordingly.
53. The memory controller of claim 46, wherein the shared memory system has a size of two megabytes, the set of SRs includes 32 SRs, a size of each SR in the set of SRs is 64 kilobytes, and the memory transaction is 16 bytes wide [Yudanov teaches “[0011] The memory modules of the system can provide address space shared between the modules and applications running on the modules and/or coupled processors. The address space sharing can be achieved by having logical addresses global to the modules, and each logical address can be associated with a certain physical address of a specific module. In some embodiments, the logical address space can be the same size as a sum of physical address spaces of the modules in the memory module system” “[0023]… the logical address space can be the same size as a sum of physical address spaces of the modules in the memory module system. For example, if there are eight modules, then association (or mapping) from logical to physical addresses can be achieved by a predetermined first group of 3 bits at a predetermined position of address bits of logical and shared memory addresses associated with virtual memory addresses coded in a code or address space (e.g., 3 bits provides 2{circumflex over ( )}3 numbers or eight numbers—one number for each module of the eight modules). The rest of the logical and shared address bits or a part of them (such as a second group of bits) can be mapped to a specific physical address within each module using a second mapping scheme. The second mapping scheme can be as simple as one-to-one mapping or more complex scheme such as round robin among the banks of each memory device in the module, or interleaved, etc.” where Yudanov depict the memory module having a plurality of partitions or modules corresponding to the claimed SRs (see figs. 1-3 and related text). Tucek teaches (pars. 0021-0023; 0030, 0033)] but the combination of Yudanov and Tucek does not expressly disclose size of two megabytes, the set of SRs includes 32 SRs, a size of each SR in the set of SRs is 64 kilobytes, and the memory transaction is 16 bytes wide; however, modifying the system/method of the combination of Yudanov and Tucek to explicitly have the memory be a size of two megabytes, the set of SRs includes 32 SRs, a size of each SR in the set of SRs is 64 kilobytes, and the memory transaction is 16 bytes wide would have involved a mere change in the size of a component. A change in size is generally recognized as being within the level of ordinary skill in the art. In re Rose, 105 USPQ 237 (CCPA 1955). Therefore, it would have been obvious to one of ordinary kill in the art to explicitly configure Yudanov and Tucek to have the memory size of two megabytes, the set of SRs includes 32 SRs, a size of each SR in the set of SRs is 64 kilobytes, and the memory transaction is 16 bytes wide since doing so would at least provide flexibility of design.
64. The one or more NTCRM of claim 62, wherein execution of the instructions is to cause the memory controller to: an agent address field, wherein the agent address field includes an agent address value, and the agent address value is a virtual address for the at least one SR in an access agent address space, and a stagger seed field, wherein the stagger seed field includes an stagger seed value, the stagger seed value is used for the translation [The rationale in the rejection of claim 48 is herein incorporated].
65. The one or more NTCRM of claim 64, wherein execution of the instructions is to cause the memory controller to: perform a bitwise operation on the agent address value using the stagger seed value to obtain the SR address, wherein the bitwise operation includes: a binary shift left operation based on a difference between a number of bits of the agent address field and the staggering parameter, or a binary addition operation to add the stagger seed value to the agent address value; and insert the SR address into the agent address field [The rationale in the rejection of claim 49 is herein incorporated].
66. The one or more NTCRM of claim 65, wherein the staggering parameter is a number of bits of the stagger seed field or a number of bits of the stagger seed value [The rationale in the rejection of claim 50 is herein incorporated].
67. The one or more NTCRM of claim 62, wherein data stored in the shared memory system is staggered by: half of a number of SRs in the set of SRs when the staggering parameter is one, a quarter of a number of SRs in the set of SRs when the staggering parameter is two, an eighth of a number of SRs in the set of SRs when the staggering parameter is three, a sixteenth of a number of SRs in the set of SRs when the staggering parameter is four, and a thirty-second of a number of SRs in the set of SRs when the staggering parameter is five [The rationale in the rejection of claim 51 is herein incorporated].
Claims 54-57, 60-61, 69 and 72-75 are rejected under 35 U.S.C. 103 as being unpatentable over Yudanov (US 2021/0294746) in view of Tucek et al. (US 20120233381) as applied in the rejection of claim 46, and further in view of Han et al. (US 10,922,258).
54. The combination of Yudanov and Tucek teaches The memory controller of claim 46, but does not expressly disclose wherein each access agent of the plurality of access agents is connected to the shared memory system via a set of input delivery unit (IDU) ports and a set of output delivery unit (ODU) ports; however, regarding these limitations, Han teaches [“Each local memory 620 can include full ports (e.g., left two ports of local memory 620 in FIG. 7, which are associated with F.write and F.read ports involving 32 connections having 16 bits—one connection for each processing unit of the processing element) providing processing units 630 parallel access to local memory 620. Full ports can be used for SIMD access to private data, such as weights of a fully connected layer in a neural network. It is appreciated that local data stored in local memory 620 and shared directly with processing units 630 associated with the same local memory is treated as private data. Each local memory 620 can also include narrow ports (e.g., the right two ports of local memory 620 in FIG. 7, which are associated with N.write and N.read ports involving 1 connection having 16 bits) providing processing units 630 narrow access to the memory. Narrow ports can be used for broadcasting or broadcasted shared data. It is appreciated that remote data stored in another local memory and shared with all processing units 630 of all LMs is treated as shared data.” (col. 7, lines 3-22)].
Yudanov, Tucek and Han are analogous art because they are from the same field of endeavor of memory access and control.
Before the effective filing date of the claimed inventions, it would have been obvious to a person of ordinary skill in the art to modify Yudanov and Tucek to have the processors/access agents connected to the shared memory system via a set of input delivery unit (IDU) ports and a set of output delivery unit (ODU) ports as taught by Han since doing so would at least provide the benefits of facilitating connectivity and access to memory and “direct and fast way to share data among the 32 processing units that are associated with the same local memory” (col. 11, lines 30-50).
Therefore, it would have been obvious to combine Yudanov, Tucek and Han for the benefit of creating a storage system/method to obtain the invention as specified in claim 54.
55. The memory controller of claim 54, wherein the set of ODU ports has a first number of ports and the set of IDU ports has a second number of ports, wherein the first number is different than the second number [Han teaches “Each local memory 620 can include full ports (e.g., left two ports of local memory 620 in FIG. 7, which are associated with F.write and F.read ports involving 32 connections having 16 bits—one connection for each processing unit of the processing element) providing processing units 630 parallel access to local memory 620. Full ports can be used for SIMD access to private data, such as weights of a fully connected layer in a neural network. It is appreciated that local data stored in local memory 620 and shared directly with processing units 630 associated with the same local memory is treated as private data. Each local memory 620 can also include narrow ports (e.g., the right two ports of local memory 620 in FIG. 7, which are associated with N.write and N.read ports involving 1 connection having 16 bits) providing processing units 630 narrow access to the memory. Narrow ports can be used for broadcasting or broadcasted shared data. It is appreciated that remote data stored in another local memory and shared with all processing units 630 of all LMs is treated as shared data.” (col. 7, lines 3-22) (see col. 8, line 44-col. 9, line 16)]; where Han does not expressly refer to the number of ODU and IDU being different; however, one of ordinary skill in the art would have found it obvious to reconfigure the memory to have a different number of ODU and IDU ports by either adding more ODU or IDU port or omitting some of the ODU or IDU ports since doing so would involve mere duplication or omission of elements (St. Regis Paper Co. v Bemis CO., 193 USPQ8) (In re Karlson), 136 USPQ 184. Therefore, it would have been obvious to have a different number of ODU and IDU ports since doing so would at least provide the benefits of flexibility of design.
56. The memory controller of claim 55, wherein the memory controller is implemented by an infrastructure processing unit (IPU) configured to support one or more processors connected to the IPU [Yudanov teaches computing devices may include infrastructure computing/processing devices (fig. 4 and related text; see par. 0071) corresponding to the claimed IPU].
57. The memory controller of claim 56, wherein the IPU is part of an X-processing unit (XPU) arrangement, and the XPU arrangement also includes one or more processing elements connected to the IPU [Yudanov teaches “in some embodiments, the memory module can be configured to include a special purpose chip, such as a GPU, an artificial intelligence (AI) accelerator (corresponding to the claimed XPU), and/or a processing-in-memory (PIM) unit.” (par. 0044). Han teaches “While NPU architecture 500 of FIG. 5 incorporates the embodiments of the present disclosure, it is appreciated that the disclosed embodiments can be applied to chips with SIMD architecture for accelerating some applications such as deep learning. Such chips can be, for example, GPU, CPU with vector processing ability, or neural network accelerators for deep learning (corresponding to the claimed XPU).” (col. 5,line 61-col. 6, line 6)].
60. The memory controller of claim 57, wherein the shared memory system and the plurality of access agents are part of a compute tile [Han teaches “Cores 5024 can include one or more processing elements that each include single instruction, multiple data (SIMD) architecture including one or more processing units configured to perform one or more operations (e.g., multiplication, addition, multiply-accumulate, etc.) on the communicated data under the control of global manager 5022. To perform the operation on the communicated data packets, cores 5024 can include one or more processing elements for processing information in the data packets. Each processing element may comprise any number of processing units. In some embodiments, core 5024 can be considered a tile or the like” (col. 4, line 39-49)].
61. The memory controller of claim 60, wherein the shared memory system is a Neural Network (NN) Connection Matrix (CMX) memory device [Yudanov teaches “[0068] FIG. 4 illustrates the example networked system 400 that includes at least computing devices 402, 422a, 422b, 422c, and 422d, in accordance with some embodiments of the present disclosure. Also, FIG. 4 illustrates example parts of an example computing device 402 with is part of the networked system 400. And, FIG. 4 shows how such computing devices can be integrated into various machines, apparatuses, and systems, such as IoT devices, mobile devices, communication network devices and apparatuses (e.g., see base station 430), appliances (e.g., see appliance 440), and vehicles (e.g., see vehicle 450). It is to be understood that the parts and devices described with respect to FIG. 4 can use the global shared context to unify such devices and parts and provide acceleration of large-scale applications such as neural networks, big data applications, and machine learning used between the devices and parts.”. Han teaches “Deep neural network algorithms involve a large number of matrix calculations” (col. 1, lines 16-17) “FIG. 5 illustrates an exemplary neural network processing unit (NPU) architecture 500, according to embodiments of the disclosure. As shown in FIG. 5, NPU architecture 500 can include an on-chip communication system 502, a host memory 504, a memory controller 506, a direct memory access (DMA) unit 508, a Joint Test Action Group (JTAG)/Test Access End (TAP) controller 510, peripheral interface 512, a bus 514, a global memory 516, and the like. It is appreciated that on-chip communication system 502 can perform algorithmic operations based on communicated data. Moreover, NPU architecture 500 can include a global memory 516 having on-chip memory blocks (e.g., 4 blocks of 8 GB second generation of high bandwidth memory (HBM2)) to serve as main memory.” (col. 4, lines 13-26)].
69. A shared memory system that is shared among a plurality of processing devices, the shared memory system comprising: a plurality of shared resources (SRs) configured to store data in a staggered arrangement according to a staggering parameter, wherein the staggering parameter is based on a number of bytes by which individual share resource (SR) addresses of the plurality of SRs are staggered in the shared memory system; and a memory controller … and the memory controller is to: receive, from an individual processing device of the plurality of processing devices, an access address for a memory transaction, wherein the access address is assigned to at least one SR in the plurality of SRs, translate the access address into an SR address based on the staggering parameter, and access data stored in the at least one SR using the SR address; wherein: the memory controller is to generate the SR address based upon at least one swizzling operation that involves portions of the access address; and at least one of the portions of the access address is determined, at least in part, based upon the staggering parameter [The rationale in the rejection of claim 46 is herein incorporated]; but the combination of Yudanov and Tucek does not expressly disclose memory controller communicatively coupled with the plurality of processing devices via a set of input delivery unit (IDU) ports and a set of output delivery unit (ODU) ports of each processing device of the plurality of processing devices; however, regarding these limitations, Han teaches [“Memory controller 506 can manage the reading and writing of data to and from a specific memory block (e.g., HBM2) within global memory 516. For example, memory controller 506 can manage read/write data coming from outside chip communication system 502 (e.g., from DMA unit 508 or a DMA unit corresponding with another NPU) or from inside chip communication system 502 (e.g., from a local memory in core 5024 via a 2D mesh controlled by a task manager of global manager 5022). Moreover, while one memory controller is shown in FIG. 5, it is appreciated that more than one memory controller can be provided in NPU architecture 500. For example, there can be one memory controller for each memory block (e.g., HBM2) within global memory 516.” (col. 4, line 57-col. 5, line 3). “Each local memory 620 can include full ports (e.g., left two ports of local memory 620 in FIG. 7, which are associated with F.write and F.read ports involving 32 connections having 16 bits—one connection for each processing unit of the processing element) providing processing units 630 parallel access to local memory 620. Full ports can be used for SIMD access to private data, such as weights of a fully connected layer in a neural network. It is appreciated that local data stored in local memory 620 and shared directly with processing units 630 associated with the same local memory is treated as private data. Each local memory 620 can also include narrow ports (e.g., the right two ports of local memo