DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 5, 6, 8-12, and 14-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Fang et al. (NetDAM: Network Direct Attached Memory with Programmable In-Memory Computing ISA).
Consider Claim 1,
FANG teaches a system for performing distributed reduction operations using near-memory computation, the system comprising:
a first near-memory compute node (FANG, e.g., Fig 6, shows plural first nodes.); and
a second near-memory compute node coupled to the first near-memory compute node (FANG, e.g., Fig 6, shows plural second nodes coupled to a first node.), wherein
the first near-memory compute node comprises a processor, memory, and a processing-in-memory (PIM) execution unit (FANG, e.g., Fig 2, shows nodes may include CPU or an accelerator; §3.1;Fig 1, NetDAM includes memory and ALU (PIM) units.) configured to:
receive one or more memory access requests addressed to a memory address (FANG, e.g., §3.1, describes executing an instruction resulting in memory access. This requires receipt of an instruction which requests memory access. Figs 7 and 8 show accesses to local memory which requires them to be addressed to a memory address.); and
responsive to the memory address being within a memory address range, trigger the PIM execution unit to perform a reduce-scatter operation (FANG, e.g. §3.1, NetDAM logic functions to store first data (i.e., A1) in SRAM, add second data (i.e., B1), and store the result; §2.5, each device has its own memory address space and therefore accesses to local memory must be responsive to the memory address being within the local memory address range.) comprising operations to:
store first data loaded from the second near-memory compute node (FANG, e.g., §3.1, describes fetching first data (e.g., A1) from a second node and storing on a first node.);
perform a reduction operation on second data and the first data to compute a result (FANG, e.g., §3.1, describes adding second data (e.g., B1) to the first data to compute a SUM. A sum is a reduction operation that reduces two values to one.); and
store the result within the first near-memory compute node (FANG, e.g., §3.1, store the sum result in SRAM of the receiving node (i.e., the first node); Fig 8, SRAM is node internal.).
Consider Claim 5,
FANG further teaches wherein each of the one or more memory access requests addressed to the memory address bypasses a cache hierarchy of a host processor that issues the memory access request (FANG, e.g., Figs 7 and 8, illustrates the data movement through the node without going through a cache hierarchy of a host processor. This is considered analogous to the claimed bypass.).
Consider Claim 6,
FANG further teaches wherein the triggering of the reduce-scatter operation is responsive to one or more of the memory access requests including an indication of a memory request type (FANG, e.g., §2.2; Fig 3, operation is triggered based on an instruction which identifies a memory access type.).
Consider Claim 8,
FANG further teaches wherein performing the reduction operation on the second data and the first data includes performing an add, multiply, MIN, MAX, AND, OR, or XOR operation on the first data and the second data to compute the result (FANG, e.g., §3.1;Fig 6, describes summing/adding first data and second data to compute the result.).
Consider Claim 9,
FANG further teaches a system wherein storing the result within the first near- memory compute node includes executing a PIM store command within the first near-memory compute node (FANG, e.g., §3.1 ¶3, execute instruction to add B1 and store the sum result in SRAM. The examiner notes that the broadest reasonable interpretation of a “PIM store command” includes commands used to store PIM results.).
Consider Claim 10,
FANG further teaches wherein the first and second near-memory compute nodes are coupled to a plurality of other near-memory compute nodes in at least one of a ring topology or a tree topology (FANG, e.g., §3, based on ring topology; Fig 6, coupled to plural other near-memory compute nodes; §2.3, may use tree or torus (i.e., ring) topology.).
Consider Claim 11,
FANG further teaches wherein the reduction operation forms part of an all-reduce operation (FANG, e.g., §3, implement ring all-reduce in NetDAM.).
Consider Claim 12,
FANG teaches an apparatus for performing distributed reduction operations using near-memory computation, the apparatus comprising:
memory (FANG, e.g., Fig 1, NetDAM includes memory.); and
a first processing-in-memory (PIM) execution unit configured to:
receive a memory access request addressed to a memory address (FANG, e.g., §3.1, describes executing an instruction resulting in memory access. This requires receipt of an instruction which requests memory access. Figs 7 and 8 show accesses to local memory which requires them to be addressed to a memory address.); and
responsive to the memory address being within a memory address range, trigger execution of a combined PIM load and a PIM add command (FANG, e.g. §3.1, NetDAM logic functions to store (i.e. load) first data (i.e., A1) in SRAM, add second data (i.e., B1), and store the result; §2.5, each device has its own memory address space and therefore accesses to local memory must be responsive to the memory address being within the local memory address range.) to:
load first data from a second PIM execution unit (FANG, e.g., §3.1, describes fetching first data (e.g., A1) from a second node (i.e., PIM execution unit) and loading it into a first node.);
perform a reduction operation on second data and the first data to compute a first result (FANG, e.g., §3.1, describes adding second data (e.g., B1) to the first data to compute a SUM. A sum is a reduction operation that reduces two values to one.); and
store the first result within the memory of the first PIM execution unit (FANG, e.g., §3.1, store the sum result in SRAM of the receiving node (i.e., the first node); Fig 8, SRAM is node internal.).
Consider Claim 14,
FANG further teaches wherein the memory access request is addressed to a memory address, and the execution is triggered in response to the memory address being within a memory address range (FANG, e.g., §2.5, global virtual address is translated to a local address. A local address corresponds to a range of addresses withing the global address space. Operations on a particular device are triggered when the identified address is within the local memory address range of the particular device.).
Consider Claim 15,
FANG further teaches wherein the execution is triggered in response to the memory access request including an indication of a memory request type (FANG, e.g., §2.2; Fig 3, operation is triggered based on an instruction which identifies a memory access type.).
Consider Claim 16,
FANG further teaches wherein the first data is used as a first operand and the second data is used as a second operand of the reduction operation (FANG, e.g., §3.1, describes summing first data (i.e., A1) and second data (i.e., B1) as part of the reduction operation.).
Consider Claim 17,
FANG further teaches wherein the PIM execution unit is coupled to a plurality of PIM execution units in at least one of a ring topology or a tree topology (FANG, e.g., §3, based on ring topology; Fig 6, coupled to plural other near-memory compute nodes; §2.3, may use tree or torus (i.e., ring) topology.).
Consider Claim 18,
FANG teaches a method for performing distributed reduction operations using near-memory computation, the method comprising:
receiving, by a first near-memory compute node of a plurality of near-memory compute nodes, one or more memory access requests addressed to a memory address (FANG, e.g., §3.1, describes executing an instruction in a first near-memory compute node resulting in memory access. This requires receipt of an instruction which requests memory access. Figs 7 and 8 show accesses to local memory which requires them to be addressed to a memory address.); and
triggering, responsive to the memory address being within a memory address range, a reduce-scatter operation comprising operations (FANG, e.g. §3.1, reduce-scatter operation; §2.5, each device has its own memory address space and therefore accesses to local memory must be responsive to the memory address being within the local memory address range.) including:
storing, by the first near-memory compute node, first data within the first near-memory compute node, the first data being loaded from a second near-memory compute node (FANG, e.g., §3.1, describes fetching first data (e.g., A1) from a second node and storing on a first node.);
performing, by the first near-memory compute node, a reduction operation on second data and the first data to compute a result (FANG, e.g., §3.1, describes adding second data (e.g., B1) to the first data to compute a SUM. A sum is a reduction operation that reduces two values to one.); and
storing, by the first near-memory compute node, the result within the first near-memory compute node (FANG, e.g., §3.1, store the sum result in SRAM of the receiving node (i.e., the first node); Fig 8, SRAM is node internal.).
Consider Claim 19,
FANG further teaches wherein performing the reduction operation on the second data and the first data includes adding, multiplying, minimizing, maximizing, ANDing, or ORing the first data and the second data to compute the first result (FANG, e.g., §3.1;Fig 6, describes summing/adding first data and second data to compute the result.).
Consider Claim 20,
FANG further teaches wherein the reduction operation forms part of an all-reduce operation (FANG, e.g., §3, implement ring all-reduce in NetDAM.).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3, 4, and 7 are rejected under 35 U.S.C. 103 as being unpatentable over FANG.
Consider Claim 3,
FANG teaches the system of claim 2, and further teaches CPU based memory access (FANG, e.g., §1.1, CPU can attach to netDAM and share the memory pool.), but fails to expressly describe wherein the one or more memory access requests are received from the processor of the first near-memory compute node.
FANG shows nodes which are illustrated to include a NetDAM component and a CPU (FANG, e.g., Fig 2) and further describes inter and intra-host access (FANG, e.g., Fig 1). In other words, requests may come locally or remotely from a limited number of types of components (e.g., CPU, accelerator, or storage). The examiner takes official notice of the fact that memory requests are commonly issued by a CPU. It would have been obvious to a person of ordinary skill in the art, prior to the effective filing date of the invention, to modify the system of FANG such that the one or more memory access requests are received from the processor of the first near-memory compute node because it is a notoriously well-known and common method of accessing same-node memory.
Consider Claim 4,
The modified system of FANG further describes wherein the processor is configured to send the one or more access requests to the second near-memory compute node (FANG, e.g., §3.1; Fig 6, describes forwarding “SUM” and a result to the next node;§2.2, segment routing header could be a chaining function to processing packet on different node;§3.1, Ring Reduce-Scatter is a chaining function. In other words, FANG teaches instructions to send the access requests to another (i.e., second) near-memory compute node.).
Consider Claim 7,
FANG further teaches wherein the one or more access requests are received from a second processor associated with the second near-memory compute node.
FANG shows nodes which are illustrated to include a NetDAM component and a CPU (FANG, e.g., Fig 2) and further describes inter and intra-host access (FANG, e.g., Fig 1). In other words, requests may come locally or remotely from a limited number of types of components (e.g., CPU, accelerator, or storage). The examiner takes official notice of the fact that memory requests are commonly issued by a CPU. Additionally, the examiner notes that every processor in the system described by FANG, including the processor of the first node, is associated in some context with the second near-memory compute node. It would have been obvious to a person of ordinary skill in the art, prior to the effective filing date of the invention, to modify the system of FANG such that the one or more memory access requests are received from a second processor associated with the second near-memory compute node because it is a notoriously well-known and common method of accessing same-node memory.
Allowable Subject Matter
Claims 21 and 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant's arguments filed 12NOV2025 have been fully considered but they are not persuasive.
The applicant argues that the cited art fails to describe receiving one or more memory access requests addressed to a memory address and responsive to the memory address being within a memory address range, trigger the PIM …” However, given that the claims are directed to a non-specific memory address and a non-specific memory address range, the examiner respectfully disagrees. As noted in the updated rejections, above, FANG shows accesses to local memory which requires them to be addressed to at least a memory address (see, e.g., Figs 7 and 8). Additionally, NetDAM packets include a memory address (see, e.g., Fig 3). The examiner further notes that FANG further describes wherein each NetDAM device has its own memory address space (see, e.g., §2.5). Therefore, valid operations within a NetDAM node must include addresses that are part of the valid address range (i.e., a memory address range) of the NetDAM device. For at least these reasons the applicant’s argument is considered not persuasive.
The applicant additionally argues that the cited art fails to describe wherein the triggering of the reduce-scatter operation is responsive to one or more of the memory access requests including an indication of a memory request type. The examiner notes that the very first step of the NetDAM based reduce scatter is to fetch A1 directly from DRAM on Node 1 and send it to Node 2 (see, e.g., §3.1). Again, as long as the request type is valid and requires any memory access then it is considered to teach the argued limitation because memory accesses are required as part of the reduce-scatter operation and, therefore, their absence would result in an absent or improper memory access.
The remaining arguments are directed towards new limitations which are considered fully addressed in the updated rejections provided above.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Gary W Cygiel whose telephone number is (571)270-1170. The examiner can normally be reached Monday - Thursday 11am-3pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Arpan P Savla can be reached at (571) 272-1077. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Gary W. Cygiel/Primary Examiner, Art Unit 2137