DETAILED ACTION
This action is responsive to the Application filed 4/17/2024.
Accordingly, claims 1-20 are submitted for prosecution on merits.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim(s) 1 is/are directed to Abstract Idea. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because of the 2-step analysis as following.
Step I:
Claim 1 is directed to an apparatus category.
Step IIA
Prong One: the steps recited as marking (a portion of the source code) can be viewed as activity that can be performed by a human mind or using pen and paper. MPEP 2106.04(a)(2) - similar to identifying and categorizing data similar to rendering opinions, re-arrangement, performing judgement or observations by a human via a generic computer or via use of pen/paper. This activity is directed to a Judicial Exception of a Abstract Idea type. The mention of “compiling” and “executing on a processor” do not change the nature of the identification via “marking” and does not necessarily transition of the claim out of “mental process” category.
Prong Two: the “compiler executing” and the “offloading” (of code portion) can be viewed respectively as pre-activity to collect or prepare data for use by the mental process of “marking” and as a post-activity that makes use of data/result from the mental process via use of a generic computer. The activity of preparing or pre-converting data and dispatching mentally processed data amount to mere extra-activities of insignificant impact to a field of computer technology, or well-understood routines – see MPEP 2106 (d)(1) - which are not viewed as actually transforming a computer technology into a improvement in this field or inventive state thereof; e.g. Kalamatianos discloses claim fails to recite how compiler effecting a marking is performed in a way to improve over internals of a computer operation. The compiler concept is treated as a mere tool or stage with which the mental activities are being realized. The extra-activities of compiling and offloading thus fail to integrate the Judicial Exception into a practical application
Step II-B:
The additional elements such as “processor”, “compiler executing” recited in a high level of generality do not amount to “significantly more” than the Abstract Idea because simply applying an abstract Idea using a conventional computer or a generic mention of a compiler is insufficient - MPEP 2106.05(a). The element recited as “offloading … portion … based on the marking” is considered a post-activity or post-solution of insignificance, since once the abstract Idea of marking has been made, the act of sending that mental process data into a destination (PIM units) is viewed as routine or conventional consequence to the mental process – MPEP 2106.05(f) - as this offloading being purely dependent on the “marking” does not provide a “inventive concept” to the Judicial Exception. That is, the additional elements are identified cannot amount to significantly more than the Judicial Exception found in step IIA.
Claim 1 amounts to nothing more that using a conventional compiler environment to perform a mental act of selecting code/data for a destination. Claim 1 is directed to a non-statutory subject matter.
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim(s) 1 is/are directed to Abstract Idea. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because of the 2 step analysis as following.
Step I:
The claim is directed to a process/method as statutory category.
Step II A
Prong One:
The step of ‘generating a Read dependence graph’ (representing one or more chains of dependent elements” and establishing ‘memory capacity … being greater than or equal to … number of elements represented by a longest chain” are acts that can be practically performed in the human mind or with the aid of pen and paper. MPEP 2106.04(a) - A human programmer can look at a source code and manually trace the dependencies between data structures (“chains”) and determine the length of those structure; e.g. a human can compare length to a known memory threshold (or “bank capacity”). As the above ‘generating’ and establishing memory capacity are viewed as concepts of organizing data and performing comparison, they fall within the Abstract Idea grouping of Mental Processes.
Prong Two:
The claim as recited does not provide a “technical improvement’ to a computer itself under MPEP 2106.04(d)(1). The step of “compiling” (portion of source code) is mere insignificant extra-solution activity, whereas compiling is a well-understood routine function of any generic system that serves as a stage for the Abstract dependency analysis. The step of “offloading” (portion of code for execution) as an extra-activity or post-solution of insignificant impact cannot be viewed as capable of transforming the Abstract Idea into a patent-eligible Application, notably when the claim does not recite a specific change to the PIM hardware or specific improvement in the way the PIM processes data. The claim is directed to an Abstract Idea (logical dependence comparison) rather than depicting a clear focus on a technical constraint associated with a HW (PIM processing); hence the elements of the claim cannot integrate the Abstract Idea into a Practical Application
Step IIB:
The “PIM units” and the “memory banks” are described in a most generic sense.
The tracing of dependencies data during compilation is fundamental and well-understood practice in computer arts, whereas use of “memory threshold” is a routine in resource management. Looking at the elements recited as “compiling” and “offloading” of portion of code that is being traced via pen/paper as chains of dependencies, the claim as a whole simply invokes action sequence such as to take source code, find dependencies (Abstracted part) and if the data fits (conventional routine), send it to a processor, which is overall a conventional practice and this does not amount to “significantly more” than the Abstract Idea itself.
Claim 11 is deemed non-eligible under the 35 USC § 101 statute.
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claim(s) 1 is/are directed to Abstract Idea. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because of the 2-step analysis as following
Step I:
Claim 18 is directed to a system category.
Step IIA
Prong One: The actinon of computing a metric in capturing an amount of data duplication can be seen as a mathematical exercise that quantifies a relationship between data sets and under MPEP 2106.04(a)(1) the is a mathematical concept
Further, the computing of a metric and comparing that a metric with a threshold to decide whether to offload are acts that can be practically performed in the human mind or via use of pen/paper, in that, a human can inspect a portion of code, count duplicate data entries an apply a logical “if-then” rule to determine destination of that portion.
Prong Two:
The step of “compiling” is a well-understood routine that pre-processes data to prepare them for an abstract calculation; whereas the step of “offloading … to PIM units based on the duplication metric” is a mere post-solution activity, the latter being a generic function which cannot transform the Abstract Idea significantly. The claim does not describe how internal operation of the PIM units and memory banks is altered or improved. Instead, the claim is – under Electric Power Group LLC vs Alstom SA - about collection information, analyzing it and displaying or routing the results which is clearly not a improvement to computer functionality; hence the claimed compiling and offloading cannot integrate the Abstract Idea of Step IIA into a practical application. MPEP 2106.04(d)(1)
Step IIB
The components recited as PIM units and memory banks are recited with a high level of generality; whereas identifying data duplication is a longstanding, conventional practice in computer science. Viewed as a whole, the compiling as a standard code preparation, the abstract calculation of a metric (mathematical concepts) and the conventional routing decision (i.e. offloading) without specific details cannot in combination add anything extraordinary beyond the Abstract Idea itself, as the claim only instructs the user to apply the abstract Idea of duplication analysis within the well-known environment of PIM architecture without explicit showing how the PIM units handle memory or execute instructions that are not already inherent in their conventional design.
Claim 18 is therefore rejected as being directed to a non-eligible subject matter under the 35 USC § 101 statute.
Analysis under step IIB of the dependent claims.
Claim 2: the generating of dependence graph as a conventional routine to organize data
cannot add significantly more to the Abstract Idea of claim 1.
Claim 3: describes loop specificity associated with the dependency graph hence cannot be viewed as internal computer-based improvement over the Abstract Idea of claim 1.
Claims 4 and 12: describes generating first data structures, second data structures and linked data structures, which can be viewed as human arrangement of data (source code) by way of a mental process or via use of pen and paper.
Claim 5: describes marking based memory capacity with establishing a size comparison, thus cannot add significantly more to the Abstract Idea itself as raised by Step IIA
Claims 6 and 13: describe basis of marking via a metric, hence a marking cannot add significantly more to the “marking” of code portion in the base claim.
Claims 7 and 20: describe duplication metric based on a comparison of elements represented on a dependence graph, hence does not add significantly more to the Abstract Idea of “marking” from the base claim.
Claim 8: describes basis of marking in terms of comparing graph elements and memory elements; this cannot be seen a transformation to a computer technical field but rather as one more sub-functionality of a Abstract Idea.
Claim 9: recites marking use of a maximum number as basis; hence cannot be construed as adding significantly more to the Abstract Idea of “marking”
Claim 10: describe marking of code portion and executing the portion, which can be respectively viewed as a variety of a human process and post-activity using the result from that process.
Claim 14: describes how a duplication metric is being based on from the dependence graph; but using a metric as a means can be viewed as part of a Abstract Idea, thus does not add significantly more to the Judicial Exception of base claim 11.
Claim 15: describes generating dependence graph and verifying absence of cycles, which in all amount to activities than can be performed by a human mind or via pen and paper.
Claim 16: describes what a cycle amounts to and this cannot be seen as integrating the Judicial Exception of claim 15 into a Practical Application.
Claim 17: describes verifying by comparing memory banks size with number of elements acquired from the dependence graph; but verifying from size comparison does not add significantly more to the abstract idea of claim 11.
Claim 19 describes the same generating steps of claim 4; hence cannot render the abstract idea of the base claim significantly more than itself.
The dependent claims in all fail to make the Judicial Exception of their base claims significantly more than itself (a Abstract Idea status).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1, 10 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kalamatianos et al, WO 2023043711, 3-23-2023, 33 pgs (herein Kalamatianos)
As per claim 1, Kalamatianos discloses a device, comprising: a memory that includes one or more processing-in-memory units (units 150 – para 0030; PIM execution unit – Fig. 2); a processor core (host processor 132 – para 0030; Fig. 3A, 3B); and a compiler (para 0036, 0064-0065 – Note1: a work scheduler from a host computer – Fig. 5 - working with reservation of register through static analysis of a compiler – para 0065-0066 - and mapping command buffer with allocations destined for dispatch by the scheduler reads on compiler executing on the host environment in support for reserving resources for dispatch to a PIM execution) executing on the processor core, the compiler causing the processor core to perform operations including:
compiling source code (compiling – para 0064, 0067) of a software program;
during the compiling (see Note1), marking a portion (by tracking command buffer indices as invalid – para 0066; set of offloaded PIM instructions is marked by two special commands … start command and end command – para 0036) of the source code as suitable (instructions that will be written to the command buffer – para 0065) for execution using the one or more processing-in-memory units (indices in the command buffer that will be required to offload – par 0065; initiate an Offload of … PIM instructions to a PIM device 410 - Fig. 6) ; and
offloading the portion of the source code (workload on the processor cores … alleviated by offloading … a PIM device – para 0025; for offloading PIM instructions for execution by the PIM … units 150 – para 0034; completed offloading the PIM instructions – para 0035) for execution by the one or more processing-in-memory units (PIM execution units – para 0030) based on the marking (para 0036, 0066).
As per claim 10, Kalamatianos discloses device of claim 1, the operations further including:
during the compiling, marking an additional portion of the source code as not
suitable (see below) for execution using the one or more processing-in-memory units; and
executing the portion of the source code (set of PIM instructions … amount of space (indices … required to offload the operations – para 0065) based on the portion of the source code
being marked as not suitable (command set of PIM instructions … and marking those indices as invalid – para 0066) for execution using the one or more processing-in-memory units.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2-3 is/are rejected under § 35 U.S.C. 103 as being unpatentable over Kalamatianos et al, WO 2023043711, 3-23-2023, 33 pgs (herein Kalamatianos) in view of Bertacco et al, USPubN: 2022/0019545 (herein Bertacco).
As per claims 2-3, Kalamatianos does not explicitly disclose device of claim 1, the operations further including generating a data dependence graph based on the portion of the source code, wherein the marking is based on an absence of cycles in the data dependence graph that include at least one loop-
carried true dependency;
wherein a cycle includes the at least one loop-carried true dependency based on a read access that is performed during a subsequent iteration of the cycle being dependent on a write access that is performed during a previous iteration of the cycle.
Similar to implement off-chip memory via using offloading to Processing-in-Memory execution per Kalamatianos, Bertacco discloses a GraphPIM-based approach to offload (para 0030, 0035) all atomic operations to an off-chip memory using a ranking algorithm (Fig.1-2 ) of a graph analytics that assesses vertex data of the graph (para 0029) to promulgate their data storing with emphasis on using cache locality provided with the off-chip memory (para 0030) as part of bypassing the cache coherence (para 0029) pressure from host CPU core; where each off-chip memory includes an atomic compute unit, a memory and a controller so that the atomic operations are handled by that compute unit using the respective memory module local to the off-chip memory, contributing to a high throughput typical to Processing-in-Memory execution (para 0003-0004) while minimizing read/write traffic with the host core; i.e. the graph-based vertex ranking to promulgate atomic operations at level of the off-chip memory as part of the cache coherence bypassing (para 0036) resulting in alleviating high-performance cost (para 0031), reducing cache pollution and access latency that would otherwise exist between off-chip memory and the main core (para 0037)
Hence, analyzing code for maximizing atomic operations that can be carried out solely at the level of off-chip memory entails scanning of data graph operations in heeding presence of possible backward branch/loop by which a write to cache has to be effectuated before a newly stored data can be read again in the next iteration, said read-after-write effect typical to a cache coherence traffic to observe and being required from the host core. Thus, analyzing a data graph where the marking is based on an absence of cycles in the data dependence graph in form of at least one loop-
carried true dependency (cache coherence read-after-write cycle) is recognized, in the sense that said loop-carried true dependency is based on a read access that is performed during a subsequent iteration of the cycle being dependent on a write access that is performed during a previous iteration of the cycle, which is defining the very backward traffic that performs read-after-write sequence underlying the cache coherence requirement.
Therefore, based on code marking with intent to employ local cache (para 001-0002) proximal to a local processing unit using memory banks allocated to respective PIM execution units (para 0030-0031) in Kalamatianos to alleviate additional traffic with the host core for updating its memory, it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to implement analysis of code with data graph analytics – as in Bertacco via a GraphPIM associated algorithm that processes magnitude of data storing at the graph vertices - such that data dependence from traversing the graph nodes would enable marking as to where atomic operations (considered for offload) can be sequentially performed and where a cycle in data dependence graph would reveal at least one loop-carried true dependency, in the sense that said loop-carried true dependency is caused a read access that is performed during a subsequent iteration of the cycle being dependent on a write access that is performed during a previous iteration of the cycle, as a dependency typical to a read-after-write underlying observance to a cache coherence policy which the GraphPIM algorithm in Bertacco endeavors to mitigate or bypass to maximize throughput of a PIM execution; because
marking code via a data graph analysis would enable identifying and consolidating the longest stretch of atomic operations that can be distributed as a long non-interrupted flow using local processing unit and local memory bank as intended with use of the offloading in Kalamatianos and observance of any interruption thereof via detection of presence of loop-carried true dependency would enable where offloading of instructions to a PIM execution should be adjusted, halted and otherwise deferred back to handling by the host core in accordance with the conventional Von Neuman approach, and so, in the sense that a flow of atomic operations at the level of local memory and PIM units – as set forth in Bertacco - when disrupted by a read-and-write back loop associated with cache coherence not only can hurt the desired off-memory throughput intended with a PIM execution offloading, but might also cause added traffic or latency that in turn would cause unexpected cache pollution, jeopardize performance outcome of the offloaded execution approach, when in fact the performance cost for this cache writeback can better off be resolved with a direct handover to the host core for this operation to be carried out in conventional approach.
Claims 4-5, 8-9, 11-12, 15-17 is/are rejected under § 35 U.S.C. 103 as being unpatentable over Kalamatianos et al, WO 2023043711, 3-23-2023, 33 pgs (herein Kalamatianos) in view of Bertacco et al, USPubN: 2022/0019545 (herein Bertacco) further in view of Lin et al, WO 2017076296 (translation), 05-11-2017, 17 pgs (herein Lin), Chang et al, USPubN: 2023/0119291 (herein Chang) and Alsop et al, CN 117063155(translation) 11-14-2023, 11 pgs (herein Alsop)
As per claims 4-5, Kalamatianos device of claim 1, wherein the portion of the source code accesses a first data structure and a second data structure (see reading/writing – para 0030 ),
Kalamatianos does not explicitly disclose the operations further including:
(i)generating a first read dependence graph representing one or more first chains of
dependent elements of the first data structure based on the portion of the source code;
generating a second read dependence graph representing one or more second chains of dependent elements of the second data structure based on the portion of the source code; and
generating a linked read dependence graph representing one or more linked chains of dependent elements by linking the one or more first chains with the one or more second chains based on the portion of the source code; wherein
(ii) the marking is based on a memory capacity of a first number of banks communicatively coupled to respective ones of the one or more processing-in-memory units being greater than or equal to an amount of the memory to store a second number of elements represented by a longest chain of the one or more linked chains.
As for (i)
Implementation of a reduce map using traversal of a graph data is shown in Lin’s graph processing and map simplification, whereby a Reduce phase processes the input data and intermediate calculation result, obtain simplified message thereof through a shuffle phase during which the intermediate results is/are taken out of the storage media (pg. 4). Hence, processing data result from processing data node of a data graph and reduce it by eliminating intermediate result from the node-edge flow is recognized.
Bertacco discloses identification of atomic operations for forming a off-loadable sequence into a in-memory execution by way of a GraphPIM analysis, with emphasis on bypass or averting a cache coherence (para 0029-0030) required under Von Neuman architecture that causes additional delay/traffic not favorable to performance throughput (para 0003-0004) by the in-memory offloading approach, the identification of off-loadable instructions based on algorithm traversing a compiler time data graph and ranking (Fig. 1-2; para 0007) stored content of vertices to determine the nodes in terms of weight for offloading, where each offloaded atomic operation can involve a start address for read and a final address for write (para 0003)
Hence, based on the teaching by Lin, consideration of what is being finally collected into a node without consideration of intermediate operation leading to the final stored value in the node entails determining by the compiler to the effect of linking sequences of more than two atomic operations sequence into a compacted representation of successive reads so to form merged chain of atomic operations considered best candidate for offloading (without complying to a cache write-back policy) in which result from intermediate steps (i.e. edge computation) such as non-final values can be slighted (taken off) by effect of compaction by a compiler algorithm traversing a data flow graph and weighting the final content in its vertices. That is, generating a linked read dependence graph in terms of one or more linked chains of dependent elements by linking the one or more first chains ( first sequence of atomic read) with the one or more second chains (second sequence of atomic read) based on the portion of the source code represented on the DAG is recognized.
As for (ii),
Chang discloses a high-bandwidth memory (HBM) system operating under a FIM controller mode via Function-In-HBM control logic to coordinate which atomic operations (para 0007-0008) can be included for transferring from a (GPU) application to the HBM/FIM environment (Fig. 2)- e.g. using a scratchpad sector local at each RAM of the HBM environment to store result of a load/store instruction (para 0029-0030) of a ALU operation; where, according to which, a GPU control associated with the transfer effectuates analysis over compiler originated commands from the GPU source to determine source and destination memory addresses of the HBM RAM as to whether a GPU candidate instruction (para 0034) can be suited as FIM (Function-in-HBM) instruction or else, as non-FIM instruction.
Hence, effect of a controller associated with offloading atomic operations for execution via fast-memory environment to determine whether an operation can be FIM appropriate or denied as non-FIM ready via matching range of memory addresses from the HBM with scope of a candidate atomic operation is recognized.
Further, Alsop discloses offloading operations from a host processor onto a fast, high bandwidth memory or PIM execution environment (pg. 3) where instructions destined for PIM units must be mapped to the actual hardware or fall into the same memory partition of the PIM environment (pg .6) in terms of address-to-physical memory, or else addressing error can occur (pg. 3), the consecutive instructions from the processor cores should be each time mapped to the same offload target device (PIM module) as part of the divergence detection associated with declaration of unload operations destined for the offloading (pg. 7-8)
Therefore, based on marking in Kalamatianos in accordance with identifying a start and an end of a PIM command (para 0036) to be spanned with the maximum capacity range of registers, it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to implement determination of code portion as candidate for their PIM offloading in Kalamatianos so that operations associated with the compiler marking includes
(1) generating a linked read dependence graph representing one or more linked chains of dependent elements by linking the one or more first chains with the one or more second chains based on the portion of the source code – as in Bertacco – where each generated first chain and second chain represents a first read dependence graph composed of one or more first chains of dependent elements of the first data structure based on the portion of the source code; and
second read dependence graph composed of one or more second chains of dependent elements of the second data structure based on the portion of the source code, each chain representing a atomic operation chaining as set forth in Bertacco, each with start address for read and a final address for write where disruption thereof by a write-back for cache coherence operation is being bypassed;
such that
(2) the ensuring effect of marking is based on a memory capacity of a first number of banks communicatively coupled to respective ones of the one or more processing-in-memory units being greater than or equal to an amount of the memory- as shown in Chang and Alsop - to store a second number of elements represented by a longest chain of the one or more linked chains being realized from the linking of atomic operations in Bertacco compiler approach that averts cache write-back traffic and and Lin map reduction that remove intermediate results from a data graph; because
Use of a data graph to afford a compiler to mark simple atomic operations typical to a PIM execution offloading mode in which these simple operations can be achieved as locally stored data without necessitating of host CPU to consolidate a final result and/or coordinate the result with other cores established upon a cache coherence policy as set forth above, in that the compiler would be able to determine which stretch of atomic operation (read and write) can be linked into a longest contiguous sequence in accordance to which, only the final content resulting from individual internal stage operation (like a load, read, store) is read into a next stage of the sequence without any latency caused by additional traffic awaiting intervention outside of the local context of the PIM memory would have for effect to avoid cache pollution, averting complexity/latency in relying upon a CPU layer to synchronize accesses by other runtime entities contending for a value that cannot be timely established as final and proper read, thus boosting higher throughput for a type of data read using spatial locality of the PIM memory banks, notably when scope of address range of the instruction to handle under a PIM environment (e.g. a longer stretch of atomic operations) is being pre-mapped at compiler time to the actual hardware/physical capacity of memory provision as set forth above in Alsop and Chang in terms of matching of a desired offload code or contiguous instructions range with at least a larger or equal memory size by each bank in the PIM side, rendering throughput of the offloaded execution largely enhanced with minimized delay caused by traffic and interaction with the host core.
As per claim 8, Kalamatianos does not explicitly disclose device of claim 4, wherein the marking is based on a first number of rows in a second number of banks communicatively coupled to respective ones of the one or more processing-in-memory units being greater than or equal to a maximum number of interacting elements of the linked read dependence graph.
But marking of offload code in consideration of what code size requires for proper local realization of result to be obtained within a local bank location under a PIM execution mode entails consideration of spatial and locality code marking by the compiler in regard to correlating a number of rows in a corresponding number of banks in relation to their being communicatively coupled to respective ones of the one or more processing-in-memory units, the latter enlisted for executing elements of the linked read dependence graph set forth per obviousness rationale in claim 4; hence the marking of rows of respective memory banks associated with a corresponding PIM unit as capacity deemed proper to carry out a PIM execution of a range of contiguous, acyclic operations in the context of observing spatial mapping to the PIM hardware and locality between processing unit and its memory would have been recognized for obvious reasons set forth in the rationale of claims 4-6
Thus, basis for a marking to the effect of identifying a first number of rows in a second number of banks that are communicatively coupled to respective ones of the one or more processing-in-memory units, the number of rows being greater than or equal to a maximum number of interacting elements of the linked read dependence graph would have been obvious for the same reasons set forth with rationale address claim 5 as set forth using obviousness of the linked read dependence graph of claim 4.
As per claim 9, Kalamatianos does not explicitly disclose device of claim 8, wherein the maximum number of interacting elements includes an element of the linked read dependence graph and one or more elements directly connected to the element in the linked read dependence graph, the element having a highest number of elements directly connected thereto in the linked read dependence graph.
But joining elements of a dependency context from traversing a PIM graph per effect of generating chain of read instructions either as first graph of dependent elements and second graph of dependent elements so to have them linked into a highest number of elements connected under a linked read dependence graph to be marked for submission into a offloaded PIM execution has been addressed as obvious using the teachings by Bertacco and Map reduction of data graph in Lin, in accordance to obviousness of claim 4 from above.
Thus, generating a linked read dependence graph per a compiler marking associated with traversing a data graph so that identification of maximum number of interacting elements takes into account of an element of the linked read dependence graph in conjunction with one or more elements directly connected to the element in the linked read dependence graph, for the element to have highest number of elements directly connected thereto in the linked read dependence graph would have been obvious for the same reasons set forth with the rejection of claim 4 in view of the intended spatial mapping raised as obvious in claim 5.
As per claim 11, Kalamatianos discloses a method, comprising:
compiling a portion of source code (refer to claim 1) of a software program;
during the compiling, generating a read dependence graph representing one or
more chains of dependent elements of one or more data structures accessed by the
portion of the source code; and
offloading the portion of the source code for execution by one or more
processing-in-memory units based on a memory capacity of a number of banks
communicatively coupled to respective ones of the one or more processing-in-memory
units being greater than or equal to an amount of memory to store a number of elements
represented by a longest chain of the one or more chains.
(all of which having been addressed in claim 4-5)
As per claim 12, Kalamatianos discloses method of claim 11, wherein the portion of the source code accesses a first data structure and a second data structure, wherein generating the read dependence graph includes:
generating a first read dependence graph representing one or more first chains of
dependent elements of the first data structure based on the portion of the source code;
generating a second read dependence graph representing one or more second
chains of dependent elements of the second data structure based on the portion of the
source code; and
generating the read dependence graph representing the one or more chains of
dependent elements by linking the one or more first chains with the one or more second
chains based on the portion of the source code.
(all of which having been addressed in the rejection of claim 4)
As per claims 15-16, Kalamatianos does not explicitly disclose method of claim 11, further comprising:
generating a data dependence graph based on the portion of the source code; and verifying an absence of cycles in the data dependence graph that include at least one loop-carried true dependency, wherein offloading the portion of the source code is further based on the verifying.
wherein a cycle includes the at least one loop-carried true dependency based on a read access that is performed during a subsequent iteration of the cycle being dependent on a write access that is performed during a previous iteration of the cycle.
But formation of chains of atomic operations with intent to bypass or avert a cache coherence operation obliging a read-after-write associated with cache of a host CPU, which considered a loop-carried true dependency being a deterrent to a smooth sequential realization of PIM execution pipeline in which the longest stretch of atomic operations (read-modify-write) can be realized via proximity of PIM units with their respective memory bank without a backward dependency on a host CPU has been shown in the teaching by Bertacco as set forth in rationale of claim 4.
Thus, implementing a compiler effect to heed presence of this write-back loop dependency in association with generating a data dependence graph based on the portion of the source code, whereby verifying an absence of cycles in the data dependence graph would detect or heed presence at least one loop-carried true dependency so to enable offloading the portion of the source code based on the verifying would have been obvious for the same reasons set forth in claim 4 with the cache coherence bypassing effect by Bertacco’s GraphPIM and marking approach.
As per claim 17, Kalamatianos discloses method of claim 11, further comprising verifying that an additional number of rows in the number of banks is greater than or equal to a maximum number of elements that are operated on together in a single computation of the portion of the source code based on the read dependence graph, wherein offloading the portion of the source code is further based on the verifying.
(Refer to rationale of claim 5 and claim 8)
Allowable Subject Matter
Claims 6-7 are objected to as being dependent upon a rejected base claim, but would be allowable (pending resolution of any pending rejection to the base claims) if rewritten in independent form including all of the limitations of the base claim and any intervening claims, the objected to subject matter including:
(claims 6-7), device of claim 4, wherein the marking is based on a duplication metric falling below a threshold, the duplication metric capturing an amount of data duplication in the memory to execute the portion of the source code using the one or more processing-in-memory units;
the operations further including computing the duplication metric based on a comparison of a first number of elements, including duplicated elements, represented by the linked read dependence graph to a second number of unique elements represented by the linked read dependence graph
Claims 13-14 are objected to as being dependent upon a rejected base claim, but would be allowable (pending resolution of any pending rejection to the base claims) if rewritten in independent form including all of the limitations of the base claim and any intervening claims, the objected to subject matter including:
(claims 13-14), method of claim 11, further comprising computing a duplication metric capturing an amount of data duplication in the memory to execute the portion of the source code using the one or more processing-in-memory units, wherein offloading the portion of the source code is further based on the duplication metric falling below a threshold; wherein
the duplication metric is based on a comparison of a first number of elements, including duplicated elements, represented by the read dependence graph to a second number of unique elements represented by the read dependence graph.
Claim 18 is/are allowed over the prior art (pending resolution of any outstanding rejection to these claims) along with its dependent claims.
(claim 18) a system, comprising a memory that includes one or more processing-in-memory units; and a processor core to perform operations including:
compiling a portion of source code of a software program;
during the compiling, computing a duplication metric capturing an amount of data duplication in the memory to execute the portion of the source code using the one or more processing-in-memory units; and
offloading the portion of the source code for execution by the one or more processing-in-memory units based on the duplication metric falling below a threshold.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tuan A Vu whose telephone number is (571) 272-3735. The examiner can normally be reached on 8AM-4:30PM/Mon-Fri.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Chat Do can be reached on (571)272-3721.
The fax phone number for the organization where this application or proceeding is assigned is (571) 273-3735 ( for non-official correspondence - please consult Examiner before using) or 71-273-8300 ( for official correspondence) or redirected to customer service at 571-272-3609. Any inquiry of a general nature or relating to the status of this application should be directed to the TC 2100 Group receptionist: 571-272-2100.
/Tuan A Vu/
Primary Examiner, Art Unit 2193
March 20, 2026