Last updated: April 19, 2026
Application No. 17/474,372
SUPPORTING PROCESSING-IN-MEMORY EXECUTION IN A MULTIPROCESSING ENVIRONMENT

Final Rejection §103
Filed
Sep 14, 2021
Examiner
LIN, HSING CHUN
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Advanced Micro Devices, Inc.
OA Round
6 (Final)
Interview Optional

— +79.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 108 resolved cases, 2023–2026
Examiner Intelligence

LIN, HSING CHUN View full profile →
Grants 59% of resolved cases
Career Allow Rate
64 granted / 108 resolved
+4.3% vs TC avg
Strong +80% interview lift
Without
With
+79.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
37 currently pending
Career history
145
Total Applications
across all art units
Statute-Specific Performance

§101
17.1%
-22.9% vs TC avg
§103
35.8%
-4.2% vs TC avg
§102
6.5%
-33.5% vs TC avg
§112
34.0%
-6.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases
Office Action

§103
LDETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3, 5-9, 11, 13-17, 19 and 21-26 are pending in this application.

Response to Arguments
Applicant’s arguments regarding the rejections of claims 1, 3-9, 11-17, and 19-23 under 35 U.S.C. 112b have been fully considered and are persuasive. 

Applicant's arguments regarding the 35 U.S.C. 101 rejections of claims 1, 3-9, 11-17, and 19-23  have been fully considered and they are persuasive.

Applicant's arguments regarding the 35 U.S.C. 103 rejections of claims 1, 3-9, 11-17, and 19-23 have been fully considered but they are unpersuasive. 
Regarding the 35 U.S.C. 103 rejection, the applicant argues the following in the remarks:
However, the cited portions of Lee discuss individual PIM commands, such as "mov A[i], pim_r0". None of these individual PIM commands are associated with a plurality of PIM instructions, and none of these individual PIM commands indicate PIM resource requirements of such a plurality of PIM instructions. Rather, these individual PIM commands identify a single action (e.g., "mov"), and a single PIM register (e.g., "pim_r0"). Lee is silent regarding a separate command that is associated with a plurality of such individual PIM commands, and that indicates PIM resource requirements of the plurality.
However, like the Hall and Lee references, Kogge also fails to teach or suggest a command associated with a plurality of PIM instructions that indicates PIM resource requirements of the plurality of PIM instructions.
Regarding claim 7, the Office Action alludes to the segment registers of Hall. However, the segment registers are used for mapping memory space allocated to the PIM node. As such, these registers are not resources that execute PIM instructions. Further, Hall does not suggest that segment registers can be allocated to a thread, nor does Hall suggest that a thread can specify how many segment registers it requires. The Office Action also alludes to the registers shown in FIG. 6 of Hall. While it might be interpreted that these registers are used by the PIM node, Hall still fails to suggest that a thread can specify how many registers it requires. Rather, Hall describes context switching instead of dividing registers into allocations. Thus, if a new PIM kernel requires registers, the previous register state is saved and all of the registers are initialized for the new PIM kernel. There is no determination as to how many registers are required. It is also noted that the PIM resource requirements (that specify a quantity of registers, as recited in claim 7) are indicated in a command that is associated with a plurality of PIM instructions. The cited references fail to teach or suggest such a command, and specifying a quantity of registers with such a command.
The Office Action indicated that Hall and Kogge disclose the subject matter of dependent claim 21 (See, Office Action at pages 34-37), but has failed to identify any disclosure in the cited references that mentions anything about a command that indicates a PIM register requirement specifying a quantity of PIM registers, or determining that the quantity of PIM registers specified by the command is less than or equal to a quantity of available PIM registers. 
Examiner has thoroughly considered Applicant' s arguments, but respectfully finds them unpersuasive for at least the following reasons:
As to point (a), the examiner respectfully disagrees. Lee recites in [0080] “To this end, the memory device may be preset so that commands for addresses of matrices A, B, and C are interpreted as PIM commands” and  [0084] “In response to a first read command “mov A[i], pim_r0” issued from a thread, the memory device reads data of the matrix A stored in a bank of a computing core corresponding to the thread and stores the read data in the register pim_r0 of a computing circuit of the computing core”. The command mov A[i], pim_r0 is associated with a plurality of PIM instructions, which are the PIM commands for address of matrices A, B, and C. The claims do not recite that the command has to be separate from the plurality of PIM instructions. 
As to point (b), the examiner respectfully disagrees. As explained above, the cited references teach this limitation. 
As to point (c), the examiner respectfully disagrees. Some of the limitations that Applicant argues are not taught by Hall are limitations that are not in dependent claim 7 or are limitations of claim 1 that are taught by Lee or Kogge references. Hall teaches a determination as to how many registers are required since it recites in Hall on pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 “host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions”, on pg. 109 2. 1 Address Translation for Locally Mapped Data paragraph 1 “To condense translation information, we use segments, each of which is defined by segment registers containing a physical base address and limit”, and on pg. 111 4 Memory Allocation: Virtual vs. Physical paragraph 2 “There are three phases to allocation: (1) host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; (2) physical allocation of an object and binding to reserved virtual segments; and, (3) mapping of existing global objects to a global segment for sharing between PIMs”. Allocating address spaces to reserve segment resisters indicates how many registers are required. As explained above, the cited references do teach a command that is associated with a plurality of PIM instructions.
As to point (d), the examiner respectfully disagrees. Hall teaches a command that indicates a PIM register requirement specifying a quantity of PIM registers since it recites in Hall on pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 “host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions”, on pg. 109 2. 1 Address Translation for Locally Mapped Data paragraph 1 “To condense translation information, we use segments, each of which is defined by segment registers containing a physical base address and limit”, and on pg. 111 4 Memory Allocation: Virtual vs. Physical paragraph 2 “There are three phases to allocation: (1) host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; (2) physical allocation of an object and binding to reserved virtual segments; and, (3) mapping of existing global objects to a global segment for sharing between PIMs”. Allocating address spaces to reserve segment resisters using a reserve function teaches a command that indicates a PIM register requirement specifying a quantity of PIM registers. Kogge teaches determining that the quantity of PIM registers specified by the command is less than or equal to a quantity of available PIM registers because it recites in Col. 17 line 66-Col. 18 line 1 “a threadlet, when it starts, does in fact have local register space equal to the maximum possible parcel size”, Col. 20 lines 18-19 “Threadlets may come in a variety of sizes, including at least 256, 512, and 2304 bits”, Col. 3 lines 15-18 “the term "parcel" refers to PArallel Communication ELement and is the packet of information that contains all the information needed to execute a threadlet”, and as shown in Fig. 1, the local memory is in the PIM nodes. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5-9, 11, 13-17, 19, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Hall et al.  (Memory Management in a PIM-Based Architecture hereinafter Hall), in view of Lee et al. (US 20220237041 A1 hereinafter Lee), and further in view of Kogge (US 7185150 B1).
Hall, Lee, and Kogge were cited in a previous office action.

As per claim 1, Hall teaches a device for supporting PIM (Processing-in-Memory) execution, the device comprising: a processor (pg. 110 3 Overview of Memory Management paragraph 1 On the host processor, the standard operating system (in DIVA, Linux) is augmented with functionality to support PIMs; pg. 113 4.2 Physical Memory Allocation paragraph 3 PIM applications perform); and 
a work scheduler coupled to the processor, wherein the work scheduler comprises logic circuitry configured to (Fig. 1: Host CPU, Memory Controller; pg. 111 paragraph 2 the host, which has a system-level view, remains a central figure in system-level scheduling, disk I/O operations, and memory management; Abstract paragraph 2 host processor with a main memory; pg. 117 paragraph 3 the host must attend to global changes in the PIM-based computation, i.e., scheduling functions):  
receive a command associated with a plurality of PIM instructions, the command issued, wherein the command indicates PIM resource requirements of the plurality of PIM instructions (Fig. 4: ReserveGlobalSegment (int numBytes, int virtualNode), ReserveLocalHeapSegment ReserveLocalCodeSegment ReserveLocalStackSegment (int numBytes, int virtualNode); pg. 113 paragraph 2 Similar functions exist to allocate the virtual address space for PIM-local code; pg. 111 4. 1 Virtual Memory Allocation paragraph 1 as part of the allocation process, we must reserve a contiguous chunk of the virtual address space for each segment prior to physical allocation. The virtual memory allocation is performed by the host using the Reserve functions for global and local segments. Because the virtual address space is quite large, these reservations should always strive to overestimate the space requirements of the segment; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 106 paragraph 4 The physical memory on each PIM chip is flexibly partitioned into these three distinct uses. Dumb memory is managed exclusively by the host operating system in standard ways, with address translation handled solely by the host processor's memory-management hardware. Figure 2 depicts the two more interesting uses of PIM memory, as part of the shared global address space, or as PIM local memory.); and   
store, in a data structure, a virtual allocation of one or more of the resources within the PIM device based on the PIM resource requirements indicated by the command (Fig. 4: ReserveGlobalSegment (int numBytes, int virtualNode), ReserveLocalHeapSegment ReserveLocalCodeSegment ReserveLocalStackSegment (int numBytes, int virtualNode); pg. 113 paragraph 2 Similar functions exist to allocate the virtual address space for PIM-local code; pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 111 4. 1 Virtual Memory Allocation paragraph 1 as part of the allocation process, we must reserve a contiguous chunk of the virtual address space for each segment prior to physical allocation. The virtual memory allocation is performed by the host using the Reserve functions for global and local segments. Because the virtual address space is quite large, these reservations should always strive to overestimate the space requirements of the segment; pg. 113 4.2 Physical Memory Allocation paragraph 3 PIM applications perform when accessing non-local memory; 4.1 Virtual Memory Allocation paragraph 1 Linux supports this reservation process by clustering free pages in the virtual address space together; reservations select a cluster that matches the requested size; pg. 106 paragraph 4 The physical memory on each PIM chip is flexibly partitioned into these three distinct uses. Dumb memory is managed exclusively by the host operating system in standard ways, with address translation handled solely by the host processor's memory-management hardware. Figure 2 depicts the two more interesting uses of PIM memory, as part of the shared global address space, or as PIM local memory).
 
Hall fails to teach a command associated with a plurality of PIM instructions, the command issued by a first thread of a plurality of threads concurrently executing on the processor, wherein the command indicates PIM resource requirements; transmit, to the first thread in dependence upon determining that sufficient resources within the PIM device are available based on the PIM resource requirements indicated by the command, a grant response indicating that access to the PIM device by the first thread is granted, wherein the first thread is configured to dispatch the plurality of PIM instructions to the PIM device after the grant response is received; and an allocation of one or more of the resources within the PIM device to the first thread.

However, Lee teaches a command associated with a plurality of PIM instructions, the command issued by a first thread of a plurality of threads concurrently executing on the processor, wherein the command indicates PIM resource requirements (Abstract The memory device including a plurality of computing cores each including a bank and a computing circuit. The memory device is configured to perform in-memory processing in one of the plurality of computing cores according to the PIM command; [0086] a write command “mov 0×0, C[i]” issued from the thread; [0080] To this end, the memory device may be preset so that commands for addresses of matrices A, B, and C are interpreted as PIM commands; [0084] In response to a first read command “mov A[i], pim_r0” issued from a thread, the memory device reads data of the matrix A stored in a bank of a computing core corresponding to the thread and stores the read data in the register pim_r0 of a computing circuit of the computing core; [0027] a thread created in the host 100; [0031] The host 100 operates according to software including an application program 10 and an operating system 20. [0033] During operations of the software, multiple threads can be created to process a given operation; [0018] The host 100 includes a central processing unit (CPU) 110); 
an allocation of one or more of the resources within the PIM device to the first thread (Abstract The memory device including a plurality of computing cores each including a bank and a computing circuit. The memory device is configured to perform in-memory processing in one of the plurality of computing cores according to the PIM command. The host allocates the plurality of computing cores to the plurality of threads, and PIM commands of each thread are processed using the computing core allocated to that thread).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hall with the teachings of Lee so that threads can be adapted to use in-memory processing (see Lee [0055] In this manner, in a host using a shared memory model, shared memory-based parallel program APIs such as OpenMP and Pthread can be adapted to use computing cores operating as a distributed memory; 0089] In addition, it is possible to easily reuse various program codes developed with conventional APIs for in-memory processing such as provided by the present invention.).

Hall and Lee fail to teach transmit, to the first thread in dependence upon determining that sufficient resources within the PIM device are available based on the PIM resource requirements indicated by the command, a grant response indicating that access to the PIM device by the first thread is granted, wherein the first thread is configured to dispatch the plurality of PIM instructions to the PIM device after the grant response is received.

However, Kogge teaches transmit, to the first thread in dependence upon determining that sufficient resources within the PIM device are available based on the PIM resource requirements indicated by the command, a grant response indicating that access to the PIM device by the first thread is granted (Fig. 1; Table on page 29 Row: V “Wait on Semaphore” (semaphore_adr, E[6]:return_adr, E[7]:free_space_adr); Claim 1 wherein said at least one first node and said at least one second node have at least one first memory and at least one second memory, respectively; a threadlet configured to cause a program to run in said computer system executed by said at least one first node when said at least one first memory is local to said threadlet; Claim 2 wherein said program requires access to a first memory location to run; claim 15 wherein said first node is on a PIM-enhanced memory chip; Col. 32 line 44-Col. 33 line 14 Threadlets may also support semaphore semantics in a relatively direct way…A P parcel has as an argument the address of the semaphore pair. When sent to the memory containing the semaphore, the P program will release the semaphore by accessing it and incrementing it atomically….A parcel of type V will try to grab the lock, and suspend itself until the lock is available. It has the same set of arguments as P, but is a bit more complex. It first atomically decrements the counter. If the result was non-negative, it returns to its sender (using whatever acknowledgement protocol is desired); Col. 16 lines 7-9 When a match occurs, the send threadlet will perform the memory transfer into the specified receiving buffer; Col. 3 lines 15-18 the term "parcel" refers to PArallel Communication ELement and is the packet of information that contains all the information needed to execute a threadlet; Col. 11 lines 41-57 Threadlets can help this situation in at least two ways. First, if it is crucial that operations be done in order at the destination location, and some extra storage can be allocated for ordering control, a "sequence number" can be appended to each parcel as an operand…After performing a designated memory operation, a threadlet could be programmed to return to the source of the parcel, and signal that the threadlet had successfully completed the threadlet's execution; Col. 17 line 66-Col. 18 line 1 a threadlet, when it starts, does in fact have local register space equal to the maximum possible parcel size; Col. 10 lines 39-67 The most basic of all memory operations is a simple load, access a specified operand and return the value to some other location, usually a CPU register or a cache entry. If the return location is in fact a memory location somewhere, the threadlet program becomes fairly simple: move to the memory that contains the data to be accessed, read the data word into the threadlet state, move to the memory representing the target, store the data from the threadlet, and quit. Such a threadlet requires two operands at the time it is launched: the address to be read and the address to receive the data…After completion the final store, a waiting thread (or threadlet), other than the active threadlet, may need to be notified that the data has arrived, and the threadlet completed its assigned task; Col. 5 lines 28-36 Such architecture, illustrated in FIG. 1, is clearly a good match for Processing-In-Memory (PIM) technology, where processing logic capable of executing such programs can be placed on a memory chip, next to a memory macro. Such an architecture is also a good match for massively parallel systems, where there are huge numbers of such PIM-enhanced memory chips, which may also include large numbers of "conventional" CPUs embedded throughout the memory; Col. 28 lines 30-32 data processing may even be done in some form of simultaneous multi-threading, such as done in PIM Lite; Col. 13 lines 21-22 a parcel in local storage (a cache line by a classical CPU or a wide word by a PIM Lite ISA)), wherein the first thread is configured to dispatch the plurality of PIM instructions to the PIM device after the grant response is received (Table 1: When execution finally moves beyond the MOVE, the threadlet is guaranteed that the location associated with the address descriptor in A is in fact in the local memory. If memory has been locked before the MOVE, the lock is released before the threadlet is moved; Col. 32 line 44-Col. 33 line 14 Threadlets may also support semaphore semantics in a relatively direct way…A P parcel has as an argument the address of the semaphore pair. When sent to the memory containing the semaphore, the P program will release the semaphore by accessing it and incrementing it atomically….A parcel of type V will try to grab the lock, and suspend itself until the lock is available. It has the same set of arguments as P, but is a bit more complex. It first atomically decrements the counter. If the result was non-negative, it returns to its sender (using whatever acknowledgement protocol is desired); Col. 17 lines 18-27 exporting the method invocations to the PIM node for execution there. In such cases, a one way transmission of the method name and the method arguments to the PIM node, followed by a one way return of method function value (if any), is all that is needed for the method. All the intermediate object accesses are low latency ones, and even if lock outs are needed, the duration of the lock out periods are greatly reduced. Implementation of these methods can thus be done in at least two ways with a PIGLET threadlet; Col. 15 line 62-Col. 16 line 9 the latter indicates where in a thread's local memory such data should be put if data is received…When a match occurs, the send threadlet will perform the memory transfer into the specified receiving buffer.). 

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hall and Lee with the teachings of Kogge since semaphores prevent deadlocks. 
	
As per claim 3, Hall, Lee, and Kogge teach the device of claim 1. Lee teaches wherein the logic circuitry is further configured to: receive a second command associated with a second plurality of PIM instructions, the second command issued by a second thread (Abstract The host includes a central processing unit configured to process processing in-memory (PIM) requests generated in a plurality of threads for in-memory processing; [0086] a write command “mov 0×0, C[i]” issued from the thread; [0085] a second read command “mov B[i], pim_r1” issued from the thread; [0067] 32 threads will be created in parallel).
Additionally, Kogge teaches queue, based on insufficient available resources of the PIM device, the second command until sufficient resources of the PIM device become available (Table on page 29 Row: V “Wait on Semaphore” (semaphore_adr, E[6]:return_adr, E[7]:free_space_adr); Col. 33 lines 8-22 A parcel of type V will try to grab the lock, and suspend itself until the lock is available. It has the same set of arguments as P, but is a bit more complex. It first atomically decrements the counter. If the result was non-negative, it returns to its sender (using whatever acknowledgement protocol is desired). If the result was negative, it atomically requests a block of storage from the free space pointer on the current node, and copies its own state into that node. It then checks the semaphore again, and if the semaphore is now free, it releases the storage and returns to sender. If it is still blocked, it atomically links the stored version onto the semaphore's queue, and quits. When this stored version awakens, by P, it will have been dequeued; Col. 10 lines 5-7 some hardware implementations will suspend the memory requests that find the word empty, and queue them up with the head of the queue; claim 15 wherein said first node is on a PIM-enhanced memory chip).


	
As per claim 5, it is a method claim of claim 1, so it is rejected for similar reasons. 

As per claim 6, Hall, Lee, and Kogge teach the method of claim 5. Hall teaches further comprising: receiving, by the work scheduler, a second command and freeing, by the work scheduler in response to receiving the second command, the resources of the PIM device that are virtually allocated by updating the data structure (Fig. 6; Fig. 4: GlobalFree free (void *existingObject); pg. 111 4 Memory Allocation: Virtual vs. Physical paragraphs 2-3 Deallocation (GlobalFree) frees physical memory but does not shrink the virtual-space allocation. The standard memory allocation functions malloc and free can be used on either the host or PIMs; pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment… When this context is restored, the host updates the segment
mappings as needed; pg. 115 6 Contexts and Swapping paragraph 3 The host operating system is responsible for creating contexts for the PIMs, and also for updating contexts in response to major system context switches; pg. 116 6.2 Swapping paragraph 1 Swapping is somewhat similar to paging; the primary distinctions are that the entire context is moved to the disk backing store, freeing all the process memory; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 113 paragraph 2 Similar functions exist to allocate the virtual address space for PIM-local code; pg. 117 paragraph 3 the host must attend to global changes in the PIM-based computation, i.e., scheduling functions).
	Additionally, Lee teaches receiving, by the work scheduler, a second command issued by the first thread (Abstract The host includes a central processing unit configured to process processing in-memory (PIM) requests generated in a plurality of threads for in-memory processing and a memory controller configured to generate a PIM command corresponding to the PIM request. The memory device including a plurality of computing cores each including a bank and a computing circuit. The memory device is configured to perform in-memory processing in one of the plurality of computing cores according to the PIM command. The host allocates the plurality of computing cores to the plurality of threads, and PIM commands of each thread are processed using the computing core allocated to that thread.).
	Additionally, Kogge teaches receiving, by the work scheduler, a second command issued by the first thread indicating that an offload of the plurality of PIM instructions has completed (Fig. 1; Col. 5 lines 28-36 Such architecture, illustrated in FIG. 1, is clearly a good match for Processing-In-Memory (PIM) technology, where processing logic capable of executing such programs can be placed on a memory chip, next to a memory macro. Such an architecture is also a good match for massively parallel systems, where there are huge numbers of such PIM-enhanced memory chips, which may also include large numbers of "conventional" CPUs embedded throughout the memory; Claim 1 wherein said at least one first node and said at least one second node have at least one first memory and at least one second memory, respectively; a threadlet configured to cause a program to run in said computer system executed by said at least one first node when said at least one first memory is local to said threadlet; Claim 2 wherein said program requires access to a first memory location to run; claim 15 wherein said first node is on a PIM-enhanced memory chip; Col. 11 lines 53-57 After performing a designated memory operation, a threadlet could be programmed to return to the source of the parcel, and signal that the threadlet had successfully completed the threadlet's execution; Col. 4 lines 38-40 a request for a memory operation to be sent from a source processing logic to some destination memory; Col. 22 lines 6-8 A new instruction is executed each time a threadlet is either awoken from a parcel or when it has completed a prior instruction; Col. 17 lines 18-28 exporting the method invocations to the PIM node for execution there. In such cases, a one way transmission of the method name and the method arguments to the PIM node, followed by a one way return of method function value…The threadlet code itself could be the entire method code); freeing the resources of the PIM device that are virtually allocated to the first thread (Table 1 The execution of this threadlet is halted, and all resources associated with it are freed; Table 1 thus the resources currently associated with the current threadlet can be released, perhaps for the new thread; Col. 18 lines 27-30 The physical storage is divided into contiguous (but not necessarily same size) pages, each of which may be mapped into one (or more) virtual address spaces; Col. 33 lines 36-42 The threadlets can work directly with embedded addresses, and "where" those addresses actually lie. A variety of mechanisms for responding to threadlets after an atomic memory operation are possible by separate programming. The processing of threadlets may be done with relatively simple logic at the memory interface (such as is made available by PIM technology).).


As per claim 7, Hall, Lee, and Kogge teach the method of claim 5. Hall teaches wherein storing the virtual allocation of the resources of the PIM device includes: reserving an allocation of PIM device registers based on the PIM resource requirements, wherein the PIM resource requirements specify a quantity of registers (Figs. 4 and 6; Fig. 4: ReserveGlobalSegment (int numBytes, int virtualNode), ReserveLocalHeapSegment ReserveLocalCodeSegment ReserveLocalStackSegment (int numBytes, int virtualNode); pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 109 2. 1 Address Translation for Locally Mapped Data paragraph 1 To condense translation information, we use segments, each of which is defined by segment registers containing a physical base address and limit; pg. 111 4 Memory Allocation: Virtual vs. Physical paragraph 2 There are three phases to allocation: (1) host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; (2) physical allocation of an object and binding to reserved virtual segments; and, (3) mapping of existing global objects to a global segment for sharing between PIMs; pg. 106 paragraph 4 The physical memory on each PIM chip is flexibly partitioned into these three distinct uses. Dumb memory is managed exclusively by the host operating system in standard ways, with address translation handled solely by the host processor's memory-management hardware. Figure 2 depicts the two more interesting uses of PIM memory, as part of the shared global address space, or as PIM local memory; pg. 109 paragraph 1 The local memory region is partitioned into eight segments at fixed virtual bases, for kernel code, stack and data, user code; pg. 115 6 Contexts and Swapping paragraph 2 program-specific segment registers; 4.1 Virtual Memory Allocation paragraph 1 Linux supports this reservation process by clustering free pages in the virtual address space together; reservations select a cluster that matches the requested size; on pg. 115 6 Contexts and Swapping paragraph 1 Switching between different threads in the same user program, such as when performing the command associated with an incoming parcel, requires modification to only two of the segment registers, but does require saving and restoring of portions of the register state; pg. 115 6 Contexts and Swapping paragraph 3 creating contexts for the PIMs, and also for updating contexts in response to major system context switches).

As per claim 8, Hall, Lee, and Kogge teach the method of claim 5. Hall teaches wherein storing the virtual allocation of the resources of the PIM device includes: reserving a portion of a command buffer in the PIM device based on the PIM resource requirements (Fig. 4; pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment; pg. 113 paragraph 2 Similar functions exist to allocate the virtual address space for PIM-local code; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 111 4.1 Virtual Memory Allocation paragraph 1 as part of the allocation process, we must reserve a contiguous chunk of the virtual address space for each segment prior to physical allocation. The virtual memory allocation is performed by the host using the Reserve functions for global and local segments. Because the virtual address space is quite large, these reservations should always strive to overestimate the space requirements of the segment; pg. 111 4.1 Virtual Memory Allocation paragraph 1 Linux supports this reservation process by clustering free pages in the virtual address space together; reservations select a cluster that matches the requested size;).

 As per claim 9, Hall, Lee, and Kogge teach the method of claim 5. Hall teaches wherein storing the virtual allocation of the resources of the PIM device includes: reserving a scratchpad allocation based on the PIM resource requirements (Fig. 4; pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment; pg. 113 paragraph 2 Similar functions exist to allocate the virtual address space for PIM-local code; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 111 4. 1 Virtual Memory Allocation paragraph 1 as part of the allocation process, we must reserve a contiguous chunk of the virtual address space for each segment prior to physical allocation. The virtual memory allocation is performed by the host using the Reserve functions for global and local segments. Because the virtual address space is quite large, these reservations should always strive to overestimate the space requirements of the segment; pg. 114 4.3 Mapping Existing Objects to PIM Global Segments paragraph 1 Like the GlobalMalloc to a remote node, it is sometimes desirable to temporarily map non-resident global data to facilitate sharing among PIMs.).

As per claim 11, it is a method claim of claim 3, so it is rejected for the same reasons. 

As per claim 13, Hall, Lee, and Kogge teach the method of claim 5. Lee teaches wherein the first thread dispatches the plurality of PIM instructions to a set of memory channels concurrently with at least one second thread dispatching other PIM instructions to that set of memory channels ([0045] a PIM command provided by the 0th thread can be associated with 0th channel and 0th bank according to the address; claim 3 wherein each of the plurality of threads is allocated a computing core among the plurality of computing cores according to a bank address and a channel address and generates a PIM request for a computing core allocated thereto; [0053] At times t0 and t2, a plurality of computing cores perform in-memory processing in parallel under the respective control of a plurality of corresponding threads.).

As per claim 14, Hall, Lee, and Kogge teach the method of claim 5. Lee teaches wherein the first thread dispatches the plurality of PIM instructions to a first partition of memory channels concurrently with at least one second thread dispatching other PIM instructions to a second partition of memory channels ([0045] a PIM command provided by the 0th thread can be associated with 0th channel and 0th bank according to the address; claim 3 wherein each of the plurality of threads is allocated a computing core among the plurality of computing cores according to a bank address and a channel address and generates a PIM request for a computing core allocated thereto; [0053] At times t0 and t2, a plurality of computing cores perform in-memory processing in parallel under the respective control of a plurality of corresponding threads.).

As per claim 15, Hall, Lee, and Kogge teach the method of claim 5. Hall teaches wherein storing the virtual allocation of the resources of the PIM device includes: mapping an index of an architectural register to an index of a physical register of the PIM device (pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment… As a result of physical memory allocations, the physical segment mappings are added to this structure; pg. 120 paragraph 2 The GlobalMalloc functions perform two distinct roles: allocating a block of physical memory and mapping a portion of the (previously reserved) global virtual address space to that physical memory; pg. 119 7.1 Large Global Segments paragraph 2 Since each PIM has multiple global segment registers, even a single global segment can be managed as multiple segments by having distinct segment registers mapping different portions of the segment.).    

As per claim 16, Hall, Lee, and Kogge teach the method of claim 5. Hall teaches wherein the PIM device is included in a memory device (Abstract paragraph 2 a main memory composed of VLSI PIM chips). 


As per claim 17, it is a system claim of claim 1, so it is rejected for similar reasons. Additionally, Hall teaches a system for supporting PIM (Processing-in-Memory) execution, the system comprising: a memory device, the memory device comprising a PIM device for executing PIM instructions; and a host device coupled to the memory device, the host device comprising a processor and a work scheduler (pg. 110 3 Overview of Memory Management paragraph 1 On the host processor, the standard operating system (in DIVA, Linux) is augmented with functionality to support PIMs; Abstract paragraph 2 host processor with a main memory composed of VLSI PIM chips; pg. 111 paragraph 2 the host, which has a system-level view, remains a central figure in system-level scheduling; pg. 113 4.2 Physical Memory Allocation paragraph 3 PIM applications perform when accessing non-local memory).

As per claim 19, it is a system claim of claim 3, so it is rejected for the same reasons.

As per claim 21, Hall, Lee, and Kogge teach the device of claim 1. Hall teaches wherein the PIM resource requirements indicated by the command include a PIM register requirement specifying a quantity of PIM registers (Fig. 4: ReserveGlobalSegment (int numBytes, int virtualNode), ReserveLocalHeapSegment ReserveLocalCodeSegment ReserveLocalStackSegment (int numBytes, int virtualNode); pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment; pg. 111 4 Memory Allocation Virtual vs. Physical paragraph 2 host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; pg. 109 2. 1 Address Translation for Locally Mapped Data paragraph 1 To condense translation information, we use segments, each of which is defined by segment registers containing a physical base address and limit; pg. 111 4 Memory Allocation: Virtual vs. Physical paragraph 2 There are three phases to allocation: (1) host allocation of contiguous virtual address spaces for global and PIM local segments using the Reserve functions; (2) physical allocation of an object and binding to reserved virtual segments; and, (3) mapping of existing global objects to a global segment for sharing between PIMs; pg. 106 paragraph 4 The physical memory on each PIM chip is flexibly partitioned into these three distinct uses. Dumb memory is managed exclusively by the host operating system in standard ways, with address translation handled solely by the host processor's memory-management hardware. Figure 2 depicts the two more interesting uses of PIM memory, as part of the shared global address space, or as PIM local memory).
Additionally, Kogge teaches wherein transmitting, to the first thread in dependence upon determining that sufficient resources within the PIM device are available based on the PIM resource requirements, a grant response indicating that access to the PIM device by the first thread is granted includes: determining that the quantity of PIM registers specified by the command is less than or equal to a quantity of available PIM registers (Fig. 1; Table on page 29 Row: V “Wait on Semaphore” (semaphore_adr, E[6]:return_adr, E[7]:free_space_adr); Claim 1 wherein said at least one first node and said at least one second node have at least one first memory and at least one second memory, respectively; a threadlet configured to cause a program to run in said computer system executed by said at least one first node when said at least one first memory is local to said threadlet; Claim 2 wherein said program requires access to a first memory location to run; claim 15 wherein said first node is on a PIM-enhanced memory chip; Col. 32 line 44-Col. 33 line 14 Threadlets may also support semaphore semantics in a relatively direct way…A P parcel has as an argument the address of the semaphore pair. When sent to the memory containing the semaphore, the P program will release the semaphore by accessing it and incrementing it atomically….A parcel of type V will try to grab the lock, and suspend itself until the lock is available. It has the same set of arguments as P, but is a bit more complex. It first atomically decrements the counter. If the result was non-negative, it returns to its sender (using whatever acknowledgement protocol is desired); Col. 16 lines 7-9 When a match occurs, the send threadlet will perform the memory transfer into the specified receiving buffer; Col. 3 lines 15-18 the term "parcel" refers to PArallel Communication ELement and is the packet of information that contains all the information needed to execute a threadlet; Col. 11 lines 41-57 Threadlets can help this situation in at least two ways. First, if it is crucial that operations be done in order at the destination location , and some extra storage can be allocated for ordering control, a "sequence number" can be appended to each parcel as an operand…After performing a designated memory operation, a threadlet could be programmed to return to the source of the parcel, and signal that the threadlet had successfully completed the threadlet's execution; Col. 17 line 66-Col. 18 line 1 a threadlet, when it starts, does in fact have local register space equal to the maximum possible parcel size; Col. 10 lines 39-67 The most basic of all memory operations is a simple load, access a specified operand and return the value to some other location, usually a CPU register or a cache entry. If the return location is in fact a memory location somewhere, the threadlet program becomes fairly simple: move to the memory that contains the data to be accessed, read the data word into the threadlet state, move to the memory representing the target, store the data from the threadlet, and quit. Such a threadlet requires two operands at the time it is launched: the address to be read and the address to receive the data…After completion the final store, a waiting thread (or threadlet), other than the active threadlet, may need to be notified that the data has arrived, and the threadlet completed its assigned task; Col. 5 lines 28-36 Such architecture, illustrated in FIG. 1, is clearly a good match for Processing-In-Memory (PIM) technology, where processing logic capable of executing such programs can be placed on a memory chip, next to a memory macro. Such an architecture is also a good match for massively parallel systems, where there are huge numbers of such PIM-enhanced memory chips, which may also include large numbers of "conventional" CPUs embedded throughout the memory; Col. 28 lines 30-32 data processing may even be done in some form of simultaneous multi-threading, such as done in PIM Lite; Col. 17 line 66-Col. 18 line 1 a threadlet, when it starts, does in fact have local register space equal to the maximum possible parcel size; Table 2 size of the threadlet to 1,2, or 9 wide words; Col. 21 lines 53-54 Wide Word Registers (denoted W[0] through W[7])).

As per claims 22 and 23, they are method and system claims of claim 21, so they are rejected for similar reasons. 
Claims 24-26 are rejected under 35 U.S.C. 103 as being unpatentable over Hall, Lee, and Kogge as applied to claims 1, 5, and 17 above, in view of Dobelstein et al. (US 20200051599 A1 hereinafter Dobelstein). 

As per claim 24, Hall, Lee, and Kogge teach the device of claim 1. Hall teaches wherein store, in the data structure, the virtual allocation (pg. 116 6.1 Contents of Context paragraph 1 On initialization of a user program, the host performs virtual memory allocation of the segments, as discussed in Section 4.1, and writes the range of allocated virtual addresses into the context data structure in the PIM run-time kernel segment).
Additionally, Kogge teaches wherein store comprises: associate an identification of the first thread with one or more numbers indicating a quantity of the one or more of the resources within the PIM device (Fig. 1; Col. 28 lines 47-48 For each threadlet is a brief description of what it does, its size (in wide words); Col. 13 lines 21-22 a parcel in local storage (a cache line by a classical CPU or a wide word by a PIM Lite ISA); Col. 3 lines 15-18 the term "parcel" refers to PArallel Communication ELement and is the packet of information that contains all the information needed to execute a threadlet;).

Hall, Lee, and Kogge fail to teach decrement a total usable resources number based on the quantity.

	However, Dobelstein teaches decrement a total usable resources number based on the quantity (claim 31 a counter having a value corresponding to a threshold number of memory regions among the plurality of memory regions allowed to perform the respective PIM operation, wherein the counter is configured to decrement based on a memory region among the plurality of memory regions performing the PIM operation; [0085] the plurality of banks can be in a processor in memory (PIM) device; [0050] a counter associated with the threshold number of banks that can perform a concurrent operation is decremented for each bank that is performing an operation).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Hall, Lee, and Kogge with the teachings of Dobelstein to prevent resources from being accessed when there isn’t a sufficient amount (see Dobelstein [0087] The method can further include denying the operation associated with the particular subarray among the plurality of subarrays based on the threshold number associated with the counter being exceeded). 
	

As per claims 25 and 26, they are method and system claims of claim 24, so they are rejected for similar reasons. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HSING CHUN LIN whose telephone number is (571)272-8522.  The examiner can normally be reached on Mon - Fri 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/H.L./Examiner, Art Unit 2195



/Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action
Prosecution Timeline

Sep 14, 2021
Application Filed
Oct 07, 2022
Non-Final Rejection — §103
Jan 17, 2023
Response Filed
Apr 21, 2023
Final Rejection — §103
Jun 22, 2023
Examiner Interview Summary
Aug 24, 2023
Request for Continued Examination
Aug 27, 2023
Response after Non-Final Action
Jan 17, 2024
Non-Final Rejection — §103
Feb 20, 2024
Examiner Interview Summary
Apr 10, 2024
Response Filed
Jul 12, 2024
Final Rejection — §103
Sep 13, 2024
Examiner Interview Summary
Oct 17, 2024
Request for Continued Examination
Oct 23, 2024
Response after Non-Final Action
May 14, 2025
Non-Final Rejection — §103
Jul 31, 2025
Applicant Interview (Telephonic)
Aug 04, 2025
Examiner Interview Summary
Sep 16, 2025
Response Filed
Jan 07, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/837,306
Patent 12554523
REDUCING DEPLOYMENT TIME FOR CONTAINER CLONES IN COMPUTING ENVIRONMENTS
2y 5m to grant Granted Feb 17, 2026
17/355,265
Patent 12547458
PLATFORM FRAMEWORK ORCHESTRATION AND DISCOVERY
2y 5m to grant Granted Feb 10, 2026
18/074,254
Patent 12468573
ADAPTIVE RESOURCE PROVISIONING FOR A MULTI-TENANT DISTRIBUTED EVENT DATA STORE
2y 5m to grant Granted Nov 11, 2025
17/806,614
Patent 12461785
GRAPHIC-BLOCKCHAIN-ORIENTATED SHARDING STORAGE APPARATUS AND METHOD THEREOF
2y 5m to grant Granted Nov 04, 2025
17/535,922
Patent 12443425
ISOLATED ACCELERATOR MANAGEMENT INTERMEDIARIES FOR VIRTUALIZATION HOSTS
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
59%
Grant Probability
99%
With Interview (+79.8%)
3y 4m
Median Time to Grant
High
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.
SUPPORTING PROCESSING-IN-MEMORY EXECUTION IN A MULTIPROCESSING ENVIRONMENT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email