Last updated: May 29, 2026
Application No. 18/689,861
MEMORY MANAGEMENT METHOD FOR PSEUDO-FUNCTIONAL DIFFERENTIABLE PROGRAMMING

Non-Final OA §103§112
Filed
Mar 06, 2024
Priority
Sep 10, 2021 — provisional 63/242,963 +1 more
Examiner
TSAI, SHENG JEN
Art Unit
2139
Tech Center
2100 — Computer Architecture & Software
Assignee
Purdue Research Foundation
OA Round
3 (Non-Final)
Interview Optional

— +13.3% interview lift. Interview lift (+13.3%) is below the 15.0% threshold. A written response is recommended.
Based on 792 resolved cases, 2023–2026
Examiner Intelligence

TSAI, SHENG JEN View full profile →
Grants 70% — above average
Career Allowance Rate
558 granted / 792 resolved
+15.5% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
18 currently pending
Career history
817
Total Applications
across all art units
Statute-Specific Performance

§101
1.0%
-39.0% vs TC avg
§103
80.5%
+40.5% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
3.3%
-36.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 792 resolved cases
Office Action

§103 §112
DETAILED ACTION
1. 	This Office Action is taken in response to Applicants’ Amendments and Remarks filed on 2/18/2026 regarding application 18/689,861 filed on 3/6/2024.  
 	Claims 1, 6, 8-17, 19-21, and 23-27 are pending for consideration.

2.				Response to Amendments and Remarks 
	Applicants’ amendments and remarks have been fully and carefully considered, with the Examiner’s response set forth below.
(1) In view of the amendments and remarks, objections of claims 2-5 have been withdrawn.
(2) The Examiner acknowledges Applicant’s explanation that the limitation “streaming tensor defined as a block of immutable data” finds its support in the Specification of the current Application, which recites “data is not mutated.”
However, other issues remain, and 112(a) and 112(b) rejections are in order. Refer to the updated section of 112 rejections in this Office Action for details.
	(3) In response to the amendments and remarks, an updated claim analysis with a newly identified reference has been made. Refer to the corresponding sections of the following Office Action for details.

3.					Examiner’s Note
(1) In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution.  MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.”  Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R.  1.131(b), (c), (d), and (h) and therefore held not fully responsive.  Generic statements such as “Applicants believe no new matter has been introduced” may be deemed insufficient.
(2) Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

4.	Claims 1, 6, 8-17, 19-21, and 23-27 are rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Currently amended claim 1 recites “wherein the at least one instruction operates on a streaming tensor. streaming tensor defined as a block of immutable data wherein residence established by pointers to said block of immutable data is decoupled from manipulation of said block of immutable data.”
However, the Examiner was not able to identify and locate any passage in the Specification of the current application which provides written description regarding “residence established by pointers to said block of immutable data is decoupled from manipulation of said block of immutable data.” 
It is noticed that paragraph [0056] of the Specification of the current application does recite “Scorch supports tensors of a variety of types: byte, char, short, int, long, float, and double that can reside on either the CPU or GPU. In the design of TORCH and PYTORCH, the residence of a tensor is coupled to where it is manipulated. CPU tensors both reside on the CPU and are manipulated on the CPU. GPU tensors both reside on the GPU and are manipulated on the GPU. Here a new class of tensors is introduced that is called streaming tensors. Streaming tensors include each of the above types of tensors. Streaming tensors decouple the residence from manipulation of a tensor. They are always manipulated on the GPU, but they can reside either on the CPU, GPU, or both. When GPU RAM is scarce, streaming tensors can be migrated from the GPU to the CPU. And when they need to be manipulated, they are copied to the GPU if they are not currently resident on the GPU.”
Thus, it is understood that the residence of streaming tensors is decoupled from its manipulation. However, there is no written descriptions in the Specification that provide support of “residence established by pointers to said block.” In fact, the Specification does not even use the term “pointers” at all, let alone “using pointers to establish the residence of the block.”
As such, the cited limitation lacks the support of written descriptions by the Specification of the current application, as required by 35 U.S.C. 112(a).
Clarifications/corrections are needed.
Claims 6, 8-17, 19-21, and 23-27 are rejected by virtue of their dependency from claim 1.

The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

5.	Claims 1, 6, 8-17, 19-21, and 23-27 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention.
Currently amended claim 1 recites “wherein the at least one instruction operates on a streaming tensor. streaming tensor defined as a block of immutable data wherein residence established by pointers to said block of immutable data is decoupled from manipulation of said block of immutable data.”
Claim 1 specifically recites that “streaming tensor defined as a block of immutable data,” and paragraph [0065] of the Specification of the current Application recites “Since Scorch is a functional language (i.e., data is not mutated), any data copied to the GPU will not be changed and need not be copied back …,” and paragraph [0079] of the Specification of the current Application recites “The tensor-streaming mechanism according to the present disclosure has three benefits. Since Scorch is a functional language, no mutation can occur, which means that, once used, the tensors copied to the GPU can be evicted and do not need to be copied back to the CPU, so long as the original copy is kept on the CPU. This saves half the communication time incurred by other duplex streaming systems …”
Thus, it is understood that the limitation “streaming tensor defined as a block of immutable data” is attributed to the fact that “data is not mutated.” However, it is not clear if a streaming tensor, by definition, is a type of “read-only data,” just like a read-only memory (ROM), which does not allow its contents be modified/changed/nutated in any circumstance. Or the immutability is not the intrinsic characteristics of the streaming tensor itself, but only because the functional language Scorch would never change the streaming tensor. If the streaming tensors by themselves can never be modified/changed/mutated, then it would mean that neither the CPU nor the GPU may modify/change/mutate the data, which means the CPU and the GPU can only read or copy the streaming tensors, and cannot perform any operations that result in the modification of the streaming tensors, which imposes a heavy restriction on the usefulness of the streaming tensors.
On the other hand, if the streaming tensors by themselves are not read-only, and the immutability is contributed by the functional language Scorch itself, then the limitation “streaming tensor defined as a block of immutable data” is misleading, to say the least, because it unnecessarily deprives the usefulness of the streaming tensors to be able to change their contents.
Clarifications/corrections are needed.
Claims 6, 8-17, 19-21, and 23-27 are rejected by virtue of their dependency from claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	Claim 1, 6-10, 12-17, 19-21, and 23 are rejected under 35 U.S.C. 103 as being anticipated by Patrick et al. (US Patent Application Publication 2024/0370302, hereinafter Patrick), and in view of Becchi et al. (US Patent Application Publication 2011/0173155, hereinafter Becchi). 
	As to claim 1, Becchi teaches A computer-implemented method of operating on a program, comprising: 
executing at least one instruction towards method of operating a functional program [as shown in figure 7, where program instructions (724) are stored in a main memory (704), then loaded into and executed by a processor (702); Some portions of this description describe the embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules can be embodied in software, firmware, hardware, or any combinations thereof … (¶ 0152-0154);
Becchi also teaches this limitation -- Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device … A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution … (¶ 0023-0024)], wherein the at least one instruction operates on a streaming tensor [Disclosed are configurations that include a deterministic streaming system with one or more deterministic streaming processors (e.g., tensor streaming processors (TSPs) or artificial intelligence processors) … The disclosed embodiments are directed to one or more deterministic streaming processors each having a functional slicing architecture. In some embodiments, each deterministic streaming processor comprises a tensor streaming processor (TSP) having a functional slicing architecture, which can be used for hardware-accelerated machine learning (ML) applications … The computational elements of the deterministic streaming processor can be divided between different functionalities (e.g., memory, arithmetic operation, etc.), and can be organized as functional slices which operate on multi-dimensional data (e.g., tensors) … (¶ 0037-0039)], streaming tensor defined as a block of immutable data wherein residence established by pointers to said block of immutable data is decoupled from manipulation of said block of immutable data [FIG. 1A illustrates an arrangement of functional slices in a tensor streaming processor (TSP), in accordance with some embodiments (¶ 0023); FIG. 2 depicts stream registers of a TSP that are numbered to show their locations between functional slices within a superlane, in accordance with some embodiments (¶ 0026)], the streaming tensors operated on a processor of a second class [Disclosed are configurations that include a deterministic streaming system with one or more deterministic streaming processors (e.g., tensor streaming processors (TSPs) or artificial intelligence processors) … The disclosed embodiments are directed to one or more deterministic streaming processors each having a functional slicing architecture. In some embodiments, each deterministic streaming processor comprises a tensor streaming processor (TSP) having a functional slicing architecture, which can be used for hardware-accelerated machine learning (ML) applications … The computational elements of the deterministic streaming processor can be divided between different functionalities (e.g., memory, arithmetic operation, etc.), and can be organized as functional slices which operate on multi-dimensional data (e.g., tensors) … (¶ 0037-0039)] and wherein a copy of the streaming tensor is reliably available in memory of a processor of a first class and may be optionally available in a memory of the processor of a second class [The deterministic cloud system 400 can run a workload (i.e., a stream of incoming tasks 430) that is otherwise very expensive to process using the traditional CPU or GPU computational resources. The workload can vary and the request patterns of users 435 can be unknown … (¶ 0089); 
Becchi more expressively teaches this limitation -- If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 … (¶ 0043-0045)];
the execution of the at least one instruction includes: 
initially analyzing the functional program [The deterministic cloud system 400 can run a workload (i.e., a stream of incoming tasks 430) that is otherwise very expensive to process using the traditional CPU or GPU computational resources. The workload can vary and the request patterns of users 435 can be unknown. By employing the TSP farm 420, it is possible to dynamically change the quality of output results. For example, the TSP farm 420 is configured to process 200 tasks at a first quality level or 400 tasks at a second quality level that is lower than the first quality level. Details about dynamically changing the quality of output results are described below in relation to FIG. 4B (¶ 0089)]; 
generating a streaming plan for usage of the streaming tensor, where the streaming plan runs in a background execution environment [“capacity planning” as shown in figure 4B, 465; The process for ensuring the drainage condition is simplified due to the deterministic nature of the TSP farm 420. A non-real-time subcomponent of the scheduler 425 that can be referred to as a “capacity planner” … If the simulation can drain the leaky buckets within all registered contractual agreements, then the capacity planner would proceed with the registration. Otherwise, the capacity planner determines the new registration to be infeasible and would require a user 435 to change their registration parameters to be less intensive on the TSP farm 420 … (¶ 0111-0112); The benefits of involving the scheduler 425 in the compilation process arise from the fact that the scheduler 425 supports a plurality of models 415 for a plurality of users 435. If a new model 415 belonging to an arbitrary user 435 is registered to the TSP farm 420 with pre-existing registered models 415, the scheduler 425 can elect to change which binary variations would be utilized for any subset of existing pre-registered models 415 as part of its optimization routine (e.g., when ensuring the drainage condition for capacity planning, as discussed in more detail in the section below). Partial compilation is useful to expedite this process because, otherwise, recompilation of models 415 would be required. Additionally, the scheduler 415 can perform its part of compilation process outside the critical path of incoming requests as, e.g., a background job. Otherwise, the non-determinism and an additional latency would be introduced to the incoming requests as model compilation itself is not deterministic … The benefit of splitting the compilation of model 415 between the compiler 410 and the scheduler 425 is that the scheduler 425 can dynamically modify a manner of running the compiled model 415 during runtime in the background … (¶ 0103-0105)];
implementing memory management based on the streaming plan [ … Each deterministic streaming processor is divided into a plurality of functional units organized into a plurality of functional slices. Each functional slice is configured to perform specific functions within the deterministic streaming processor, which can include memory functional slices (MEMs) for storing operand data, arithmetic functional slices for performing operations on received operand data (e.g., vector processing, matrix manipulation), and/or the like … (¶ 0012-0017)], wherein the memory management includes: determining when the streaming tensor is needed by the processor of a second class [as shown in figure 4A, where users (user 1 to user n, 435) request for services, which are translated into the corresponding tasks (task 1 to task n, 430), and in response, Tensor Streaming Processors (TSP 1 to TSP n, 420) are activated to serve users’ requests; FIG. 4A illustrates an example deterministic cloud system 400, in accordance with some embodiments. The deterministic cloud system 400 is implemented as a serverless cloud configuration with multiple TSPs configured to manage, e.g., Deep Neural Network (DNN) inference workloads … (¶ 0083-0089)]
receiving request for the streaming tensor [as shown in figure 4A, where users (user 1 to user n, 435) request for services, which are translated into the corresponding tasks (task 1 to task n, 430), and in response, Tensor Streaming Processors (TSP 1 to TSP n, 420) are activated to serve users’ requests; FIG. 4A illustrates an example deterministic cloud system 400, in accordance with some embodiments. The deterministic cloud system 400 is implemented as a serverless cloud configuration with multiple TSPs configured to manage, e.g., Deep Neural Network (DNN) inference workloads … (¶ 0083-0089); In one or more embodiments, the deterministic cloud system 400 would offer reserved execution of models 415 for customers with strict SLA requirements. This requires registering their model 415 before issuing tasks 430 by providing a variety of constraints of the TSP farm 420 and constraints of users 435. The constraints of the TSP farm 420 can be, e.g., required latency SLAs of registered models 415, quality SLAs, and accuracy SLAs. The users 435 can be constrained to issuing a maximum inferences per second (IPS) (i.e., constraining an average request load), and a maximum request queue size (i.e., constraining a peak request load) … (¶ 0109-0112); Becchi also teaches this limitation -- figure 6, step 602, “determine size and location of requested data”]; 
at run time determining if the streaming tensor is resident on the memory of the processor of a second class [this limitation is taught by Becchi -- Referring now to FIG. 3b, if the block B 308 resides in GPU memory, no memory allocation is performed and the content of the data block list is used to return the proper GPU address … (¶ 0044)]; 
if the streaming tensor is resident on the memory of the processor of a second class, i) retrieving the streaming tensor from the memory of the processor of a second class, ii) using the retrieved streaming tensor in the execution of the at last one instruction, and iii) generating an output [this limitation is taught by Becchi -- as shown in figure 6; Referring now to FIG. 3b, if the block B 308 resides in GPU memory, no memory allocation is performed and the content of the data block list is used to return the proper GPU address … (¶ 0044-0045)]; 
if not resident on the memory of the processor of a second class, [this limitation is taught by Becchi -- If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 …  (¶ 0043-0045)], i) retrieving the streaming tensor from the memory of the processor of a first class, ii) copying the streaming tensor onto the memory of the processor of a second class, iii) using the retrieved input data in the execution of the at last one instruction, and iv) generating an output [this limitation is taught by Becchi -- as shown in figure 6; If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043); When a kernel is invoked on CPU, the runtime must ensure that the CPU memory has an up-to-date copy of all input parameters …  After execution of a CPU kernel call, output parameters are marked as residing on the CPU memory … (¶ 0043-0045)].
Regarding claim 1, Patrick does not teach coping the needed tensor data into the tensor streaming processor (TSP) if it does not reside in the memory of the TSP.
However, Becchi specifically teaches coping the needed data into a GPU from a CPU if it does not reside in the memory of the GPU [as shown in figure 6; If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043); When a kernel is invoked on CPU, the runtime must ensure that the CPU memory has an up-to-date copy of all input parameters …  After execution of a CPU kernel call, output parameters are marked as residing on the CPU memory … (¶ 0043-0045)].
Therefore, it would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to copy the needed data into a GPU/TSP from a CPU if it does not reside in the memory of the GPU/TSP, as specifically demonstrated by Becchi, and to incorporate it into the existing scheme disclosed by Patrick because Becchi teaches doing so allows better utilizing the performance of both CPU and GPU [If an application has three candidate kernels with both CPU and GPU implementations and, during a certain execution path, the first kernel is estimated to be much faster, but the second and third much slower on the GPU (based on the sizes of their parameters), a data-agnostic scheduler is likely to run the first kernel on the GPU, and the rest on the CPU … (¶ 0028)].
As to claim 6, Patrick in view of Becchi teaches The method of claim 1, further comprising: initially analyzing the program; and generating a streaming plan for usage of the streaming tensor, where the streaming plan runs in a background execution environment and includes determining when the streaming tensor is needed by the processor of a second class, if not already in the memory of the processor of a second class, prefetching the streaming tensors into the memory of the processor of a second class ahead of when the streaming tensor is needed by the processor of a second class to avoid the processor of a second class waiting for the streaming tensor [Becchi -- The eval_loc routine (line 3) is also defined within the function call handler 204, and determines the best target for the intercepted function call at 208. This decision is made by estimating the data transfer time of the input parameters and the kernel execution time on both CPU and GPU. The runtime transfers data only when such data do not reside on the memory module where they are needed for execution … (¶ 0036); If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043); Each data block consists of a linear start address, a byte size, a device address, a location (identifier of the device where the data have been allocated), the timestamp of the last access to the block on device, and a synchronization status, indicating whether the content of CPU and device memory is synchronized or whether the up-to-date copy of the data in the block resides in CPU or device memory. Additionally, in the case of integrated devices, an additional field indicates the address in page-locked memory where the data block has been mapped (¶ 0067)].
As to claim 8, Patrick in view of Becchi teaches The method of claim 6, wherein the step of analyzing the program includes performing a profile run at run time to determine structure the program [Becchi -- If an application has three candidate kernels with both CPU and GPU implementations and, during a certain execution path, the first kernel is estimated to be much faster, but the second and third much slower on the GPU (based on the sizes of their parameters), a data-agnostic scheduler is likely to run the first kernel on the GPU, and the rest on the CPU. However if the runtime discovers that the first kernel produces a large amount of data that is consumed by the second kernel, a better schedule may be to run the second kernel also on the GPU … A runtime according to the present principles analyzes such situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy … The runtime has mechanisms to ensure coherent access to multiple copies of the same data residing in different memories (e.g., CPU and GPU memory) … (¶ 0028-0031)].
As to claim 9, Patrick in view of Becchi teaches The method of claim 8, wherein the program is a functional differentiable program [Becchi -- Systems and method for data-aware scheduling of applications on a heterogeneous platform having at least one central processing unit (CPU) and at least one accelerator. Such systems and methods include a function call handling module configured to intercept, analyze, and schedule library calls on a processing element. The function call handling module further includes a function call interception module configured to intercept function calls to predefined libraries, a function call analysis module configured to analyze argument size and location, and a function call redirection module configured to schedule library calls and data transfers. The systems and methods also use a memory unification module, configured to keep data coherent between memories associated with the at least one CPU and the at least one accelerator based on the output of the function call redirection module (abstract)].
As to claim 10, Patrick in view of Becchi teaches The method of claim 9, wherein the functional differentiable program is a structured network [Patrick -- FIG. 4A illustrates an example deterministic cloud system 400, in accordance with some embodiments. The deterministic cloud system 400 is implemented as a serverless cloud configuration with multiple TSPs configured to manage, e.g., Deep Neural Network (DNN) inference workloads … (¶ 0083); Becchi -- Systems and method for data-aware scheduling of applications on a heterogeneous platform having at least one central processing unit (CPU) and at least one accelerator. Such systems and methods include a function call handling module configured to intercept, analyze, and schedule library calls on a processing element. The function call handling module further includes a function call interception module configured to intercept function calls to predefined libraries, a function call analysis module configured to analyze argument size and location, and a function call redirection module configured to schedule library calls and data transfers. The systems and methods also use a memory unification module, configured to keep data coherent between memories associated with the at least one CPU and the at least one accelerator based on the output of the function call redirection module (abstract); Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters (¶ 0025)].
As to claim 11, Patrick in view of Becchi teaches The method of claim 10, wherein the structured network is a neural network [Patrick -- FIG. 4A illustrates an example deterministic cloud system 400, in accordance with some embodiments. The deterministic cloud system 400 is implemented as a serverless cloud configuration with multiple TSPs configured to manage, e.g., Deep Neural Network (DNN) inference workloads … (¶ 0083)].
As to claim 12, Patrick in view of Becchi teaches The method of claim 6, wherein the streaming plan includes pre-fetching data for the memories of the processor of a second class based on a window of future cycles of the one or more processors of a second class [Becchi -- A runtime operates at the granularity of a function call. An application runs by default on the CPU and may perform calls to well known kernels for which multiple implementations--either targeting CPU or GPU--are provided. When one of these computational kernels is invoked, the runtime determines the implementation to instantiate. This decision depends on two factors: the kernel execution time and the data transfer time. In turn, these factors depend on the size of the function call parameters and on the location of the corresponding data. GPU kernel implementations assume that their parameters reside on the GPU memory. It is therefore the responsibility of the runtime to hide this fact from the calling application and to maintain a mapping between data structures residing on CPU and on GPU memories. Data is not transferred to the CPU memory at the end of each GPU kernel invocation, but only when that data is used (¶ 0031); If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043)].
As to claim 13, Patrick in view of Becchi teaches The method of claim 12, wherein size of the window is predefined [Becchi -- A runtime operates at the granularity of a function call. An application runs by default on the CPU and may perform calls to well known kernels for which multiple implementations--either targeting CPU or GPU--are provided. When one of these computational kernels is invoked, the runtime determines the implementation to instantiate. This decision depends on two factors: the kernel execution time and the data transfer time. In turn, these factors depend on the size of the function call parameters and on the location of the corresponding data. GPU kernel implementations assume that their parameters reside on the GPU memory. It is therefore the responsibility of the runtime to hide this fact from the calling application and to maintain a mapping between data structures residing on CPU and on GPU memories. Data is not transferred to the CPU memory at the end of each GPU kernel invocation, but only when that data is used (¶ 0031)].
As to claim 14, Patrick in view of Becchi teaches The method of claim 12, wherein size of the window is provided by a user [Becchi -- Heterogeneous platforms are those with both a multi-core central processing unit (CPU) and a many-core accelerated processor such as a graphics processing unit (GPU). To realize the higher performance that such platforms can deliver, however, programmers need intimate knowledge of the GPU architecture. In order to help the common programmer develop code for such platforms, GPU implementations of several "kernels" are made available as libraries. Thus each library kernel has both a CPU and GPU implementation (¶ 0005); The present principles enable legacy applications to automatically run on heterogeneous platforms with minimal data transfers and with full data coherence. The operating system and runtime may be used to provide the programmer with a unified memory view of possibly discrete underlying memory sub-systems … (¶ 0052)].
As to claim 15, Patrick in view of Becchi teaches The method of claim 12, wherein size of the window is adaptive based on the out-of-memory errors [Becchi -- … The runtime transfers data only when such data do not reside on the memory module where they are needed for execution. eval_loc queries the memory access module 206 for the location of each input parameter, and estimates the data transfer time based on the parameter size. In case of GPU execution, eval_loc considers the size and the location of the output parameters to determine whether the GPU has enough free memory to allocate them … (¶ 0036)].
As to claim 16, Patrick in view of Becchi teaches The method of claim 6, wherein the step of analyzing the program includes performing a static analysis at compiler stage to determine variable control flow of the program [Becchi -- Ideally, a heterogeneous system should enable any legacy code written for homogeneous systems to run faster, transparently to the programmer. Library-based programming, where pre-compiled assembly-level libraries for common kernels on the accelerators are made available, eases the burden of parallelizing applications on heterogeneous systems such as that shown in FIG. 1 … (¶ 0027)].
As to claim 17, Patrick in view of Becchi teaches The method of claim 16, wherein the static analysis is adaptive based on a speculative execution scheme [Becchi -- Systems and method for data-aware scheduling of applications on a heterogeneous platform having at least one central processing unit (CPU) and at least one accelerator. Such systems and methods include a function call handling module configured to intercept, analyze, and schedule library calls on a processing element. The function call handling module further includes a function call interception module configured to intercept function calls to predefined libraries, a function call analysis module configured to analyze argument size and location, and a function call redirection module configured to schedule library calls and data transfers. The systems and methods also use a memory unification module, configured to keep data coherent between memories associated with the at least one CPU and the at least one accelerator based on the output of the function call redirection module (abstract); A runtime operates at the granularity of a function call. An application runs by default on the CPU and may perform calls to well known kernels for which multiple implementations--either targeting CPU or GPU--are provided. When one of these computational kernels is invoked, the runtime determines the implementation to instantiate. This decision depends on two factors: the kernel execution time and the data transfer time. In turn, these factors depend on the size of the function call parameters and on the location of the corresponding data. GPU kernel implementations assume that their parameters reside on the GPU memory. It is therefore the responsibility of the runtime to hide this fact from the calling application and to maintain a mapping between data structures residing on CPU and on GPU memories. Data is not transferred to the CPU memory at the end of each GPU kernel invocation, but only when that data is used (¶ 0031)].
As to claim 19, Patrick in view of Becchi teaches The method of claim 1, wherein the one or more processors of a second class includes a coprocessor [Becchi -- Heterogeneous platforms are those with both a multi-core central processing unit (CPU) and a many-core accelerated processor such as a graphics processing unit (GPU) … (¶ 0005-0006)].
As to claim 20, Patrick in view of Becchi teaches The method of claim 19, wherein the coprocessor include graphics processing units [Becchi -- Heterogeneous platforms are those with both a multi-core central processing unit (CPU) and a many-core accelerated processor such as a graphics processing unit (GPU) … (¶ 0005-0006)].
As to claim 21, Patrick in view of Becchi teaches The method of claim 19, wherein the coprocessor include tensor processing units [Patrick -- Disclosed are configurations that include a deterministic streaming system with one or more deterministic streaming processors (e.g., tensor streaming processors (TSPs) or artificial intelligence processors) … The disclosed embodiments are directed to one or more deterministic streaming processors each having a functional slicing architecture. In some embodiments, each deterministic streaming processor comprises a tensor streaming processor (TSP) having a functional slicing architecture, which can be used for hardware-accelerated machine learning (ML) applications … The computational elements of the deterministic streaming processor can be divided between different functionalities (e.g., memory, arithmetic operation, etc.), and can be organized as functional slices which operate on multi-dimensional data (e.g., tensors) … (¶ 0037-0039)].
As to claim 23, Patrick in view of Becchi teaches The method of claim 1, the copying steps, includes: determining if there is sufficient contiguous memory available in the memory of the processor of a second class; if there is sufficient contiguous memory, proceeding with the copying step [Becchi -- … The runtime transfers data only when such data do not reside on the memory module where they are needed for execution. eval_loc queries the memory access module 206 for the location of each input parameter, and estimates the data transfer time based on the parameter size. In case of GPU execution, eval_loc considers the size and the location of the output parameters to determine whether the GPU has enough free memory to allocate them … (¶ 0036); If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043); When a kernel is invoked on CPU, the runtime must ensure that the CPU memory has an up-to-date copy of all input parameters …  After execution of a CPU kernel call, output parameters are marked as residing on the CPU memory … (¶ 0047-0048)].

7.	Claims 24-27 are rejected under 35 U.S.C. 103 as being unpatentable over Patrick in view of Becchi, and further in view of Payer et al. (US Patent 9,448,929, hereinafter Payer).
Regarding claim 24, Patrick in view of Becchi does not teach if there is insufficient contiguous memory in the memory of the processor of the second class, calling a garbage collector adapted to remove unneeded data in the memories of the one or more processors of a second class.
However, Payer specifically teaches if there is sufficient contiguous memory, performing the copying or writing operation to the memory of the one or more processors of a second class; if there is not sufficient contiguous memory, calling a garbage collector adapted to remove unneeded data in the memories [… the inserted instruction being configured to determine if a contiguous memory block of sufficient size to allocate the first amount of memory and the second amount of memory is available and, if the contiguous memory block of sufficient size is not available, trigger garbage collection … (c5 L35-65)].
 Therefore, it would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to determine if there is sufficient contiguous memory, performing the copying or writing operation to the memory of the one or more processors of a second class; if there is not sufficient contiguous memory, calling a garbage collector adapted to remove unneeded data in the memories, as expressively demonstrated by Payer, and to incorporate it into the existing scheme disclosed by Patrick in view of Becchi, in order to obtain enough memory space to accommodate the desired data.
As to claim 25, Patrick in view of Becchi & Payer teaches The method of claim 24, further comprising if there is still insufficient contiguous memory in the memories of the processor of a second class and collectively there is sufficient contiguous and non-contiguous memory, then compacting data in the memories memory of the processor of a second class, and if there is sufficient contiguous memory in the memory of the processor of a second class after the compacting data step, then proceeding with the copying step [Payer -- … Garbage collection can also be used to defragment a block of memory that is associated with a given program. Such a defragmentation process can group (move) live (active) memory objects together in memory and, as a result, free up larger blocks (sections, chunks, etc.) of available (free, unassigned, and so forth) memory space by eliminating portions of free memory that located between live objects (fragmented memory). These portions of unused (fragmented) memory may, for instance, be associated with objects that are no longer being used by the given application (c1 L30-46); Becchi -- … The runtime transfers data only when such data do not reside on the memory module where they are needed for execution. eval_loc queries the memory access module 206 for the location of each input parameter, and estimates the data transfer time based on the parameter size. In case of GPU execution, eval_loc considers the size and the location of the output parameters to determine whether the GPU has enough free memory to allocate them … (¶ 0036); If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043); When a kernel is invoked on CPU, the runtime must ensure that the CPU memory has an up-to-date copy of all input parameters …  After execution of a CPU kernel call, output parameters are marked as residing on the CPU memory … (¶ 0047-0048)].
	As to claim 26, Patrick in view of Becchi & Payer teaches The method of claim 25, further comprising if there is still insufficient contiguous memory in the memories of the processor of a second class, then removing one or more recently used data in the memories of the processor of a second class [Payer -- The garbage collector 147 may be implemented as a generational garbage collector, though other types of garbage collectors may be used. The garbage collector 147 can use a semi-space strategy that classifies objects as “young generation” objects, which have not yet been observed and/or moved by the garbage collector 147, and “old generation” objects, which have been previously observed and/or moved by the garbage collector. In such approaches, the garbage collector 147 can be configured to perform frequent minor collections of the young generation objects, and may also implement a mark-and-sweep collector with incremental marking for major collections of the old generation objects (c7 L27-39)], and if there is sufficient contiguous memory in the memory of the processor of a second class after the removing the one or more least recently used streaming tensors step, then proceeding with the copying step, and if there is still insufficient contiguous memory in the memory of the processor of a second class and collectively there is sufficient contiguous and non-contiguous memory, then re- compacting data in the memory of the processor of a second class, and if there is sufficient contiguous memory in the memory of the processor of a second class after the re-compacting data step, then proceeding with the copying step [Payer -- … Garbage collection can also be used to defragment a block of memory that is associated with a given program. Such a defragmentation process can group (move) live (active) memory objects together in memory and, as a result, free up larger blocks (sections, chunks, etc.) of available (free, unassigned, and so forth) memory space by eliminating portions of free memory that located between live objects (fragmented memory). These portions of unused (fragmented) memory may, for instance, be associated with objects that are no longer being used by the given application (c1 L30-46); Becchi -- … The runtime transfers data only when such data do not reside on the memory module where they are needed for execution. eval_loc queries the memory access module 206 for the location of each input parameter, and estimates the data transfer time based on the parameter size. In case of GPU execution, eval_loc considers the size and the location of the output parameters to determine whether the GPU has enough free memory to allocate them … (¶ 0036); If the requested block does not reside in GPU memory, as shown in FIG. 3a, a GPU memory allocation is performed and a new entry is added to the data block list. FIG. 3a shows a block list 302 that includes a block A 306, but not the requested block B 308. The memory allocation is followed by a data transfer (from CPU to GPU) 216 only if the update parameter of the get call is set to true. The resulting data block list 304 then includes the synced block B 308 (¶ 0043); When a kernel is invoked on CPU, the runtime must ensure that the CPU memory has an up-to-date copy of all input parameters …  After execution of a CPU kernel call, output parameters are marked as residing on the CPU memory … (¶ 0047-0048)].
	As to claim 27, Patrick in view of Becchi & Payer teaches The method of claim 26, further comprising determining if there is still insufficient contiguous memory in the memories of the processor of a second class, then halt execution of the at least one instruction and issuing an out-of- memory error [Becchi -- … The runtime transfers data only when such data do not reside on the memory module where they are needed for execution. eval_loc queries the memory access module 206 for the location of each input parameter, and estimates the data transfer time based on the parameter size. In case of GPU execution, eval_loc considers the size and the location of the output parameters to determine whether the GPU has enough free memory to allocate them … (¶ 0036)].
	
					Conclusion
8.	Claims 1, 6, 8-17, 19-21, and 23-27 are rejected as explained above. 
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHENG JEN TSAI whose telephone number is 571-272-4244.  The examiner can normally be reached on Monday-Friday, 9-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on 571-272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/SHENG JEN TSAI/Primary Examiner, Art Unit 2139
Read full office action
Prosecution Timeline

Mar 06, 2024
Application Filed
Jun 04, 2025
Non-Final Rejection mailed — §103, §112
Sep 04, 2025
Response Filed
Sep 26, 2025
Final Rejection mailed — §103, §112
Feb 17, 2026
Request for Continued Examination
Feb 20, 2026
Response after Non-Final Action
Mar 13, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/820,356
Patent 12638971
BI-MODAL MEMORY IDLE HYSTERESIS FOR OPTIMAL ADD-IN CARD ACCELERATOR PERFORMANCE AND POWER
3y 9m to grant Granted May 26, 2026
17/863,300
Patent 12608158
SYSTEMS AND METHODS FOR SEND LOG PAGE COMMANDS FOR PULL MODEL DEVICES
3y 9m to grant Granted Apr 21, 2026
18/604,149
Patent 12596490
MEMORY MANAGEMENT USING A REGISTER
2y 0m to grant Granted Apr 07, 2026
18/190,724
Patent 12585387
Clock Domain Phase Adjustment for Memory Operations
2y 12m to grant Granted Mar 24, 2026
18/340,291
Patent 12579075
USING RETIRED PAGES HISTORY FOR INSTRUCTION TRANSLATION LOOKASIDE BUFFER (TLB) PREFETCHING IN PROCESSOR-BASED DEVICES
2y 8m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
84%
With Interview (+13.3%)
3y 4m (~1y 1m remaining)
Median Time to Grant
High
PTA Risk
Based on 792 resolved cases by this examiner. Grant probability derived from career allowance rate.