Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is in response to the claim listing filed on December 5th, 2025. Claims 1-21 are currently pending.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-21 are rejected 35 U.S.C. 103 as being unpatentable over combination of Itani et al. (USPGPUB No. 20230042226 A1, hereinafter referred to as Itani) in view of Dalal (USPGPUB No. 2023/0231811 A1).
Referring to claim 1, Itani discloses a method, comprising {“systems and methods of the present disclosure may use the DMA and the processor (e.g., VPU) to configure a tightly coupled processing…”, see Figs. 11a and 11c [0304] 1st sentence}:
at a direct memory access (DMA) controller {“HW sequencer control 1060” to influence DMA “DMA engine 1056”, see Fig. 10H [0287]}, issuing memory operations {“[issue memory] sequence”, see Fig. 10H [0287], 2nd sentence} based on descriptors to copy data from a global memory {“may read the image structure, pull in the descriptors,” (see Fig. 10H [0287]) from global memory “HW sequencer command memory 1054” (see Fig. 10H, [0283]) also referred as “descriptor SRAM 1054” ([0284])} of a parallel processor {“process 1100 may include a processing controller 1102” (see Fig. 11a [0306], 1st sentence)} and a local memory {local memory “access data from VMEM 1110 that was written to VMEM 1110 using the DMA 1104” (see Figs. 11a and 11c [0316]], last sentence} of a module executing {“vector processing unit (VPU) 1108 or another processor type”, see Fig. 11a [0306], 1st sentence} at the parallel processor {parallel processing system “Parallel processing is used to accelerate many compute tasks” ([0245], 1st sentence, [0263], 1st sentence)};
Itani does not appear to explicitly disclose issuing memory operations based on descriptors to copy data to and from a global memory of a parallel processor; and in response to a context switch at the shader, halting issuing memory operations at the DMA controller
However, Dalal discloses issuing memory operations based on descriptors to copy data to {“intrinsic traffic management of sessions [metadata] on the Xockets DIMM [memory operations]” ([0218]) issued as claimed based on descriptors “many parallel sessions (such that the hardware’s prefetching of session-specific data” (see Fig. 30, [0170], last two sentences} and from a global memory {“very high number of random accesses as the streams are all independently striding through large video files” from a global memory “memory to memory or disk to memory” (see Fig. [0216])} of a parallel processor {parallel processor “each wimpy core can serve data from local memory”, see Fig. 51, [0216]};
and in response to a context switch {“Xockets DIMM can allow context switching”, [0218]} at the shader {shaders as part of “Adobe Flash Media Server, Microsoft IIS, Wowza, Kaltura, or a CDN may”, see Fig. 51, [0218]}, halting issuing memory operations at the DMA controller {“a context stored [halted] by one offload processor can be resumed by a different offload processor”, see Fig. 60-0, [0320]}.
Itani and Dalal are analogous because they are from the same field of endeavor, routing packet stream(s).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Itani and Dalal before him or her, to modify Itani’s parallel processor and corresponding DMA controller “HW sequencer control 1060” (see Figs. 10H, 11a, and 11c) incorporating Dalal’s “each wimpy core” and respective “Xockets DIMM” functionality (see Figs. 30 and 60-0).
The suggestion/motivation for doing so would have been to take advantage benefit for the Xockets’ context switching mechanism implementation (Dalal [0191], last sentence), said context Xocket switching facilitated by one or more additional virtual switches connected to a typical virtual switch, via SRIOV + IOMMU, as but one example which includes, a series of wimpy cores, each with their own independent memory channel, can be managed with any suitable virtualization framework (Dalal [0095], 1st and 2nd sentences, paraphrased).
Therefore, it would have been obvious to combine Dalal with Itani to obtain the invention as specified in the instant claim(s).
As per claim 2, the rejection of claim 1 is incorporated and Dalal discloses further comprising: saving at least one descriptor of incomplete memory operations {“can store current context information” (see Fig. 60-0 [0319], 2nd sentence) including at least one descriptor “selected offload processors can save their current context data 6354” (see Fig. 63, [0347], last sentence) that “context data 6354” including at least one descriptor “based on session metadata” (see Fig. 59a, [0303], 1st sentence)} to a region of the global memory {“[global] buffer memory can be different from local memory 6010. For example, [global] buffer memory can have a slower access time than local memory 6010. However, in other embodiments, buffer memory and local memory can be implemented with [global] memory devices”, see Fig. 60-0, [0324], last three sentences}.
As per claim 3, the rejection of claim 2 is incorporated and Dalal discloses further comprising:
saving metadata associated with the at least one descriptor {“selected offload processors can save their current context data 6354” (see Fig. 63, [0347], last sentence) that “context data 6354” including at least one descriptor “based on session metadata” (see Fig. 59a, [0303], 1st sentence} to the region of the global memory {“can then switch between such different [global memory region] spaces”, see Fig. 60-0, [0322], last sentence}, wherein the metadata indicates that the at least one descriptor {“context data 6354” (see Fig. 63, [0347], last sentence) including at least one descriptor “based on session metadata” (see Fig. 59a, [0303], 1st sentence)} is a restored descriptor {“will be fetched and pre-fetched into the cache so that when the thread resumes most of its [restored descriptor] previous working set is there in the cache already”, see Fig. 59-0 [0455], last two sentences}.
As per claim 4, the rejection of claim 3 is incorporated and Dalal discloses wherein the metadata further indicates a number of descriptors of incomplete memory operations saved {“session’s contents [number of descriptors] are prefetched and transferred into the cache”, see Figs. 59-0 and 85 [0456], 2nd sentence} to the region of the global memory {“register set is saved to [global] memory as part of switch-out, ”, see Fig. 59-0 [0456], 1st sentence}.
As per claim 5, the rejection of claim 2 is incorporated and Dalal discloses further comprising: receiving the at least one descriptor of incomplete memory operations {“register contents [at least one descriptor] are [received] loaded by the kernel upon resuming the thread, these loads should be from the cache”, see Fig. 59-0 [0456], 2nd sentence} at the DMA controller in response to a context resume at the shader {“According to [context resume] instructions from hardware scheduler 5923,” (see Fig. 59B [0310], last two sentences) to DMA controller “DMA master or “DMA Slave 5927” ([0310], 2nd and 3rd sentences respectively)}.
As per claim 6, the rejection of claim 5 is incorporated and Dalal discloses further comprising:
resuming issuing memory operations {“According to [context resume] instructions from hardware scheduler 5923,” (see Fig. 59B [0310], last two sentences) to DMA controller “DMA master or “DMA Slave 5927” ([0310], 2nd and 3rd sentences respectively) the context resume including any issued memory operations in the given context} based on the at least one descriptor of incomplete memory operations {“session’s contents [number of descriptors] are prefetched and transferred into the cache”, see Figs. 59-0 and 85 [0456], 2nd sentence} to copy data to {“intrinsic traffic management of sessions [metadata] on the Xockets DIMM [memory operations]” ([0218]) issued as claimed based on descriptors “many parallel sessions (such that the hardware’s prefetching of session-specific data” (see Fig. 30, [0170], last two sentences} and from the global memory of the parallel processor {“very high number of random accesses as the streams are all independently striding through large video files” from a global memory “memory to memory or disk to memory” (see Fig. [0216])} and the local memory of the shader {parallel processor “each wimpy core can serve data from local memory”, see Fig. 51, [0216]}.
As per claim 7, the rejection of claim 6 is incorporated and Dalal discloses further comprising: prioritizing issuing memory operations {“packets [associated with memory operations] are prioritized and scheduled and as such”, see Fig. 59a [0293], 1st sentence} based on the at least one descriptor {“context data 6354” including at least one descriptor “based on session metadata” (see Fig. 59a, [0303], 1st sentence} of incomplete copy operations {“can allocate a priority to each of the output queues and carry out reordering of incoming packets” (see Fig. 59a [0296], 1st sentence) that re-ordering including copy operations “shape of flows and micro-flows through delay (buffering), … delay jitter (temporally shifting [copying] cells of a flow by different amounts”, see Fig. 79, [0415], last sentence} over issuing instructions based on a descriptor received {“session’s contents [number of descriptors] are prefetched and transferred into the cache”, see Figs. 59-0 and 85 [0456], 2nd sentence} at the DMA controller subsequent to the context resume at the shader {“According to [context resume] instructions from hardware scheduler 5923,” (see Fig. 59B [0310], last two sentences) to DMA controller “DMA master or “DMA Slave 5927” ([0310], 2nd and 3rd sentences respectively)}.
Referring to claims 8-15 are system claims reciting the same functionality as the method claim of claims 1-7, thereby rejected under the same rationale as claims 1-7 recited above, inter alia, Dalal discloses wherein a trap handler {“profiling of Wowza [shader] (streaming engine) informs the Apache performance” ([0122], last sentence) via associated trap handler “request to a particular memory address representing the disk creates a trap executing code from the Dockets′ OS driver.” (emphasis added by Examiner, [0231], 1st sentence)}, associated with the shader is to {shaders as part of “Adobe Flash Media Server, Microsoft IIS, Wowza, Kaltura, or a CDN may”, see Fig. 51, [0218]}: save at least one descriptor of incomplete copy operations {“can store current context information” (see Fig. 60-0 [0319], 2nd sentence) including at least one descriptor “selected offload processors can save their current context data 6354” (see Fig. 63, [0347], last sentence) that “context data 6354” including at least one descriptor “based on session metadata” (see Fig. 59a, [0303], 1st sentence)} to a region of the global memory {“[global] buffer memory can be different from local memory 6010. For example, [global] buffer memory can have a slower access time than local memory 6010. However, in other embodiments, buffer memory and local memory can be implemented with [global] memory devices”, see Fig. 60-0, [0324], last three sentences}.
Additionally, as per claim 15, the rejection of claim 13 is incorporated and Dalal discloses further comprising:
a scheduler to initiate a second context resume {“external hardware scheduler 8004 can act as a traffic management queue” for a plurality of context, see Fig. 85 [0454], 3rd sentence} in response to receiving an indication {“response to the completion of the processing task [as indicated], scheduler [8004/] 6216 can update a schedule.”, see Fig. 62-4 [0343], 3rd sentence} from the DMA controller that the first context resume has completed {“eliminating two sources of context switch overhead and reducing the latency for the switched-in [first] session to resume useful processing”, see Fig. 59-0 [0456], last sentence}.
Referring to claims 16-21 are system claims reciting the same functionality as the apparatus claim of claims 8-15, thereby rejected under the same rationale as claims 8-15 recited above.
Response to Arguments
Applicant’s arguments filed on 12/05/2025 have been considered but deemed moot in view of the following explanation:
Applicant alleges that there is no motivation for the combination of Itani and Dalal do teach claim 1’s “issuing memory operations based on descriptors [sic]” (Remarks page 6, 2nd paragraph after 103 section header).
The Examiner respectfully acknowledges the Attorney’s assertion to Dalal high level network session management system in conjunction with prefetching as a memory optimization policy (Remarks page 6, last paragraph). The attorney further goes on to assert the Office has not provided teaching to either reference detailing the necessary hardware synchronization as well as that Dalal’s general commercial video platforms is an unsupported assertion and that generic wimpy cores are not specific shader architecture (Remarks page 7 paraphrased).
The Examiner will further elaborate on the claim interpretation of the independent claims and further draw parallels to Itani and Dalal references. Each independent claim recites a comprising preamble, which treats the claim open-ended, meaning additional features/structure/steps can be included provided it facilitates the claimed limitations, in which the broad scope describes a method claim performing memory operations between a DMA controller, and a shader executing at the parallel processor. Claim 16 discloses a system reciting similar steps to claim 1, with the additional functionality of the trap handler, but simply swapping the “halt” of claim 1 for “stop issuing instructions” in last 3 lines of claim 16. Outside of these two distinctions, the term “shader” is not further defined in any claim tree or dependent filed, whether such shader performs tessellation, culling, shading, or other imaging tasks known in 2D/3D processing technical art/field. Applicant’s PGPUB [0008] recites “shader programs, raytracing programs” and a processing system to group work items into waves but does not further elaborate to what extent or if the image processing is on a 2D/3D plane, light sources, and the like.
The specification goes further into structure surrounding the term “shader”, for example a shader providing a descriptor to a DMA controller, where the descriptor includes “tensor dimensions, tile dimensions, strides, padding, the global memory address to or from which the tensor is to be copied” (PGPUB [0031]), however elaborating which shader performs different portions of the descriptors is not further defined in the specification. In other words, claim 1 recites a generic shader, that may be executing as hardware/software in communication with a DMA controller for processing tensor dimensions or tile dimensions even though neither are brought into the currently filed claims.
Turning to Dalal, as already cited “Adobe Flash Media Server, Microsoft IIS, Wowza, Kaltura, or a CDN may” (see Fig. 51, [0218]) and further expanding the “Suricata Header Detection Engine” ([0212]) includes a program “coded for external HW acceleration (the normal use is in the CUDA framework of graphics providers such as Nvidia)” ([0241]) where Nvidia is a well-known trademark for providing shader solutions in computer architecture. While Dalal does not use the term “tensor” explicitly, Dalal does recite “accelerators” ad nauseum as well as “Accelerator Coherency Port” ([0172]) to facilitate “IO data can stream in and out, filling L1 and other memory during the packet processing” ([0175]) between parallel processors and DMA.
In lieu of the rebuttal recited above, Dalal’s ACP works as hardware synchronization “augment the available L2 cache and extend the coherency domain of sessions” (see Fig. 59-0 [0455]) which by the use of “with the careful management of a session’s cache contents, the cost of context switching due to register set save and restore and cache misses on switch-in are greatly reduced, and even eliminated in some optimal cases, thereby eliminating two sources of context switch overhead and reducing the latency for the switched-in session to resume useful processing” ([0456], last sentence) which occurs after the halted “when the thread resumes most of its previous working set is there in the cache already” ([0455], last three sentences). Other examples of hardware synchronization “scheduler circuit 5908b/n… engineering computational availability can be hardware context switching synchronized with network queuing” this also handle the situation of incompletion memory operations in a use case “the lopsided nature of this ratio, embodiments can utilize computation having many parallel sessions (such that the hardware’s prefetching of session-specific data offloads a large portion of the host processor load) and having minimal general purpose processing of data.” ([0300])
Further worth mentioning at least with respect to claim 1 that the DMA controller simply halts in response, and could receive such instructions based on a context table or another component entirely to alert the DMA controller of the context switch. Similar issues also present with respect to claim 8. Claim 16 trap handler simply resides in the system and can be implemented as hardware or software.
Dalal [0134]: “A rack-level 2-4 TB shared [global] in-memory disk can be created with a maximum 1… A rack hosting 3600 ARM cores can query across this [global memory] disk using SQL at speeds orders of magnitude faster than a single serve”; such data movement to the “[local memory] local Xockets DIMM upon requesting a certain address range” (Dalal [0132]) facilitated by a trap handler “mmap routine can trap and execute the code of the Xockets driver, which in turn can issue the correct set of write and read commands to Xockets Memory 1222 to produce and return the sought after data, to the requesting user process.” (Dalal [0132] last sentence).
For these reasons the current ground of rejection(s) is respectfully maintained.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following references regarding claim 1’s “DMA controller”, “memory operations”, or “local memory”: US 11409685 B1, US 11256641 B2, US 11048435 B2, US 11042496 B1, US 20210117246 A1, and US 10977198 B2.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER A. BARTELS whose telephone number is (571)270-3182. The examiner can normally be reached on Monday-Friday 9:00a-5:30pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dr. Henry Tsai can be reached on 571-272-4176. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C. B./
Examiner, Art Unit 2184
/STEVEN G SNYDER/ Primary Examiner, Art Unit 2184