DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. Claims 1–20 are presented for examination in a non-provisional application filed on Jul. 18, 2023.
Claim Interpretation Under 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
3. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action.
Accordingly, claim 20 is being interpreted under 35 U.S.C. 112(f).
Abbreviations
4. Where appropriate, the following abbreviations will be used when referencing Applicant’s submissions and specific teachings of the reference(s):
i. figure / figures: Fig. / Figs.
ii. column / columns: Col. / Cols.
iii. page / pages: p. / pp.
References Cited
5. (A) Applicant’s Specification, construed as Applicant Admitted Prior Art (“AAPA”).
(B) Pandurangan et al., 2023/0114636 A1 (“Pandurangan”).
(C) Gangani et al., US 2021/0240524 A1 (“Gangani”).
Notice re prior art available under both pre-AIA and AIA
6. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
A.
7. Claims 1–3, 5–6, 12, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over (A) AAPA in view of (B) Pandurangan and (C) Gangani.
See “References Cited” section, above, for full citations of references.
8. Regarding claim 1, (A) AAPA teaches/suggests the invention substantially as claimed, including:
“A system, comprising:
a data storage device comprising:
a peripheral interface configured to connect to a host system;
a storage medium configured to store host data”
(¶ 4: Each storage device in a multi-device storage system may be connected to a host system through at least one high-bandwidth interface, such as PCle, using an appropriate storage protocol for the storage device, such as non-volatile memory express (NVMe) for accessing solid state drives (SSDs) or the storage blades of all flash arrays);
“a direct memory access service configured to:
store, to a host … data; and access, from the host … data”
(¶ 4: fabric-based distributed storage systems may include storage devices configured with direct memory access to enable more efficient transfer of data to and from hosts and other systems).
AAPA does not teach “store, to a host memory buffer of the host system and through the peripheral interface, a first set of task input data; and
access, from the host memory buffer and through the peripheral interface, a first set of task output data”
(B) Pandurangan, teaching a computational storage device, however teaches or suggests:
“store, to a host memory buffer of the host system and through the peripheral interface, a first set of task input data; and
access, from the host memory buffer and through the peripheral interface, a first set of task output data”
(¶ 22: A computational storage device in accordance with example embodiments of the disclosure may enable a user (e.g., an application running on a host) to access a device program using a storage protocol such as Nonvolatile Memory Express (NVMe). For example, in some embodiments, a user may start a device program by sending a command to the storage device using a storage protocol. The command may include, for example, the name of the device program to execute, one or more parameters to pass to the device program, a pointer to a memory area that may be used to exchange input and/or output data between the device program and an application running on the host;
¶ 24: a host may collect a stream of output data such as logs, debug messages, and/or the like, from a device program by directly reading the output data from a buffer using the storage protocol. In such an embodiment, a host and a storage device may exchange output data from a device program through a ring buffer using a producer-consumer model in which the device may produce data and the host may consume data;
¶ 71: the memory space in the data buffer pointed to by DPTR in the start command illustrated in FIG. 7 may include, for example, the name of the device program to execute and/or one or more input arguments for the device program. The data buffer may be located, for example, at a storage device, at a host, at any other location;
¶ 72: the data buffer may be populated (at least partially) automatically, for example, by an application running on a host, a driver on a host, a computational storage device, and/or the like, in response to a command to execute a device program;
¶ 73: As another example, if the data buffer is located at least partially at a host, the device program may send output data to the data buffer at the host using the storage protocol;
¶ 54: The buffer logic 438 may maintain any type of buffer that may enable a user ( e.g., an application running on a host) to receive output data from one or more device programs).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of (B) Pandurangan with those of (A) AAPA to provide a computational storage device and device programs therein to execute user/device programs and exchange data with the host. The motivation or advantage to do so is provide shared access to the host the compute resources of a computational storage device, including specialized hardware/software components of the storage device (see AAPA, ¶ 5: storage devices may incorporate custom hardware accelerators integrated in their controller application specific integrated circuits (ASICs)).
AAP and Pandurangan do not teach “a processing offload service configured to notify, through the peripheral interface, a processor device to initiate a first processing task on the first set of task data, wherein the processor device comprises a graphics processing unit.”
(C) Gangani, in the context of AAPA and Pandurangan’s teachings, however teaches or suggests implementing:
“a processing offload service configured to notify, through the peripheral interface, a processor device to initiate a first processing task on the first set of task data, wherein the processor device comprises a graphics processing unit”
(¶ 51: CPU 210 may be configured to execute the application 212. The application 212 may be an application that offloads the performing of ML tasks to the GPU 220. For example, the CPU 210 may use the GPU 220 to execute one or more ML primitives. For example, the application 212 may include operations that cause the GPU 220 to execute one or more computational jobs associated with an ML primitive;
¶ 52: graphics driver 214 may receive the operations from the application 212 and may control operation of the GPU 220 to facilitate executing the operations. For example, the graphics driver 214 may generate one or more command streams, store the generated command streams in the command buffer 230 of the memory 124, and instruct the GPU 220 to execute the command streams;
¶ 54: the CPU 210 may provide graphics data to the GPU 220 for rendering and issue one or more graphics commands to the GPU 220 … the CPU 210 may provide the graphics commands and the graphics data to the memory 124, which may be accessed by the GPU 220;
¶ 61: an ML command may cause the GPU 220 to generate a quantity of outputs (sometimes referred to as “primitive outputs”) associated with an ML primitive;
¶ 63: GPU 220 may access ML data associated with each of the respective computational jobs from the ML data buffer 232 at the memory 124, execute the respective computational jobs using the accessed ML data, and then write the output generated by executing each of the 100 computational jobs to the ML data buffer 232 of the memory 124).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further combine the teachings of (C) Gangani with those of AAPA and Pandurangan to offload specific computation tasks to the GPU from a device program (i.e. computational storage device). The motivation or advantage to do so is provide specialized or additional computational processing or acceleration devices (functions) to meet the performance or processing requirements of host applications and/or user programs.
9. Regarding claim 2, AAPA, Pandurangan, and Gangani, in combination, teach or suggest:
“a peripheral bus configured for communication among the data storage device, the host system, and the processor device”
(AAPA, ¶ 2: Some computing systems, such as storage arrays, may include multiple data storage devices supporting one or more host systems through a peripheral or storage interface bus, such as peripheral component interconnect express (PCIe);
Pandurangan, Fig. 10 and ¶ 90: The host apparatus 1000 illustrated in FIG. 10 may include a processor 1002, which may include a memory controller 1004, a system memory 1006, host control logic 1008, and/or an interconnect interface 1010, which may be implemented, for example using CXL. Any or all of the components illustrated in FIG. 10 may communicate through one or more system buses 1012;
Gangani, Fig. 2 and ¶ 48: FIG. 2, the example CPU 210, the example GPU 220, and the example memory 124 are in communication via an example bus 202); and
“the host system comprising:
a host processor; and
a host memory device comprising a set of host memory locations configured to be:
allocated to the host memory buffer;
accessible to the data storage device using direct memory access; and
accessible to the processor device using direct memory access”
(AAPA, ¶ 2: host systems … are being tasked with mathematically intensive tasks related to graphics, video processing, machine learning, and other computational tasks;
¶ 4: fabric-based distributed storage systems may include storage devices configured with direct memory access to enable more efficient transfer of data to and from hosts and other systems;
Pandurangan, Fig. 10 and ¶ 90: The host apparatus 1000 illustrated in FIG. 10 may include a processor 1002, which may include a memory controller 1004, a system memory 1006, host control logic 1008, and/or an interconnect interface 1010, which may be implemented, for example using CXL. Any or all of the components illustrated in FIG. 10 may communicate through one or more system buses 1012;
¶ 35: remote direct memory access (RDMA);
Gangani, Fig. 2 and ¶ 48: FIG. 2, the example CPU 210, the example GPU 220, and the example memory 124 are in communication via an example bus 202); and
10. Regarding claim 3, AAPA, Pandurangan, and Gangani, in combination, teach or suggest:
“wherein the host memory buffer is further configured with:
a first subset of the set of host memory locations allocated to task input data and including the first set of task input data;
a second subset of the set of host memory locations allocated to task output data and including the first set of task output data”
(AAPA, ¶ 4: fabric-based distributed storage systems may include storage devices configured with direct memory access to enable more efficient transfer of data to and from hosts and other systems;
Pandurangan, (as applied in rejecting claim 1) ¶ 22: A computational storage device in accordance with example embodiments of the disclosure may enable a user (e.g., an application running on a host) to access a device program using a storage protocol such as Nonvolatile Memory Express (NVMe). For example, in some embodiments, a user may start a device program by sending a command to the storage device using a storage protocol. The command may include, for example, the name of the device program to execute, one or more parameters to pass to the device program, a pointer to a memory area that may be used to exchange input and/or output data between the device program and an application running on the host;
¶ 24: a host may collect a stream of output data such as logs, debug messages, and/or the like, from a device program by directly reading the output data from a buffer using the storage protocol. In such an embodiment, a host and a storage device may exchange output data from a device program through a ring buffer using a producer-consumer model in which the device may produce data and the host may consume data;
¶ 71: the memory space in the data buffer pointed to by DPTR in the start command illustrated in FIG. 7 may include, for example, the name of the device program to execute and/or one or more input arguments for the device program. The data buffer may be located, for example, at a storage device, at a host, at any other location;
¶ 72: the data buffer may be populated (at least partially) automatically, for example, by an application running on a host, a driver on a host, a computational storage device, and/or the like, in response to a command to execute a device program;
¶ 73: As another example, if the data buffer is located at least partially at a host, the device program may send output data to the data buffer at the host using the storage protocol;
¶ 54: The buffer logic 438 may maintain any type of buffer that may enable a user ( e.g., an application running on a host) to receive output data from one or more device programs).
Gangani, (as applied in rejecting claim 1) ¶ 52: graphics driver 214 may receive the operations from the application 212 and may control operation of the GPU 220 to facilitate executing the operations. For example, the graphics driver 214 may generate one or more command streams, store the generated command streams in the command buffer 230 of the memory 124, and instruct the GPU 220 to execute the command streams;
¶ 54: the CPU 210 may provide graphics data to the GPU 220 for rendering and issue one or more graphics commands to the GPU 220 … the CPU 210 may provide the graphics commands and the graphics data to the memory 124, which may be accessed by the GPU 220;
¶ 61: an ML command may cause the GPU 220 to generate a quantity of outputs (sometimes referred to as “primitive outputs”) associated with an ML primitive;
¶ 63: GPU 220 may access ML data associated with each of the respective computational jobs from the ML data buffer 232 at the memory 124, execute the respective computational jobs using the accessed ML data, and then write the output generated by executing each of the 100 computational jobs to the ML data buffer 232 of the memory 124); and
“a third subset of the set of host memory locations allocated to a status register configured to include at least one status indicator for the first processing task”
(Pandurangan, ¶ 54: buffer logic 438 may notify the user that output data is available, for example, through an interrupt, a status bit in a status register, and/or the like;
¶ 73: data buffer is located at least partially at a host, the device program may send output data to the data buffer at the host using the storage protocol).
11. Regarding claim 5, AAPA, Pandurangan, and Gangani, in combination, teach or suggest:
“further comprising the processor device, the processor device configured to:
receive the notification to initiate the first processing task”
(Pandurangan, ¶ 22: start a device program by sending a command to the storage device using a storage protocol. The command may include, for example, the name of the device program to execute, one or more parameters to pass to the device program, a pointer to a memory area that may be used to exchange input and/or output data between the device program and an application running on the host;
Gangani, ¶ 52: graphics driver 214 may receive the operations from the application 212 and may control operation of the GPU 220 to facilitate executing the operations. For example, the graphics driver 214 may generate one or more command streams, store the generated command streams in the command buffer 230 of the memory 124, and instruct the GPU 220 to execute the command streams;
¶ 61: an ML command may cause the GPU 220 to generate a quantity of outputs … once the GPU 220 receives the ML command ( e.g., from the command buffer 230), control may be passed to the GPU 220 for launching one or more computational jobs for generating the requested quantity of outputs);
“access, using direct memory access to the host memory buffer, the first set of task input data;
process, using a first set of task code for the first processing task, the first set of task input data to determine the first set of task output data;
store, using direct memory access, the first set of task output data to the host memory buffer”
(AAPA, ¶ 4: fabric-based distributed storage systems may include storage devices configured with direct memory access to enable more efficient transfer of data to and from hosts and other systems;
Pandurangan, ¶ 22, 24, 71–73, and ¶ 54, as applied in rejecting claim 1 above. See also rejection of claim 3 incorporating similar features;
Gangani, ¶ 52, 54, 61, and ¶ 63, as applied in rejecting claim 1 above. See also rejection of claim 3 incorporating similar features); and
“notify, responsive to storing the first set of task output data to the host memory buffer, the data storage device that the first processing task is complete”
(Pandurangan, ¶ 54: buffer logic 438 may notify the user that output data is available, for example, through an interrupt, a status bit in a status register, and/or the like;
¶ 73: data buffer is located at least partially at a host, the device program may send output data to the data buffer at the host using the storage protocol;
Gangani, ¶ 72: after completion of the first batch of computational jobs, the GPU 220 may repeat, for each subsequent batch, the loading of the respective input data, the executing of the respective computational jobs to generate job output data, and the writing (or storing) of the respective job output data to the ML data buffer 232;
¶ 73: after completion of the first batch of computational jobs, the GPU 220 may write the batch output data ( e.g., the job output data generated by the execution of each computational job of the batch) from the GPU memory 226 to the memory 124 ( e.g., to the ML data buffer 232)).
12. Regarding claim 6, Pandurangan and Gangani teach or suggest:
“the processing offload service is further configured to determine the first set of task code for the first processing task”
(Pandurangan, ¶ 45: commands may include commands to download a user program from a host, execute a user program, transfer input and/or output data for a user program to and/or from the storage media 312 and/or a host or other device through the storage protocol interface 304 and/or the like;
Gangani, ¶ 51: CPU 210 may be configured to execute the application 212. The application 212 may be an application that offloads the performing of ML tasks to the GPU 220. For example, the CPU 210 may use the GPU 220 to execute one or more ML primitives. For example, the application 212 may include operations that cause the GPU 220 to execute one or more computational jobs associated with an ML primitive;
¶ 52: graphics driver 214 may receive the operations from the application 212 and may control operation of the GPU 220 to facilitate executing the operations. For example, the graphics driver 214 may generate one or more command streams, store the generated command streams in the command buffer 230 of the memory 124, and instruct the GPU 220 to execute the command streams); and
“the notification to initiate the first processing task includes the first set of task code for the first processing task”
(Pandurangan, ¶ 22: start a device program by sending a command to the storage device using a storage protocol. The command may include, for example, the name of the device program to execute, one or more parameters to pass to the device program, a pointer to a memory area that may be used to exchange input and/or output data between the device program and an application running on the host;
Gangani, ¶ 52: graphics driver 214 may receive the operations from the application 212 and may control operation of the GPU 220 to facilitate executing the operations. For example, the graphics driver 214 may generate one or more command streams, store the generated command streams in the command buffer 230 of the memory 124, and instruct the GPU 220 to execute the command streams;
¶ 61: an ML command may cause the GPU 220 to generate a quantity of outputs … once the GPU 220 receives the ML command ( e.g., from the command buffer 230), control may be passed to the GPU 220 for launching one or more computational jobs for generating the requested quantity of outputs).
13. Regarding claim 12, it is the corresponding method claim reciting similar limitations of commensurate scope as the system of claim 1. Therefore, it is rejected on the same basis as claim 1 above.
14. Regarding claim 14, it is the corresponding method claim reciting similar limitations of commensurate scope as the system of claim 5. Therefore, it is rejected on the same basis as claim 5 above.
15. Regarding claim 20, it is a corresponding system claim reciting similar limitations of commensurate scope as the system of claim 1. Therefore, it is rejected on the same basis as claim 1 above.
Allowable Subject Matter
16. Claims 4, 7–11, 13, and 15–19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN C WU whose telephone number is (571)270-5906. The examiner can normally be reached Monday through Friday, 8:30 A.M. to 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J. Li can be reached on (571)272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BENJAMIN C WU/Primary Examiner, Art Unit 2195
January 9, 2026