DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This is in response to applicant's RCE amendment/response filed on 4/6/26, which has been entered and made of record. Claims 1, 4-5, 8, 11, 13-14, 17 and 19 have been amended. No Claim has been cancelled or newly added. Claims 1-20 are pending in the application.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 8 and 14 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument (due to applicant's arguments direct to amend limitation which is part of original dependent claim 17 is incorporated into new ground of rejection to independent claim 1).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 7-9, 14, 15, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Benthin et al. (U.S. Patent Application Publication No. 2018/0293784), hereinafter referenced as Benthin, Preble (U.S. Patent Application Publication No. 2010/0268921), hereinafter referenced as Preble, Srinivasan et al. (U.S. Patent Application Publication No. 2007/0277152), hereinafter referenced as Srinivasan and Zhao et al. (U.S. Patent No. 10891156), hereinafter referenced as Zhao.
Regarding claim 1, Benthin teaches one or more processors comprising: circuitry to:
(fig. 13 teaches graphics processor of a system on a chip integrated circuit); wherein input to the GPU prefetch instruction comprises a pointer to a source location from GPU global memory (paragraph 139 teaches a prefetch instruction using a pointer to identify a block of data to be prefetched, paragraph 185 teaches graphics multiprocessor [which is part of GPU] accessing global memory and that any memory outside of parallel processing unit can be used as global memory); global memory accessible by the GPU can be considered GPU global memory; and a size of variable amount of information (paragraph 141 teaches "In one implementation, this is accomplished by specifying an address pointer (PTR) in combination with a size value (SIZE), thereby establishing the memory range"; the size value here is the size of variable amount of information since size establishing memory range defines how much information you would deal with by knowing how big the chunk of memory is and how much variable data/information can fit in such); and in response to the API call, cause the variable amount of information to be stored into one or more GPU caches (paragraph 141 teaches "a prefetch range instruction) may request a range of data to be copied from the memory subsystem 1503 into one of the local caches of the GPU"; this shows range of data/variable amount of information stored in GPU caches).
However, Benthin fails to teach receive an application programming interface (API) call to perform a graphics processing unit (GPU) prefetch instruction; and in response to the API call, cause the variable amount of information to be stored into one or more GPU caches.
However, Preble teaches receive an application programming interface (API) call to perform a graphics processing unit (GPU) prefetch instruction (Preble, paragraph 8 teaches “application program interface (API) generates an instruction to prefetch a second element of the data collection”); this shows prefetch instruction being in response to an API meaning the Benthin GPU prefetch instruction would be as well ; and in response to the API call, cause the variable amount of information to be stored into one or more GPU caches (Preble, paragraph 26 teaches "the API 104 will provide prefetch instructions to the execution core 106 to prefetch the records 316 and 317 to ensure these records are stored in the cache 109."; this shows the information/records (variable from Benthin [range of data in paragraph 141]) would be stored in cache(s) (of GPU from Benthin [range of data copied into local caches of GPU in paragraph 141]) which is all done in response to the API call). Preble is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of using an API to prefetch. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Benthin’s invention with the API prefetch techniques of Preble to ensure the efficiency of the application can be increased (Preble, paragraph 8). This would be done by storing data where it’s needed beforehand due to the API.
However, the combination of Benthin and Preble fails to teach and generate an indicator identifying whether asynchronous hardware is used to perform the GPU prefetch instruction.
However, Srinivasan teaches and generate an indicator identifying whether asynchronous hardware is used to perform the GPU prefetch instruction (Srinivasan, paragraph 71 teaches "It is common practice in hardware systems to use signals to control, synchronize and coordinate activities. In synchronous hardware, time signals are used and in asynchronous hardware, start and completion signals are used"); since common practice to use signals, depending on which signal is received/used (start and stop signal(s) or time signals), it would indicate (signal acts as indicator) whether an asynchronous hardware was used to perform the aforementioned task of GPU prefetch instruction (from the combination above). Srinivasan is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of usage of asynchronous hardware and signaling/indicating of such. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Benthin and Preble with the asynchronous hardware signaling techniques of Srinivasan to simplify parallel programming, and realize scalability, high efficiencies and verifiability (Srinivasan, paragraph 58). This would be done due to the signaling when asynchronous hardware is used leading to a more optimized/tuned mode for asynchronous operations.
However, the combination of Benthin, Preble and Srinivasan fails to explicitly teach the asynchronous data movement hardware of the GPU.
However, Zhao teaches asynchronous data movement hardware of the GPU (Zhao, col. 12, lines 33-37 teach “so as to coordinate asynchronous data movement operations to and from specific processing devices and/or memory devices (e.g., batch loading of data into GPU memory, prefetching data from memory/storage”); this shows asynchronous data movement hardware of the GPU since the data is moved to and from specific processing devices (asynchronously) with GPU memory being listed as an example. Zhao is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of asynchronous data movement of GPU. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Benthin, Preble and Srinivasan with the data movement techniques of Zhao to optimize task execution and enhance system throughput (Zhao, col. 12, lines 37-38). This would be done by the asynchronous data movement due to the efficiency added from it.
Regarding claim 2, the combination of Benthin, Preble, Srinivasan and Zhao teaches wherein the one or more GPU caches comprise one or more level two (L2) caches (Benthin, paragraph 141 and reference 1514 of fig. 18 teaches an L2 cache of the GPU).
Regarding claim 7, the combination of Benthin, Preble, Srinivasan and Zhao teaches wherein the circuitry is to, in response to the API call, cause the GPU prefetch instruction to be performed (Preble, paragraph 8 teaches "an application program interface (API) generates an instruction to prefetch a second element...prefetching the second element "); this shows the prefetch instruction being performed in response to api generating instruction (api call) and when viewed in combination, this would be GPU prefetch instruction from Benthin. The same motivations used in claim 1 apply here in claim 7.
Regarding claim 8, a system recites similar limitations as product/processor claim 1, and
thus is rejected under similar rationale.
Regarding claim 9, the system claim is similar to product/processor claim 2, and thus is
rejected under similar rationale.
Method claim 14 is similar to product/processor claim 1, and thus is rejected under
similar rationale.
Method claim 15 is similar to product/processor claim 2, and thus is rejected under
similar rationale.
Regarding claim 20, a non-transitory computer-readable medium recites similar limitations as method claim 14, and thus is rejected under similar rationale.
Claim(s) 3, 5, 10-13, 16, 18, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Benthin, Preble, Srinivasan and Zhao as applied to claim 1 above, and further in view of Zhang et al. (Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning), hereinafter referenced as Zhang.
Regarding claim 3, the combination of Benthin, Preble, Srinivasan and Zhao teach wherein the GPU prefetch instruction is compiled from an assembly-level instruction to cause the variable amount of information to be stored into the one or more GPU caches (Benthin paragraph 141 teaches "a prefetch range instruction) may request a range of data to be copied from the memory subsystem 1503 into one of the local caches of the GPU"); this shows range of data/variable amount of information stored in GPU caches by the prefetch instruction being compiled.However, the combination of Benthin, Preble, Srinivasan and Zhao fails to teach wherein the GPU prefetch instruction is compiled from an assembly-level instruction to cause the variable amount of information to be stored into the one or more GPU caches.
However, Zhang teaches wherein the GPU prefetch instruction is compiled from an assembly-level instruction to cause the variable amount of information to be stored into the one or more GPU caches (Zhang, page 34 and 35, section 3.3 teach assembly instructions where the instructions are used for reading/writing information/data as shown in page 38, section 4.3). Zhang is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of storage of information with specific GPU architecture. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Benthin, Preble, Srinivasan and Zhao to incorporate the teachings of Zhang to allow users to optimize any code segment based on the generated assembly instead of coding from scratch and also help to understand and optimize the performance of a computational program (Zhang, page 32, section 1).
Regarding claim 5, the combination of Benthin, Preble, Srinivasan, Zhao and Zhang teaches wherein the circuitry is to: compile the GPU prefetch instruction to executable binary code; and perform the executable binary code to be performed by the GPU. (Zhang, page 33, section 3.1 teaches instructions compiled to cubin files). A cubin file is an ELF-formatted file meaning it contains executable binary code to be specifically performed by a GPU (and first must be compiled), also, since this shows to use GPU ISA and encodings thereof, the GPU prefetch instructions from above would be part of graphics ISA (when viewed in combination). The same motivations used in claim 3 apply here in claim 5.
Regarding claim 10, the combination of Benthin, Preble, Srinivasan, Zhao and Zhang teaches wherein the GPU prefetch instruction is an assembly-level instruction (Zhang, page 34 and 35, section 3.3 teach assembly instructions). The same motivations used in claim 3 apply here in claim 10.
Regarding claim 11, the system claim is similar to product/processor claim 4, and thus is
rejected under similar rationale.
Regarding claim 12, the system claim is similar to product/processor claim 5, and thus is
rejected under similar rationale.
Regarding claim 13, the combination of Benthin, Preble, Srinivasan, Zhao and Zhang teaches wherein the system comprises a GPU and wherein the one or more processors are to perform the GPU prefetch instruction by compiling the GPU prefetch instruction into one or more instructions in binary executable code to be performed by the GPU of the system. (Benthin, fig. 10 teaches a system with a graphics processor and processor; Zhang, page 33, section 3.1 teaches instructions compiled to cubin files). A cubin file is an ELF-formatted file meaning it contains executable binary code to be specifically performed by a GPU. The same motivations used in claim 3 apply here in claim 13.
Method claim 16 is similar to product/processor claim 5, and thus is rejected under
similar rationale.
Method claim 18 is similar to system claim 10, and thus is rejected under similar rationale.
Method claim 19 is similar to product/processor claim 4, and thus is rejected under
similar rationale.
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Benthin, Preble, Srinivasan and Zhao as applied to claim 1 above, and further in view of Otterness et al. (U.S. Patent No. 6,460,122), hereinafter referenced as Otterness.
Regarding claim 4, the combination of Benthin, Preble, Srinivasan, and Zhao fails to teach wherein the indicator is a boolean value or an enumerated value.
However, Otterness teaches wherein the indicator is a boolean value or an enumerated value (Otterness, col. 11, lines 25-27 teach “Copy Complete Boolean field is used to indicate the start of a Direct Memory Access (DMA) operation and the completion of that operation”); this shows a boolean value as indicator of a memory access (such as the asynchronous data movement hardware operation from claim 1 when viewed in combination). Otterness is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of Boolean value as indicator. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Benthin, Preble, Srinivasan, and Zhao with the Boolean indicator techniques of Otterness to improve data throughput to achieve improved memory bandwidth (Otterness, col. 39, lines 13-14). This would be done by using Boolean instead of other data types for further efficiency.
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Benthin, Preble, Srinivasan and Zhao as applied to claim 1 above, and further in view of Fu et al. (U.S. Patent Application Publication No. 2024/0111534), hereinafter referenced as Fu.
Regarding claim 6, the combination of Benthin, Preble, Srinivasan and Zhao fails to teach wherein the GPU prefetch instruction is an asynchronous prefetch instruction.
However, Fu teaches wherein the GPU prefetch instruction is an asynchronous prefetch instruction (Fu, paragraph 434 teaches “data can be read from the L3 cache 4222 and staged in the L2 cache 2814 during the multicast asynchronous load to the graphics cores of the graphics core cluster. In one embodiment, multicast asynchronous loads can be used to fetch or prefetch data into the L1 cache 4220”); this shows prefetching data/instruction from graphics processor in an asynchronous manner. Fu is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of using asynchronous prefetch instruction. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Benthin, Preble, Srinivasan and Zhao with the asynchronous techniques of Fu to enable access to the data at a higher data rate and lower latency relative to accessing the data directly from memory (Fu, paragraph 2). This means better overall access to data by reducing the time needed to access it from enabling asynchronous instructions.
Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Benthin, Preble, Srinivasan and Zhao as applied to claim 14 above, and further in view of Leslie-Hurd (U.S. Patent Application Publication No. 2016/0378664), hereinafter referenced as Leslie.
Regarding claim 17, the combination of Benthin, Preble, Srinivasan and Zhao fails to teach wherein the indicator comprises a data structure that includes one or more reason codes indicating whether the asynchronous data movement hardware was not used.
However, Leslie teaches wherein the indicator comprises a data structure that includes one or more reason codes indicating whether the asynchronous data movement hardware was not used (Leslie, paragraph 21 teaches "error codes of the PFEC are extended to include an error code representing information associated with the EPC-related fault" and paragraph 28 teaches "a page fault may occur as a result of accessing the EPC memory page...PFEC may be a data structure used by the processing device to indicate the occurrence of a hardware page fault. In one embodiment, the PMH 160 may utilize EPC-related fault delivery logic 165 to generate an error code related to the EPC page fault condition"); this shows PFEC as data structure and indicator, accessing EPC memory page is data movement hardware usage and a fault there would indicate the hardware (asynchronous data movement hardware from claim 14 when viewed in combination) was not used which is provided by the error/reason code. Leslie is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of indicator as data structure and reason/error code. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Benthin, Preble, Srinivasan and Zhao with the data structure and error/reason code techniques of Leslie to “generate an error code associated with the fault, wherein the error code reflects an EPC-related fault cause; and 3) encode the error code into a data structure associated with the processor core” (Leslie, paragraph 90). This would allow a user to know the fault or reason why specific hardware isn’t used and add efficiency due to the data structure used (due to organized data).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Dover (U.S. Patent Application Publication No. 2018/0276146) paragraph 94 teaches “produce a Boolean value indicating that the memory storage location is accessible”; this shows an Boolean indicator indicating hardware is accessible (and if not it would mean hardware isn’t used).
Wiegert et al. (U.S. Patent Application Publication No. 2024/0134797) claim 1 teaches “transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores” and claim 8 teaches “graphics processor as in claim 7, wherein the direct memory access circuit is to read the data from the cache memory asynchronously relative to the plurality of execution resources.”; this shows asynchronous data movement hardware of GPU since the read data is done asynchronously and is from cache memory of GPU then transferred to graphics core.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAUMAN U AHMAD whose telephone number is (703)756-5306. The examiner can normally be reached Monday - Friday 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571) 272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
/N.U.A./Examiner, Art Unit 2611