DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA or AIA Status
This Office Action is sent in response to Applicant’s Communication received on 05/17/2024 for application number 18/667,614.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 7 is objected to because of the following informalities: “modification comprises inserting one or more instructions to be performed by a first to begin thread of the one or more GPU kernels” (emphasis added) should read “modification comprises inserting one or more instructions to be performed by a first thread to begin thread of the one or more GPU kernels” (emphasis added). Claims 14 and 20 are objected to for similar reasons. Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Levit-Gurevich (US 2023/0109752 A1).
Regarding claim 1, Levit-Gurevich teaches a processor (Figure 1, CPU 118 and GPU 110), comprising: one or more circuits to cause one or more software programs to be modified to initialize information to be used by one or more application programming interfaces (APIs) (“the profiling instructions 104A-104C are inserted at a first address (e.g., a first position) of a kernel (e.g., the beginning of the first kernel 106) to initialize variables used for profiling.” Par 0069 and “ in response to executing the first instrumentation routine 1302, the trace emulator 430 can invoke a callback routine (e.g., “Callback Before( )”) to invoke an API to provide GPU states of a hardware thread that executed the software thread to an upper level construct, such as the application 120 of FIG. 1, the hardware profiling analysis tool 718 of FIG. 7, etc.” par 0171 and Figure 1, 7, 13, 18) [this shows modification of a software program/kernel [through instrumentation] by inserting instructions to initialize variables used for profiling; the GPU states (initialized/captured data) is used by an API].
Claims 8 and 15 correspond to claim 1 and are rejected accordingly.
Regarding claim 2, Levit-Gurevich teaches the processor of claim 1, wherein the one or more circuits are to modify the one or more software programs at runtime of the one or more software programs (“the GLIT engine 102 may modify the first kernel 106 to create an instrumented GPU kernel, such as the second kernel 108. That is, the GLIT engine 102 creates the second kernel 108 without executing any compilation of the first kernel 106.” Par 0074 and “In some examples, the GLIT engine 102 instruments binary shaders/kernels prior to sending them to the GPU 110.” Par 0089 and Figure 4) [processor logic (GLIT engine) modifies existing kernel binary before they are dispatched (i.e. at runtime)].
Claims 9 and 16 correspond to claim 1 and are rejected accordingly.
Regarding claim 3, Levit-Gurevich teaches the processor of claim 1, wherein the modification comprises selecting one or more instructions with which to perform the initialization (“in the example provided in FIG. 22 , the original code 2200 at the left represents an original kernel or shader, in which there are certain points (which are italicized) that are identified as Events to be instrumented. Specifically, a process is to save data before the first instruction (ADD) of the code (code dispatch point),” par 0239 and paragraphs 223-230 and Figure 22).
Claims 10 and 17 correspond to claim 1 and are rejected accordingly.
Regarding claim 4, Levit-Gurevich teaches the processor of claim 1, wherein the one or more software programs comprise one or more GPU kernels (“As used herein, a GPU kernel refers to a kernel in binary format.” Par 0042), and the modification comprises inserting one or more instructions to be performed by the one or more GPU kernels to perform the initialization prior to each of one or more invocations of the one or more APIs within the one or more software programs (“the profiling instructions 104A-104C are inserted at a first address (e.g., a first position) of a kernel (e.g., the beginning of the first kernel 106) to initialize variables used for profiling.” Par 0069 and “the trace emulator 430 can execute the first instrumentation routine 1302 prior to executing the emulation routine (EmulRoutines) and the second instrumentation routine 1304 after executing the emulation routine.” Par 0170 and paragraphs 99-102 and Figure 12) [this directly shows inserting instructions (which perform initialization) at beginning of kernel (prior to execution calls)].
Claims 11 and 18 correspond to claim 1 and are rejected accordingly.
Regarding claim 5, Levit-Gurevich teaches the processor of claim 1, wherein the one or more software programs comprise one or more GPU kernels (“As used herein, a GPU kernel refers to a kernel in binary format.” Par 0042), and the modification comprises inserting one or more instructions to be performed by the one or more GPU kernels after each of one or more invocations of the one or more APIs within the one or more software programs (“insert a second callback routine in the instrumented routine after the emulation routine, the second call-back routine to invoke the first API or a second API to provide the first GPU state to the application.” Par 0305 and “In some examples, in response to executing the second instrumentation routine 1304, the trace emulator 430 can invoke a callback routine (e.g., “CallbackAfter( )”) to invoke an API to provide GPU states of the hardware thread that executed the software thread to an upper level construct, such as the application 120 of FIG. 1 , the hardware profiling analysis tool 718 of FIG. 7 , etc.” par 0172).
Claim 12 corresponds to claim 5 and is rejected accordingly.
Regarding claim 6, Levit-Gurevich teaches the processor of claim 1, wherein the information is to be initialized on a reserved shared memory included in one or more GPUs (“In the illustrated example of FIG. 2 , the GPU slice 200 includes example cache memory 210. In this example, the cache memory 210 is implemented by level three (L3) data cache that includes example atomic barriers 212 and example shared local memory 214.” Par 0059 and Figure 2) [shared local memory serves as a location for data storage].
Claim 19 corresponds to claim 6 and is rejected accordingly.
Regarding claim 7, Levit-Gurevich teaches the processor of claim 1, wherein the one or more software programs comprise one or more GPU kernels (“As used herein, a GPU kernel refers to a kernel in binary format.” Par 0042), and the modification comprises inserting one or more instructions to be performed by a first to begin thread of the one or more GPU kernels (“the thread dispatcher 506 may load initial GPU state(s) into an idle one of the thread(s) 208 and start its execution based on the determination(s).” par 0120 and “Identified Events may include, but are not limited to: (a) Code dispatch” par 0224 and “the profiling instructions 104A-104C are inserted at a first address (e.g., a first position) of a kernel (e.g., the beginning of the first kernel 106)” par 0069 and “Specifically, a process is to save data before the first instruction (ADD) of the code (code dispatch point),” par 0239 and Figure 2) [the code dispatch point marks the start of kernel execution; instructions are inserted before kernel’s main execution by hardware thread].
Claims 14 and 20 correspond to claim 1 and are rejected accordingly.
Regarding claim 13, Levit-Gurevich teaches the method of claim 8, wherein the information is to be initialized on a memory that is not accessible until performance of a GPU kernel thread of one or more software programs associated with the memory has started (“As used herein, a GPU state refers to one or more first values stored in a general-purpose register file (GRF) and/or one or more second values stored in an architecture register file (ARF) associated with a hardware thread of the GPU.” Par 0045 and “the thread dispatcher 506 may load initial GPU state(s) into an idle one of the thread(s) 208 and start its execution based on the determination(s).” par 0120 and Figure 5) [this shows initial GPU states, stored in thread specific registers, are only loaded into the thread and execution is started synchronously by the dispatcher, meaning those register contents only become relevant/accessible to the executing thread right at the beginning of its performance].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Hari et al. (US 2019/0102180 A1) teaches methods for software optimization to reduce overhead of intra-thread instruction duplication on a GPU. The method identifies duplication eligible original instructions, inserts duplicate instructions after original ones, creates a shadow register and verifies the integrity of source operands for the instructions using the shadow registers.
Hu et al. (US 2022/0004438 A1) teaches a processing unit generating different versions of a GPU based on allocated hardware resources from a global memory and fast shared resource,
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AYMAN FATIMA whose telephone number is (571)270-0830. The examiner can normally be reached M to Fri between 8am and 4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jaweed Abbaszadeh can be reached on (571)270-1640. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AYMAN FATIMA/Examiner, Art Unit 2176
/JAWEED A ABBASZADEH/Supervisory Patent Examiner, Art Unit 2176