Last updated: April 19, 2026

Application No. 17/955,123

APPLICATION PROGRAMMING INTERFACE TO GENERATE KERNELS

Non-Final OA §102§103

Filed

Sep 28, 2022

Examiner

VINCENT, ROSS MICHAEL

Art Unit

2196

Tech Center

2100 — Computer Architecture & Software

Assignee

Nvidia Corporation

OA Round

3 (Non-Final)

Interview Optional

— +35.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 22 resolved cases, 2023–2026

Examiner Intelligence

VINCENT, ROSS MICHAEL View full profile →

Grants 54% of resolved cases

Career Allow Rate

12 granted / 22 resolved

-0.5% vs TC avg

Strong +36% interview lift

Without

With

+35.9%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

32 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

22.7%

-17.3% vs TC avg

§103

57.4%

+17.4% vs TC avg

§102

8.2%

-31.8% vs TC avg

§112

11.4%

-28.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 22 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-2, 8, 10, 17, 19-20, 26, 28-29, and 36 are currently amended.  No new claims have been added.  No claims have been canceled.  Claims 1-36 are pending for examination.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/18/2025 has been entered.

Response to Arguments
Applicant’s arguments, see pgs.9-10 , filed 12/18/2025, with respect to the rejection of claims 1, 10, 19, and 28 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection in view of Munshi in view of Hukerikar has been withdrawn.  However, upon further consideration, a new grounds of rejection is made in view of Herbert (US 20210326175 A1).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 10, 19, and 28 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Herbert (US 20210326175 A1)

As per claim 1, Herbert discloses:
	One or more processors, comprising: circuitry to, in response to a call to an application programming interface (API) comprising parameters indicative of a cluster of two or more groups of blocks of threads, cause a kernel to be performed by at least scheduling the cluster comprising the two or more groups of blocks of threads to be performed in parallel, (“The embodiments in the present invention are directed to a system including one or more computers and one or more storage devices on which are stored instructions that are operable, the system including one or more memory and address formats, one or more hardware schedulers, external memory, CPU set shared memory shared amongst a cooperative set of CPUs, and CPU local memory, and one or more accelerators, when executed by the one or more computers, to cause the one or more computers to perform operations including utilizing a software programming model and API to program serial data processing including primitives for parallelism and synchronization for serial processing pipelines whereby the software programming model and the API employ lightweight micro threading and synchronization mechanisms to construct horizontal pipelines and vertical pipelines with concurrent processing whereby the API is targeted to a domain specific space of serial pipeline processing and serial data processing for hardware acceleration, executing an operation for horizontal parallelization, vertical parallelization, or hybrid parallelization of a serial processing pipeline to produce materialized data objects, … utilizing programmable threads as a unit of execution that implements one stage in a processing pipeline, utilizing a programming language and model to program the threads, utilizing thread sets that are groups of threads that define instances of the vertical pipelines, utilizing datapaths, each of which comprises a group of thread sets”, 0004 ; “ This section specifies an API for the PANDA programming model. … The basic structures of this API are: objects, work items, dependencies, threads, thread sets, and datapaths. These map the corresponding elements of the architecture.”, 0140 ; “The PANDA API will be supported in XDP via helper functions. …. In either scheduling case, the number of threads could be limited as necessary to ensure that the kernel does not go into a long or even an infinite loop (following the design philosophy of XDP/eBPF to promote robustness”, 293 ; Examiner Note: the set of datapaths- each comprising groups of thread sets (blocks) equates to a cluster.  The panda API structures equate to parameters).
wherein each group of blocks of threads comprises one or more blocks of threads and each block of threads comprises two or more threads. (“utilizing thread sets that are groups of threads that define instances of the vertical pipelines, utilizing datapaths, each of which comprises a group of thread sets”, 0004)

As per claim 10, it is a computer-implemented method claim with substantially the same limitations as claim 1, and as such it is rejected for substantially the same reasons.

As per claim 19, Herbert discloses:
A computer system comprising: one or more processors and memory storing executable instructions that, when performed by the one or more processors, are to, in response to a call to an application programming interface (API) comprising parameters indicative of a cluster of two or more groups of blocks of threads, cause a kernel to be performed by at least scheduling the cluster comprising the two or more groups of blocks of threads to be performed in parallel (“The embodiments in the present invention are directed to a system including one or more computers and one or more storage devices on which are stored instructions that are operable, the system including one or more memory and address formats, one or more hardware schedulers, external memory, CPU set shared memory shared amongst a cooperative set of CPUs, and CPU local memory, and one or more accelerators, when executed by the one or more computers, to cause the one or more computers to perform operations including utilizing a software programming model and API to program serial data processing including primitives for parallelism and synchronization for serial processing pipelines whereby the software programming model and the API employ lightweight micro threading and synchronization mechanisms to construct horizontal pipelines and vertical pipelines with concurrent processing whereby the API is targeted to a domain specific space of serial pipeline processing and serial data processing for hardware acceleration, executing an operation for horizontal parallelization, vertical parallelization, or hybrid parallelization of a serial processing pipeline to produce materialized data objects, … utilizing programmable threads as a unit of execution that implements one stage in a processing pipeline, utilizing a programming language and model to program the threads, utilizing thread sets that are groups of threads that define instances of the vertical pipelines, utilizing datapaths, each of which comprises a group of thread sets”, 0004 ; “ This section specifies an API for the PANDA programming model. … The basic structures of this API are: objects, work items, dependencies, threads, thread sets, and datapaths. These map the corresponding elements of the architecture.”, 0140 ; “The PANDA API will be supported in XDP via helper functions. …. In either scheduling case, the number of threads could be limited as necessary to ensure that the kernel does not go into a long or even an infinite loop (following the design philosophy of XDP/eBPF to promote robustness”, 293 ; Examiner Note: the set of datapaths- each comprising groups of thread sets (blocks) equates to a cluster.  The panda API structures equate to parameters).
wherein each group of blocks of threads comprises one or more blocks of threads and each block of threads comprises two or more threads. (“utilizing thread sets that are groups of threads that define instances of the vertical pipelines, utilizing datapaths, each of which comprises a group of thread sets”, 0004)

As per claim 28, it is a non-transitory machine readable medium claim with substantially the same limitations as claim 1, and as such it is rejected for substantially the same reasons.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 11, 20, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Kramer (US 20210287325 A1).

As per claim 2, Herbert fully discloses the limitations of claim 1, but does not disclose the generated kernel comprising multiple partitions of blocks.
However, Kramer discloses:
in response to the call to the API, generate the kernel to comprise multiple partitions of blocks of threads of a grid of threads, the two or blocks being in one of the partitions.  (“In one implementation, a single-pass compute shader downsampling kernel is launched on compute unit 300. The kernel is partitioned into multiple thread groups, with each thread group including a plurality of threads.”, 0039 ; Examiner Note: the kernel being partitioned into multiple groups equates to a kernel comprising multiple partitions of )
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call of Herbert (0004) with the method of Kramer (0039) in order to provide a system capable of using an API to generate a kernel which can run blocks from separate partitions in parallel, thereby allowing for the parallel processing of threads from any block- increasing the versatility of the kernels parallel processing.

As per claim 11, it is a computer-implemented method claim with substantially the same limitations as claim 2, and therefore it is rejected for substantially the same reasons.

As per claim 20, it is a computer system claim with substantially the same limitations as claim 2, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 29, it is a machine-readable medium claim with substantially the same limitations as claim 2, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Claims 3- 4, 12- 13, 21-22, and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Perelygin (US 11080111 B1).

As per claim 3, Herbert fully discloses the limitations of claim 1, but does not disclose the API being called during compilation of a program to comprise the kernel.
However, Perelygin discloses:
the API is to be called during compiling of a program to comprise the kernel. (“In at least one embodiment, source code 602 is compiled into a kernel 606 by a compiler 604. In at least one embodiment, a compiler 604 is a set of software instructions that, when executed, take as input one or more source code files 602 and translate said source code files 602 into executable code such as a kernel 606. In at least one embodiment, a compiler 604 links in supporting libraries implementing additional functionality such as parallel computing functionality provided by an API, such as CUDA or other parallel computing libraries described herein, during compilation of a kernel 606.”, 0077)
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call causing of two or more groups of thread blocks to be scheduled in parallel of Herbert (0004) with the method of Perelygin (0077) in order to provide a system capable of calling an API during compilation of a program from source code, thereby allowing for the system to place the API calls sooner and consequently arrive at the compiled kernel running threads in parallel in a shorter amount of time.

As per claim 4, Herbert fully discloses the limitations of claim 1, but does not disclose the API being called during compilation of a kernel from source code.
However, Perelygin discloses:
the API is to be called during compilation of the kernel from source code.  (Perelygin (US 11080111 B1) “In at least one embodiment, source code 602 is compiled into a kernel 606 by a compiler 604. In at least one embodiment, a compiler 604 is a set of software instructions that, when executed, take as input one or more source code files 602 and translate said source code files 602 into executable code such as a kernel 606. In at least one embodiment, a compiler 604 links in supporting libraries implementing additional functionality such as parallel computing functionality provided by an API, such as CUDA or other parallel computing libraries described herein, during compilation of a kernel 606.”, 0077)

As per claim 12, it is a computer-implemented method claim with substantially the same limitations as claim 3, and therefore it is rejected for substantially the same reasons.

As per claim 13, it is a computer-implemented method claim with substantially the same limitations as claim 4, and therefore it is rejected for substantially the same reasons.

As per claim 21, it is a computer system claim with substantially the same limitations as claim 3, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 22, it is a computer system claim with substantially the same limitations as claim 4, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 30, it is a machine-readable medium claim with substantially the same limitations as claim 3, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

As per claim 31, it is a machine-readable medium claim with substantially the same limitations as claim 4, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Claims 5, 14, 23, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Reed (US 20230350661 A1).

As per claim 5, Herbert fully discloses the limitations of claim 1, but does not disclose an API call causing a scheduling policy to be set.
However, Reed discloses:
in response to the call to the API, set a scheduling policy  (“Additionally or alternatively, the operator also defines in the initial API call 301 or another API call a set of push schedule rules and subscriber exclusion rules that control when and to whom the dependency-aware rules engine 300 may push upgrades”, 0041 ; Examiner Note: a set of schedule rules equates to a scheduling policy)
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call causing of two or more groups of thread blocks to be scheduled in parallel of Herbert (0004) with the API call to set a scheduling policy of Reed (0041) in order to provide an improved system which allows the operator to define a schedule remotely (Reed, [0021]).

As per claim 14, it is a computer-implemented method claim with substantially the same limitations as claim 5, and therefore it is rejected for substantially the same reasons.

As per claim 23, it is a computer system claim with substantially the same limitations as claim 5, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 32, it is a machine-readable medium claim with substantially the same limitations as claim 5, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Claims 6, 15, 24, and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Kini (US 20230185706 A1).

As per claim 6, Herbert fully discloses the limitations of claim 1, but does not disclose the circuitry launching the kernel.
However, Kini discloses:
Circuitry is further to launch the kernel.  (“In at least one embodiment, control thread 204 performs an API to launch a kernel, using systems and methods such as those described herein”, 0081 )
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call causing of two or more groups of thread blocks to be scheduled in parallel of Herbert (0004) with the method of Kini (0081) in order to provide an API capable of both generating and launching a kernel- thereby increasing the efficiency of the computing system.

As per claim 15, it is a computer-implemented method claim with substantially the same limitations as claim 6, and therefore it is rejected for substantially the same reasons.

As per claim 24, it is a computer system claim with substantially the same limitations as claim 6, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 33, it is a machine-readable medium claim with substantially the same limitations as claim 6, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Claims 7, 16, 25, and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Powers (US 20160371081 A1).

As per claim 7, Herbert fully discloses the limitations of claim 1, but does not disclose the API causing the kernel to be performed by an accelerator.
However, Powers discloses:
the circuitry is further to, in response to the call to the API, is to cause the kernel to be performed by an accelerator  ( “On runtime computing system 12, the host software may use a DASH API to execute computational kernels, almost as if it were calling a regular function.”, 0026 ; “For example, end user 15 can select processing units 12 (e.g., off-the-shelf components, such as GPU's, FPGA's, CPU's) that meet particular operational and environmental requirements, confident in the knowledge that execution of software 10, and particularly source code 8, will utilize the capabilities of the various hardware resources during execution of application source code 6 and computational source code 8 on runtime computing system 12.”, 0021 ; Examiner Note: a GPU equates to an accelerator)
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call causing of two or more groups of thread blocks to be scheduled in parallel of Herbert (0004) with the method of Powers (0081) in order to provide an API capable of both generating a GPU kernel, and causing the GPU kernel to be performed, thereby increasing the efficiency of CPU – GPU interaction, and leading to the kernel being performed on the GPU sooner than if a different API had to be called to cause the kernel to be performed.

As per claim 16, it is a computer-implemented method claim with substantially the same limitations as claim 7, and therefore it is rejected for substantially the same reasons.

As per claim 25, it is a computer system claim with substantially the same limitations as claim 7, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 34, it is a machine-readable medium claim with substantially the same limitations as claim 7, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Claims 8, 17, 26, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Munshi (CN 102099789 A) [published: 2011-06-15, translation: PE2E].

As per claim 8, Herbert fully discloses the limitations of claim 1, but does not disclose the API call setting a dimension of the group of threads.
However, Munshi discloses:
the circuitry is further to, in response to the call to the API, set one or more dimensions of the two or more groups of blocks of threads.  (“API request may include multi-dimensional value as the global thread number of the array N integers (G1A2,..., Gn). integer N is data parallel task-related dimension. processing 800 processing logic capable of counting the number of integer multidimensional value of the group to determine the dimension. in response to the API request, processing logic may process 800 according to a product of N integers G1 * G2 *-* Gn, determining the total number of thread of executable code is performed between the multiple calculating units.”, 0075)
Furthermore, Herbert discloses:
two or more groups of blocks of threads. (“utilizing thread sets that are groups of threads that define instances of the vertical pipelines, utilizing datapaths, each of which comprises a group of thread sets”, 0004)
The system of Munshi in view of Herbert would be capable of setting the dimensions of two or more groups of blocks of threads.  It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call causing of two or more groups of thread blocks to be scheduled in parallel of Herbert (0004) with the method of Munshi (0075) in order to provide an API which is capable of setting the dimensions of the group of blocks being processed, thereby providing the API with increased control over the kernel’s execution, and ensuring that the dimensions of the group of blocks are appropriate for the kernel in an efficient manner.

As per claim 17, it is a computer-implemented method claim with substantially the same limitations as claim 8, and therefore it is rejected for substantially the same reasons.

As per claim 26, it is a computer system claim with substantially the same limitations as claim 8, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 35, it is a machine-readable medium claim with substantially the same limitations as claim 8, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Claims 9, 18, 27, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Herbert (US 20210326175 A1) in view of Yang (CN 107357661 A) [published: 2017-11-17, translation: PE2E].

As per claim 9, Herbert fully discloses the limitations of claim 1, but does not disclose the API indicating a maximum number of blocks per group of blocks.
However, Yang discloses:
the circuitry is further to, in response to the call to the API, is to indicate a maximum number of blocks per group of blocks of threads (“In the step (3), according to the kernel functions of task, and task block size TaskBlockSizei, calculate The active threads number of blocks that can be accommodated on one GPU SM, what the process can be provided by CUDA CudaOccupancyMaxActiveBlocksPerMultiprocessorAPI can accommodate to calculate on each SM or CapSM Maximum activity thread number of blocks MaxActivePBlocki”, Par.34, contents of the invention)
The system of Herbert in view of Yang would be capable of indicating the maximum number of blocks per group in the API call which indicates number of threads and compute instances.  It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the performance of a kernel by an API call causing of two or more groups of thread blocks to be scheduled in parallel of Herbert (0004) with the method of Yang (Par.34, contents of the invention) in order to provide an API which is capable of indicating a maximum number of blocks of threads per group, thereby preventing the kernel from ever attempting to process a group with an incompatibly large number of blocks in a group- avoiding the potential errors associated with an excessively large group of blocks.

As per claim 18, it is a computer-implemented method claim with substantially the same limitations as claim 9, and therefore it is rejected for substantially the same reasons.

As per claim 27, it is a computer system claim with substantially the same limitations as claim 9, provided by the computer system of claim 19, and therefore it is rejected for substantially the same reasons.

As per claim 36, it is a machine-readable medium claim with substantially the same limitations as claim 9, provided by the machine-readable medium of claim 28, and therefore it is rejected for substantially the same reasons.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Slesarenko (US 20200394202 A1) – discloses an apparatus for extracting data from a database.  Comprises a data extraction kernel on the native-side, and an API which is used to write database data to a buffer.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROSS MICHAEL VINCENT whose telephone number is (703)756-1408. The examiner can normally be reached Mon-Fri 8:30AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, April Blair can be reached on (571) 270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/R.M.V./
Examiner, Art Unit 2196




/HIREN P PATEL/Primary Examiner, Art Unit 2196

Read full office action

Prosecution Timeline

Sep 28, 2022

Application Filed

Mar 21, 2025

Non-Final Rejection — §102, §103

Jun 18, 2025

Examiner Interview Summary

Jul 28, 2025

Response Filed

Sep 15, 2025

Final Rejection — §102, §103

Nov 12, 2025

Interview Requested

Nov 18, 2025

Examiner Interview Summary

Dec 18, 2025

Request for Continued Examination

Jan 06, 2026

Response after Non-Final Action

Jan 22, 2026

Non-Final Rejection — §102, §103

Mar 31, 2026

Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

17/976,476

Patent 12530219

TIME-BOUND LIVE MIGRATION WITH MINIMAL STOP-AND-COPY

2y 5m to grant Granted Jan 20, 2026

17/929,947

Patent 12511158

TASK ALLOCATION METHOD, APPARATUS, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM

2y 5m to grant Granted Dec 30, 2025

17/581,399

Patent 12493493

METHOD AND SYSTEM FOR ALLOCATING GRAPHICS PROCESSING UNIT PARTITIONS FOR A COMPUTER VISION ENVIRONMENT

2y 5m to grant Granted Dec 09, 2025

17/892,504

Patent 12481529

CONTROLLER FOR COMPUTING ENVIRONMENT FRAMEWORKS

2y 5m to grant Granted Nov 25, 2025

17/491,140

Patent 12430170

QUANTUM COMPUTING SERVICE WITH QUALITY OF SERVICE (QoS) ENFORCEMENT VIA OUT-OF-BAND PRIORITIZATION OF QUANTUM TASKS

2y 5m to grant Granted Sep 30, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

54%

Grant Probability

90%

With Interview (+35.9%)

3y 5m

Median Time to Grant

High

PTA Risk

Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.