Prosecution Insights
Last updated: April 19, 2026
Application No. 18/072,053

APPLICATION PROGRAMMING INTERFACE TO SYNCHRONIZE MATRIX MULTIPLY-ACCUMULATE MEMORY TRANSACTIONS

Final Rejection §103§112
Filed
Nov 30, 2022
Examiner
SPANN, COURTNEY P
Art Unit
2183
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
4 (Final)
80%
Grant Probability
Favorable
5-6
OA Rounds
2y 11m
To Grant
99%
With Interview

Examiner Intelligence

Grants 80% — above average
80%
Career Allow Rate
206 granted / 258 resolved
+24.8% vs TC avg
Strong +21% interview lift
Without
With
+21.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
21 currently pending
Career history
279
Total Applications
across all art units

Statute-Specific Performance

§101
6.4%
-33.6% vs TC avg
§103
44.6%
+4.6% vs TC avg
§102
9.1%
-30.9% vs TC avg
§112
28.3%
-11.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 258 resolved cases

Office Action

§103 §112
DETAILED ACTION Response to Amendment This action is responsive to the amendment filed on 3/2/2026. Claims 1-6, 8-11, 13-22 are pending and have been examined. Claims 1-6, 8-11, 13 and 15-20 have been amended. Claims 7 and 12 have been canceled. Claims 21-22 have been added. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Interpretations Claim 15 recites the following underlined contingent limitations: “…executing one or more instructions, in response to a prefetch call, indicating…” The contingent limitations use the language “in response to” and are contingent because they precede steps that are only required to be performed in response to a condition being met. For example, the steps of “executing one or more instructions” are only required to be performed if (e.g. “in response to”) a prefetch call. However, if a prefetch call does not occur none of the following steps of executing one or more instructions are required to occur based on the broadest reasonable interpretation given to contingent limitations in method claims (See MPEP 2111.04(II) See Ex parte Schulhauser, Appeal 2013-007847 (PTAB April 28, 2016)). For purposes of examination the examiner will provide a prior art rejection with the above broadest reasonable interpretation. Dependent claim 16 includes steps which are dependent upon the contingent limitation of claim 15, thus the steps of claims 16 are not required to occur. The examiner suggests amending the claim to remove the contingent limitations stating “in response to” and to positively recite each step of the method claim (e.g. receiving a call and in response to the call executing one or more instructions…). Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 15-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. In regards to claim 15, the limitation stating “…executing one or more instructions, in response to receiving a prefetch call indicating data for one or more matrix multiply-accumulate (MMA) operations; pausing progress of a plurality of groups of threads until one or more MMA memory transactions storing data in memory accessible …” fails to comply with the written description requirement because the original disclosure does not properly describe a prefetch call in sufficient detail that one of ordinary skill in the art can reasonably conclude that the inventor had possession of the claimed invention. Specifically, paragraphs [0084-0087] disclose using function calls to implement components of element 308 with an API; wherein the elements indicate an arrive operation (element 310), MMA operation (element 312), commit (element 314) and wait (element 316), however none of the operations include a “prefetch call” as claimed. Rather, the function calls in paragraphs [0084-0087] and Figs. 3-7 are used to indicate threads arriving at a location and waiting until data is stored prior to executing MMA operations (e.g., the pausing progress of threads as claimed above). Furthermore, there is no mention in the specification of prefetching outside of paragraph [0271] which mentions using an instruction prefetcher as known in the art. Thus, the specification does not provide sufficient support for a prefetch call. Additionally, there is no call in the applicant’s disclosure used to indicate any prefetching as claimed, as prefetching data would require data to be gathered early and prevent pausing of threads, as known to one of ordinary skill in the art, which contradicts the current claim language. Claims 16-20 are dependent upon claim 15 above and are similarly rejected on the same basis as claim 15 above for including the deficiencies of claim 15. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-6, 8-11 and 13-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khailany, PGPUB No. 2014/0032828, Gadre, PGPUB No. 2012/0198214 and further in view of Boswell, PGPUB No. 2018/0321938 (cited in 892 filed on 1/12/2024). In regards to claim 1, Khailany discloses One or more processors ([0009 and Fig. 4]) comprising: circuitry to execute one or more instructions to: cause a thread to wait until the thread completes memory transactions that store data in memory ([0034-0036, 0043 and 0049]: wherein a memory fence instruction is executed by processor circuitry to cause a thread to wait until memory copy instruction data transfers (memory transactions) store data to a shared/local memory accessible by processor cores (See Fig. 4)) and in response the thread completing the memory transactions, cause the thread to perform matrix multiply operations using the data stored in the memory. ([0034-0036, 0043 and 0049]: wherein when the memory transfers are complete the thread can perform matrix vector multiply operations using the data stored in memory) Khailany does not explicitly disclose one or more instructions to: cause a plurality of groups of threads to wait until the plurality of groups of threads complete memory transactions that store data in memory accessible to one or more matrix multiply-accumulate (MMA) accelerators; and in response to the plurality of groups of threads completing the memory transactions, cause the plurality of groups of threads to perform MMA operations on the one or more MMA accelerators using the data stored in the memory. Khailany discloses a “membar” fence instruction that causes a single thread to wait until one or more matrix multiply memory transactions have been performed, Khailany additionally discloses embodiments in which threads (i.e., a group of threads) may wish to wait on the completion of one or more matrix multiply memory transactions to be performed (Khailany [0039]). However, Khaliany has not explicitly disclosed, in a single embodiment, a “membar” fence instruction that causes one or more groups of threads to wait until one or more memory transactions have been performed. Gadre discloses an instruction to cause a plurality of groups of threads to wait until the plurality of groups of threads complete memory transactions that store data to memory ([0006, 0075-0085 and 0134]: wherein a membar.GL instruction can cause a plurality of cooperative thread arrays (groups of threads) to wait until memory transactions have been performed) in response to the plurality of groups of threads completing the memory transactions ([0006, 0075-0085 and 0134]) It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the membar fence instruction of Khailany to cause a plurality of groups of threads to wait until memory transactions have been performed as the membar fence instructions of Gadre. It would have been obvious to one of ordinary skill in the art because using a membar instruction that causes a plurality of groups of threads to wait enforces memory consistency as well as program and dependency ordering between threads of different CTA’s (See Gadre [0072-0074]). Additionally, Gadre includes a membar instruction that can be used on various affinity levels (i.e., single thread, warp, CTA, globally, etc., see paragraphs [0070 and 0076]) and therefore modifying the membar instruction of Khailany as the membar instruction of Garde would add flexibility to the instruction of Khailany. The combination of Khailany and Gadre does not explicitly disclose memory accessible to one or more matrix multiply-accumulate (MMA) accelerators; and cause the plurality of groups of threads to perform MMA operations on the one or more MMA accelerators using the data stored in the memory. Khailany discloses performing matrix multiplication using cores after the memory transactions complete but does not disclose performing matrix multiply-accumulate (MMA) operations nor matrix multiply accumulate accelerators. Boswell discloses memory accessible to one or more matrix multiply-accumulate (MMA) accelerators ([0097 and 0107]: wherein register file memory is accessible to cores (including HMMA data path) (see Figs.4 and 9)) cause the plurality of groups of threads to perform MMA operations on the one or more MMA accelerators using the data stored in the memory. ([0106-0111]: wherein a plurality of groups of threads implemented in a SIMT architecture perform MMA operations on MMA accelerators (cores including HMMA data path) using data stored in memory (see [0045 and 0051] for further details on thread groups and warps) It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor processing a plurality of groups of threads as in the combination of Khailany and Gadre to include matrix multiply accumulation hardware to allow groups of threads to perform matrix multiply accumulation operations as taught in Boswell. It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (using a multi-threaded processor to perform matrix multiply accumulate operations as taught in Boswell) for another (using a processor to perform generic matrix multiply operations using threads as taught in Khailany and Gadre) to obtain predictable results (performing matrix multiply accumulate operations for a plurality of threads using matrix multiply accumulate) (MPEP 2143, Example B). Furthermore, it would have been obvious because using a core data path to accelerate matrix operations can improve processor efficiency (Boswell [0024, 0111 and 0150]). Claim 9 is similarly rejected on the same basis as claim 1 above as claim 9 is the system corresponding to the processor of claim 1 above. (Note: Khailany Fig. 4 discloses a system including one or more processors as claimed in claim 9) Claim 15 is similarly rejected on the same basis as claim 1 above as claim 15 is the method corresponding to the processor of claim 1 above. (see contingent limitation interpretation section above which indicates “prefetch call nor executing instructions” in claim 15 is required based on broadest reasonable interpretation of method claim) In regards to claim 2, the overall combination of Khailany, Gadre and Boswell disclose The one or more processors of claim 1 (see rejection of claim 1 above) wherein an instruction of the one or more instructions is a fence instruction. (Khailany [0034-0035] |Gadre [0074-0076]) In regards to claim 3, the overall combination of Khailany, Gadre and Boswell disclose The one or more processors of claim 1 (see rejection of claim 1 above) the circuitry to perform one or more memory ordering operations to cause the plurality of groups of threads to wait. (Khailany [0034-0039]: wherein a fence instruction causes one or more memory ordering operations to be performed such that all prior memory copy instructions are to be performed before subsequent computation operations access the copied data from memory| Gadre [0076-0085, 0132 and 0134]: wherein the membar instruction causes memory ordering operations between memory transactions (reads and writes) of multiple thread groups such that all memory transactions prior to the membar instruction must be executed prior to subsequent memory transactions that come after the membar instruction) Claim 10 is similarly rejected on the same basis as claim 3 above as claim 10 is the system corresponding to the processor of claim 3 above. Claim 18 is similarly rejected on the same basis as claim 3 above as claim 18 is the method corresponding to the processor of claim 3 above. In regards to claim 4, the overall combination of Khailany, Gadre and Boswell disclose The one or more processors of claim 1 (see rejection of claim 1 above) wherein completing the memory transactions synchronizes the plurality of groups of threads dependent on data copied as a result of the memory transactions. (Khailany [0034-0036 and 0043]: wherein a fence instruction causes a thread to synchronize, wherein the thread comprises matrix multiply operations or other computation operations dependent on data copied as a result of memory copy instructions (note Gadre [0059, 0075-0085, and 0093] disclosed the membar instructions synchronizing a plurality of groups of threads and therefore the combination of references discloses synchronizing thread groups)) Claim 11 is similarly rejected on the same basis as claim 4 above as claim 11 is the system corresponding to the processor of claim 4 above. Claim 17 is similarly rejected on the same basis as claim 4 above as claim 17 is the method corresponding to the processor of claim 4 above. In regards to claim 5, the overall combination of Khailany, Gadre and Boswell discloses The one or more processors of claim 1 (see rejection of claim 1 above) wherein the memory transactions are to cause one or more operands for the MMA operations to be copied to a memory (Khailany [0012-0014, 0027, 0032 and 0043]: wherein the memory copy transactions for matrix multiply operations causes one or more operands of the matrix multiply operations to be copied to a local memory (note: figs. 1 and 9 of Boswell discloses the explicit MMA operations and operands and the overall combination of references discloses the above limitation)) The overall combination of Khailany, Gadre and Boswell thus far does not disclose copying operands to one or more registers of a register file. However, Khailany discloses copying operands to a location in local memory such as on-chip memory, SRAM or shared memory for example. However, Khailany does not disclose that a local memory can include one or more registers of a register file. Boswell discloses one or more registers of a register file. (See Fig. 4, element 420)) It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the local memory of Khailany to include a register file as taught in Boswell. It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (using a register file as a memory as taught in Boswell) for another (using a generic local memory as taught in Khailany) to obtain predictable results (using a register file as a local memory to store copied operand data) for the benefit of using a register file for storage which is fast memory (MPEP 2143, Example B). Claim 19 is similarly rejected on the same basis as claim 5 above as claim 19 is the method corresponding to the processor of claim 5 above. In regards to claim 6, the overall combination of Khailany, Gadre and Boswell disclose The one or more processors of claim 1 (see rejection of claim 1 above) wherein completing the memory transactions cause one or more operands of for the MMA operations to be copied to the memory, wherein the memory is shared memory for a streaming multiprocessor. (Khailany [0012-0014, 0027, 0032 and 0043]: wherein the memory copy transactions for matrix multiply operations causes one or more operands of the matrix multiply operations to be copied to shared memory for a streaming multiprocessor (note: fig. 1 of Boswell discloses the explicit MMA operations and operands and the overall combination of references discloses the above limitation)) Claim 20 is similarly rejected on the same basis as claim 6 above as claim 20 is the method corresponding to the processor of claim 6 above. In regards to claim 8, the overall combination of Khailany, Gadre and Boswell disclose The one or more processors of claim 1 (see rejection of claim 1 above) wherein each processor of the one or more processors is a graphics processing unit (GPU).” (Khailany [0009 and 0049] |Gadre [0030] | Boswell [0029]) In regards to claim 13, the overall combination of Khailany, Gadre and Boswell disclose The system of claim 9 (see rejection of claim 9 above) the system to fence the memory transactions to cause the plurality of groups of threads to wait. (Khailany [0034-0036 and 0043]: wherein the memory fence instruction causes the processor to fence the memory transactions performed by the memory copy instructions for matrix multiply instructions (note: fig. 1 of Boswell discloses the explicit MMA operations and operands and the overall combination of references discloses the above limitation) |Gadre [0074-0085]: wherein the membar.GL instruction fences memory transactions of a plurality of groups of threads) In regards to claim 14, the overall combination of Khailany, Gadre and Boswell disclose The system of claim 9 (see rejection of claim 9 above) wherein the one or more processors are streaming multiprocessors (SMs) of one or more graphics processing units (GPUs). (Khailany [0043] | Gadre [0030 and 0046] | Boswell [0029 and 0040]) In regards to claim 16, the overall combination of Khailany, Gadre and Boswell discloses The computer-implemented method of claim 15 (see rejection of claim 15 above) wherein executing the one or more instructions further comprises performing, by each of the plurality of groups of threads, a memory fence of the memory transactions (wherein the executing of instructions is contingent upon a prefetch call in claim 15, as the claim is rejected under a contingent limitation interpretation indicating no prefetch call occurs the performing of a memory fence is not required) In regards to claim 21, the overall combination of Khailany, Gadre and Boswell disclose The one or more processors of claim 1 (see rejection of claim 1 above) wherein the MMA operations are performed on matrix fragments stored in the memory accessible to the one or more MMA accelerators. (Boswell [0084-0091 and 0104] and Figs. 7 and 9]) In regards to claim 22, the overall combination of Khailany, Gadre and Boswell disclose The system of claim 9 (see rejection of claim 9 above) wherein the plurality of groups of threads are further grouped into groups of warps (Garde [0048 and 0066] |Boswell [0045, 0051, 0053 and 0107-0111]) and each warp in the same group performs the same MMA operation. (Boswell [0045, 0051, 0053 and 0107-0111]) Response to Arguments Applicant’s arguments, see pages 6-7 of the remarks filed on 3/2/2026, with respect to 35 USC 112(b) have been fully considered and are persuasive. The previous 35 USC 112(b) rejections have been withdrawn. Applicant’s arguments, see pages 8-11, of remarks filed on 3/2/2026, with respect to the rejection(s) of claim(s) 1, 9 and 15 under 35 USC 103 have been fully considered and are partially persuasive. Therefore, the rejections have been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Khailany, Gadre and Boswell under 35 USC 103 for claims 1, 9 and 15. Additionally, a new ground of rejection under 35 USC 112(a) is made for claim 15. Claims 2-6, 8, 10-14 and 16-22 are dependent upon one of independent claims 1, 9 and 15 above, and are argued at least based upon their respective dependencies. Therefore, dependent claims 2-6, 8, 10-14 and 16-22 remain rejected at least based on their respective dependencies. Applicant argues the previous reference Gadre in the 35 USC 103 rejection, on pages 8-9 of the remarks in the substance that: “The secondary reference, Gadre, does not cure the deficiency of Khailany with respect to the amended claim. Gadre describes "a memory barrier (MEMBAR) instruction [that] is used to ensure that all memory transactions issued before the MEMBAR instruction are sufficiently performed so that their results are visible to any memory transactions issued after the MEMBAR instruction." Gadre at [0067]. This includes "multiple levels of MEMBAR instructions that differ in the scope of other threads that are affected," including "enforc[ing] memory ordering among threads in the [cooperative thread array] CTA." Id. at [0075]. The disclosure of the Gadre reference is still limited to threads in a cooperative thread array, i.e., a single group of threads. Therefore, Gadre does not disclose the "plurality of groups of threads" and the "MMA accelerators," to disclose "in response to the plurality of groups of threads completing the memory transactions, cause the plurality of groups of threads to perform the MMA operations on the one or more MMA accelerators using the data stored in the memory," as in the amended claim.” The examiner respectfully disagrees with the applicants’ assertions that Gadre does not disclose a memory barrier instruction that enforces memory ordering between a plurality of thread groups as argued above. For example, Gadre discloses a MEMBAR.GL and MEMBAR.SYS which both disclose waiting until all prior memory requests with respect to all other threads in a PPU and all threads and clients of system to complete (see paragraphs [0083-0089]). In particular, Gadre [0083-0085] discloses that membar.GL is used for communication between threads in different CTAs, thus the global level membar instruction causes threads of different thread groups (e.g. multiple CTA’s [0006] or a plurality of thread groups) to wait. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P SPANN whose telephone number is (571)431-0692. The examiner can normally be reached M-F, 9am-6pm, EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /COURTNEY P SPANN/Primary Examiner, Art Unit 2183
Read full office action

Prosecution Timeline

Nov 30, 2022
Application Filed
Jan 08, 2024
Non-Final Rejection — §103, §112
Apr 02, 2024
Interview Requested
Apr 08, 2024
Applicant Interview (Telephonic)
Apr 08, 2024
Examiner Interview Summary
Jun 12, 2024
Response Filed
Jun 29, 2024
Final Rejection — §103, §112
Aug 07, 2024
Interview Requested
Sep 11, 2024
Examiner Interview Summary
Sep 11, 2024
Applicant Interview (Telephonic)
Jan 06, 2025
Notice of Allowance
Jun 06, 2025
Request for Continued Examination
Jun 11, 2025
Response after Non-Final Action
Sep 03, 2025
Non-Final Rejection — §103, §112
Dec 08, 2025
Interview Requested
Dec 15, 2025
Applicant Interview (Telephonic)
Dec 18, 2025
Examiner Interview Summary
Mar 02, 2026
Response Filed
Mar 16, 2026
Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596550
Dual-Mode Floating Point Processor Operation
2y 5m to grant Granted Apr 07, 2026
Patent 12585468
APPARATUS AND METHOD USING HINT CAPABILITY FOR CONTROLLING MICRO-ARCHITECTURAL CONTROL FUNCTION
2y 5m to grant Granted Mar 24, 2026
Patent 12572362
PROCESSOR AND METHOD FOR EXECUTING A LOOPING CODE SEGMENT WITH ZERO OVERHEAD
2y 5m to grant Granted Mar 10, 2026
Patent 12566609
MICROPROCESSOR WITH APPARATUS AND METHOD FOR HANDLING OF INSTRUCTIONS WITH LONG THROUGHPUT
2y 5m to grant Granted Mar 03, 2026
Patent 12566724
SEQUENTIAL PROCESSING METHOD AND APPARATUS OF DATA PACKET
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+21.3%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 258 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month