Last updated: April 19, 2026
Application No. 16/824,457
TECHNIQUES FOR ORCHESTRATING STAGES OF THREAD SYNCHRONIZATION

Non-Final OA §101§103§112
Filed
Mar 19, 2020
Examiner
YUN, CARINA
Art Unit
2194
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
5 (Non-Final)
This examiner grants 50% of cases after interview

— +33.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 322 resolved cases, 2023–2026
Examiner Intelligence

YUN, CARINA View full profile →
Grants 50% of resolved cases
Career Allow Rate
160 granted / 322 resolved
-5.3% vs TC avg
Strong +34% interview lift
Without
With
+33.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
25 currently pending
Career history
347
Total Applications
across all art units
Statute-Specific Performance

§101
17.8%
-22.2% vs TC avg
§103
47.5%
+7.5% vs TC avg
§102
8.6%
-31.4% vs TC avg
§112
21.4%
-18.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 322 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-39 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Regarding claims 1, 11, 20, and 30, recite “information indicative of how many of the plurality of threads have completed performance” and examiner cannot find any specific mention of such language in the specification. Examiner cannot point to anything the specification that has closest support. Applicant should point to support by quotations and paragraph numbers, or remove the subject matter. 
Dependent claims 2-10, 12-19, 21-29, and 31-39 have similar issues and are rejected based on dependency.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-10, 20-39 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
Regarding claims 1 recite “when executed” and it is not clear if these instructions are actually executed or not. Dependent claims 2-10 are rejected based on dependency.
Regarding claims 1 recite “to provide,” but does not say what is receiving the indication, and is unclear what the verb requires in this context. Dependent claims 2-10 are rejected based on dependency.
Regarding claims 1, 11, 20, and 30, recites “the plurality of threads have completed performance” it is not clear what is considered threads completed performance, as threads are normally executed.
Dependent claims 2-10, 12-19, 21-29, and 31-39 have similar issues and are rejected based on dependency.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-39 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Regarding claim 1 this part of the eligibility analysis evaluates whether the claim falls within any statutory category. MPEP §2106.03. The claim recites a medium; thus, the claim is directed to a machine which is one of the statutory categories of invention.
Step 2A Prong 1: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04(II) and the October 2019 Update, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. 
The limitations “information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads” as drafted, recite functions that, under its broadest reasonable interpretation, covers functions that could reasonably be performed in the mind, including with the aid of pen and paper, but for the recitation of generic computer components. That is, the limitations as drafted, are functions that, under its broadest reasonable interpretation, recite the abstract idea of a mental process. The limitations encompass a human mind carrying out the functions through observation, evaluation, judgment and/or opinion, or even with the aid of pen and paper.  Thus, these limitations recite and fall within the “Mental Processes” grouping of abstract ideas. See MPEP §2106.04(a)(2). Accordingly, claim 1 recites a judicial exception (i.e. an abstract idea).
Step 2A, Prong 2, This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (a) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (b) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. 2019 PEG Section III(A)(2), 84 Fed. Reg. at 54-55.
In this case, this judicial exception is not integrated into a practical application. The claim recites the following additional elements “computer readable medium having stored thereon a set of instructions that when executed, at least one or more processors,” and “API” and “plurality of threads” is recited at a high level of generality (i.e. general medium, processor, API, threads) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Accordingly, the additional elements do not integrate the recited judicial exception into a practical application, and the claim is therefore directed to the judicial exception.  See MPEP 2106.05(f).
The additional element “in response to a call to the API by a thread” and “cause the one or more processor to provide, to the thread” is at best the equivalent of merely adding the words “apply it” to the judicial exception. Accordingly, the additional elements do not integrate the recited judicial exception into a practical application, and the claim is therefore directed to the judicial exception.  See MPEP 2106.05(f).
Step 2B, This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. MPEP 2106.05.  
As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of
The claims include additional elements “in response to a call to the API, cause the one or more processor to provide, to a caller of the API” “computer readable medium having stored thereon a set of instructions that when executed, at least one or more processors,” and “API” and “plurality of threads” are merely a generic computer or generic computer components to apply the judicial exception which cannot provide an inventive concept.
The claims include additional elements “in response to a call to the API by a thread” and “cause the one or more processor to provide, to the thread” is at best the equivalent of merely adding the words “apply it” to the judicial exception and is not sufficient to amount to significantly more than the judicial exception. See MPEP 2106.05(f).
Accordingly,  the additional elements mentioned above are not sufficient to amount to significantly more than the judicial exception, and the claims are therefore directed to the judicial exception. Thus, the claims do not appear to be patent eligible under 35 USC 101.

Regarding Claim 2, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “a first thread of the plurality of threads modifies a first shared data value; a second thread of the plurality of threads modifies a second shared data value; the first thread indicates a first arrival to the API when the first thread has completed modifying the first shared data value; the second thread indicates a second arrival to the API when the second thread has completed modifying the second shared data value; and the first thread or the second thread is selected by one or more threads of the plurality of threads to perform a set of operations that depend on the first shared data value and the second shared data value, the selection based, at least in part, on the order of arrival comprising the first arrival and the second arrival indicated by the information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads,” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding clam 3, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the first thread and the second thread complete modifying the first shared data value and the second shared data value when the first thread and the second thread do not contain additional instructions that depend on the first shared data value and the second shared data value.” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 4, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the first thread and the second thread indicate to the API by performing a function call” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 5, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the first thread or the second thread is selected by receiving, from the one or more processors an order, indicated by the information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads, and determining if the first thread or the second thread has a lowest value in the order” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 6, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the set of operations that depend on the first shared data value and the second shared data value are an epilogue” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 7, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the set of instructions when executed by the one or more processors, facilitates parallel computations performed by each thread of the plurality of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 8, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein each thread of the plurality of threads is synchronized by waiting for one or more arrivals indicated by the information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 9, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein a first thread among the plurality of threads is selected to perform a set of preprocessing instructions based, at least in part on if the first thread is available to perform the set of preprocessing instructions before other threads of the plurality of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 10, is a dependent claim rejected for the same reasons as claim 1. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein if the first thread is not available to perform the set of preprocessing instructions before other threads of the plurality of threads, a second thread among the group of threads is selected to perform the set of preprocessing instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 11, is rejected for the same reasons as claim 1. 
Regarding claim 12, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “determining if the thread is last in an order of arrival based, at least in part, on the information indicative of how many of the group of threads have completed performance of one or more instructions on data shared by the group of threads; selecting the thread from the group of threads to perform a set of instructions that depend on one or more shared data values, the selection based, at least in part, on if the thread is last in the order of arrival; performing the set of instructions if the thread is selected; and performing, if the thread is not selected, the set of instructions by a second thread from the group of threads that is last in the order of arrival” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 13, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the set of instructions that depend on the one or more shared data values is an epilogue” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 14, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein determining that the thread is last in the order of arrival based, at least in part, on a count value included in the information, the count value indicating how many of the threads have not completed performance of the one or more instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 15, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the order of arrival is determined based on when each thread of the group of threads has completed modifying one or more shared data items” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 16, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein each thread of the group of threads has completed modifying one or more shared data items when it does not contain additional modifications required by the set of instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 17, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the thread is selected to perform one or more operations to prepare shared data if the thread is first among the group of threads to be available to perform the one or more operations to prepare shared data” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 18, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the API provides one or more function calls that facilitate parallel computations performed by each thread of the group of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 19, is a dependent claim rejected for the same reasons as claim 11. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein each thread of the group of threads is synchronized by waiting for one or more arrivals indicated in the order of arrival” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 20, is rejected for the same reasons as claim 1. In particular, the claim recites two additional elements – processor and one or more circuits--. The processor and circuits are recited at a high-level of generality (i.e., as a generic processor and generic circuit) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Regarding claim 21, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “an arrival index of a thread of the plurality of threads is determined based on an order indicated by the information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads; if the arrival index of the thread is last in the order, the thread performs a set of instructions that depend on one or more shared data values; and if the arrival index of the thread is not last in the order, a second thread of the plurality of threads is determined based on the order to perform the set of instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 22, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the arrival index is determined based, at least in part, on a count value of the information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 23, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the second thread performs the set of instructions if it is last in the order” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 24, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the set of instructions that depend on the one or more shared data values is an epilogue” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 25, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the order is determined based on when, during execution, each thread of the plurality of threads has completed modifying the one or more shared data values” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 26, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein each thread of the plurality of threads has completed modifying the one or more shared data values when it does not contain additional modifications to the one or more shared data values required by the set of instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 27, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein each thread of the plurality of threads is synchronized by waiting for one or more arrivals indicated by the information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 28, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the thread is selected to perform one or more operations to prepare one or more shared data items if the thread is first among the plurality of threads to be available to perform the one or more operations to prepare the one or more shared data items” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 29, is a dependent claim rejected for the same reasons as claim 20. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the API provides one or more function calls that facilitate parallel computations performed by each of the one or more of the plurality of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 30, is rejected for the same reasons as claim 1. In particular, the claim recites additional elements –one or more circuits--. The circuits are recited at a high-level of generality (i.e., as a generic circuit) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Regarding claim 31, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “a first thread of the group of threads modifies a first shared data value; a second thread of the group of threads modifies a second shared data value; the first thread indicates a first arrival to the API when the first thread has completed modifying the first shared data value; the second thread indicates a second arrival to the API when the second thread has completed modifying the second shared data value; and the first thread or the second thread is selected by one or more threads of the group of threads to perform a set of operations that depend on the first shared data value and the second shared data value, the selection based, at least in part, on an order of arrival comprising the first arrival and the second arrival indicated by the information indicative of how many group of threads have completed performance of one or more instructions on data shared by the group of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 32, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the set of operations that depend on the first shared data value and the second shared data values is an epilogue” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 33, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the first thread or the second thread are selected by receiving, from the API, the order of arrival and determining if the first thread or the second thread is last in the order of arrival” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 34, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the first thread and the second thread complete modifying the first shared data value and the second shared data value when the first thread and the second thread do not contain operations that compute one or more new values for the first shared data value and the second shared data value” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 35, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the first thread and the second thread indicate to the API by performing a function call provided by the API” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 36, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the API provides one or more function calls that facilitate parallel computing” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Regarding claim 37, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein each thread of the group of threads is synchronized by waiting for one or more arrivals indicated in the order of arrival indicated by the information indicative of how many of the group of threads have completed performance of one or more instructions on data shared by the group of threads” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 38, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein the thread is selected to perform a set of prologue instructions based, at least in part, on if the thread is first among the group of threads available to perform the set of prologue instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).
Regarding claim 39, is a dependent claim rejected for the same reasons as claim 30. Furthermore, the claims do not add additional elements and does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The additional element of “wherein a second thread is selected to perform the set of prologue instructions if the first thread is not first among the group of threads available to perform the set of prologue instructions and the second thread is first among the group of threads available to perform the set of prologue instructions” does not render the judicial exception as a practical limitation or make a combination that is significantly more than the judicial exception because the step is still drawn to an abstract idea. The limitation is an additional element reciting computer instruction; these additional elements are merely instructions to implement an abstract idea on a computer. MPEP 2106.04(d).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-8, and 11-37 rejected under 35 U.S.C. 103 as being unpatentable over Budge (U.S. Patent 8,438,370) in view of Jiao (U.S. PG PUB 2012/0096474).

Regarding claim 1, Budge teaches a non-transitory computer readable medium having stored thereon a set of instructions that, when executed, at least in part by one or more processors (see col. 4, lines 13-43 “a computer program product includes a computer readable medium encoded with program code for controlling operation of a computer system”), cause the one or more processors to:
 in response to a call to an application programming interface (API) by a thread of a plurality of threads (see col. 17, lines 19-27, “The library functions may include driver API calls that instruct a driver program executing on a CPU to send commands that define the CTA(s) to a PPU, which executes the CTA(s) as specified by the driver.” See col. 5, lines 45-60, “PPU 122 advantageously implements a highly parallel processor including one or more processing cores, each of which is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently. PPU 122 can be programmed to perform a wide array of computations, including linear and nonlinear data transforms, filtering of video and/or audio data, pseudorandom number generation, modeling (e.g., applying laws of physics to determine position, velocity and other attributes of objects), image rendering, and so on. PPU 122 may transfer data from system memory 104 and/or PP memory 124 into internal memory, process the data, and write result data back to system memory 104 and/or PP memory 124, where such data can be accessed by other system components, including, e.g., CPU 102.”), 
Budge does not expressly disclose, however, Jiao teaches cause the one or more processors to provide to the thread information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 2, Budge teaches wherein: 
a first thread of plurality of threads modifies a first shared data value (see ¶ 17, “According to one aspect of the present invention, a method for generating a plurality of data values includes defining a thread array having a number of threads” and ¶ 20, “Each thread is associated with a target one of the data elements to be updated and is also associated with a source one of the of data elements, where the source data element is to be used to update the target data element. Each thread is assigned to one of a number of subsets of threads that independently update their respective target data elements. In this aspect, each thread in a first one of the subsets update their respective target data elements using an initial value of their respective source data elements, and at least some of the target data elements associated with the threads of the first subset are the source data elements for at least some of the threads in a second one of the subsets.”); 
a second thread of the plurality of threads modifies a second shared data value (see ¶ 17, “According to one aspect of the present invention, a method for generating a plurality of data values includes defining a thread array having a number of threads” and see ¶ 20, “Each of the threads in the second subset computes an updated value for its respective target data element using the value of the source data element associated therewith and stores the updated value of the target data element in the memory.”); 
the first thread indicates a first arrival to the API when the first thread has completed modifying the first shared data value (see ¶ 44, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the third subset update their respective elements MT[tid] only after the threads in the first and second subsets have completed their updates”); 
the second thread indicates a second arrival to the API when the second thread has completed modifying the second shared data value (see ¶ 44, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the third subset update their respective elements MT[tid] only after the threads in the first and second subsets have completed their updates”); and 
the first thread or the second thread is selected by the one or more of the plurality of threads to perform a set of operations that depend on the first shared data value and the second shared data value, the selection based, at least in part, on the order of arrival comprising the first arrival and the second arrival indicated by the API (see ¶ 17, “According to one aspect of the present invention, a method for generating a plurality of data values includes defining a thread array having a number of threads” and  ¶ 42, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the second subset update their respective elements MT[tid] after the threads in the first subset perform their updates and before any of the remaining threads perform updates.”).
Budge does not expressly disclose, however, Jiao teaches information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 3, Budge teaches wherein the first thread and the second thread complete modifying the first shared data value and the second shared data value when the first thread and the second thread do not contain additional instructions that depend on the first shared data value and the second shared data value (see ¶8. “To the extent that threads assigned to different data elements do not have data dependencies on each other, those threads can be executed in parallel, providing a potentially significant performance advantage” see ¶ 38, “In the twister phase, the threads are executed in subsets of mutually independent threads, and the elements of MT are updated in order to the extent needed to preserve correct behavior while still providing a high degree of parallelism.”).

Regarding claim 4, Budge teaches wherein the first thread and the second thread indicate to the API by performing a function call (see 82, “The library functions may include driver API calls that instruct a driver program executing on a CPU to send commands that define the CTA(s) to a PPU, which executes the CTA(s) as specified by the driver”).

Regarding claim 5, Budge teaches wherein the first thread or the second thread is selected by receiving, from the one or more processors, an order (see 17, “Execution of different threads is ordered so that all of the threads in the first subset storing their data values in the memory before any of the threads in the second subset compute their data values. For example, each thread may execute a thread synchronization command that synchronizes the threads at a point at which all of the threads in the first subset have stored their data values in the memory and before any of the threads in the second subset compute their data values.”).
Budge does not expressly disclose, however, Jiao teaches information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 6, Budge teaches wherein the set of operations that depend on the first shared data value and the second shared data value are an epilogue. Note: epilogue is defined as a set of instructions, perform a data-dependent operating using one or more shared values, see ¶ 57 of applicant’s specification. (see ¶ 17, “The subsets include at least a first subset and a second subset, where none of the threads in the first subset depend on data values to be computed by any of the threads in the second subset but at least some of threads in the second subset depend on data values to be computed by at least some of the threads in the first subset.”).

Regarding claim 7, Budge teaches wherein the set of instructions, when executed by the one or more processors, facilitates parallel computations performed by each thread of the plurality of threads (see ¶16, “To the extent that threads assigned to different data elements do not have data dependencies on each other, those threads can be executed in parallel”).

Regarding claim 8, Budge teaches wherein each thread of the plurality of threads is synchronized by waiting for one or more arrivals indicated (see ¶77, “Thus, while the threads in the first subset are executing their iterations, the other threads are simply waiting for synchronization to occur. Once the threads of the first subset finish, they reach the syncthreads command, and synchronization (step 510) is achieved”).
Budge does not expressly disclose, however, Jiao teaches by the information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 11, is a method claim that contains substantially the same limitations as claim 1. Therefore, they are rejected for the same reasons as claim 1. 

Regarding claim 12, Budge teaches further comprising: 
determining if the thread is last in an order of arrival (see ¶ 62, “At step 460, the thread checks its thread identifier tid to determine whether it is the last thread (e.g., whether tid=623).); 
selecting the thread from the group of threads to perform a set of instructions that depend on one or more shared data values, the selection based, at least in part, on if the thread is last in the order of arrival (see ¶ 52, “At step 438, the thread checks its thread identifier tid to determine whether it is the last thread (i.e., whether tid=623) or another thread (i.e., whether tid<623). At step 440, each thread except the last thread computes a local variable y using: y=MSB(MT[tid])|LSBS(MT[tid+1]). (Eq. 4)”);
 performing the set of instructions if the thread is selected (see ¶52, y=MSB(MT[tid])| LSBS(MT[tid+1]). (Eq. 4) ); and 
performing, if the thread is not selected, the set of instructions by a second thread from the group of threads (see ¶ 59, “the second subset 332 includes threads with 227<=tid<397. These threads proceed to step 452 to update their respective elements MT[tid]. For example, each thread in the second subset may compute: MT[tid]=U(MT[tid-227],y), (Eq. 6)”) that is last in the order of arrival (see ¶ 62, “At step 460, the thread checks its thread identifier tid to determine whether it is the last thread (e.g., whether tid=623). At step 462, the last thread computes its local variable y, e.g., using: y=MSB(MT[623])|LSBS(MT[0]). (Eq. 8)”).
Budge does not expressly disclose, however, Jiao teaches information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 13, Budge teaches wherein the set of instructions that depend on the one or more shared data values is an epilogue Note: epilogue is defined as a set of instructions, perform a data-dependent operating using one or more shared values, see ¶ 57 of applicant’s specification. (see ¶ 17, “The subsets include at least a first subset and a second subset, where none of the threads in the first subset depend on data values to be computed by any of the threads in the second subset but at least some of threads in the second subset depend on data values to be computed by at least some of the threads in the first subset.”).

Regarding claim 14, Budge teaches wherein determining that the thread is last in the order of arrival (see ¶ 62, “At step 460, the thread checks its thread identifier tid to determine whether it is the last thread (e.g., whether tid=623).).
Budge does not expressly disclose, however, Jiao teaches a count value included in the information, the count value indicating how many of the threads have not completed performance of the one or more instructions (see ¶[0031] “As described earlier, the thread counter 312 may be initialized to the total number of threads 316, and the thread monitor 316 may decrement the thread counter 312 every time a thread arrives. The threads that complete execution first are suspended until the remaining threads arrive (i.e., until the thread counter 312 reaches zero)”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 15, Budge teaches wherein the order of arrival is determined based on when each thread of the group of threads has completed modifying one or more shared data items (see ¶28, “When execution of a thread or SIMD group is completed, core 210 advantageously notifies core interface 203. Core interface 203 can then initiate other processes, e.g., to retrieve output data from shared memory 206 and/or to prepare core 210 for execution of additional threads.”).

Regarding claim 16, Budge teaches wherein each thread of the group of threads has completed modifying one or more shared data items when it does not contain additional modifications required by the set of instructions (see ¶58, “When all of the threads have reached step 448, it is guaranteed that updates to elements MT[0] through MT[226] are complete. Threads in a second subset can then update their respective elements MT[tid].”).

Regarding claim 17, Budge teaches wherein the thread is selected to perform one or more operations to prepare shared data if the thread is first among the group of threads to be available to perform the one or more operations to prepare shared data (see ¶ 42, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the second subset update their respective elements MT[tid] after the threads in the first subset perform their updates and before any of the remaining threads perform updates.”).

Regarding claim 18, Budge teaches each thread of the group of threads wherein the API provides one or more function calls that facilitate parallel computations performed by each thread (see ¶16, “To the extent that threads assigned to different data elements do not have data dependencies on each other, those threads can be executed in parallel”).

Regarding claim 19, Budge teaches wherein each thread of the group of threads is synchronized by waiting for one or more arrivals indicated in an order of arrival indicated by the information (see ¶ [0077], “Thus, while the threads in the first subset are executing their iterations, the other threads are simply waiting for synchronization to occur. Once the threads of the first subset finish, they reach the syncthreads command, and synchronization (step 510) is achieved.”).

Regarding claim 20, is a processor claim that contains substantially the same limitations as claim 1. Therefore, they are rejected for the same reasons as claim 1. In addition Budge teaches a processor comprising: one or more circuitry that, by executing instructions (see Fig. 1). 

Regarding claim 21, Budge teaches wherein: 
an arrival index a thread of the plurality of threads is determined based on an order (see ¶ 17, “According to one aspect of the present invention, a method for generating a plurality of data values includes defining a thread array having a number of threads” and  ¶ 55, “In one embodiment, when a thread encounters the syncthreads command, it generates an "arrival" signal for an instruction unit (e.g., instruction unit 212 of FIG. 2) that controls instruction issue for the threads of the CTA. After receiving the arrival signal from a particular thread, the instruction unit defers issuing further instructions for that thread until such time as the instruction unit receives corresponding arrival signals from all of the threads of the CTA”); 
if the arrival index of the thread is last in the order, the thread performs a set of instructions that depend on one or more shared data values (see ¶ 62, “Threads that do not satisfy the condition at step 456 skip step 458. At step 460, the thread checks its thread identifier tid to determine whether it is the last thread (e.g., whether tid=623). At step 462, the last thread computes its local variable y, e.g., using: y=MSB(MT[623])|LSBS(MT[0]). (Eq. 8)”);
and if the arrival index of the thread is not last in the order, a second thread of the plurality of threads is determined based on the order to perform the set of instructions (see ¶ 59, “Accordingly, at step 450, the thread checks its thread identifier tid to determine whether it is in the second subset. In the embodiment shown in FIG. 3, the second subset 332 includes threads with 227.ltoreq.tid<397. These threads proceed to step 452 to update their respective elements MT[tid]. For example, each thread in the second subset may compute: MT[tid]=U(MT[tid-227],y), (Eq. 6)”).
Budge does not expressly disclose, however, Jiao teaches indicated by information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 22, Budge does not expressly disclose, however, Jiao teaches wherein the arrival index is determined based, at least in part, on a count value of the information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 23, Budge teaches wherein the second thread performs the set of instructions if it is last in the order (see ¶ 62, “At step 460, the thread checks its thread identifier tid to determine whether it is the last thread (e.g., whether tid=623).).	

Regarding claim 24, Budge teaches wherein the set of instructions that depend on the one or more shared data values is an epilogue. Note: epilogue is defined as a set of instructions, perform a data-dependent operating using one or more shared values, see ¶ 57 of applicant’s specification. (see ¶ 17, “The subsets include at least a first subset and a second subset, where none of the threads in the first subset depend on data values to be computed by any of the threads in the second subset but at least some of threads in the second subset depend on data values to be computed by at least some of the threads in the first subset.”).

Regarding claim 25, Budge teaches wherein the order is determined based on when, during execution, each thread of the plurality of threads has completed modifying the one or more shared data items (see ¶ 58, “Threads that are not in the first subset skip step 446. At step 448, all of the threads are synchronized, e.g., using another syncthreads command. When all of the threads have reached step 448, it is guaranteed that updates to elements MT[0] through MT[226] are complete. Threads in a second subset can then update their respective elements MT[tid].”).

Regarding claim 26, Budge teaches wherein each thread of the plurality of threads has completed modifying the one or more shared data items when it does not contain additional modifications to the one or more shared data items required by the set of instructions (see ¶ 28, “When execution of a thread or SIMD group is completed, core 210 advantageously notifies core interface 203. Core interface 203 can then initiate other processes, e.g., to retrieve output data from shared memory 206 and/or to prepare core 210 for execution of additional threads.”).

Regarding claim 27, Budge teaches wherein each thread of the plurality of threads is synchronized by waiting for one or more arrivals indicated (see ¶77, “Thus, while the threads in the first subset are executing their iterations, the other threads are simply waiting for synchronization to occur. Once the threads of the first subset finish, they reach the syncthreads command, and synchronization (step 510) is achieved.”).
Budge does not expressly disclose, however, Jiao teaches by the information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 28, Budge teaches wherein the thread is selected to perform one or more operations to prepare one or more shared data items if the thread is first among the plurality of threads to be available to perform the one or more operations to prepare the one or more shared data items (see ¶ 17, “Execution of different threads is ordered so that all of the threads in the first subset storing their data values in the memory before any of the threads in the second subset compute their data values. For example, each thread may execute a thread synchronization command that synchronizes the threads at a point at which all of the threads in the first subset have stored their data values in the memory and before any of the threads in the second subset compute their data values”).

Regarding claim 29, Budge teaches wherein the API provides one or more function calls that facilitate parallel computations performed by each of the one or more plurality of threads (see ¶16, “To the extent that threads assigned to different data elements do not have data dependencies on each other, those threads can be executed in parallel”).

Regarding claim 30, is a system claim that contains substantially the same limitations as claim 1. Therefore, they are rejected for the same reasons as claim 1. In addition, Budge teaches a system comprising: one or more circuits (see Fig. 1). Note: the circuit only needs to be capable of performing the steps. Examiner is providing art for compact prosecution.

Regarding claim 31, Budge teaches wherein: a first thread of the group of threads modifies a first shared data value (see ¶ 20, “Each thread is associated with a target one of the data elements to be updated and is also associated with a source one of the of data elements, where the source data element is to be used to update the target data element. Each thread is assigned to one of a number of subsets of threads that independently update their respective target data elements. In this aspect, each thread in a first one of the subsets update their respective target data elements using an initial value of their respective source data elements, and at least some of the target data elements associated with the threads of the first subset are the source data elements for at least some of the threads in a second one of the subsets.”); 
a second thread modifies a second shared data value (see ¶ 20, “Each of the threads in the second subset computes an updated value for its respective target data element using the value of the source data element associated therewith and stores the updated value of the target data element in the memory.”); 
the first thread indicates a first arrival to the API when the first thread has completed modifying the first shared data value (see ¶ 44, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the third subset update their respective elements MT[tid] only after the threads in the first and second subsets have completed their updates”); 
the second thread indicates a second arrival to the API when the second thread has completed modifying the second shared data value (see ¶ 44, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the third subset update their respective elements MT[tid] only after the threads in the first and second subsets have completed their updates”); and 
the first thread or the second thread is selected by one or more threads of the group of threads to perform a set of operations that depend on the first shared data value and the second shared data value, the selection based, at least in part, on the order of arrival comprising the first arrival and the second arrival indicated by the API (see ¶ 42, “(see ¶ 42, “In one embodiment, thread synchronization techniques are advantageously employed to guarantee that all threads in the second subset update their respective elements MT[tid] after the threads in the first subset perform their updates and before any of the remaining threads perform updates.”)”).
Budge does not expressly disclose, however, Jiao teaches information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Regarding claim 32, Budge teaches wherein the set of operations that depend on the first shared data value and the second shared data values is an epilogue. Note: epilogue is defined as a set of instructions, perform a data-dependent operating using one or more shared values, see ¶ 57 of applicant’s specification. (see ¶ 17, “The subsets include at least a first subset and a second subset, where none of the threads in the first subset depend on data values to be computed by any of the threads in the second subset but at least some of threads in the second subset depend on data values to be computed by at least some of the threads in the first subset.”).

Regarding claim 33, Budge teaches wherein the first thread or the second thread are selected by receiving, from the circuitry, the order of arrival and determining if the first thread or the second thread is last in the order of arrival (see ¶ 62, “At step 460, the thread checks its thread identifier tid to determine whether it is the last thread (e.g., whether tid=623).). 

Regarding claim 34, Budge teaches wherein the first thread and the second thread complete modifying the first shared data value and the second shared data value when the first thread and the second thread do not contain operations that compute one or more new values for the first shared data value and the second shared data value (see ¶58, “When all of the threads have reached step 448, it is guaranteed that updates to elements MT[0] through MT[226] are complete. Threads in a second subset can then update their respective elements MT[tid].”). 

Regarding claim 35, Budge teaches wherein the first thread and the second thread indicate to the API by performing a function call provided by the API (see 82, “The library functions may include driver API calls that instruct a driver program executing on a CPU to send commands that define the CTA(s) to a PPU, which executes the CTA(s) as specified by the driver”).

Regarding claim 36, Budge teaches wherein the API provides one or more function calls that facilitate parallel computing (see ¶16, “To the extent that threads assigned to different data elements do not have data dependencies on each other, those threads can be executed in parallel”).

Regarding claim 37, Budge teaches wherein each thread of the group of threads is synchronized by waiting for one or more arrivals indicated in an order of arrival (see ¶77, “Thus, while the threads in the first subset are executing their iterations, the other threads are simply waiting for synchronization to occur. Once the threads of the first subset finish, they reach the syncthreads command, and synchronization (step 510) is achieved”).
Budge does not expressly disclose, however, Jiao teaches information indicative of how many of the threads have completed performance of one or more instructions on data shared by the plurality of threads (see ¶[0031] “the thread monitor 306 constantly updates the thread counter 312 in FIG. 3 to track the number of threads 316 that have completed execution”).
Hence, it would have been obvious to one or ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting Jiao for synchronizing a plurality of threads in a general purpose shader in a graphics processor (see ¶[0004] of Jiao).

Claims 9-10 and 38-39 are rejected under 35 U.S.C. 103 as being unpatentable over Budge (U.S. Patent 8,438,370) and Jiao (U.S. PG PUB 2012/0096474) as applied to claim 1 and 30 above, further in view of Lindholm et al. (U.S. Patent 7,015,913).

Regarding claim 9, Budge does not specifically disclose wherein a first thread among the group of threads is selected to perform a set of preprocessing instructions based, at least in part on if the first thread is available to perform the set of preprocessing instructions before other threads of the group of threads.
However, Lindholm teaches wherein a first thread among the group of threads is selected to perform a set of preprocessing instructions based, at least in part on if the first thread is available to perform the set of preprocessing instructions before other threads of the group of threads (see ¶ 48, “In step 511, Instruction Scheduler 430 determines if source data required to process the program instruction associated with the thread is available, and, if so, in step 513 Instruction Scheduler 430 outputs the program instruction associated with the thread to Instruction Dispatcher 440.”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting the teachings of Lindholm to perform multithreaded processing.

Regarding claim 10, Budge does not specifically disclose wherein if the first thread is not available to perform the set of preprocessing instructions before other threads of the group of threads, a second thread among the group of threads is selected to perform the set of preprocessing instructions.
However, Lindholm teaches wherein if the first thread is not available to perform the set of preprocessing instructions before other threads of the group of threads, a second thread among the group of threads is selected to perform the set of preprocessing instructions (see ¶ 11, “First source data required to process the program instruction associated with the first thread are determined to be not available. Second source data required to process the program instruction associated with the second thread are determined to be available. The program instruction associated with the second thread to process the second sample in the execution unit is dispatched prior to dispatching the program instruction associated with the first thread to process the first sample in the execution unit.”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Budge by adapting the teachings of Lindholm to enable processing of samples independent of an order in which the samples are received (see ¶ 12 of Lindholm).

Regarding claim 38, Budge and Jiao do not expressly disclose wherein the thread is selected to perform a set of prologue instructions based, at least in part, on if the thread is first among the group of threads available to perform the set of prologue instructions. 
However, Lindholm teaches wherein the thread is selected to perform a set of prologue instructions based, at least in part, on if the thread is first among the group of threads available to perform the set of prologue instructions (see ¶ 48, “In step 511, Instruction Scheduler 430 determines if source data required to process the program instruction associated with the thread is available, and, if so, in step 513 Instruction Scheduler 430 outputs the program instruction associated with the thread to Instruction Dispatcher 440.”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Budge and Jiao by adapting the teachings of Lindholm 2 to enable processing of samples independent of an order in which the samples are received (see ¶ 12 of Lindholm).

Regarding claim 39, Budge and Jiao do not expressly disclose wherein a second thread is selected to perform the set of prologue instructions if the first thread is not first among the group of threads available to perform the set of prologue instructions and the second thread is first among the group of threads available to perform the set of prologue instructions.
However, Lindholm teaches wherein a second thread is selected to perform the set of prologue instructions if the first thread is not first among the group of threads available to perform the set of prologue instructions and the second thread is first among the group of threads available to perform the set of prologue instructions (see ¶ 11, “First source data required to process the program instruction associated with the first thread are determined to be not available. Second source data required to process the program instruction associated with the second thread are determined to be available. The program instruction associated with the second thread to process the second sample in the execution unit is dispatched prior to dispatching the program instruction associated with the first thread to process the first sample in the execution unit.”).
Hence, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Budge and Jiao by adapting the teachings of Lindholm to enable processing of samples independent of an order in which the samples are received (see ¶ 12 of Lindholm).

Response to Arguments
Applicant's arguments filed on 12/02/2024 have been fully considered but they are not persuasive. 
Regarding 112(a) rejections, examiner has withdrawn in light of the amendments filed.
Regarding 112(b) rejections, applicants only address the claims in terms of a plurality of threads completed performance, which has already been withdrawn. It has not addressed the remaining 112(b) rejections.
Regarding 101 rejections, applicants argue that responding to an API call by a thread of a plurality of threads by providing, to the thread, information indicative of how many of the plurality of threads have completed performance of one or more instructions on data shared by the plurality of threads is not something that can practically be performed by the human mind.
Applicants further argue the additional elements would integrate into a practical application because the limitations are related to processors, circuitry, and execution of instructions to, in response to a call to and API. 
Applicants further argue the improvement the claims are for is to improve parallel computing related to threads that perform instructions using data shared by threads.
Examiner disagrees. The claimed limitations recite an abstract idea, because the limitations as drafted, are functions that, under its broadest reasonable interpretation, recite the abstract idea of a mental process.  The limitations encompass a human mind carrying out the functions through observation, evaluation, judgment and/or opinion, or even with the aid of pen and paper.  Thus, these limitations recite and fall within the “Mental Processes” grouping of abstract ideas. See MPEP §2106.04(a)(2). For example, “an indication of an order in which one or more of a plurality of threads completed performance of one or more instructions” can be done via human mind with aid of pen and paper. For example, a human mind with aid of pen and paper can information indicative of how many of the plurality of threads have completed performance of the one or more instructions on data shared by the plurality of threads, can be indicated in a piece of paper. The response to API call, is merely a way of applying the abstract idea to a computing component and is neither a practical application or inventive concept. The improvement applicant is mentioning is merely insignificant post solution activity of applying the abstract idea to plurality of threads, which is neither a practical application nor an inventive concept.

Regarding 103 rejections, applicants’ arguments are moot due to the new grounds of rejection.

Support for Amendments and Newly Added Claims
Applicants are respectfully requested, in the event of an amendment to claims or submission of new claims, that such claims and their limitations be directly mapped to the specification, which provides support for the subject matter.  This will assist in expediting compact prosecution.  MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.”  Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R.  1.121(b), (c), (d), and (h) and therefore held not fully responsive.  Generic statements such as “Applicants believe no new matter has been introduced” may be deemed insufficient.
Interview Requests
In accordance with 37 CFR 1.133(a)(3), requests for interview must be made in advance.  Interview requests are to be made by telephone (571-270-7848) call or FAX (571-270-8848).  Applicants must provide a detailed agenda as to what will be discussed (generic statement such as “discuss §102 rejection” or “discuss rejections of claims 1-3” may be denied interview).  The detail agenda along with any proposed amendments is to be written on a PTOL-413A or a custom form and should be faxed (or emailed, subject to MPEP 713.01.I / MPEP 502.03) to the Examiner at least 5 business days prior to the scheduled interview. Interview requests submitted within amendments may be denied because the Examiner was not notified, in advance, of the Applicant Initiated Interview Request and due to time constraints may not be able to review the interview request to prior to the mailing of the next Office Action.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Daloze (U.S. PG PUB 2018/0300132) teaches a single thread and a sequential data structure. The single thread includes functionality to execute on a parallel processor. The method further includes detecting whether the collection is shared by multiple threads by tracking reachability of the collection, and modifying the data representation and the implementation of the shared collection for synchronization of the multiple threads. The method may also include testing whether the multiple threads are synchronized on the shared collection.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARINA YUN whose telephone number is (571)270-7848. The examiner can normally be reached Mon, Tues, Thurs, 9-4 (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to call.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kevin Young can be reached at (571) 270-3180. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Carina Yun
Patent Examiner
Art Unit 2194



/CARINA YUN/Examiner, Art Unit 2194
Read full office action
Prosecution Timeline

Mar 19, 2020
Application Filed
Aug 18, 2021
Non-Final Rejection — §101, §103, §112
Jan 24, 2022
Interview Requested
Feb 16, 2022
Response Filed
Feb 28, 2022
Final Rejection — §101, §103, §112
Jul 20, 2022
Interview Requested
Sep 07, 2022
Response after Non-Final Action
Sep 07, 2022
Notice of Allowance
Oct 03, 2022
Response after Non-Final Action
Apr 07, 2023
Response after Non-Final Action
Apr 12, 2023
Response after Non-Final Action
Apr 24, 2023
Response after Non-Final Action
Jun 30, 2023
Response after Non-Final Action
Jul 01, 2023
Response after Non-Final Action
Jul 03, 2023
Response after Non-Final Action
Jul 03, 2023
Response after Non-Final Action
Sep 30, 2024
Response after Non-Final Action
Dec 02, 2024
Request for Continued Examination
Dec 11, 2024
Response after Non-Final Action
Feb 10, 2025
Non-Final Rejection — §101, §103, §112
Jun 03, 2025
Interview Requested
Jun 13, 2025
Response Filed
Jul 01, 2025
Final Rejection — §101, §103, §112
Aug 26, 2025
Interview Requested
Dec 08, 2025
Request for Continued Examination
Dec 18, 2025
Response after Non-Final Action
Feb 16, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/454,088
Patent 12578996
ADAPTIVE HIGH-PERFORMANCE TASK DISTRIBUTION FOR MANAGING COMPUTING RESOURCES ON CLOUD
2y 5m to grant Granted Mar 17, 2026
17/718,620
Patent 12572398
CONSOLE COMMAND COMPOSITION
2y 5m to grant Granted Mar 10, 2026
17/388,127
Patent 12554562
INTERSYSTEM PROCESSING EMPLOYING BUFFER SUMMARY GROUPS
2y 5m to grant Granted Feb 17, 2026
18/194,214
Patent 12498996
HYBRID PAGINATION FOR RETRIEVING DATA
2y 5m to grant Granted Dec 16, 2025
17/213,733
Patent 12474974
SYSTEMS AND METHODS FOR POWER MANAGEMENT FOR MODERN WORKSPACES
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
50%
Grant Probability
83%
With Interview (+33.5%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 322 resolved cases by this examiner. Grant probability derived from career allow rate.
TECHNIQUES FOR ORCHESTRATING STAGES OF THREAD SYNCHRONIZATION

This examiner grants 50% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email