Last updated: April 19, 2026
Application No. 17/691,872
Techniques for Scalable Load Balancing of Thread Groups in a Processor

Non-Final OA §102§103
Filed
Mar 10, 2022
Examiner
DAO, TUAN C.
Art Unit
2198
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
3 (Non-Final)
Interview Optional

— +15.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 782 resolved cases, 2023–2026
Examiner Intelligence

DAO, TUAN C. View full profile →
Grants 82% — above average
Career Allow Rate
642 granted / 782 resolved
+27.1% vs TC avg
Strong +16% interview lift
Without
With
+15.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
38 currently pending
Career history
820
Total Applications
across all art units
Statute-Specific Performance

§101
18.3%
-21.7% vs TC avg
§103
51.8%
+11.8% vs TC avg
§102
18.6%
-21.4% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 782 resolved cases
Office Action

§102 §103
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The present application is being examined under the pre-AIA  first to invent provisions.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  
Applicant's submission filed on 09/02/2025 has been entered.
Claims 1, 3-14, and 17-28 have been examined. 

Claim Objections
Claim 28 is objected to because of the following informalities:
Claim 28 is objected to because of the following informalities:  the “The processor system of claim.” It is not clear about a dependency of claim 28. Furthermore, the feature “the same GPC” is insufficient antecedent basis and needed to spell out. 
Appropriate correction is required.

Response to Amendment
In the instant amendment, claims 1, 6, 8-9, 12, 14, and 19-20 have been amended. Claims 21-28 are newly added.
Allowable Subject Matter
Claims 14, and 17-19 are allowed.
Claim 23 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3-5, and 9-12 are rejected under 35 U.S.C. 102(XYZ) as being anticipated by US 2014/0123150 to Lindholm  et al. (hereinafter “Lindholm”) in view of US 2014/0337389 to Ricketts et al. (hereafter “Ricketts”) and  US 2013/0132684 to Ostrovsky et al. (hereafter “Ostrovsky”).

As per claim 1, Lindholm discloses a processing system including:
a set of plural processors (FIGs. 2-3; paragraphs 0027, 0029, 0030-0031, and 0047-0048: a plurality of GPC 208 [Wingdings font/0xE0] a plurality of multiprocessor SM 310), and
a work distributor (FIGs. 3C; paragraphs 0031 and 0047-0048: CPU 102 and SM 310) that distributes thread blocks to the set of processors for execution (FIGs. 3-4; paragraphs 0047-0048, 0055 and 0067: “As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.” [Wingdings font/0xE0] thread blocks are a assigned to a Exc units 302), and each of the plural processor is configured to execute the thread blocks (FIGs. 3-4; paragraphs 0047-0048, 0055 and 0067), the work distributor being configured to:
(a) balance loading of the thread blocks across the set of plural processors (FIG. 3-4; paragraph 0067: “As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.”), and (b) guarantee the set of plural processors can execute the thread blocks concurrently (FIG. 3-4; paragraphs 0048, 0063 and 0067: “Sixteen warps are reserved for processing the thread blocks, where each warp includes 4 threads. Therefore, each thread block is a group of 64 threads having resources that are allocated together. In another embodiment, a thread block includes a different number of threads, e.g., a thread block is a group of 16 threads (4 warps of 4 threads each). As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.”),
wherein the respective thread blocks are assigned identifier coordinates for execution on the set of processors (FIG. 4; paragraphs 0067-0071: “the thread block is computed by truncating the lowest 4 bits of the logical identifier. The lowest 4 bits of the logical identifier are an offset within the thread block. The physical identifier for the thread is computed by mapping the thread block to a corresponding physical identifier base and then using the offset to locate the processing resources allocated for the thread. For example, the high bits of the physical identifier may be used to determine the thread block and the lower bits may be used to determine the particular thread within the thread block.”).
Ricketts further discloses wherein the thread blocks are represented by a grid (paragraphs 0023 and 0039) comprising a collection of thread blocks arranged in a multi-dimensional array (paragraphs 0023 and 0039), 
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Ricketts into Lindholm’s teaching because it would provide for the purpose of the CPU executes a driver kernel that implements an application programming interface (API) that enables one or more applications executing on the CPU to schedule operations for execution on the PPU (Rickett, paragraph 0023).
Ostrovsky further discloses the thread blocks in a respective portion of the grid (FIGs. 2 and 6A-B;  paragraphs 0035, 0046 and 0058-0059).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Ostrovsky into Lindholm’s teaching and Ricketts’ teaching because it would provide for the purpose of the function is configured to be invoked as a plurality of threads on a parallel accelerator processor to operate over a multi-dimensional matrix of memory cells (Ostrovsky, paragraph 0006).

As per claim 3, Lindholm does not explicitly disclose wherein the grid comprises a three-dimensional array of cooperative thread arrays.
Ricketts further discloses wherein the grid comprises a three-dimensional array of cooperative thread arrays (paragraphs 0023 and 0039).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Ricketts into Lindholm’s teaching because it would provide for the purpose of the CPU executes a driver kernel that implements an application programming interface (API) that enables one or more applications executing on the CPU to schedule operations for execution on the PPU (Rickett, paragraph 0023).

As per claim 4, Lindholm discloses wherein the processors comprise streaming multiprocessors (paragraphs 0046-0047: GPC 208 including streaming processor 310) and the work distributor comprises a hardware circuit (FIGs. 3-5; paragraphs 0047-0050).

As per claim 5, Lindholm discloses wherein the work distributor comprises a first work distributor configured to distribute work across the set of plural processors (FIGs. 1-2; paragraph 0031: CPU 102 writes stream commands for each PPU) , and a plurality of second work distributors structured to assign work to individual processors (FIGs. 2-5; paragraphs 0047-0048: each PPU having GPC including SM scheduling works across Exec unit 302).

As per claim 9, Lindholm discloses wherein the work distributor load balances the thread blocks across the set of processors by simultaneously selecting more than one processor to launch and execute thread blocks (FIG. 3-4; paragraphs 0048, 0063 and 0067: “Sixteen warps are reserved for processing the thread blocks, where each warp includes 4 threads. Therefore, each thread block is a group of 64 threads having resources that are allocated together. In another embodiment, a thread block includes a different number of threads, e.g., a thread block is a group of 16 threads (4 warps of 4 threads each). As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.”)

As per claim 10, Lindholm discloses wherein the respective thread blocks are part of a Cooperative Group Array (CGA) (paragraphs 0049, 0056 and 0084).

As per claim 11, Lindholm disclose wherein the work distributor selectively does not launch more than one thread array that is part of the common array on any one of the processors of the set of plural processors (paragraphs 0084-0085).

As per claim 12, Lindholm discloses wherein the plural processors each comprise hardware that independently derives or calculates a unique thread block identifier (FIG. 4; paragraphs 0067-0071).


Claim 6, are rejected under 35 U.S.C. 103 as being unpatentable over Lindholm, Ricketts, and Ostrovsky, as applied to claim 1, and further in view of US 2003/0120778 to Chaboud et al. (hereafter “Chaboud”) and US 2015/0039860 to Sundar et al. (hereafter “Sundar”)

 
As per claim 6, Lindholm discloses wherein the work distributor includes a query model of the set of processors (FIGs. 3-4; paragraphs 0047-0048, 0055 and 0067).
Rickett further discloses 
Chaboud further discloses uses the query model to launch the thread blocks against a state of the set of processors to test whether the thread blocks can launch and execucute concurrently (FIGs. 2-3; paragraphs 0048-0049).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Chaboud into Lindholm’s teaching, Ricketts’ teaching and Ostrovsky’s teaching because it would provide for the purpose of allocating remotely accessible system resources is performed if there are insufficient local system resources to perform the task (Chaboud, paragraph 0010).
Sundar further discloses the state is a shadow state of the set of processors (paragraph 0061: copies of state in a particular of time).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Sundar into Lindholm’s teaching, Ricketts’ teaching, Ostrovsky’s teaching and Chaboud’ teaching because it would provide for the purpose of when a speculative misprediction or an exception is detected, control logic within the processor 200 is able to select a given checkpoint or snapshot. The control logic utilizes the information stored in the snapshot to recover the architectural state and restart instruction processing at that point (Sundar, paragraph 0061).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Lindholm, Ricketts, Ostrovsky, Chaboud and Sundar, as applied to claim 6, and further in view of US 2006/0036580 to Stata et al. (hereafter “Stata”)

As per claim 7, Lindholm does not explicitly disclose wherein the work distributor maintains a live query model that is updated continually, and a further query model that stores a shadow state.
Stata further discloses wherein the work distributor maintains a live query model that is updated continually, and a further query model that stores a shadow state (paragraphs 0056 and 0071).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Stata into Lindholm’s teaching, Ricketts’ teaching, Ostrovsky’s teaching, Chaboud’s teaching and Sundar’s teaching because it would provide for the purpose of providing the capability to continuously update the query (Stata, paragraph 0056).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Lindholm, Ricketts, Ostrovsky, Chaboud, and Sundar, as applied to claim 6, and further in view of US 2014/0013330 to Wang et al. (hereafter “Wang”)

As per claim 8, Lindholm does not explicitly disclose wherein the work distributor uses the query model in an iterative or recursive manner to test launch of multiple hierarchical levels of thread block groups.
Chaboud further discloses to test launch multiple hierarchical levels of thread block groups (FIGs. 2-3; paragraph 0048).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Chaboud into Lindholm’s teaching, Ricketts’ teaching and Ostrovsky’s teaching because it would provide for the purpose of allocating remotely accessible system resources is performed if there are insufficient local system resources to perform the task (Chaboud, paragraph 0010).
Wang further discloses wherein the work distributor uses the query model in an iterative or recursive manner to launch multiple hierarchical levels of thread block groups (paragraph 0066).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Wang into Lindholm’s teaching, Ricketts’ teaching, Ostrovsky’s teaching, Chaboud’s teaching and Sundar’s teaching because it would provide for the purpose of instances of deadlocked applications, disabled application features, and/or general unresponsiveness in the operations of applications executing on the computing device may be diminished (Wang, paragraph 0005).
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Lindholm, Ricketts, and Ostrovsky, as applied to claim 1, further in view of US 2009/0063885 to Arimilli et al. (hereafter “Arimilli”)

As per claim 13, Lindholm does not explicitly disclose wherein the work distributor is configured to determine, based on respective loading levels of the processors in the set of plural processor, which processors in the set of plural processor are likely to execute new work fastest. 
Arimilli further discloses wherein the work distributor is configured to determine, based on respective loading levels of the processors in the set of plural processor, which processors in the set of plural processor are likely to execute new work fastest (paragraph 0084).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Arimilli into Lindholm’s teaching, Ricketts’ teaching, and Ostrovsky’s teaching because it would provide for the purpose of based on a determined time difference between the fastest and slowest processors, a corresponding amount of data to be shifted may be identified (Arimilli, paragraph 0084).

Claims 20-21, and 25-28 rejected under 35 U.S.C. 103 as being unpatentable over Lindholm further in view of US 2010/0325187 to Juffa et al. (hereafter “Juffa”)

As per claim 20, Lindholm discloses a processing system comprising:
a launch test circuit connected to receive instructions to launch a thread group array (FIG. 4; paragraphs 0035, 0049, 0067-0068 and 0084-0085: “This collection of thread groups is referred to herein as a "cooperative thread array" ("CTA") or "thread array." Each CTA comprises a programmer-specified number of warps executing in the same SM 310. One or more CTAS can potentially execute concurrently in the same SM 310 The size of a CTA is generally determined by the programmer and the amount of hardware resources, such as memory or registers, available to the CTA.” And “The task/work unit 207 receives tasks from the front end 212 and ensures that GPCs 208 are configured to a valid state before the processing specified by each one of the TMDs is initiated. A priority may be specified for each TMD that is used to schedule execution of the processing task. Processing tasks can also be received from the processing cluster array 230. Optionally, the TMD can include a parameter that controls whether the TMD is added to the head or the tail for a list of processing tasks (or list of pointers to the processing tasks), thereby providing another level of control over priority.” [Wingdings font/0xE0]determining/ensuring/checking (testing as claimed) resources/priority/particular barrier before to schedule CTA (thread arrays) processing) comprising a collection of multiple thread blocks (FIGs. 3-4; paragraphs 0047-0048, 0055 and 0067: “As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.” [Wingdings font/0xE0] thread blocks are a assigned to a Exc units 302), the launch test circuit configured to determine, before launching the thread group array (paragraphs 0035 and 0085: “The task/work unit 207 receives tasks from the front end 212 and ensures that GPCs 208 are configured to a valid state before the processing specified by each one of the TMDs is initiated. A priority may be specified for each TMD that is used to schedule execution of the processing task. Processing tasks can also be received from the processing cluster array 230. Optionally, the TMD can include a parameter that controls whether the TMD is added to the head or the tail for a list of processing tasks (or list of pointers to the processing tasks), thereby providing another level of control over priority.” [Wingdings font/0xE0]determining/ensuring/checking (testing as claimed) resources/priority/particular barrier before to schedule CTA (thread arrays) processing), whether all thread blocks in the thread group array can execute concurrently across a set of plural processors at the same hardware organization level the thread group array specifies or is associated with (FIGs. 2 and 3A-B; paragraphs 0034-0035, 0045 and 0047: multiple PPU [Wingdings font/0xE0] each PPU having multiple GPC [Wingdings font/0xE0] each GPC having multiple SM [Wingdings font/0xE0] each SM having multiple Exec Unit 302 [Wingdings font/0xE0] the Exec Units 302s are considered as the same level), without requiring preempting of other executing tasks the set of plural processors are already executing (paragraphs 0048-0049 and 0067: “Sixteen warps are reserved for processing the thread blocks, where each warp includes 4 threads. Therefore, each thread block is a group of 64 threads having resources that are allocated together. In another embodiment, a thread block includes a different number of threads, e.g., a thread block is a group of 16 threads (4 warps of 4 threads each). As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.” [Wingdings font/0xE0] parallelly/simultaneously processing [Wingdings font/0xE0] threads are processed at the same time); and
concurrently launches all the thread blocks in the thread group array while balancing loading by the launching thread blocks and the other executing tasks of the plural processors across the set (paragraphs 0048-0049 and 0067: “Sixteen warps are reserved for processing the thread blocks, where each warp includes 4 threads. Therefore, each thread block is a group of 64 threads having resources that are allocated together. In another embodiment, a thread block includes a different number of threads, e.g., a thread block is a group of 16 threads (4 warps of 4 threads each). As shown in FIG. 4, 128 warps may be simultaneously processed by an SM 310 and the four thread blocks may be distributed to different execution units 302 for load balancing across the different execution units 302.”).
Lindholm discloses threads groups in the thread array, however, Lindholm does not explicitly disclose a launch circuit that, conditioned on the-determination by the launch test circuit that all thread blocks in the thread group array can execute concurrently across the set of plural processors (FIGs. 2 and 4A; paragraphs 0034-0039 and 0050-0051: “The size of a particular CTA is equal to m*k, where k is the number of concurrently executing threads in a thread group and is also an integer multiple of the number of streaming processors in a streaming multiprocessor, and m is the number of thread groups simultaneously active on the streaming multiprocessor. The size of a CTA is generally determined by the amount of hardware resources, such as memory or registers, available to the CTA as well as by the size of the matrix tiles processed by the CTA.” And “In step 404, a software process determines the size of the CTAs. As previously described herein, the CTA size is generally determined by the amount of hardware resources within the streaming multiprocessors available to the CTAs as well as by the size of the result tiles.”) at the same hardware organization level the thread group array specifies or is associated with (FIGs. 2 and 4A; paragraphs 0034-0039 and 0050-0051: “The size of a particular CTA is equal to m*k, where k is the number of concurrently executing threads in a thread group and is also an integer multiple of the number of streaming processors in a streaming multiprocessor, and m is the number of thread groups simultaneously active on the streaming multiprocessor. The size of a CTA is generally determined by the amount of hardware resources, such as memory or registers, available to the CTA as well as by the size of the matrix tiles processed by the CTA”[Wingdings font/0xE0] SMs (stream processor) in graphic adapter 102).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Juffa into Lindholm’s teaching because it would provide for the purpose of provides an elegant way to compute the elements of a result matrix on a tile-by-tile basis using multiple CTAs that execute concurrently on the different streaming multiprocessors of a graphics processing unit (Juffa, paragraph 0014).

As per claim 21, Lindholm discloses wherein the launch test circuit comprises a speculative launch test circuit (FIG. 4; paragraphs 0035, 0049, 0067-0068 and 0084-0085: “This collection of thread groups is referred to herein as a "cooperative thread array" ("CTA") or "thread array." Each CTA comprises a programmer-specified number of warps executing in the same SM 310. One or more CTAS can potentially execute concurrently in the same SM 310 The size of a CTA is generally determined by the programmer and the amount of hardware resources, such as memory or registers, available to the CTA.” And “The task/work unit 207 receives tasks from the front end 212 and ensures that GPCs 208 are configured to a valid state before the processing specified by each one of the TMDs is initiated. A priority may be specified for each TMD that is used to schedule execution of the processing task. Processing tasks can also be received from the processing cluster array 230. Optionally, the TMD can include a parameter that controls whether the TMD is added to the head or the tail for a list of processing tasks (or list of pointers to the processing tasks), thereby providing another level of control over priority.” [Wingdings font/0xE0]determining/ensuring/checking (testing as claimed) resources/priority/particular barrier before to schedule CTA (thread arrays) processing)

As per claim 25, Lindholm does not explicitly disclose wherein the thread group array comprises a Cooperative Group Array (CGA) represented by a multi-dimensional grid.
Juffa further discloses wherein the thread group array comprises a Cooperative Group Array (CGA) represented by a multi-dimensional grid (in view of paragraph 0118 of the PGPUB or the instant specification, CGA is a collection/group of the number or CTA, Juffa FIG. 5; paragraph 0064 discloses a group of CTA with a specific size is created as claimed CGA).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Juffa into Lindholm’s teaching because it would provide for the purpose of provides an elegant way to compute the elements of a result matrix on a tile-by-tile basis using multiple CTAs that execute concurrently on the different streaming multiprocessors of a graphics processing unit (Juffa, paragraph 0014).

As per claim 26, Lindholm discloses wherein the launch circuit is configured to launch each thread of a thread block onto the same processor of the plural processors (FIG. 3B; paragraphs 0048-0049 and 0067: parallelly executing a CTA in one SM), and to launch at least some different thread blocks on different processors of the plural processors (FIG. 3B; paragraphs 0048-0049 and 0067: parallelly executing a CTA in another SM).

A per claim 27, Lindholm discloses wherein the launch circuit is configured to launch thread blocks on the least utilized processors of the plural processors (FIG. 3B; paragraphs 0048-0049 and 0067)

As per claim 28, Lindholm discloses wherein the launch circuit is configured to schedule and confine the thread group array to launch on processors within the same GPC cluster (FIGs. 2 and 3A-B; paragraphs 0034-0035, 0045, 0047 and 0061).

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Lindholm in view of Juffa, as applied to claim 20, and further in view of US 2010/0153541 to Arimilli et al. (hereafter “Arimilli ‘541”)

As per claim 22, Lindholm does not explicitly disclose wherein the launch test circuit hardware- guarantees availability of sufficient processing resources to launch and concurrently execute in parallel the collection of multiple thread blocks of the thread group array.
Arimili ‘541 further discloses wherein the launch test circuit hardware- guarantees availability of sufficient processing resources to launch and concurrently execute in parallel the collection of multiple thread blocks of the thread group array (paragraph 0027: “threads associated with a job may be assigned (or re-assigned) to processors based on individual processor loads and network traffic. As used herein, the term "job" is a collection of threads that perform parallel computing.”)
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Arimili into Lindholm’s teaching and Juffa’s teaching because it would provide for the purpose of employing monitoring hardware that dynamically modifies job assignments for processors in an HPC cluster (Arimili ‘541, paragraph 0027).
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Lindholm in view of Juffa, as applied to claim 20, and further in view of US 2018/0181502 to Jen et al. (hereafter “Jen”)

As per claim 24, Lindholm does not explicitly disclose communication circuits that provide direct processor-to-processor communication between the set of plural processors.
Jen further discloses communication circuits that provide direct processor-to-processor communication between the set of plural processors (FIG. 5; paragraph 0009).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Jen into Lindholm’s teaching and Juffa’s teaching because it would provide for the purpose of a high performance PCIe device may be coupled to UPI through an appropriate translation bridge (i.e. UPI to PCIe). Moreover, the UPI links may be utilized by many UPI based devices, such as processors, in various ways (e.g. stars, rings, meshes, etc.) (Jen, paragraph 0052).

Response to Arguments
Applicants’ arguments have been considered but are moot in view of the new ground(s) of rejection.  Applicants’ amendment necessitated the new ground(s) of rejection presented in this Office action. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tuan Dao whose telephone number is (571) 270 3387. The examiner can normally be reached on Monday to Friday from 09am to 05pm. The examiner can also be reached on alternate Fridays.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital, can be reached at telephone number (571) 272 4215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form.

/TUAN C DAO/Primary Examiner, Art Unit 2198
Read full office action
Prosecution Timeline

Mar 10, 2022
Application Filed
Sep 06, 2024
Non-Final Rejection — §102, §103
Jan 09, 2025
Response Filed
Mar 25, 2025
Final Rejection — §102, §103
Jul 31, 2025
Response after Non-Final Action
Sep 02, 2025
Request for Continued Examination
Sep 09, 2025
Response after Non-Final Action
Nov 12, 2025
Non-Final Rejection — §102, §103
Apr 08, 2026
Examiner Interview Summary
Apr 08, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

17/736,312
Patent 12602257
ELECTRONIC DEVICE AND OPERATING METHOD WITH MODEL CO-LOCATION
2y 5m to grant Granted Apr 14, 2026
18/006,191
Patent 12566648
METHOD OF PROCESSING AGREEMENT TASK
2y 5m to grant Granted Mar 03, 2026
18/157,566
Patent 12566627
PREDICTING THE NEXT BEST COMPRESSOR IN A STREAM DATA PLATFORM
2y 5m to grant Granted Mar 03, 2026
17/641,347
Patent 12561173
METHOD FOR DATA PROCESSING AND APPARATUS, AND ELECTRONIC DEVICE
2y 5m to grant Granted Feb 24, 2026
18/414,211
Patent 12561591
CLASSIFICATION AND TRANSFORMATION OF SEQUENTIAL EVENT DATA
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
98%
With Interview (+15.6%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 782 resolved cases by this examiner. Grant probability derived from career allow rate.