Last updated: April 19, 2026
Application No. 17/471,116
ARITHMETRIC UNIT SELECTION PROCESS BASED ON PROCESS EXECUTION CONTROL

Non-Final OA §103§112
Filed
Sep 09, 2021
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Actapio Inc.
OA Round
3 (Non-Final)
Interview Optional

— +38.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →
Grants 52% of resolved cases
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units
Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 9/25/2025 has been entered.

Remarks
This Office Action is responsive to Applicants' Amendment filed on September 25, 2025, in which claims 1, 6, and 9-11 are currently amended. Claims 1, 2, and 4-11 are currently pending.

Response to Arguments
The rejections to claim 6 under 35 U.S.C. § 112(b) is hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.

Applicant’s arguments with respect to rejection of claims 1, 2, and 4-11 under 35 U.S.C. 101 based on amendment have been considered and are persuasive. The rejection under 35 U.S.C. 101 is withdrawn in view of the amendments and remarks made to the rejections.


Applicant’s arguments with respect to rejection of claims 1, 2, and 4-11 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Objections
Claim 1 objected to because of the following informalities: Regarding claim 1, "decide an arithmetic unit as an execution target, that is, decides which" should read "decide an arithmetic unit as an execution target, that is, decide which".  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 11 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 11, "a learning model with a highest model accuracy" is indefinite.  It's unclear what the learning model with highest model accuracy is determined relative to.  Claim 1 only requires training a single learning model and makes no mention of a plurality of learning models so it's unclear what the learning models' accuracy is high relative to.  In the interest of further examination the trained learning model of claim 11 is interpreted as a learning model with a highest model accuracy (trained to minimized loss) selected for deployment.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


	Claims 1, 2, 5, 6, 9,  and 10 are rejected under U.S.C. §103 as being unpatentable over the combination of Liu (“Griffin: Uniting CPU and GPU in Search Engines for Intra-Query Parallelism”, 2018) and Nagarajan (“Deterministic Implementations for Reproducibility in Deep Reinforcement Learning”, 2019).

	 Regarding claim 1, Liu teaches An execution control apparatus comprising: a plurality of arithmetic units; and a processor in communication with the plurality of arithmetic units, wherein the processor is configured to: ([Abstract] "We present Griffin, a search engine that dynamically combines GPU- and CPU-based algorithms to process individual queries according to their characteristics. Griffin uses state of-the-art CPU-based query processing techniques and incorporates a novel approach to GPU-based query evaluation" GPU and CPU interpreted as arithmetic units)
	specify features of a model used when the plurality of arithmetic units each having different architectures executes a predetermined process;([p. 2 §2] "the best techniques for a CPU-based search engine will differ from those for a GPU-based one. Below, we describe state-of-the-art approaches of query processing in search on CPU and GPU" [p. 4 §3] "Griffin consists of three main components: Griffin-GPU, the CPU query processing component, and the scheduler. Griffin-GPU implements the advanced parallel algorithms on the GPU. The CPU query processing component implements state-of-the-art CPU query algorithms [11, 26, 40]. The scheduler decides where to run the current query operation" Griffin-GPU, CPU processing component, and scheduler are all features of the model used when the plurality of arithmetic units having different architectures (CPU and GPU) execute a predetermined process.)
	decide an arithmetic unit as an execution target, that is, decides which of the plurality of arithmetic units is to execute the process using the model, based on the features; and([p. 2 §2] "the best techniques for a CPU-based search engine will differ from those for a GPU-based one. Below, we describe state-of-the-art approaches of query processing in search on CPU and GPU" [p. 4 §3] "Griffin consists of three main components: Griffin-GPU, the CPU query processing component, and the scheduler. Griffin-GPU implements the advanced parallel algorithms on the GPU. The CPU query processing component implements state-of-the-art CPU query algorithms [11, 26, 40]. The scheduler decides where to run the current query operation" [p. 7] "the scheduler first decides if the current query is suitable for running on Griffin-GPU or on CPU, depending on the length ratio for the two shortest inverted lists. If the ratio is less than 128, Griffin begins execution on the GPU. After each intersection, the scheduler will check if the length ratio of the two lists in the next round is less than 128. If it is, processing continues on the GPU. Otherwise, Griffin transfers the intermediate results to CPU")
	control the arithmetic unit to execute the process using the model.([p. 2 §2] "the best techniques for a CPU-based search engine will differ from those for a GPU-based one. Below, we describe state-of-the-art approaches of query processing in search on CPU and GPU" [p. 4 §3] "Griffin consists of three main components: Griffin-GPU, the CPU query processing component, and the scheduler. Griffin-GPU implements the advanced parallel algorithms on the GPU. The CPU query processing component implements state-of-the-art CPU query algorithms [11, 26, 40]. The scheduler decides where to run the current query operation" [p. 7] "the scheduler first decides if the current query is suitable for running on Griffin-GPU or on CPU, depending on the length ratio for the two shortest inverted lists. If the ratio is less than 128, Griffin begins execution on the GPU. After each intersection, the scheduler will check if the length ratio of the two lists in the next round is less than 128. If it is, processing continues on the GPU. Otherwise, Griffin transfers the intermediate results to CPU" In Liu Griffin is the model which controls the process execution.)
	wherein deciding the arithmetic unit as the execution target comprises deciding the arithmetic unit from a first arithmetic unit that is a deterministic central processing unit (CPU) having a branch prediction function ([p. 3 §2.2] "Most search engines execute decompression, list intersection, and ranking operations on CPU. CPU is good at dealing with complex logic, and its advanced prefetch and branch handling can provide high performance and efficiency. As a result, CPU is able to run fast sequential merge, especially when the data accesses exhibit ample spatial locality [...] the CPU clock speed and aggressive branch handling will still perform well" [p. 7 §3.2] "At this time, the modern CPUs with speculative execution and branch prediction can address the branch divergence effectively while avoiding the additional overhead of moving data to GPU.")
	and a second arithmetic unit that is a non-deterministic graphics processing unit (GPU) with no branch prediction function([p. 4 §2.3] "GPU binary search is not efficient. The frequent branch divergence results in idle threads and reduced performance.").
	However, Liu does not explicitly teach wherein when a process is executed by the first arithmetic unit using same input data at different times, outputs of the first arithmetic unit remain the same,
	wherein when a process is executed by the second arithmetic unit using same input data at different times, outputs of the second arithmetic unit are not guaranteed to be the same.

	Nagarajan, in the same field of endeavor, teaches wherein when a process is executed by the first arithmetic unit using same input data at different times, outputs of the first arithmetic unit remain the same,([p. 1] "Deterministic implementation: a computer program that, when run under some fixed experimental conditions, will always produce identical outputs for a given input." [p. 3] "From the hardware side, running the same deterministic implementation on a CPU can yield different results from running deterministically on a GPU. This can be due to several reasons (Whitehead and Fit-Florea 2011), including differences in available operations and in the precision between the CPU and GPU. Further, when a deterministic implementation is run on two different GPU architectures, it may produce different results, since code generated by the compiler is then compiled at run-time for a specific target GPU (NVIDIA Corporation 2018a; NVIDIA Corporation 2018b)")
	wherein when a process is executed by the second arithmetic unit using same input data at different times, outputs of the second arithmetic unit are not guaranteed to be the same.([p. 2 Right Column] "GPU Neural networks are typically trained on graphics processing units (GPUs). Many numerical operations performed on the GPU are nondeterministic by default." [p. 3] "PyTorch exposes a modifiable boolean variable that allows us to enable or disable deterministic numerical computations on the GPU [...] From the hardware side, running the same deterministic implementation on a CPU can yield different results from running deterministically on a GPU. This can be due to several reasons (Whitehead and Fit-Florea 2011), including differences in available operations and in the precision between the CPU and GPU. Further, when a deterministic implementation is run on two different GPU architectures, it may produce different results, since code generated by the compiler is then compiled at run-time for a specific target GPU (NVIDIA Corporation 2018a; NVIDIA Corporation 2018b)").

	Liu as well as Nagarajan are directed towards heterogeneous processor schedulers.  Therefore, Liu as well as Nagarajan are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with the teachings of Nagarajan by considering non-determinism when scheduling operations.  Not only does Nagarajan explicitly disclose that GPUs are naturally non-deterministic, Nagarajan also provides as additional motivation for combination ([Abstract] “We find that individual sources of nondeterminism can substantially impact the performance of agent, illustrating the benefits of deterministic implementations. In addition, we also discuss the important role of deterministic implementations in achieving exact replicability of results”).

	 Regarding claim 2, the combination of Liu, and Nagarajan teaches The execution control apparatus according claim 1, wherein features of a plurality of processes executed as the model are specified as the features of the model, and wherein the arithmetic unit is decided as the execution target to execute the process, for each of the plurality of processes, from among the plurality of arithmetic units, based on the features of the plurality of processes.(Liu [p. 2] "Griffin decides when and where to execute query operation without a priori knowledge of the query’s characteristics. To make a proper decision, Griffin considers overheads due to data transfer between CPU and GPU, GPU memory management, as well as the system load. Griffin addresses these challenges with a dynamic intra-query scheduling algorithm that breaks a query into sub-operations and schedules them to the GPU or to the CPU based on their runtime characteristics" [p. 7] "the scheduler first decides if the current query is suitable for running on Griffin-GPU or on CPU, depending on the length ratio for the two shortest inverted lists. If the ratio is less than 128, Griffin begins execution on the GPU. After each intersection, the scheduler will check if the length ratio of the two lists in the next round is less than 128. If it is, processing continues on the GPU. Otherwise, Griffin transfers the intermediate results to CPU").
	
	 Regarding claim 5, the combination of Liu, and Nagarajan teaches The execution control apparatus according to claim 1, wherein a first arithmetic unit adopting an out-of-order method or a second arithmetic unit not adopting the out-of-order method is decided as the execution target.(Liu [p. 4 §2.3] "GPU binary search is not efficient. The frequent branch divergence results in idle threads and reduced performance." The GPU in Liu does not adopt an out-of-order method as the execution target).
	
	 Regarding claim 6, the combination of Liu, and Nagarajan teaches The execution control apparatus according to claim 1, wherein the first arithmetic unit is a central processing unit having a branch prediction function, (Liu [p. 3 §2.2] "Most search engines execute decompression, list intersection, and ranking operations on CPU. CPU is good at dealing with complex logic, and its advanced prefetch and branch handling can provide high performance and efficiency. As a result, CPU is able to run fast sequential merge, especially when the data accesses exhibit ample spatial locality [...] the CPU clock speed and aggressive branch handling will still perform well" [p. 7 §3.2] "At this time, the modern CPUs with speculative execution and branch prediction can address the branch divergence effectively while avoiding the additional overhead of moving data to GPU.")
	and the second arithmetic unit is an image arithmetic unit having no branch prediction function as the second arithmetic unit.(Liu [p. 4 §2.3] "GPU binary search is not efficient. The frequent branch divergence results in idle threads and reduced performance." GPU (graphics processing unit) interpreted as image (graphic) arithmetic unit).
	
	 Regarding claim 9, claim 9 is directed towards the method performed by the apparatus of claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 9.
	
	 Regarding claim 10, claim 10 is substantially similar to claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 10.

	Claim 4 is rejected under U.S.C. §103 as being unpatentable over the combination of Liu and Nagarajan and Singh (US10726583B2).

	 Regarding claim 4, the combination of Liu, and Nagarajan teaches The execution control apparatus according to claim 1.
	However, the combination of Liu, and Nagarajan doesn't explicitly teach, wherein a first arithmetic unit that performs scalar operations or a second arithmetic unit that performs vector operations is decided as the execution target.

	Singh, in the same field of endeavor, teaches The execution control apparatus according to claim 1, wherein a first arithmetic unit that performs scalar operations or a second arithmetic unit that performs vector operations is decided as the execution target.([Col. 10 l. 5-50] "Each execution unit (e.g. 608A) is an individual vector processor capable of executing multiple simultaneous threads and processing multiple data elements in parallel for each thread").

	The combination of Liu and Nagarajan as well as Singh are directed towards heterogeneous processor scheduling systems.  Therefore, the combination of Liu and Nagarajan as well as Singh are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Liu and Nagarajan with the teachings of Singh by using a processor that performs vector operations.  One of ordinary skill in the art would recognize that vector operations are vital for most machine learning tasks.  Singh provides an improved method of processing vector operations and provides as additional motivation for combination ([Col. 28 l. 1-7] “Data with a very low dynamic range can be encoded using mean encoding. Data having a very small number of unique values can be encoded using unique values coordinate encoding. In one embodiment UAV table encoding can be used as a default encoding method, with significance map (SM) encoding or table encoding (TE) enabled to further increase the efficiency of the UAV table encoding method for certain types of data.”).

	Claim 7 is rejected under U.S.C. §103 as being unpatentable over the combination of Liu and Nagarajan and Lee (US20170205863A1).

	 Regarding claim 7, the combination of Liu, and Nagarajan teaches The execution control apparatus according to claim 6.
	However, the combination of Liu, and Nagarajan doesn't explicitly teach, wherein when the model is a model for multi-class classification, the image arithmetic unit is decided as the arithmetic unit as an execution target.

	Lee, in the same field of endeavor, teaches The execution control apparatus according to claim 6, wherein when the model is a model for multi-class classification, the image arithmetic unit is decided as the arithmetic unit as an execution target.([¶0139] " various offline supervised models may be used. In one example, a multi-class logistic regression model with a ridge estimator may be used to measure the relationship between more than two categorical dependent or independent variables, which exhibit good accuracy and simple implementation").

	The combination of Liu and Nagarajan as well as Lee are directed towards heterogeneous processor scheduling systems.  Therefore, the combination of Liu and Nagarajan as well as Lee are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of The combination of Liu and Nagarajan with the teachings of Lee by performing multi-class classification.  Lee provides as additional motivation for combination ([¶0139] a multi-class logistic regression model with a ridge estimator may be used to measure the relationship between more than two categorical dependent or independent variables, which exhibit good accuracy and simple implementation”).

	Claim 8 is rejected under U.S.C. §103 as being unpatentable over the combination of Liu and Nagarajan and Salajegheh (US20160103996A1).

	 Regarding claim 8, the combination of Liu, and Nagarajan teaches The execution control apparatus according to claim 6.
	However, the combination of Liu, and Nagarajan doesn't explicitly teach wherein when the model is a model for two-class classification, the central processing unit is decided as the arithmetic unit as an execution target.

	Salajegheh, in the same field of endeavor, teaches when the model is a model for two-class classification, the central processing unit is decided as the arithmetic unit as an execution target. ([¶0104] "[0104] Boosted decision stumps are one level decision trees that have exactly one node (and thus one test question or test condition) and a weight value, and thus are well suited for use in a binary classification of data/behaviors").

	The combination of Liu and Nagarajan as well as Salajegheh are directed towards heterogeneous processor systems for machine learning.  Therefore, the combination of Liu and Nagarajan as well as Salajegheh are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Liu and Nagarajan with the teachings of Salajegheh by performing binary classification through boosted decision trees.  Salajegheh provides as additional motivation for combination ([¶0104] "Boosted decision stumps are efficient because they are very simple and primal (and thus do not require significant processing resources). Boosted decision stumps are also very parallelizable, and thus many stumps may be applied or tested in parallel/at the same time (e.g., by multiple cores or processors in the computing device).").

	Claim 11 is rejected under U.S.C. §103 as being unpatentable over the combination of Abadi and Liu and in further view of Two Sigma (“A Workaround for Non Determinism in TensorFlow”, 2017).

	 Regarding claim 11, Abadi teaches An execution control apparatus comprising: a processor, the processor is configured to: receive training data for learning model training; ([p. 2] "Figure 2: A schematic TensorFlow dataflow graph for a training pipeline, containing subgraphs for reading input data, preprocessing, training, and checkpointing state" See FIG. 2 which receives input data for training.)
	divide the training data into a plurality of file sets in chronological order; generate a plurality of subsets of file sets, wherein file sets in each subset are selected based on a predetermined order;([p. 2] "Figure 2: A schematic TensorFlow dataflow graph for a training pipeline, containing subgraphs for reading input data, preprocessing, training, and checkpointing state" [p. 7 §3.1] "Like a Variable, the FIFOQueue operation produces a reference handle that can be consumed by one of the standard queue operations, such as Enqueue and Dequeue. These operations push their input onto the tail of the queue and, respectively, pop the head element and output it. Enqueue will block if its given queue is full, and Dequeue will block if its given queue is empty. When queues are used in an input preprocessing pipeline, this blocking provides backpressure; it also supports synchronization (§4.4). The combination of queues and dynamic control flow (§3.4) can also implement a form of streaming computation between subgraphs" See FIG. 2 where input data is divided into a chronological shuffle queue and then further processed into another chronological queue (subsets).  Abadi explicitly discloses that the queue determines the order in time that each partial gradient was and is processed/streamed.)
	perform, for each of the plurality of subsets of file sets: connect the subset of file sets based on the predetermined order to generate a training data set; ([p. 11 §4.4] "We implement the synchronous version using queues (§3.1) to coordinate execution: a blocking queue acts as a barrier to ensure that all workers read the same parameter values, and a per-variable queue accumulates gradient updates from all workers in order to apply them atomically. The simple synchronous version (Figure 5(b)) accumulates updates from all workers before applying them, but slow workers limit overall throughput" Accumulating gradient updates interpreted as connecting the subset of file sets based on the predetermined order to generate a training data set)
	train a learning model to learn features of the training data set; ([p. 6 §2] "We show in Subsection 6.3 that TensorFlow can train larger models on larger clusters with step times as short as 2 seconds" See also FIG. 2)
	and evaluate the learning model to determine model accuracy;  select a learning model with a highest model accuracy for deployment;([p. 14 §6.3] "Training a network to high accuracy requires a large amount of computation, and we use TensorFlow to scale out this computation across a cluster of GPU-enabled servers. In these experiments, we focus on Google’s Inception-v3 model, which achieves 78.8% accuracy in the ILSVRC 2012 image classification")
	select an arithmetic unit as an execution target from a plurality of arithmetic units for executing the learning model with the highest model accuracy based on associated features, ([p. 8 §3.3] "Dataflow simplifies distributed execution, because it makes communication between subcomputations explicit. It enables the same TensorFlow program to be deployed to a cluster of GPUs for training, a cluster of TPUs for serving, and a cellphone for mobile inference. Each operation resides on a particular device, such as a CPU or GPU in a particular task. A device is responsible for executing a kernel for each operation assigned to it. TensorFlow allows multiple kernels to be registered for a single operation, with specialized implementations for a particular device or data type (see §5 for details). For many operations, such as element-wise operators (Add, Sub, etc.), we can compile a single kernel implementation for CPU and GPU using different compilers")
	wherein selecting the arithmetic unit comprises analyzing computational complexity metrics of the learning model processes,([p. 8 §3.3] "The TensorFlow runtime places operations on devices, subject to implicit or explicit constraints in the graph. The placement algorithm computes a feasible set of devices for each operation, calculates the sets of operations that must be collocated, and selects a satisfying device for each colocation group. It respects implicit colocation constraints that arise because each stateful operation and its state must be placed on the same device")
	each of the plurality of arithmetic units has a unique architecture that differs from architectures of other arithmetic units of the plurality of arithmetic units; ([p. 8 §3.3] "Dataflow simplifies distributed execution, because it makes communication between subcomputations explicit. It enables the same TensorFlow program to be deployed to a cluster of GPUs for training, a cluster of TPUs for serving, and a cellphone for mobile inference. Each operation resides on a particular device, such as a CPU or GPU in a particular task. A device is responsible for executing a kernel for each operation assigned to it. TensorFlow allows multiple kernels to be registered for a single operation, with specialized implementations for a particular device or data type (see §5 for details). For many operations, such as element-wise operators (Add, Sub, etc.), we can compile a single kernel implementation for CPU and GPU using different compilers")
	and control the arithmetic unit to execute the model. ([p. 8 §3.3] "Dataflow simplifies distributed execution, because it makes communication between subcomputations explicit. It enables the same TensorFlow program to be deployed to a cluster of GPUs for training, a cluster of TPUs for serving, and a cellphone for mobile inference. Each operation resides on a particular device, such as a CPU or GPU in a particular task. A device is responsible for executing a kernel for each operation assigned to it. TensorFlow allows multiple kernels to be registered for a single operation, with specialized implementations for a particular device or data type (see §5 for details). For many operations, such as element-wise operators (Add, Sub, etc.), we can compile a single kernel implementation for CPU and GPU using different compilers").
	However, Abadi does not explicitly teach wherein the plurality of arithmetic units comprises a first arithmetic unit that is a deterministic central processing unit (CPU) having a branch prediction function and 
	a second arithmetic unit that is a non-deterministic graphics processing unit (GPU) with no branch prediction function.

	Liu, in the same field of endeavor, teaches wherein the plurality of arithmetic units comprises a first arithmetic unit that is a deterministic central processing unit (CPU) having a branch prediction function and ([p. 3 §2.2] "Most search engines execute decompression, list intersection, and ranking operations on CPU. CPU is good at dealing with complex logic, and its advanced prefetch and branch handling can provide high performance and efficiency. As a result, CPU is able to run fast sequential merge, especially when the data accesses exhibit ample spatial locality [...] the CPU clock speed and aggressive branch handling will still perform well" [p. 7 §3.2] "At this time, the modern CPUs with speculative execution and branch prediction can address the branch divergence effectively while avoiding the additional overhead of moving data to GPU.").

	Abadi as well as Liu are directed towards heterogeneous processor schedulers.  Therefore, Abadi as well as Liu are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Abadi with the teachings of Liu by using a CPU with branch prediction.  Liu provides as additional motivation for combination that this is a known feature of modern CPUs ([p. 7] "the modern CPUs with speculative execution and branch prediction can address the branch divergence effectively while avoiding the additional overhead of moving data to GPU").
	However, the combination of Abadi, and Liu does not explicitly teach a second arithmetic unit that is a non-deterministic graphics processing unit (GPU) with no branch prediction function.

	Two Sigma, in the same field of endeavor, teaches a second arithmetic unit that is a non-deterministic graphics processing unit (GPU) with no branch prediction function.("the GPU non-determinism may be explored in this deterministic setting by controlling how mini-batches are constructed. A second benefit of repeatability is that is makes it easier to write regression tests for the training framework […] Non-Determinism with GPU Training" At the time of writing this office action GPUs do not use branch prediction).

	The combination of Abadi, and Liu as well as Two Sigma are directed towards heterogeneous processor schedulers.  Therefore, the combination of Abadi, and Liu as well as Two Sigma are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Abadi, and Liu with the teachings of Two Sigma by using a non-deterministic GPU without branch prediction.  Two Sigma explicitly uses Tensorflow introduced by the primary reference of Abadi and further discloses the issue of non-determinism in GPUs.  Two Sigma provides as additional motivation for combination that non-determinism is a known issue and provides workaround improvements ([p. 7] “To correct for the behavior of reduce_sum and non-deterministic reduction when broadcasting biases, as described in the main article we implement our own version of reduce_sum that uses matmul, and we implement biases by augmenting the weights matrix as well as all previous layer inputs with a ones column”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Babaeizadeh (“REINFORCEMENT LEARNING THROUGH ASYN CHRONOUS ADVANTAGE ACTOR-CRITIC ON A GPU”, 2017) is directed towards a heterogeneous processor scheduling system for machine learning and orders training data in a queue.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Sep 09, 2021
Application Filed
Nov 07, 2024
Non-Final Rejection — §103, §112
May 19, 2025
Response Filed
Jun 21, 2025
Final Rejection — §103, §112
Sep 19, 2025
Applicant Interview (Telephonic)
Sep 19, 2025
Examiner Interview Summary
Sep 25, 2025
Response after Non-Final Action
Oct 27, 2025
Request for Continued Examination
Oct 29, 2025
Response after Non-Final Action
Dec 05, 2025
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021
Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
18/486,534
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
16/902,547
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
18/607,777
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
16/940,293
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.