Last updated: May 29, 2026
Application No. 17/526,003
HIGHLY PARALLEL PROCESSING ARCHITECTURE WITH COMPILER

Non-Final OA §103
Filed
Nov 15, 2021
Priority
Sep 09, 2020 — provisional 63/075,849 +9 more
Examiner
KIM, SISLEY NAHYUN
Art Unit
2196
Tech Center
2100 — Computer Architecture & Software
Assignee
Ascenium, Inc.
OA Round
3 (Non-Final)
Interview Optional

— +16.2% interview lift. Examiner has a relatively high allowance rate (89%); +16.2% interview lift. A written response may suffice.
Based on 678 resolved cases, 2023–2026
Examiner Intelligence

KIM, SISLEY NAHYUN View full profile →
Grants 89% — above average
Career Allowance Rate
603 granted / 678 resolved
+33.9% vs TC avg
Strong +16% interview lift
Without
With
+16.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
707
Total Applications
across all art units
Statute-Specific Performance

§101
1.7%
-38.3% vs TC avg
§103
81.0%
+41.0% vs TC avg
§102
13.6%
-26.4% vs TC avg
§112
1.6%
-38.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 678 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Continued Examination
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 26 November 2025 has been entered.

Response to Arguments
Applicant’s arguments with respect to claims 1-14, 16-18, 20, 22-24, 27-30, 35, and 36 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made

Claims 1-14, 16-18, 20, 22-24, 27-30, 35, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Meixner (US 2016/0313984, hereinafter Meixner) in view of and Armoni et al. (US 2010/0235608, hereinafter Armoni) and Jacob (Yaakov) et al. (US 2020/0097442, hereinafter Jacob).

Regarding claim 1, Meixner discloses 
A processor-implemented method for task processing comprising:
accessing a two-dimensional (2D) array of compute elements, wherein each compute element within the array of compute elements (paragraph [0045]: the virtualized environment can be viewed as a type of two-dimensional (2D), SIMD processor composed of a 2D array of, e.g., identical processors each executing identical code in lock-step) is known to a compiler (paragraph [0037]: FIG. 1 shows a high level view of an image processor technology platform that includes a virtual image processing environment 101, the actual image processing hardware 103 and a compiler 102 for translating higher level code written for the virtual processing environment 101 to object code that the actual hardware 103 physically executes) and is coupled to its neighboring compute elements within the array of compute elements (paragraph [0127]: the pair of execution lanes 1110 are drawn as horizontal neighbors when in fact, according to the following example, they are vertical neighbors);
providing a set of directions to the 2D array of compute elements, through a control word generated by the compiler (paragraph [0095]: The image processor may be targeted, for example, by a compiler that converts program code written for a virtual processor within a simulated environment into program code that is actually executed by the hardware processor), for compute element operation and memory access precedence (paragraph [0053]: The memory model therefore includes per processor private scratchpad regions 402 for the storage of such intermediate information by each virtual processor's corresponding thread. In an embodiment, the scratch pad region for a particular processor is accessed 409 by that processor through a typical (e.g., linear) random access memory address and is a read/write region of memory (i.e., a virtual processor is able to both read information from private memory as well as write information into private memory); paragraph [0162]: the virtual code sequence 2101 of FIG. 21 retrieves data values requiring a maximum or near maximum number of shifts if data where to be accessed in the order specified (A-M C K B L F G E)), … and wherein the set of directions enables the 2D array of compute elements to properly sequence compute element results (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations)); and
executing a compiled task on the array of compute elements, based on the set of directions (paragraph [0161]: FIG. 21 shows a related process in which the compiler will reorder data load sequences to reduce or minimize the number of shifts in the hardware that are required to load desired data into their respective execution lanes; paragraph [0162]: if the virtual code sequence 2101 of FIG. 21 was presented to the compiler, the compiler would still produce object code like the object code sequence 2002 observed in FIG. 20 which sequences data accesses in boustrophedonic order (A B C H F G F K L M)).
Meixner does not disclose wherein the set of directions includes directions to idle an unneeded compute element within the 2D array of compute elements … and wherein the set of directions enables the 2D array of compute elements to properly sequence compute element results. 
Armoni discloses wherein the set of directions includes directions to idle an unneeded compute element within the 2D array of compute elements … (paragraph [0045]: An array of interconnected processor elements (PE) (1-1) … The array can be … at idle state when no instructions are driven through the instruction buses (1-8); paragraph [0046]: Configuration of the array is done by writing into the processor elements' memory) and wherein the set of directions enables the 2D array of compute elements to properly sequence compute element results (paragraph [0044]: once test execution is complete, the back tracing of the shortest-path to every node from the origins of propagation. Multiple tests of the same terrain representation can be achieved consuming minimal time by simply repeating the test flow once for each new test). It would have been obvious to one of ordinary skill in the art at the time the claimed invention was effectively filed to modify Meixner’s compiler/control-word framework to incorporate the idle/configuration and tracing shortest-path teachings of Armoni. The motivation would have been to allow for the finding of the T-connected region connectivity in a highly efficient manner, by allowing the embedded processor to halt the propagation in the propagation unit inside the graph-processing unit after time T, and retrieve the nodes that the signal arrived at (Armoni paragraph [0036]).
Meixner in view of Armoni does not disclose wherein the set of directions includes directions to idle an unneeded compute element within the 2D array of compute elements, causing the compute element to be placed in a low power state. 
Jacob discloses wherein the set of directions includes directions to idle an unneeded compute element within the 2D array of compute elements, causing the compute element to be placed in a low power state (paragraph [0077]: a column 140 of PEs 110 may be enabled or disabled according to the value of the ColumnValid bit (or indication). Disabling columns 140 that are not used in a specific cycle may improve the efficiency of systolic array 100 by reducing the power consumption of systolic array 100, by isolating inactive logic and disabling clocks of the multiplications, accumulators, and PSUMs 480 in a column 140 of PEs 110, and by enabling efficient utilization of PEs 110 as disclosed herein). 
Incorporating Jacob’s column-disable by reducing power consumption technique into the combined Meixner in view of Armoni framework provides a known, implementation-level means to realize the idle/low-power behavior suggested by Armoni.
It would have been obvious to one of ordinary skill in the art at the time the claimed invention was effectively filed to modify the teaching of Meixner by using Armoni’s control/configuration approach to permit selective idling of PE and tracking shortest-path, and adopting Jacob’s disabling PEs that are not used according to the value of the ColumnValid bit (or indication) by reducing the power consumption of systolic array. The motivation would have been to enable efficient utilization of PEs (Jacob paragraph [0077]).
Regarding claim 35 referring to claim 1, Meixner discloses A computer program product embodied in a non-transitory computer readable medium for task processing, the computer program product comprising code which causes one or more processors to perform operations of: … (Fig. 24, paragraph [0195]).
Regarding claim 36 referring to claim 1, Meixner discloses A computer system for task processing comprising: a memory which stores instructions; one or more processors coupled to the memory, wherein the one or more processors, when executing the instructions which are stored, are configured to: … (Fig. 24, paragraph [0195]).

Regarding claim 2, Meixner discloses 
wherein the compute element results are generated in parallel in the array of compute elements (paragraph [0045]: Regardless, by instantiating a separate processor for each of multiple locations in the output array, the processors can execute their respective threads in parallel so that, e.g., the respective values for all locations in the output array are produced concurrently … the virtualized environment can be viewed as a type of two-dimensional (2D), SIMD processor composed of a 2D array of, e.g., identical processors each executing identical code in lock-step).

Regarding claim 3, Meixner discloses 
wherein the compute element results are ordered independently from control word arrival at each compute element within the array of compute elements (paragraph [0045]: Regardless, by instantiating a separate processor for each of multiple locations in the output array, the processors can execute their respective threads in parallel; paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations)).

Regarding claim 4, Meixner discloses 
wherein the set of directions controls data movement for the array of compute elements (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations).

Regarding claim 5, Meixner discloses 
wherein the data movement includes loads and stores with a memory array (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations; paragraph [0161]: FIG. 21 shows a related process in which the compiler will reorder data load sequences to reduce or minimize the number of shifts in the hardware that are required to load desired data into their respective execution lanes).

Regarding claim 6, Meixner discloses 
wherein the data movement includes loads and stores with a memory array (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations).

Regarding claim 7, Meixner discloses 
wherein the memory access precedence enables ordering of memory data (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations).

Regarding claim 8, Meixner discloses 
wherein the ordering of memory data enables compute element result sequencing (paragraph [0053]: The memory model therefore includes per processor private scratchpad regions 402 for the storage of such intermediate information by each virtual processor's corresponding thread. In an embodiment, the scratch pad region for a particular processor is accessed 409 by that processor through a typical (e.g., linear) random access memory address and is a read/write region of memory (i.e., a virtual processor is able to both read information from private memory as well as write information into private memory); paragraph [0162]: the virtual code sequence 2101 of FIG. 21 retrieves data values requiring a maximum or near maximum number of shifts if data where to be accessed in the order specified (A-M C K B L F G E)).

Regarding claim 9, Meixner discloses 
wherein the set of directions controls the array of compute elements on a cycle-by-cycle basis (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle; paragraph [0119]: the scalar instruction 951 is executed by the scalar processor before the execution lanes within the execution lane array execute either of the other to instructions 952, 953. That is, the execution of the VLIW word includes a first cycle upon which the scalar instruction 951 is executed followed by a second cycle upon with the other instructions 952, 953 may be executed (note that in various embodiments instructions 952 and 953 may be executed in parallel)).

Regarding claim 10, Meixner discloses 
wherein the cycle-by-cycle basis is enabled by a stream of wide, variable length, microcode control words generated by the compiler (paragraph [0113]: the instruction format of the instructions read from scalar memory 903 and issued to the execution lanes of the execution lane array 905 includes a very-long-instruction-word (VLIW) type format that includes more than one opcode per instruction; paragraph [0119]: the execution of the VLIW word includes a first cycle upon which the scalar instruction 951 is executed followed by a second cycle upon with the other instructions 952, 953 may be executed (note that in various embodiments instructions 952 and 953 may be executed in parallel)).

Regarding claim 11, Meixner discloses 
wherein the cycle-by-cycle basis comprises an architectural cycle (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle).

Regarding claim 12, Meixner discloses 
wherein the compiler provides, via the control word, valid bits for each column of the array of compute elements, on the cycle-by-cycle basis (paragraph [0145]: The HI half sheet contains the upper 8 bits of each data item at the correct array location. The LO half sheet contains the lower 8 bits of each data item at the correct array location; paragraph [0167]: For example, as depicted in FIG. 15 in the case of 16 bit input operands the sheet generator will generate a HI half sheet and a LO half sheet. The HI half sheet contains the upper 8 bits of each data item at the correct array location. The LO half sheet contains the lower 8 bits of each data item at the correct array location. 16 bit operations are then performed by loading both sheets into the stencil processor and informing the execution lane hardware (e.g., via an immediate value in the program code) that 16 bit operation is to take place).

Regarding claim 13, Meixner discloses 
wherein the valid bits indicate a valid memory load access is emerging from the array (paragraph [0145]: The HI half sheet contains the upper 8 bits of each data item at the correct array location. The LO half sheet contains the lower 8 bits of each data item at the correct array location; paragraph [0167]: For example, as depicted in FIG. 15 in the case of 16 bit input operands the sheet generator will generate a HI half sheet and a LO half sheet. The HI half sheet contains the upper 8 bits of each data item at the correct array location. The LO half sheet contains the lower 8 bits of each data item at the correct array location. 16 bit operations are then performed by loading both sheets into the stencil processor and informing the execution lane hardware (e.g., via an immediate value in the program code) that 16 bit operation is to take place).

Regarding claim 14, Meixner discloses 
wherein the compiler provides, via the control word, operand size information for each column of the array of compute elements (paragraph [0084]: Thus, the image data at each pixel location in an input array or output array can have a data size of 8, 16 or 32 bits. Here, a virtual processor can be configured for an execution mode that establishes the bit size and the numerical format of the values within the general purpose register. Instructions may also specify immediate operands (which are input operands whose input values are expressed directly in the instruction itself rather being found in a specified register). Immediate operands can also have configurable 8, 16 or 32 bit widths; paragraph [0111]: a two-dimensional shift array structure 906 and separate random access memories 907 associated with specific rows or columns of the array).

Regarding claim 16, Meixner discloses 
wherein the set of directions controls code conditionality for the array of compute elements (paragraph [0085]: In an extended embodiment, each virtual processor is also capable of operating in a scalar mode or a SIMD mode internal to itself. That is, the data within a specific array location may be viewed as a scalar value or as a vector having multiple elements; paragraph [0086]: Predicate values are used, e.g., to determine branch directions through the code during execution (and therefore are used as operands in conditional branch instructions)).

Regarding claim 17, Meixner discloses 
wherein the conditionality determines code jumps (paragraph [0086]: Predicate values are used, e.g., to determine branch directions through the code during execution (and therefore are used as operands in conditional branch instructions)).

Regarding claim 18, Meixner discloses 
wherein the conditionality is established by a control unit (paragraph [0086]: each virtual processor also includes registers to hold predicate values … Predicate values are used, e.g., to determine branch directions through the code during execution (and therefore are used as operands in conditional branch instructions). Predicate values can also be expressed as an immediate operand in an instruction).

Regarding claim 20, Meixner discloses 
wherein the set of directions enables simultaneous execution of two or more potential compiled task outcomes (paragraph [0141]: in response to a need detected from the application software that the kernel will simultaneously process data from different channels (which may have been hinted at from a compiler) the program code executed by the sheet generator will proceed to form separate sheets along different “planes” (i.e., form a different sheet from each channel) and load them together into the data computation unit).

Regarding claim 22, Meixner discloses 
wherein the two or more potential compiled task outcomes are controlled by a same control word (paragraph [0141]: in response to a need detected from the application software that the kernel will simultaneously process data from different channels (which may have been hinted at from a compiler) the program code executed by the sheet generator will proceed to form separate sheets along different “planes” (i.e., form a different sheet from each channel) and load them together into the data computation unit).

Regarding claim 23, Meixner discloses 
wherein the same control word is executed on a given cycle across the array of compute elements (paragraph [0141]: in response to a need detected from the application software that the kernel will simultaneously process data from different channels (which may have been hinted at from a compiler) the program code executed by the sheet generator will proceed to form separate sheets along different “planes” (i.e., form a different sheet from each channel) and load them together into the data computation unit; paragraph [0167]: Given that the execution lane array is intended to operate in a SIMD like fashion, the program code will naturally cause execution lanes in the array (which includes both rows and columns) to issue memory access requests on a same cycle).

Regarding claim 24, Meixner discloses 
wherein the two or more potential compiled task outcomes are executed on spatially separate compute elements within the array of compute elements (paragraph [0002]: a spatially organized two dimensional array captures the two dimensional nature of images (additional dimensions may include time (e.g., a sequence of two dimensional images) and data type (e.g., colors); paragraph [0141]: in response to a need detected from the application software that the kernel will simultaneously process data from different channels (which may have been hinted at from a compiler) the program code executed by the sheet generator will proceed to form separate sheets along different “planes” (i.e., form a different sheet from each channel) and load them together into the data computation unit; paragraph [0167]: Given that the execution lane array is intended to operate in a SIMD like fashion, the program code will naturally cause execution lanes in the array (which includes both rows and columns) to issue memory access requests on a same cycle).

Regarding claim 27, Meixner discloses 
wherein the set of directions includes a spatial allocation of subtasks on one or more compute elements within the array of compute elements (paragraph [0041]: a separate processor and thread can be allocated for each pixel in the output array; paragraph [0124]: Additional spill-over room is provided by random access memories 1007 that are coupled to each row and/or each column in the array, or portions thereof (E.g., a random access memory may be assigned to a “region” of the execution lane array that spans 4 execution lanes row wise and 2 execution lanes column wise. For simplicity the remainder of the application will refer mainly to row and/or column based allocation schemes).

Regarding claim 28, Meixner discloses 
wherein the set of directions includes scheduling computation in the array of compute elements (paragraph [0059]: it is expected that SIMD image processing sequences will often perform a look-up into a same look-up table during a same clock cycle. When confronted with virtual code having inefficient data access sequences from a two-dimensional shift array perspective, the compiler will reorder the data access sequence to keep the number of shifts minimal between mathematical operations (e.g., one shift between mathematical operations)).

Regarding claim 29, Meixner discloses 
wherein the computation includes compute element placement, results routing, and computation wavefront propagation within the array of compute elements (paragraph [0100]: In the case of an image processing pipeline or a DAG flow having a single input, generally, input frames are directed to the same line buffer unit 701_1 which parses the image data into line groups and directs the line groups to the sheet generator 703_1 whose corresponding stencil processor 702_1 is executing the code of the first kernel in the pipeline/DAG. Upon completion of operations by the stencil processor 702_1 on the line groups it processes, the sheet generator 703_1 sends output line groups to a “downstream” line buffer unit 701_2 (in some use cases the output line group may be sent_back to the same line buffer unit 701_1 that earlier had sent the input line groups)).

Regarding claim 30, Meixner discloses 
wherein the set of directions enables multiple programming loop instances circulating within the array of compute elements (paragraph [0120]: The program code then enters a loop of NOOP instructions for instruction fields 952, 953 until the sheet generator completes its load/store to/from the data computation unit).

Claims 37 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Meixner (US 2016/0313984, hereinafter Meixner) in view of and Armoni et al. (US 2010/0235608, hereinafter Armoni) and Jacob (Yaakov) et al. (US 2020/0097442, hereinafter Jacob) as applied to claim 1, and further in view of Corbal et al. (US 2020/0310797, hereinafter Corbal).

Regarding claim 37, Meixner in view of Armoni does not disclose wherein placing the compute element in a low power state includes idling the compute element such that no operation is performed. Jacob discloses wherein placing the compute element in a low power state includes idling the compute element such that no operation is performed (paragraph [0077]: a column 140 of PEs 110 may be enabled or disabled according to the value of the ColumnValid bit (or indication). Disabling columns 140 that are not used in a specific cycle may improve the efficiency of systolic array 100 by reducing the power consumption of systolic array 100, by isolating inactive logic and disabling clocks of the multiplications, accumulators, and PSUMs 480 in a column 140 of PEs 110, and by enabling efficient utilization of PEs 110 as disclosed herein). 
Incorporating Jacob’s column-disable by reducing power consumption technique into the combined Meixner in view of Armoni framework provides a known, implementation-level means to realize the idle/low-power behavior suggested by Armoni.
It would have been obvious to one of ordinary skill in the art at the time the claimed invention was effectively filed to modify the teaching of Meixner by using Armoni’s control/configuration approach to permit selective idling of PE and tracking shortest-path, and adopting Jacob’s disabling PEs that are not used according to the value of the ColumnValid bit (or indication) by reducing the power consumption of systolic array. The motivation would have been to enable efficient utilization of PEs (Jacob paragraph [0077]).
Meixner in view of Armoni and Jacob does not disclose idling the compute element such that no operation is performed, while maintaining data pass thru functionality. Corbal discloses wherein idling the compute element such that no operation is performed, while maintaining data pass thru functionality (paragraph [0465]: Note that although an addition operation is depicted in FIG. 65, it should be understood that other operations may be performed. In one embodiment, each element of a packed data operand includes a corresponding bit in the configuration value of a PE that indicates is the operation is to be disabled for that element (e.g., if that element is to pass through the PE without being modified and/or operated on). In one embodiment, the “disable” portion (e.g., field) of a configuration value is a static mask that determines which packed data element positions (e.g., slots) (shown in FIG. 65 as having two available element positions) perform the configured (e.g., non-disable) operation or rather perform a pass-through of one (e.g., from the first packed data source 6502 in one embodiment or from the second packed data source 6504 in another embodiment) of the sources instead (e.g., performing the bypass when the associated bit is set to 1 (or 0 in another embodiment)). In particular, a PE configuration ‘disable’ field/mask that causes individual packed-element positions to bypass the PE operation and perform pass-through of an input operand.
It would have been obvious to one of ordinary skill in the art at the time the claimed invention was effectively filed to expand Meixner’s compiler control words to include per-element disable bits (per Armoni’s configuration model and Corbal’s disable field), and implement disabled elements using Jacob’s technique of disabling PEs to realize a low-power idle state. This combination yields the predictable result of compiler-generated control that both sequences operations/results (Meixner) and causes unused element positions to (i) be functionally bypassed (Corbal) and (ii) be placed into a low-power/clock-gated state (Jacob) for reduced energy consumption and improved utilization. The motivation would have been to provide improvements in performance and reductions in energy (Corbal paragraph [0152]).

Regarding claim 38, Meixner in view of Armoni and Jacob does not disclose wherein the data pass thru functionality includes operating one or more ring buses in a pass thru mode to maintain connectivity across the array. 
Corbal discloses wherein the data pass thru functionality includes operating one or more ring buses in a pass thru mode to maintain connectivity across the array (Fig. 13, paragraph [0235]: FIG. 14 shows a high-level microarchitecture of a network (e.g., mezzanine) endpoint (e.g., stop), which may be a member of a ring network for context … Flow control and backpressure behavior may be utilized in each communication channel, e.g., in a (e.g., packet switched communications) network and (e.g., circuit switched) network (e.g., fabric of a spatial array of processing elements); paragraph [0465]: Note that although an addition operation is depicted in FIG. 65, it should be understood that other operations may be performed. In one embodiment, each element of a packed data operand includes a corresponding bit in the configuration value of a PE that indicates is the operation is to be disabled for that element (e.g., if that element is to pass through the PE without being modified and/or operated on). In one embodiment, the “disable” portion (e.g., field) of a configuration value is a static mask that determines which packed data element positions (e.g., slots) (shown in FIG. 65 as having two available element positions) perform the configured (e.g., non-disable) operation or rather perform a pass-through of one (e.g., from the first packed data source 6502 in one embodiment or from the second packed data source 6504 in another embodiment) of the sources instead (e.g., performing the bypass when the associated bit is set to 1 (or 0 in another embodiment)). In particular, a PE configuration ‘disable’ field/mask that causes individual packed-element positions to bypass the PE operation and perform pass-through of an input operand.
It would have been obvious to one of ordinary skill in the art at the time the claimed invention was effectively filed to expand Meixner’s compiler control words to include per-element disable bits (per Armoni’s configuration model and Corbal’s disable field), and implement disabled elements using Jacob’s technique of disabling PEs to realize a low-power idle state. This combination yields the predictable result of compiler-generated control that both sequences operations/results (Meixner) and causes unused element positions to (i) be functionally bypassed (Corbal) and (ii) be placed into a low-power/clock-gated state (Jacob) for reduced energy consumption and improved utilization. The motivation would have been to provide improvements in performance and reductions in energy (Corbal paragraph [0152]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Li et al. (US 2022/0405553) discloses “each processing element of computation engine 1104 can bypass circuits to skip computations when inactive values are received” (paragraph [0177]).
Patel et al. (US 7,978,205) discloses “Even in the best optimization schemes that a programmer may use, hardware will be found sitting idle and unused waiting for information to pass through, simply because of the variance associated with different kinds of tasks requested of a graphics subsystem” (col. 5, lines 57-62).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SISLEY N. KIM whose telephone number is (571)270-7832. The examiner can normally be reached M-F 11:30AM -7:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, April Y. Blair can be reached on (571)270-1014. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
/SISLEY N KIM/Primary Examiner, Art Unit 2196                                                                                                                                                                                                        02/05/2026
Read full office action
Prosecution Timeline

Nov 15, 2021
Application Filed
Feb 26, 2025
Non-Final Rejection mailed — §103
Jun 05, 2025
Response Filed
Jun 26, 2025
Final Rejection mailed — §103
Nov 26, 2025
Request for Continued Examination
Dec 06, 2025
Response after Non-Final Action
Feb 12, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/483,986
Patent 12639107
SCHEDULING WITH SUPPORT FOR MULTIPLE SCHEDULING OBJECTIVES
4y 8m to grant Granted May 26, 2026
17/994,685
Patent 12625743
PERFORMING PRE-MIGRATION AND POST-MIGRATION CONNECTION CHECKS
3y 5m to grant Granted May 12, 2026
18/350,149
Patent 12625719
DYNAMIC VISIBILITY AND AUTHORIZATION POLICY MANAGEMENT FOR A CLOUD SERVICE PLATFORM
2y 10m to grant Granted May 12, 2026
18/451,254
Patent 12625750
Automatic State Migration of Stateful Container During Secondary Application Container Hot Upgrade
2y 8m to grant Granted May 12, 2026
17/702,224
Patent 12619473
Cloud System Resource Set Recommendation Method and Apparatus, and Computing Device Cluster
4y 1m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+16.2%)
2y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 678 resolved cases by this examiner. Grant probability derived from career allowance rate.