DETAILED ACTION
Claims 1-20 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 11/20/2025.
New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-8, 12-15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kesiraju et al. (U.S. 2022/0083343), in view of Krig (U.S. 2014/0136816), in view of Di et al. (U.S. 2016/0259648), in view of Official Notice.
As per claim 1:
Kesiraju and Krig disclosed a method comprising:
identifying, by a processor, one or more first arithmetic logic unit (ALU) operations in a first ALU operation queue (Kesiraju: Figures 1 and 10 elements 10A, 14, 116, and 122, paragraphs 21, 23, 26, and 59-61)(The coprocessor identifies coprocessor instructions by decoding them prior to placement into the op queue. Additionally, the coprocessor scheduler detects ready coprocessing instructions within the op queue for scheduling. Official notice is given that vector execution units can include logical execution units for the advantage of performing moving, shifting, rotating, and permutation operations. Thus, it would have been obvious to one of ordinary skill in the art to implement logical operation execution within the PEs (i.e. ALU) of the execute unit of the coprocessor.), wherein the first ALU operations are associated with a first requested vector length and at least one first input vector (Kesiraju: Figure 10 element 120, paragraphs 60-61)(The coprocessor instructions include source vector operands, which are associated with a given data element length based on data element sizes and storage size of the source vector.);
identifying, by the processor, one or more second ALU operations in a second ALU operation queue (Kesiraju: Figures 1 and 10 elements 10N, 14, 116, and 122, paragraphs 21, 23, 26, and 59-61)(The coprocessor identifies coprocessor instructions by decoding them prior to placement into the op queue. Additionally, the coprocessor scheduler detects ready coprocessing instructions within the op queue for scheduling. In view of the above official notice, the PEs (i.e. ALUs) include logical operation execution logic. Separate subsets of the op queue storing coprocessor instructions from a second processor reads upon the second ALU operation queue. Lastly, official notice is given that distributed thread instruction queues can be implemented for the advantage of reducing port costs on the op queue. Thus, it would have been obvious to one of ordinary skill in the art to implement distributed thread instruction queues within the coprocessor.), wherein the second ALU operations are associated with a second requested vector length and at least one second input vector (Kesiraju: Figure 10 element 120, paragraphs 60-61)(The coprocessor instructions include source vector operands, which are associated with a given data element length based on data element sizes and storage size of the source vector.), wherein the processor comprises a vector logic unit, and the vector logic unit comprises a set of ALUs (Kesiraju: Figure 10 element 120, paragraph 60)(The execute unit (i.e. vector logic unit) includes a set of PEs (i.e. ALUs).);
determining a first subset of the set of ALUs and a second subset of the set of ALUs based at least on the first requested vector length or the second requested vector length, wherein the first subset or the second subset is determined in view of one or more allocation criteria, wherein the first subset includes a first number of ALUs of the vector logic unit, and wherein the second subset includes a second number of ALUs of the vector logic unit, wherein a total number of ALUs allocated to the first subset and the second subset is based on the first requested vector length or the second requested vector length and are shared between the first ALU operations and the second ALU operations (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(Krig disclosed changing SIMD width of the processor based on runtime conditions. Kesiraju disclosed SMT execution on the coprocessor, but doesn’t provide a detailed explanation of how this is performed. The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. In this instance, the total number of ALUs allocated to the two coprocessor instructions is the entire set (based on needed vector length – i.e. half of PEs) of PEs, which are shared by the two coprocessor instructions.);
identifying one or more first identified operations from the first ALU operations, wherein each first identified operation corresponds to an ALU of the first subset of the set of ALUs (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first coprocessor instruction is decoded to identify the operations to be performed by the execute unit.); and
performing each first identified operation using the corresponding ALU of the first subset of the set of ALUs (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first coprocessor instruction that uses half of the PEs is scheduled and executed.).
The advantage of splitting a large set of SIMD lanes into slices is that multiple operations can be performed in parallel and SIMD lanes not in use can be powered off. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the SIMD lane slices into the execute unit of Kesiraju for the above advantages.
Kesiraju and Krig failed to teach wherein the first subset or the second subset is determined in view of one or more allocation criteria and a time factor associated with the first ALU operations or the second ALU operations, wherein a priority is assigned to the first ALU operations and the second ALU operations based on the time factor, wherein a processing allocation of the first subset of the set of ALUs and the second subset of the set of ALUs is based on the priority.
However, Di combined with Kesiraju and Krig disclosed wherein the first subset or the second subset is determined in view of one or more allocation criteria and a time factor associated with the first ALU operations or the second ALU operations, wherein a priority is assigned to the first ALU operations and the second ALU operations based on the time factor, wherein a processing allocation of the first subset of the set of ALUs and the second subset of the set of ALUs is based on the priority (Di: Figures 1-2 elements 108 and 208-212, paragraph 4, 18, 21-22, and 24)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(Di disclosed scheduling the oldest ready instructions (i.e. time factor) in even and odd groups within a reservation station. Di gives arbitration priority to the oldest ready instructions versus younger ready instructions. Krig disclosed changing SIMD width of the processor based on runtime conditions. Kesiraju disclosed SMT execution on the coprocessor, but doesn’t provide a detailed explanation of how this is performed. The combination allows for scheduling multiple oldest even/odd group (i.e. time factor, priority when multiple ready instructions) ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two oldest ready coprocessor instructions from even and odd entries of the Op Queue from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. Additionally, official notice is given that instructions can be given priority levels such that higher priority instructions are scheduled first for the advantage of ensuring faster execution of critical instructions. Thus, it would have been obvious to one of ordinary skill in the art to implement assigning and scheduling instructions with higher priority levels.).
The advantage of scheduling oldest ready instructions in operation queues is that instruction dependencies can be resolved sooner. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the oldest ready scheduling method of Di within the scheduler of Kesiraju for the above advantage.
As per claim 2:
Kesiraju, Krig, and Di disclosed the method of claim 1, further comprising:
removing each first identified operation from the first ALU operation queue (Kesiraju: Figure 10 element 116, paragraph 58 and 61)(The op queue stores ops until the ops are ready to execute. Official notice is given that instruction queues remove ready instructions upon issuance to execution units for the advantage of recovering resources to be reused for other younger decoded instructions. Thus, it would have been obvious to one of ordinary skill in the art to implement removing scheduled coprocessor instructions from the op queue.).
As per claim 3:
Kesiraju, Krig, and Di disclosed the method of claim 1, further comprising:
identifying one or more second identified operations from the second ALU operations, wherein each second identified operation corresponds to an ALU of the second subset of the set of ALUs (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The second coprocessor instruction is decoded to identify the operations to be performed by the execute unit.); and
performing each second identified operation using the corresponding ALU of the second subset of the set of ALUs (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The second coprocessor instruction that uses half of the PEs is scheduled and executed.).
As per claim 4:
Kesiraju, Krig, and Di disclosed the method of claim 3, wherein performing each first identified operation using the corresponding ALU of the first subset of the set of ALUs is in parallel with performing each second identified operation using the corresponding ALU of the second subset of the set of ALUs (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first and second coprocessor instructions that each use half of the PEs is scheduled and executed in parallel.).
As per claim 5:
Kesiraju, Krig, and Di disclosed the method of claim 3, wherein each first identified operation is performed in a same clock cycle as each second identified operation (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 23, 44, and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first and second coprocessor instructions that each use half of the PEs is scheduled and executed in parallel in a pipeline. Official notice is given that processor pipelines can include clocked pipeline stages for the advantage of synchronized execution. Thus, it would have been obvious to one of ordinary skill in the art to implement clock cycles in the instruction pipeline of Kesiraju, which includes the execute unit.).
As per claim 6:
Kesiraju, Krig, and Di disclosed the method of claim 1, wherein the allocation criteria comprise a Quality of Service associated with a vector instruction, a total number of ALUs of the set of ALUs of the vector arithmetic logic unit, and the first and second requested vector lengths (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figures 2 and 10 elements 22 and 120, paragraphs 23 and 60-61, abstract)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction and total number of PEs in the execute unit. Kesiraju disclosed a thread priority table and real-time threads, but doesn’t explicitly discuss quality of service (QoS) metrics. Official notice is given that QoS metrics for workloads can be assigned priority levels in processors for the advantage of allocating resources to meet QoS metrics. Thus, it would have been obvious to one of ordinary skill in the art to implement QoS metrics for threads in Kesiraju and map QoS metrics to thread priority levels in Kesiraju.).
As per claim 7:
Kesiraju, Krig, and Di disclosed the method of claim 6, wherein the Quality of Service comprises a numeric value that corresponds to a Quality of Service level (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figures 2 and 10 elements 22 and 120, paragraphs 23 and 60-61, abstract)(In view of the above official notice, the QoS metrics is a numerical value.).
As per claim 8:
Kesiraju, Krig, and Di disclosed the method of claim 1, wherein determining the first subset of the set of ALUs and the second subset of the set of ALUs further comprises:
determining whether a sum of the first requested vector length and the second requested vector length is less than or equal to a total number of ALUs of the set of ALUs of the vector arithmetic logic unit (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first and second coprocessor instructions that each use half of the PEs is scheduled and executed in parallel.); and
responsive to determining that the sum is less than or equal to the total number of ALUs:
setting a number of ALUs in the first subset to the first requested vector length (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first coprocessor instruction that uses half of the PEs is scheduled and executed.); and
setting a number of ALUs in the second subset to the second requested vector length (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The second coprocessor instruction that uses half of the PEs is scheduled and executed.).
As per claim 12:
Kesiraju, Krig, and Di disclosed the method of claim 1, wherein the first ALU operations cause the first subset of the set of ALUs to generate a first output vector in view of the first input vector (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. The first coprocessor instruction that uses half of the PEs is scheduled and executed.), and wherein the first output vector is provided to a first processor core of the processor (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 element 120, paragraphs 23 and 60-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. Official notice is given that coprocessors can send execution results back to processor cores for the advantage of further processing on processor cores. Thus, it would have been obvious to one of ordinary skill in the art to implement sending coprocessor results back to the originating processor cores.).
As per claim 13:
Claim 13 essentially recites the same limitations of claim 1. Claim 13 additionally recites the following limitations:
receiving a first vector instruction from a program executing on a first core of a processor (Kesiraju: Figure 1 elements 10A and 14, paragraph 26)(The coprocessor receives coprocessor instructions from the processor cores via the coprocessor interface.),
receiving a second vector instruction from a second core of the processor (Kesiraju: Figure 1 elements 10A and 14, paragraph 26)(The coprocessor receives coprocessor instructions from the processor cores via the coprocessor interface.).
As per claim 14:
The additional limitation(s) of claim 14 basically recite the additional limitation(s) of claim 4. Therefore, claim 14 is rejected for the same reason(s) as claim 4.
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 5. Therefore, claim 15 is rejected for the same reason(s) as claim 5.
As per claim 19:
Claim 19 essentially recites the same limitations of claim 1. Claim 19 additionally recites the following limitations:
a memory (Kesiraju: Figure 11 element 158); and
a processing device operatively coupled to the memory (Kesiraju: Figure 11 elements 152 and 160).
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 2. Therefore, claim 20 is rejected for the same reason(s) as claim 2.
Claim 9-11 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kesiraju et al. (U.S. 2022/0083343), in view of Krig (U.S. 2014/0136816), in view of Di et al. (U.S. 2016/0259648), in view of Official Notice, further in view of Greathouse et al. (U.S. 2016/0085551).
As per claim 9:
Kesiraju, Krig, and Di disclosed the method of claim 1, wherein determining the first subset of the set of ALUs and the second subset of the set of ALUs further comprises:
determining whether a sum of the first requested vector length and the second requested vector length is less than or equal to a total number of ALUs of the set of ALUs of the vector arithmetic logic unit (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction. For example, two coprocessor instructions from separate processing cores that use up half of the PEs (i.e. first/second subset) can be scheduled for SMT processing on the coprocessor execute unit. Scheduling first and second coprocessor instructions that each use half of the PEs determines that the total sum of requested PEs for both coprocessor instructions is less than or equal to the total number of PEs in the execute unit.).
Kesiraju, Krig, and Di failed to teach responsive to determining that the sum is greater than the total number of ALUs, setting a number of ALUs in the first subset to a value less than the first requested vector length.
However, Greathouse combined with Kesiraju, Krig, and Di disclosed responsive to determining that the sum is greater than the total number of ALUs, setting a number of ALUs in the first subset to a value less than the first requested vector length (Greathouse: Figures 4 and 6 elements 406, 412, 418, and 602-606, paragraphs 29, 31-32, and 54)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(Greathouse allows for large wavefronts with larger widths than SIMD execution widths to be executed over multiple execution cycles. The combination allows for coprocessor instructions of Kesiraju having larger vector widths than can be processed in a single cycle by the execute unit to be processed over multiple clock cycles. This involves setting a PE width over multiple clock cycles that is less than the coprocessor instruction vector width.).
The advantage of executing large vector width operations over multiple clock cycles is that execution unit costs can be reduced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement executing large coprocessor vector width operation in Kesiraju over multiple clock clocks for the above advantage.
As per claim 10:
Kesiraju, Krig, Di, and Greathouse disclosed the method of claim 9, wherein the allocation criteria comprise a Quality of Service associated with a first vector instruction, the total number of ALUs, and the first and second vector lengths (Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figures 2 and 10 elements 22 and 120, paragraphs 23 and 60-61, abstract)(The combination allows for scheduling multiple ready coprocessor instructions (i.e. allocation criteria) from multiple different processor cores based on the vector widths of each coprocessor instruction and total number of PEs in the execute unit. Kesiraju disclosed a thread priority table and real-time threads, but doesn’t explicitly discuss quality of service (QoS) metrics. Official notice is given that QoS metrics for workloads can be assigned priority levels in processors for the advantage of allocating resources to meet QoS metrics. Thus, it would have been obvious to one of ordinary skill in the art to implement QoS metrics for threads in Kesiraju and map QoS metrics to thread priority levels in Kesiraju.), and wherein the value less than the first vector length is determined using a resource allocation model in view of the Quality of Service associated with the first vector instruction (Greathouse: Figures 4 and 6 elements 406, 412, 418, and 602-606, paragraphs 29, 31-32, and 54)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(Greathouse allows for large wavefronts with larger widths than SIMD execution widths to be executed over multiple execution cycles. The combination allows for coprocessor instructions of Kesiraju having larger vector widths than can be processed in a single cycle by the execute unit to be processed over multiple clock cycles. This involves setting a PE width over multiple clock cycles that is less than the coprocessor instruction vector width. In view of the above official notice, QoS metrics are considered by the scheduler.).
As per claim 11:
Kesiraju, Krig, Di, and Greathouse disclosed the method of claim 9, further comprising:
deferring at least one of the first ALU operations to a subsequent clock cycle (Greathouse: Figures 4 and 6 elements 406, 412, 418, and 602-606, paragraphs 29, 31-32, and 54)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(Greathouse allows for large wavefronts with larger widths than SIMD execution widths to be executed over multiple execution cycles. The combination allows for coprocessor instructions of Kesiraju having larger vector widths than can be processed in a single cycle by the execute unit to be processed over multiple clock cycles. This involves setting a PE width over multiple clock cycles that is less than the coprocessor instruction vector width. Operations not executed the first clock cycle are deferred to subsequent clock cycles.).
As per claim 16:
Kesiraju, Krig, and Di disclosed the method of claim 13.
Kesiraju, Krig, and Di failed to teach providing, to the program executing on the first core of the processor, the number of elements of the first vector processed by the first vector operations, wherein the number of elements of the first vector processed by the first vector operations is less than the first vector length.
However, Greathouse combined with Kesiraju, Krig, and Di disclosed providing, to the program executing on the first core of the processor, the number of elements of the first vector processed by the first vector operations, wherein the number of elements of the first vector processed by the first vector operations is less than the first vector length (Greathouse: Figures 4 and 6 elements 406, 412, 418, and 602-606, paragraphs 29, 31-32, and 54)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(Greathouse allows for large wavefronts with larger widths than SIMD execution widths to be executed over multiple execution cycles. The combination allows for coprocessor instructions of Kesiraju having larger vector widths than can be processed in a single cycle by the execute unit to be processed over multiple clock cycles. This involves setting a PE width over multiple clock cycles that is less than the coprocessor instruction vector width.).
The advantage of executing large vector width operations over multiple clock cycles is that execution unit costs can be reduced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement executing large coprocessor vector width operation in Kesiraju over multiple clock clocks for the above advantage.
As per claim 17:
Kesiraju, Krig, and Di disclosed the method of claim 13.
Kesiraju, Krig, and Di failed to teach wherein the program executing on the first core determines an updated vector length that specifies a number of remaining elements to be processed, wherein the updated vector length is a difference of the number of elements of the first vector processed by the first vector operations and the first vector length.
However, Greathouse combined with Kesiraju, Krig, and Di disclosed wherein the program executing on the first core determines an updated vector length that specifies a number of remaining elements to be processed, wherein the updated vector length is a difference of the number of elements of the first vector processed by the first vector operations and the first vector length (Greathouse: Figures 4 and 6 elements 406, 412, 418, and 602-606, paragraphs 29, 31-32, and 54)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(Greathouse allows for large wavefronts with larger widths than SIMD execution widths to be executed over multiple execution cycles. The combination allows for coprocessor instructions of Kesiraju having larger vector widths than can be processed in a single cycle by the execute unit to be processed over multiple clock cycles. This involves setting a PE width over multiple clock cycles that is less than the coprocessor instruction vector width. Operations not executed the first clock cycle are deferred to subsequent clock cycles. The updated vector length is determined based on number of clock cycles completed/remaining and the number of PEs executing each clock cycle.).
The advantage of executing large vector width operations over multiple clock cycles is that execution unit costs can be reduced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement executing large coprocessor vector width operation in Kesiraju over multiple clock clocks for the above advantage.
As per claim 18:
Kesiraju, Krig, and Di disclosed the method of claim 13.
Kesiraju, Krig, and Di failed to teach wherein the program executing on the first core executes a third instruction to specify an updated vector length corresponding to a third number of elements in the first vector, wherein the third number of elements corresponds to elements of the first vector not processed by the first vector operations specified by the first vector instruction.
However, Greathouse combined with Kesiraju, Krig, and Di disclosed wherein the program executing on the first core executes a third instruction to specify an updated vector length corresponding to a third number of elements in the first vector, wherein the third number of elements corresponds to elements of the first vector not processed by the first vector operations specified by the first vector instruction (Greathouse: Figures 4 and 6 elements 406, 412, 418, and 602-606, paragraphs 29, 31-32, and 54)(Krig: Figures 4B and 5 elements 420-424 and 502-506, paragraphs 44 and 50-52)(Kesiraju: Figure 10 elements 114-116 and 120-122, paragraphs 23 and 58-61)(Greathouse allows for large wavefronts with larger widths than SIMD execution widths to be executed over multiple execution cycles. The combination allows for coprocessor instructions of Kesiraju having larger vector widths than can be processed in a single cycle by the execute unit to be processed over multiple clock cycles. This involves setting a PE width over multiple clock cycles that is less than the coprocessor instruction vector width. Operations not executed the first clock cycle are deferred to subsequent clock cycles (i.e. third instruction and third number of elements).).
The advantage of executing large vector width operations over multiple clock cycles is that execution unit costs can be reduced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement executing large coprocessor vector width operation in Kesiraju over multiple clock clocks for the above advantage.
Response to Arguments
The arguments presented by Applicant in the response, received on 11/20/2025 aren’t considered persuasive.
Applicant argues regarding claims 1, 13, and 19:
“However, paragraphs [0021], [0022], and [0024] do not suggest or disclose "a priority is assigned to the first ALU operations and the second ALU operations based on the time factor, wherein a processing allocation of the first subset of the set of ALUs and the second subset of the set of ALUs is based on the priority," as recited in claim 1. Di's teaching of the reservation station queue may dispatch instructions to two or more separate execution units of the same type, an ordered shift queue divided into two groups so that up to two ready instructions may be dispatched in parallel, or a grouping of even numbered entries and a grouping of odd numbered entries included in an ordered shift queue where an ordered select logic provides the oldest instruction from the odd and even numbered entries that is ready to be dispatched are unrelated to "a priority is assigned to the first ALU operations and the second ALU operations based on the time factor, wherein a processing allocation of the first subset of the set of ALUs and the second subset of the set of ALUs is based on the priority," as recited in claim 1. Thus, Di does not disclose the features of claim 1, and Di does not disclose the features missing in Kesiraju and Krig. Therefore, the cited references, alone or in combination, fail to suggest "a priority is assigned to the first ALU operations and the second ALU operations based on the time factor, wherein a processing allocation of the first subset of the set of ALUs and the second subset of the set of ALUs is based on the priority," as recited in claim 1.”
This argument is not found to be persuasive for the following reason. The added Di reference is used to teach the claimed limitation of a time factor associated with the first and second ALU operations, which was missing from the combination of Kesiraju and Krig. Di allows for the use of priority selection when multiple instructions in each group are ready such that the oldest amongst the ready instructions are selected. Thus, the combination allows for scheduling instructions based on allocation criteria (i.e. ready), time factor (i.e. oldest), and priority (i.e. ready older instructions have higher prior than ready younger instructions). Additionally, official notice was given that instructions themselves can be assigned priority levels when scheduling for execution. This also reads upon the newly added priority limitation.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988. The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JACOB PETRANEK/Primary Examiner, Art Unit 2183