Last updated: April 19, 2026
Application No. 18/309,177
Multi-Instruction Engine-Based Instruction Processing Method and Processor

Final Rejection §101§103
Filed
Apr 28, 2023
Examiner
HU, SELINA ELISA
Art Unit
2193
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +100.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 3 resolved cases, 2023–2026
Examiner Intelligence

HU, SELINA ELISA View full profile →
Grants 67% — above average
Career Allow Rate
2 granted / 3 resolved
+11.7% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
32 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
53.5%
+13.5% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to applicant’s amendment filed on 12/30/2025.
Claims 1-18 and 20-21 are pending and examined.
Claim 19 is cancelled.

Response to Arguments

Applicant’s arguments filed 12/30/2025 with respect to 35 U.S.C. 112(f) have been fully considered but they are not persuasive. Applicant argued that “claim 11 does recite sufficient structure to perform the recited function because claim 11 specifies that the program block dispatcher is included in a processor.” The examiner respectfully disagrees, see 35 U.S.C. 112(f) interpretation below for a detailed analysis. While there is structure present for a processor in the preamble of claim 11, there is no structure for the program block dispatcher as currently recited. It is unclear whether the program block dispatcher is software, hardware, or a combination of both and therefore the 35 U.S.C. 112(f) interpretation is maintained due to lack of sufficient structure to perform the functions as alleged by the applicant.

Applicant’s arguments filed 12/30/2025 with respect to 35 U.S.C. 101 have been fully considered but they are not persuasive. Applicant argued that the features of the amended claim such as “accessing a program block table of a computer program” set forth the use of technology and “that amended claim 1 is integrated into the practical application of improving the performance of the computer because the claimed features improve the ability of the computer to manage its processing resources.” The examiner respectfully disagrees, see 35 U.S.C. 101 rejections below for a detailed analysis. The “determining…” limitation presented in claim 1 recites a judicial exception and is not integrated into practical application by the new amendments. The “accessing…” limitation that was added is an additional element that is insignificant extra-solution activity. The limitation “accessing” in the context of this claim encompasses mere data gathering. Therefore, the 35 U.S.C. 101 rejections are maintained.

Applicant’s arguments filed 12/30/2025 with respect to 35 U.S.C. 103 have been fully considered but they are not persuasive. Applicant argued that “Therefore, the combination of Sakthivel, Hirota, Chang, Vincent, Rosen, and Smith fail to describe accessing a program block table of the computer program and determining from an instruction engine group of the processor, based on the instruction processing request, and based on the program block table, a first instruction engine for processing the first instruction set, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine.” While the combination of Sakthivel, Hirota, Chang, Vincent, Rosen, and Smith may not explicitly describe the amended limitations of claim 1, a new reference of Kipp is interpreted to disclose the amended limitations. See 35 U.S.C. 103 rejections below for a detailed analysis. Examiner interprets Kipp’s hierarchy storing task groups which contain user tasks as a program block table and the user adding the task group through calling an interface function to the program block table of the computer program. The scheduler automatically interacting with the hierarchy to coordinate worker threads to execute user tasks correlates to accessing the program block table. Additionally, the hierarchy being used to map task group queues in relation to assignment to worker threads, which are further mapped to underlying system hardware threads, correlates to determining a first instruction engine for processing the first instruction set, based on the program block table, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine. Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with Kipp because hierarchies allow utilization of a layered approach to synchronization where mechanisms with a much lower cost and complexity can be applied to multi-core systems to allow for parallel processing.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a program block dispatcher configured to…” in claim 11. The “program block dispatcher” is a generic placeholder coupled to functional language “configured to receive/determine/send…” without sufficient structure to perform the recited function. While there is structure present for a processor, there is no structure for the program block dispatcher as currently recited. It is unclear whether the program block dispatcher is software, hardware, or a combination of both and therefore a generic placeholder is coupled to functional language without sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-18 and 20-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to (an) abstract idea(s) without significantly more.

Claim 1 and 21 recite:
A method implemented by a processor, the method comprising:
receiving an instruction processing request to process a first instruction set of a computer program;
accessing a program block table of the computer program;
determining from an instruction engine group of the processor, based on the instruction processing request, and based on the program block table, a first instruction engine for processing the first instruction set, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine;
sending the instruction processing request to a first instruction cache of an instruction cache group of the processor, wherein the first instruction cache corresponds to the first instruction engine; and
obtaining the first instruction set from the first instruction cache.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 1 is a process.

Step 2A, Prong I: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes: (an) abstract idea(s).

The ‘determining’ limitation in #3 above, as claimed and under broadest reasonable interpretation (BRI), is a mental process that covers performance of the limitation in the mind. The limitation “determining” in the context of this claim encompasses a person analyzing, evaluating, or determining a first instruction engine for processing the first instruction set, including comparison or judgement.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.
The ‘receiving’ limitation in #1 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “receiving” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

The ‘accessing’ limitation in #2 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “accessing” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

The ‘sending’ limitation in #4 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “sending” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

The ‘obtaining’ limitation in #5 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “obtaining” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

Additionally, one or more of the claims recite the following additional elements:
Processor (Claim 1)

These additional elements are recited at a high level of generality (i.e., as generic computer components) such that they amount to no more than components comprising mere instructions to apply the exception. Accordingly, these additional elements do not integrate the abstract idea(s) into a practical application because they do not impose any meaningful limits on practicing the abstract ideas(s).

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claims 1 and 21 are directed to (an) abstract idea(s) without significantly more.

Claims 2 and 13 recite:
wherein determining the first instruction engine based on the instruction processing request comprises: 
obtaining, based on the instruction processing request, an alternative instruction engine for processing the first instruction set; and
selecting the alternative instruction engine as the first instruction engine.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 2 is a process.
Claim 13 is a machine.

Step 2A, Prong I: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes: (an) abstract idea(s).

The ‘selecting’ limitation in #7 above, as claimed and under broadest reasonable interpretation (BRI), is a mental process that covers performance of the limitation in the mind. The limitation “selecting” in the context of this claim encompasses a person analyzing, evaluating, or determining an alternative instruction engine as the first instruction engine, including comparison or judgement.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘obtaining’ limitation in #6 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “obtaining” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claims 2 and 13 are directed to (an) abstract idea(s) without significantly more.

Claims 3 and 14 recite:
wherein the instruction engine group comprises a first alternative instruction engine group, and wherein obtaining the alternative instruction engine of the first instruction set based on the instruction processing request comprises 
using when the first instruction set is an instruction set on a non-performance path, a selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction set.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 3 is a process.
Claim 14 is a machine.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘using’ limitation in #8 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “using” in the context of this claim encompasses merely using a selected instruction engine as the alternative instruction engine. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(f). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claims 3 and 14 are directed to (an) abstract idea(s) without significantly more.

Claims 4 and 15 recite:
wherein the instruction engine group comprises a first alternative instruction engine group, and wherein obtaining the alternative instruction engine of the first instruction set based on the instruction processing request comprises 
using when the first instruction set is an instruction set on a performance path, a first selected instruction engine in the first alternative instruction engine group or a second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set.

Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 4 is a process.
Claim 15 is a machine.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘using’ limitation in #9 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “using” in the context of this claim encompasses merely using a selected instruction engine as the alternative instruction engine. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(f). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claims 4 and 15 are directed to (an) abstract idea(s) without significantly more.

Claim 5 recites:
wherein using the first selected instruction engine in the first alternative instruction engine group or the second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set comprises:
using when a first condition is met, the first selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction set, wherein the first condition is that a queue depth of an instruction processing request queue corresponding to at least one instruction engine in the first alternative instruction engine group is less than a first preset threshold.

Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 5 is a process.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘using’ limitation in #10 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “using” in the context of this claim encompasses merely using a selected instruction engine as the alternative instruction engine when a first condition is met. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(f). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claim 5 is directed to (an) abstract idea(s) without significantly more.

Claims 6 and 17 recite:
wherein the second instruction engine group comprises a second alternative instruction engine group and a third alternative instruction engine group, and wherein using the second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set comprises: 
using the second selected instruction engine in the second alternative instruction engine group of the second instruction engine group as the alternative instruction engine of the first instruction set;
and adding when a third condition is met, at least one instruction engine in the third alternative instruction engine group to the second alternative instruction engine group, wherein the third condition is that the second alternative instruction engine group is empty, or queue depths of instruction processing request queues corresponding to all the instruction engines in the second alternative instruction engine group exceed a second preset threshold.

Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 6 is a process.
Claim 17 is a machine.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘using’ limitation in #11 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “using” in the context of this claim encompasses merely using a second selected instruction engine as the alternative instruction engine. See MPEP 2106.05(f). 

The ‘adding’ limitation in #12 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “adding” in the context of this claim encompasses merely adding at least one instruction engine to the second alternative instruction engine group when a third condition is met. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(f). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claims 6 and 17 are directed to (an) abstract idea(s) without significantly more.

Claim 7 recites:
selecting in the third alternative instruction engine group, at least one instruction engine corresponding to an instruction processing request queue whose queue depth is less than a third preset threshold; and
adding the at least one instruction engine to the second alternative instruction engine group.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 7 is a process.

Step 2A, Prong I: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes: (an) abstract idea(s).

The ‘selecting’ limitation in #13 above, as claimed and under broadest reasonable interpretation (BRI), is a mental process that covers performance of the limitation in the mind. The limitation “selecting” in the context of this claim encompasses a person analyzing, evaluating, or determining at least one instruction engine whose queue depth is less than a third threshold, including comparison or judgement.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘adding’ limitation in #14 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “adding” in the context of this claim encompasses merely adding at least one instruction engine to the second alternative instruction engine group. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claim 7 is directed to (an) abstract idea(s) without significantly more.

Claim 8 recites:
recording an instruction engine selection difference; and
deleting when the instruction engine selection difference exceeds a fourth preset threshold, all the instruction engines in the second alternative instruction engine group, wherein the instruction engine selection difference indicates a quantity difference between a first quantity of times of selecting the first selected instruction engine from the first alternative instruction engine group and a second quantity of times of selecting the second selected instruction engine from the second alternative instruction engine group.

Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 8 is a process.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.
	
The ‘recording’ limitation in #15 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “recording” in the context of this claim encompasses merely storing information in memory. See MPEP 2106.05(g).

The ‘deleting’ limitation in #16 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “deleting” in the context of this claim encompasses merely deleting all the instruction engines in the second alternative instruction engine group. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g)&(f). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Additionally, with regards to #15 above, per MPER 2106.05(d)(II), the courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic matter (e.g., at a high level of generality) or as insignificant extra-solution activity:
Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;

With regards to Claim 18, the method of Claims 7 and 8 perform the same steps as the machine of Claim 18, and Claim 18 is therefore rejected using the same rationale set forth above in the rejection of Claims 7 and 8.

Therefore, Claims 8 and 18 are directed to (an) abstract idea(s) without significantly more.

Claims 9 and 20 recite:
wherein selecting the alternative instruction engine as the first instruction engine comprises 
selecting the alternative instruction engine corresponding to an instruction processing request queue with a minimum queue depth as the first instruction engine.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 9 is a process.
Claim 20 is a machine.

Step 2A, Prong I: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes: (an) abstract idea(s).

The ‘selecting’ limitation in #17 above, as claimed and under broadest reasonable interpretation (BRI), is a mental process that covers performance of the limitation in the mind. The limitation “selecting” in the context of this claim encompasses a person analyzing, evaluating, or determining the alternative instruction engine corresponding to an instruction processing request queue with a minimum queue depth, including comparison or judgement.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(f). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claims 9 and 20 are directed to (an) abstract idea(s) without significantly more.

Claim 10 recites:
sending when the first instruction cache detects an end indicator of the first instruction set, scheduling information, wherein the scheduling information indicates that the first instruction engine can process a next instruction processing request.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 10 is a process.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘sending’ limitation in #18 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “sending” in the context of this claim encompasses merely sending scheduling information. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claim 10 is directed to (an) abstract idea(s) without significantly more.

Claim 11 recites:
A processor comprising an instruction cache group comprising a plurality of instruction caches; an instruction engine group comprising a plurality of instruction engines, wherein the plurality of instruction caches in the instruction cache group are in a one-to- one correspondence with the plurality of instruction engines in the instruction engine group;
And a program block dispatcher configured to: 
receive an instruction processing request to process a first instruction set of a computer program;
access a program block table of the computer program;
determine a first instruction engine based on the instruction processing request and based on the program block table, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine;
and send the instruction processing request to a first instruction cache corresponding to the first instruction engine, wherein the first instruction engine processes the first instruction set in the instruction engine group, 
and wherein the first instruction engine is configured to obtain the first instruction set from the first instruction cache.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 11 is a machine.

Step 2A, Prong I: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes: (an) abstract idea(s).

The ‘determining’ limitation in #21 above, as claimed and under broadest reasonable interpretation (BRI), is a mental process that covers performance of the limitation in the mind. The limitation “determining” in the context of this claim encompasses a person analyzing, evaluating, or determining a first instruction engine for processing the first instruction set, including comparison or judgement.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘receiving’ limitation in #19 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “receiving” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

The ‘accessing’ limitation in #20 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “accessing” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

The ‘sending’ limitation in #22 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “sending” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

The ‘obtaining’ limitation in #23 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element that is insignificant extra-solution activity. The limitation “obtaining” in the context of this claim encompasses mere data gathering. See MPEP 2106.05(g).

Additionally, one or more of the claims recite the following additional elements:
Processor (Claim 11)

These additional elements are recited at a high level of generality (i.e., as generic computer components) such that they amount to no more than components comprising mere instructions to apply the exception. Accordingly, these additional elements do not integrate the abstract idea(s) into a practical application because they do not impose any meaningful limits on practicing the abstract ideas(s).

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Claim 12 merely further describes the instruction processing request queues of Claim 11. The claim does not include additional elements that integrate into practical application or are sufficient to amount to significantly more than the judicial exception.

Therefore, Claims 11-12 are directed to (an) abstract idea(s) without significantly more.

Claim 16 recites:
use, when a second condition is met, the second instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set, wherein the second condition is that queue depths of instruction processing request queues corresponding to all instruction engines in the first alternative instruction engine group exceed a first preset threshold.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
Yes.
Claim 16 is a machine.

Step 2A, Prong II: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.

The ‘using’ limitation in #24 above, as claimed and under broadest reasonable interpretation (BRI), is an additional element as “apply it” that is mere instructions to apply an exception. The limitation “using” in the context of this claim encompasses merely using the second instruction engine as the alternative instruction engine. See MPEP 2106.05(f). 

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.

As discussed above with respect to integration of the abstract idea(s) into a practical application, the aforementioned additional elements amount to no more than components for obtaining or gathering data and comprising mere instructions to apply the exception which is evidently seen in MPEP 2106.05(g). Mere instructions to apply an exception using generic computer components cannot provide an inventive concept.

Therefore, Claim 16 is directed to (an) abstract idea(s) without significantly more.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 10-11 and 21 are rejected under 35 U.S.C. 103 as being unpatentable by Sakthivel et al. (U.S. Patent No. US 20210294649 A1), hereinafter “Sakthivel” in view of Hirota et al. (U.S. Patent No. US 20200401444 A1), hereinafter “Hirota” and Kipp et al. (U.S. Patent No. US 11249807 B2), hereinafter “Kipp.”

With regards to Claim 1, Sakthivel teaches:
determining from an instruction engine group of the processor, based on the instruction processing request, a first instruction engine for processing the first instruction set (Paragraphs 68 and 142, “In one embodiment, the instruction cache 252 receives a stream of instructions to execute from the pipeline manager 232. The instructions are cached in the instruction cache 252 and dispatched for execution by the instruction unit 254. The instruction unit 254 can dispatch instructions as thread groups (e.g., warps), with each thread of the thread group assigned to a different execution unit within GPGPU core 262… The compiler/driver may identify and mark (i.e., encode) instructions that are more efficiently executed in the compute block 750. When these instructions are fetched out of the cache 752 they will be routed to the compute unit 752 for the operation instead of getting directed to the standard execution units.” The pipeline manager sending a stream of instructions to execute as thread groups, which are assigned to different execution units within GPGPU core based on which instructions are more efficiently executed in a particular compute block, correlates to determining a first instruction engine from a group for processing the instruction set based the instruction processing request);
sending the instruction processing request to a first instruction cache of an instruction cache group of the processor, wherein the first instruction cache corresponds to the first instruction engine (Paragraphs 68 and 246, “In one embodiment, the instruction cache 252 receives a stream of instructions to execute from the pipeline manager 232. The instructions are cached in the instruction cache 252 and dispatched for execution by the instruction unit 254. The instruction unit 254 can dispatch instructions as thread groups (e.g., warps), with each thread of the thread group assigned to a different execution unit within GPGPU core 262… In some embodiments, execution units 2352A-2352B have an attached L1 cache 2351 that is specific for each array or shared between the arrays. The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.” The pipeline manager sending a stream of instructions to the instruction cache for execution correlates to sending the instruction processing request to a first instruction cache. The execution units each having a specific L1 cache which can be configured as an instruction cache correlates to the first instruction cache corresponding to the first instruction engine of an instruction cache group of the processor); 

Sakthivel does not explicitly teach that determining the first instruction engine for processing the first instruction set is based on a program block table, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine. However, determining the first instruction engine for processing the first instruction set is based on a program block table, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine as evidenced by Kipp (Fig. 1 and 4, Col. 10, lines 65-67, Col. 11, lines 1-5 and 18-28, and Col. 12, lines 18-27, “FIG. 4 shows an example hierarchy 400 of task group queues in relation to scheduling and assignment to worker threads in a worker pool such as the worker pool 104 in FIG. 1. The hierarchy 400 is used to assign the tasks to one of nine different task groups 410, 412, 414, 416, 418, 420, 422, 424, and 426 in this example. It is to be understood that the hierarchy may include any number of tasks assigned to any number of different task groups… The task group hierarchy 400 then acquires a task group queue that in this example is the highest priority task group queue 430. After the task group queue 430 is acquired, a task group in the task group queue 430 is acquired such as the task group 410. The scheduler logic 130 then assigns an available worker thread from the worker pool 440 to perform the acquired task from the acquired task group. This process continues until all of the task groups in the task group queue 430 have been assigned a worker thread in the worker pool… The worker thread pool or software thread pool 104 is responsible for providing a set of worker threads that map directly or indirectly to the underlying system hardware threads. For example, in FIG. 1, the workers 140-154 in the worker pool 104 are mapped directly to the hardware threads associated with one of the processing cores 110, 112, 114, and 116. These worker threads are coordinated by the scheduler logic 130 and are used to execute the user tasks that have been stored within the task groups in the hierarchy 400 shown in FIG. 4.” The user tasks stored within the task groups correlate to the first instruction set. The hierarchy being used to map task group queues in relation to assignment to worker threads correlates to determining a first instruction engine for processing the first instruction set based on the program block table. The scheduler logic assigning available worker threads, which are further mapped to underlying system hardware threads as seen in Fig. 1, to perform the acquired task from the task group across the entire hierarchy as seen in Fig. 4 therefore correlates to wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine).

Sakthivel does not explicitly teach:
A method implemented by a processor, the method comprising: receiving an instruction processing request to process a first instruction set of a computer program;
accessing a program block table of the computer program;
and obtaining the first instruction set from the first instruction cache.

However, Hirota teaches:
A method implemented by a processor, the method comprising: receiving an instruction processing request to process a first instruction set of a computer program (Paragraphs 23 and 73-74, “A task is a set of program instructions that may be loaded into any type of memory of cache… For instance, in some embodiments, to add the task descriptor 430(x) to the active list 420, the dependency/prefetch unit 410 may cache any portion of the task descriptor 430(x) across any number and type of caches and then indicate to the scheduling/launch unit 490 that the task descriptor 430(x) is active. After the dependency/prefetch unit 410 adds the task descriptor 430(x) to the active list 420, the scheduling/launch unit 490 initiates a task launch of task X as per the task descriptor 430(x). The task launch of task X is a setup activity that precedes the execution of task X.” The dependency/prefetch unit indicating to the scheduling/launch unit that the specific task descriptor 430(x) is active and added to the active list corresponds to receiving an instruction processing request. Each task such as 430(x) described by the respective task descriptor comprising a set of program instructions correlates to the instruction processing request for processing a first instruction set of a computer program); 
and obtaining the first instruction set from the first instruction cache (Paragraphs 73-74, and 78, “For instance, in some embodiments, to add the task descriptor 430(x) to the active list 420, the dependency/prefetch unit 410 may cache any portion of the task descriptor 430(x) across any number and type of caches and then indicate to the scheduling/launch unit 490 that the task descriptor 430(x) is active… The task launch may involve, without limitation, any number of operations (e.g., CTA launch, warp launch, etc.) that prepare any number of the SMs 310 to participate in the task execution of task X… After the task launch of task X completes, the scheduling/launch unit 490 ensures that instructions for task X are cached. As described in greater detail below, instructions for task X may be cached in an instruction cache (or an instruction/constant cache) as the result of a previously-executed instruction prefetch... After the scheduling/launch unit 490 ensures that constants for task X are cached, the scheduling/launch unit 490 initiates a task execution of task X. During the task execution of task X, each of any number of SMs 310 executes one or more threads of task X.” The SM participating in the task execution of task X correlates to a first instruction engine. The task descriptor for task X, which includes instructions for task X, being cached in an instruction cache by the dependency/prefetch unit to prepare any number of SMs to participate in the task execution of task X correlates to obtaining the first instruction set from the first instruction cache).

Additionally, Kipp teaches:
accessing a program block table of the computer program (Fig. 4, Col. 10, lines 52-54, Col. 11, lines 47-54, Col. 12, lines 18-27, “A user does not typically need to interact directly with the task group queues or the task group queue hierarchy as it is automatically performed by the task scheduler 108… The following is example code for a user adding a task group to the scheduler as a basis for the user task group. To add a task group to be scheduled, a typical user will call an interface function from the scheduler such as using the command “AddTaskGroup( )” In this example the scheduler logic 130 acquires a user task group reading the priority level and the maximum number of worker threads assigned to the task group… The worker thread pool or software thread pool 104 is responsible for providing a set of worker threads that map directly or indirectly to the underlying system hardware threads. For example, in FIG. 1, the workers 140-154 in the worker pool 104 are mapped directly to the hardware threads associated with one of the processing cores 110, 112, 114, and 116. These worker threads are coordinated by the scheduler logic 130 and are used to execute the user tasks that have been stored within the task groups in the hierarchy 400 shown in FIG. 4.” The hierarchy storing task groups which contain user tasks correlates to a program block table. The user adding the task group through calling an interface function correlates to the program block table of the computer program. The scheduler automatically interacting with the hierarchy to coordinate worker threads to execute user tasks correlates to accessing the program block table);

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with a method implemented by a processor, the method comprising: receiving an instruction processing request to process a first instruction set of a computer program; and obtaining the first instruction set from the first instruction cache as taught by Hirota because pre-fetching functionality can reduce the resolution dependency latency between producer and consumer tasks. Pre-fetching information stored in caches can also reduce the overall time required to execute the workload. Tasks comprising a set of program instructions can also be loaded into any type of memory or cache (Hirota: paragraphs 23 and 60).

Additionally, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with accessing a program block table of the computer program; determining from an instruction engine group of the processor, and based on the program block table, a first instruction engine for processing the first instruction set, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine as taught by Kipp because hierarchies allow utilization of a layered approach to synchronization where mechanisms with a much lower cost and complexity can be applied to multi-core systems to allow for parallel processing (Kipp: Col. 6, lines 14-22).

With regards to Claim 21, the method of Claim 1 performs the same steps as the method of Claim 21, and Claim 21 is therefore rejected using the same rationale set forth above in the rejection of Claim 1.

With regards to Claim 10, Sakthivel in view of Hirota and Kipp teaches the method of Claim 1 above. Hirota further teaches:
sending when the first instruction cache detects an end indicator of the first instruction set, scheduling information, wherein the scheduling information indicates that the first instruction engine can process a next instruction processing request (Paragraphs 73, 78, 86, 91, and 94, “In alternate embodiments, the active list 420 may be implemented in conjunction with any number of caches that are associated with the PPU 202 at any level of granularity… For instance, in some embodiments, to add the task descriptor 430(x) to the active list 420, the dependency/prefetch unit 410 may cache any portion of the task descriptor 430(x) across any number and type of caches and then indicate to the scheduling/launch unit 490 that the task descriptor 430(x) is active… As described in greater detail below, instructions for task X may be cached in an instruction cache (or an instruction/constant cache) as the result of a previously-executed instruction prefetch... After the memory flush for task X is complete, the dependency/prefetch unit 410 executes a count update for task X. For each consumer task descriptor 430 associated with task X, the dependency/prefetch unit 410 decrements the current count 444. If, after decrementing the current count 444 of the task descriptor 430(y), the current count 444 is zero, then the dependency/prefetch unit 410 adds the task descriptor 430(y) to the active list 420… After executing the count update for task X, the dependency/prefetch unit 410 removes the task descriptor 430(x) from the active list 420… In the same or other embodiments, any number of the host interface unit 232, the task management unit 234, the dependency prefetch unit 410, the scheduling/launch unit 490, and the work/distribution unit 236 in any combination may implement the prefetch and self-reset functionality described herein.” The task descriptor and active list is cached across any type of cache including an instruction cache. The task descriptor for task X being removed from the active list correlates to the cache detecting an end indicator of the first instruction set. The count update decrementing to zero for a related consumer task causing the dependency/prefetch unit to add a task descriptor Y to the active list correlates to sending scheduling information indicating the first instruction engine can process a next instruction processing request to the instruction cache).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with sending when the first instruction cache detects an end indicator of the first instruction set, scheduling information, wherein the scheduling information indicates that the first instruction engine can process a next instruction processing request as taught by Hirota because tracking dependencies in task graphs efficiently to indicate when a dependent task can run increases the overall performance of a conventional task management unit (Hirota: paragraph 59).

With regards to Claim 11, Sakthivel teaches:
A processor comprising an instruction cache group comprising a plurality of instruction caches (Paragraph 246, “In some embodiments, execution units 2352A-2352B have an attached L1 cache 2351 that is specific for each array or shared between the arrays. The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.” The execution units 2352A-2352B each having a specific L1 cache which can be configured as an instruction cache correlates to an instruction cache group comprising a plurality of instruction caches);
an instruction engine group comprising a plurality of instruction engines, wherein the plurality of instruction caches in the instruction cache group are in a one-to- one correspondence with the plurality of instruction engines in the instruction engine group (Paragraph 246, “In some embodiments, execution units 2352A-2352B have an attached L1 cache 2351 that is specific for each array or shared between the arrays. The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.” The execution units 2352A-2352B correlate to an instruction engine group comprising a plurality of instruction engines. The execution units each having a specific L1 cache which can be configured as an instruction cache correlates to the plurality of instruction caches being in a one-to-one correspondence with the plurality of instruction engines);
determine a first instruction engine based on the instruction processing request (Paragraphs 68 and 142, “In one embodiment, the instruction cache 252 receives a stream of instructions to execute from the pipeline manager 232. The instructions are cached in the instruction cache 252 and dispatched for execution by the instruction unit 254. The instruction unit 254 can dispatch instructions as thread groups (e.g., warps), with each thread of the thread group assigned to a different execution unit within GPGPU core 262… The compiler/driver may identify and mark (i.e., encode) instructions that are more efficiently executed in the compute block 750. When these instructions are fetched out of the cache 752 they will be routed to the compute unit 752 for the operation instead of getting directed to the standard execution units.” The pipeline manager sending a stream of instructions to execute as thread groups, which are assigned to different execution units within GPGPU core based on which instructions are more efficiently executed in a particular compute block, correlates to determining a first instruction engine based the instruction processing request);
and send the instruction processing request to a first instruction cache corresponding to the first instruction engine, wherein the first instruction engine processes the first instruction set in the instruction engine group (Paragraphs 68 and 246, “In one embodiment, the instruction cache 252 receives a stream of instructions to execute from the pipeline manager 232. The instructions are cached in the instruction cache 252 and dispatched for execution by the instruction unit 254. The instruction unit 254 can dispatch instructions as thread groups (e.g., warps), with each thread of the thread group assigned to a different execution unit within GPGPU core 262… In some embodiments, execution units 2352A-2352B have an attached L1 cache 2351 that is specific for each array or shared between the arrays. The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.” The pipeline manager sending a stream of instructions to the instruction cache for execution correlates to sending the instruction processing request to a first instruction cache for the first instruction engine to process. The execution units each having a specific L1 cache which can be configured as an instruction cache correlates to the first instruction cache corresponding to the first instruction engine of an instruction cache group of the processor),

Sakthivel does not explicitly teach that determining the first instruction engine for processing the first instruction set is based on a program block table, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine. However, program block tables are a popular method of determining first instruction engines for processing instruction sets as evidenced by Kipp above (Fig. 4, Col. 10, lines 65-67, Col. 11, lines 1-5, and Col. 12, lines 18-27).

Sakthivel does not explicitly teach:
and a program block dispatcher configured to: receive an instruction processing request to process a first instruction set of a computer program;
access a program block table of the computer program;
and wherein the first instruction engine is configured to obtain the first instruction set from the first instruction cache 
However, Hirota teaches:
and a program block dispatcher configured to: receive an instruction processing request to process a first instruction set of a computer program (Paragraphs 23 and 73-74, “A task is a set of program instructions that may be loaded into any type of memory of cache… For instance, in some embodiments, to add the task descriptor 430(x) to the active list 420, the dependency/prefetch unit 410 may cache any portion of the task descriptor 430(x) across any number and type of caches and then indicate to the scheduling/launch unit 490 that the task descriptor 430(x) is active. After the dependency/prefetch unit 410 adds the task descriptor 430(x) to the active list 420, the scheduling/launch unit 490 initiates a task launch of task X as per the task descriptor 430(x). The task launch of task X is a setup activity that precedes the execution of task X.” The dependency/prefetch unit indicating to the scheduling/launch unit that the specific task descriptor 430(x) is active and added to the active list corresponds to the program block dispatcher receiving an instruction processing request. Each task such as 430(x) described by the respective task descriptor comprising a set of program instructions correlates to the instruction processing request for processing a first instruction set of a computer program);
and wherein the first instruction engine is configured to obtain the first instruction set from the first instruction cache (Paragraphs 73-74, and 78, “For instance, in some embodiments, to add the task descriptor 430(x) to the active list 420, the dependency/prefetch unit 410 may cache any portion of the task descriptor 430(x) across any number and type of caches and then indicate to the scheduling/launch unit 490 that the task descriptor 430(x) is active… The task launch may involve, without limitation, any number of operations (e.g., CTA launch, warp launch, etc.) that prepare any number of the SMs 310 to participate in the task execution of task X… After the task launch of task X completes, the scheduling/launch unit 490 ensures that instructions for task X are cached. As described in greater detail below, instructions for task X may be cached in an instruction cache (or an instruction/constant cache) as the result of a previously-executed instruction prefetch... After the scheduling/launch unit 490 ensures that constants for task X are cached, the scheduling/launch unit 490 initiates a task execution of task X. During the task execution of task X, each of any number of SMs 310 executes one or more threads of task X.” The SM participating in the task execution of task X correlates to a first instruction engine. The task descriptor for task X, which includes instructions for task X, being cached in an instruction cache by the dependency/prefetch unit to prepare any number of SMs to participate in the task execution of task X correlates to the first instruction engine obtaining the first instruction set from the first instruction cache).

Additionally, Kipp teaches:
access a program block table of the computer program (Fig. 4, Col. 10, lines 52-54, Col. 11, lines 47-54, Col. 12, lines 18-27, “A user does not typically need to interact directly with the task group queues or the task group queue hierarchy as it is automatically performed by the task scheduler 108… The following is example code for a user adding a task group to the scheduler as a basis for the user task group. To add a task group to be scheduled, a typical user will call an interface function from the scheduler such as using the command “AddTaskGroup( )” In this example the scheduler logic 130 acquires a user task group reading the priority level and the maximum number of worker threads assigned to the task group… The worker thread pool or software thread pool 104 is responsible for providing a set of worker threads that map directly or indirectly to the underlying system hardware threads. For example, in FIG. 1, the workers 140-154 in the worker pool 104 are mapped directly to the hardware threads associated with one of the processing cores 110, 112, 114, and 116. These worker threads are coordinated by the scheduler logic 130 and are used to execute the user tasks that have been stored within the task groups in the hierarchy 400 shown in FIG. 4.” The hierarchy storing task groups which contain user tasks correlates to a program block table. The user adding the task group through calling an interface function correlates to the program block table of the computer program. The scheduler automatically interacting with the hierarchy to coordinate worker threads to execute user tasks correlates to accessing the program block table);

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with a program block dispatcher configured to: receive an instruction processing request to process a first instruction set and wherein the first instruction engine is configured to obtain the first instruction set from the first instruction cache as taught by Hirota because pre-fetching functionality can reduce the resolution dependency latency between producer and consumer tasks. Pre-fetching information stored in caches can also reduce the overall time required to execute the workload. Tasks comprising a set of program instructions can also be loaded into any type of memory or cache (Hirota: paragraphs 23 and 60).

Additionally, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with access a program block table of the computer program; determine a first instruction engine based on the program block table, wherein the program block table specifies a mapping relationship between the first instruction set and the first instruction engine as taught by Kipp because hierarchies allow utilization of a layered approach to synchronization where mechanisms with a much lower cost and complexity can be applied to multi-core systems to allow for parallel processing (Kipp: Col. 6, lines 14-22).

Claim(s) 2, 4, 9, 12-13, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable by Sakthivel in view of Hirota, Kipp and Chang et al. (U.S. Patent No. US2015324234A1), hereinafter “Chang.”

With regards to Claim 2, Sakthivel in view of Hirota and Kipp teaches the method of Claim 1 above. Sakthivel in view of Hirota and Kipp does not explicitly teach:
wherein determining the first instruction engine based on the instruction processing request comprises: obtaining based on the instruction processing request, an alternative instruction engine for processing the first instruction set; and 
selecting the alternative instruction engine as the first instruction engine.
However, Chang teaches:
wherein determining the first instruction engine based on the instruction processing request comprises: obtaining based on the instruction processing request, an alternative instruction engine for processing the first instruction set (Fig. 8, paragraphs 26-27, 33, 53 and 58, “Hence, in the present invention, a thread group may be defined as having a plurality of tasks sharing same specific data, for example, in the main memory 119 and/or accessing same specific memory address(es), for example, in the main memory 119. A task can be a single-threaded process or a thread of a multi-threaded process… Based on above observation, the proposed task scheduling method may be aware of the cache coherence overhead when controlling one task to migrate from one cluster to another cluster. Thus, the proposed task scheduling method may be a thread group aware task scheduling scheme which checks characteristics of a thread group when dispatching a task of the thread group to one of the clusters… By way of example, but not limitation, the tasks may include programs, application program sub-components, or a combination thereof… FIG. 8 is a diagram illustrating a sixth task scheduling operation which makes one task that belongs to a thread group migrate from a run queue of a processor core (e.g., a heaviest-loaded processor core) in one cluster to a run queue of a processor core (e.g., an idle processor core) in another cluster… As shown in FIG. 8, the thread group includes a first task (e.g., task P82 ) selected as a candidate task for task migration... The distribution of the first task and the second tasks belonging to the same thread group is checked… The first task is included in one run queue of the cluster Cluster_ 0 . Based on the checking result of the distribution of first task and second tasks, the scheduling unit 104 may judge that the candidate task should migrate from a current cluster to a different cluster. The scheduling unit 104 may make the task P82 migrate from the run queue RQ1 of the processor core CPU_ 1 (which is the heaviest-loaded processor core among the selected processor cores) to the run queue RQ5 of the processor core CPU_ 5 (which is the processor core that triggers the load balance procedure).” The processor cores CPU_1 and CPU_5 of clusters Cluster_0 and Cluster_1 correspond to the first and alternative instruction engines respectively. The task P82, which can include a combination of programs or application program sub-components, corresponds to the first instruction set. The task scheduling method checking characteristics of a thread group when dispatching or migrating tasks corresponds to obtaining an alternative instruction engine based on the instruction processing request. The scheduling unit judging that the candidate task should migrate from the processor core CPU_1 to the processor core CPU_5 corresponds to obtaining an alternative instruction engine for processing the first instruction set); and 
selecting the alternative instruction engine as the first instruction engine (Fig. 8, paragraph 58, “Based on the checking result of the distribution of first task and second tasks, the scheduling unit 104 may judge that the candidate task should migrate from a current cluster to a different cluster. The scheduling unit 104 may make the task P82 migrate from the run queue RQ1 of the processor core CPU_ 1 (which is the heaviest-loaded processor core among the selected processor cores) to the run queue RQ5 of the processor core CPU_ 5 (which is the processor core that triggers the load balance procedure).” The scheduling making the task P82 migrate from the processor core CPU_1 to the processor core CPU_5 corresponds to selecting the alternative instruction engine).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein determining the first instruction engine based on the instruction processing request comprises: obtaining based on the instruction processing request, an alternative instruction engine for processing the first instruction set; and selecting the alternative instruction engine as the first instruction engine as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

With regards to Claim 13, the method of Claim 2 performs the same steps as the machine of Claim 13, and Claim 13 is therefore rejected using the same rationale set forth above in the rejection of Claim 2.

With regards to Claim 4, Sakthivel in view of Hirota, Kipp and Chang teaches the method of Claim 2 above. Chang further teaches:
wherein the instruction engine group comprises a first alternative instruction engine group and a second instruction engine group (Paragraph 23, “Regarding the clusters 112 _ 1 - 112 _N, each cluster may be a group of processor cores. For example, the cluster 112 _ 1 may include one or more processor cores 117 , each having the same processor architecture with the same computing power; and the cluster 112 _N may include one or more processor cores 118 , each having the same processor architecture with the same computing power. In one example, the processor cores 117 may have different processor architectures with different computing power. In another example, the processor cores 118 may have different processor architectures with different computing power.” The clusters 112_1 to 112_N include multiple clusters and therefore at least correlate to a first alternative and second instruction engine group), and wherein obtaining the alternative instruction engine of the first instruction set based on the instruction processing request comprises using when the first instruction set is an instruction set on a performance path, a first selected instruction engine in the first alternative instruction engine group or a second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set (Fig. 8, paragraphs 26-27, 33, 53 and 58, “Hence, in the present invention, a thread group may be defined as having a plurality of tasks sharing same specific data, for example, in the main memory 119 and/or accessing same specific memory address(es), for example, in the main memory 119. A task can be a single-threaded process or a thread of a multi-threaded process… Based on above observation, the proposed task scheduling method may be aware of the cache coherence overhead when controlling one task to migrate from one cluster to another cluster. Thus, the proposed task scheduling method may be a thread group aware task scheduling scheme which checks characteristics of a thread group when dispatching a task of the thread group to one of the clusters… By way of example, but not limitation, the tasks may include programs, application program sub-components, or a combination thereof… FIG. 8 is a diagram illustrating a sixth task scheduling operation which makes one task that belongs to a thread group migrate from a run queue of a processor core (e.g., a heaviest-loaded processor core) in one cluster to a run queue of a processor core (e.g., an idle processor core) in another cluster… As shown in FIG. 8, the thread group includes a first task (e.g., task P82 ) selected as a candidate task for task migration... The distribution of the first task and the second tasks belonging to the same thread group is checked… The first task is included in one run queue of the cluster Cluster_ 0 . Based on the checking result of the distribution of first task and second tasks, the scheduling unit 104 may judge that the candidate task should migrate from a current cluster to a different cluster. The scheduling unit 104 may make the task P82 migrate from the run queue RQ1 of the processor core CPU_ 1 (which is the heaviest-loaded processor core among the selected processor cores) to the run queue RQ5 of the processor core CPU_ 5 (which is the processor core that triggers the load balance procedure).” The processor cores CPU_1 and CPU_5 of clusters Cluster_0 and Cluster_1 correspond to the first and alternative instruction engines in the instruction engine and alternative instruction engine group respectively. The task P82, which can include a combination of programs or application program sub-components, corresponds to the first instruction set on a performance path. The task scheduling method checking characteristics of a thread group when dispatching or migrating tasks corresponds to obtaining an alternative instruction engine based on the instruction processing request. The scheduling unit judging that the candidate task should migrate from the processor core CPU_1 to the processor core CPU_5 corresponds to using a first selected instruction engine in the alternative instruction engine group).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the instruction engine group comprises a first alternative instruction engine group and a second instruction engine group, and wherein obtaining the alternative instruction engine of the first instruction set based on the instruction processing request comprises using when the first instruction set is an instruction set on a performance path, a first selected instruction engine in the first alternative instruction engine group or a second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

With regards to Claim 15, the method of Claim 4 performs the same steps as the machine of Claim 15, and Claim 15 is therefore rejected using the same rationale set forth above in the rejection of Claim 4.

With regards to Claim 9, Sakthivel in view of Hirota, Kipp and Chang teaches the method of Claim 2 above. Chang further teaches:
wherein selecting the alternative instruction engine as the first instruction engine comprises selecting the alternative instruction engine corresponding to an instruction processing request queue with a minimum queue depth as the first instruction engine (Fig. 8, paragraphs 53 and 56, “FIG. 8 is a diagram illustrating a sixth task scheduling operation which makes one task that belongs to a thread group migrate from a run queue of a processor core (e.g., a heaviest-loaded processor core) in one cluster to a run queue of a processor core (e.g., an idle processor core) in another cluster. Assume that the processor core CPU_5 triggers a load balance procedure due to empty run queue or timer expiration… Consider a case where the task P.sub.81 is selected as a candidate task to migrate from a current cluster Cluster_0 to a different cluster Cluster_1... Hence, the scheduling unit 104 may perform the proposed thread group aware task scheduling scheme to determine whether to make one task (e.g., P.sub.81 or P.sub.82) of the thread group migrate from the run queue RQ.sub.1 of the processor core CPU_1 (which is the busiest processor core among the selected processor cores) to the run queue RQ.sub.5 of the processor core CPU_5 (which is the processor core that triggers the load balance procedure, and is, for example, the idlest processor core) for cache coherence overhead reduction.” The scheduling unit making the task P81 migrate from the processor core CPU_1 to the processor core CPU_5 based on CPU_5 having the idlest processor core and an empty run queue corresponds to selecting the alternative instruction engine with a minimum queue depth).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein selecting the alternative instruction engine as the first instruction engine comprises selecting the alternative instruction engine corresponding to an instruction processing request queue with a minimum queue depth as the first instruction engine as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

With regards to Claim 20, the method of Claim 9 performs the same steps as the machine of Claim 20, and Claim 20 is therefore rejected using the same rationale set forth above in the rejection of Claim 9.

With regards to Claim 12, Sakthivel in view of Hirota and Kipp teaches the method of Claim 11 above. Sakthivel further teaches:
wherein the plurality of instruction processing request queues are further in a one-to-one correspondence with the plurality of caches (Paragraphs 91 and 104, “As mentioned, in the illustrated embodiment, one or more graphics memories 433-434, M are coupled to each of the graphics processing engines 431-432, N, respectively. The graphics memories 433-434, M store instructions and data being processed by each of the graphics processing engines 431-432, N. The graphics memories 433-434, M may be volatile memories such as DRAMs (including stacked DRAMs), GDDR memory (e.g., GDDR5, GDDR6), or HBM, and/or may be non-volatile memories such as 3D XPoint or Nano-Ram… In one embodiment, each WD 484 is specific to a particular graphics acceleration module 446 and/or graphics processing engine 431-432, N. It contains all the information a graphics processing engine 431-432, N requires to do its work or it can be a pointer to a memory location where the application has set up a command queue of work to be completed.” The WDs which points to a command queue being specific to a particular graphics processing engine correlate to the processing request queues being in a one-to-one correspondence with the instruction engine. The graphics memories coupled to each of the graphics processing engines which are volatile memories correlate to caches in a one-to-one correspondence with the instruction engine. Therefore, the WDs and graphics memories are in a one-to-one correspondence with the graphics processing engine and correlate to the instruction processing request queues in a one-to-one correspondence with the caches);
the program block dispatcher is further configured to determine, based on an instruction processing request queue corresponding to the first instruction engine, the first cache corresponding to the first instruction engine (Paragraph 118, “One mechanism for changing the bias state employs an API call (e.g. OpenCL), which, in turn, calls the GPU's device driver which, in turn, sends a message (or enqueues a command descriptor) to the GPU directing it to change the bias state and, for some transitions, perform a cache flushing operation in the host. The cache flushing operation is required for a transition from host processor 405 bias to GPU bias, but is not required for the opposite transition.” The GPU driver enqueuing a command descriptor to the GPU correlates to an instruction processing request queue corresponding to the first instruction engine. The command descriptor directing the GPU to perform a cache flushing operation correlate to determining the first cache corresponding to the first instruction engine based on an instruction processing request queue).

Sakthivel does not explicitly teach that the caches are instruction caches. However, instruction caches are a popular type of cache as evidenced by Hirota above (Paragraph 78). 

Sakthivel in view of Hirota and Kipp does not explicitly teach:
a plurality of instruction processing request queues in a one-to-one correspondence with the plurality of instruction engines, 
However, Chang teaches:
a plurality of instruction processing request queues in a one-to-one correspondence with the plurality of instruction engines (Paragraph 33, “Each processor core of the multi-core processor system 10 may be given a run queue managed by the scheduling unit 104 . Hence, when the multi-core processor system 10 has M processor cores, the scheduling unit 104 may manage M run queues 105 _ 1 - 105 _M for the M processor cores, respectively, where M is a positive integer and may be adjusted based on actual design consideration. The run queue may be a data structure which records a list of tasks, where the tasks may include a task that is currently running (e.g., a running task) and other task(s) waiting to run (e.g., runnable task(s)).” Each processor core given a run queue by the scheduling unit correlates to a plurality of instruction processing request queues in a one-to-one correspondence with the plurality of instruction engines),

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with a plurality of instruction processing request queues in a one-to-one correspondence with the plurality of instruction engines as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

Claim(s) 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable by Sakthivel in view of Hirota, Kipp, Chang and Vincent et al. (U.S. Patent No. US 9298504 B1), hereinafter “Vincent.”

With regards to Claim 3, Sakthivel in view of Hirota, Kipp and Chang teaches the method of Claim 2 above. Chang further teaches:
wherein the instruction engine group comprises a first alternative instruction engine group (Paragraph 23, “Regarding the clusters 112 _ 1 - 112 _N, each cluster may be a group of processor cores. For example, the cluster 112 _ 1 may include one or more processor cores 117 , each having the same processor architecture with the same computing power; and the cluster 112 _N may include one or more processor cores 118 , each having the same processor architecture with the same computing power. In one example, the processor cores 117 may have different processor architectures with different computing power. In another example, the processor cores 118 may have different processor architectures with different computing power.” Each of the clusters having a group of processor cores correlates to the instruction engine group comprising a first alternative instruction engine group),

Sakthivel in view of Hirota, Kipp and Chang does not explicitly teach:
and wherein obtaining the alternative instruction engine of the first instruction set based on the instruction processing request comprises 
using when the first instruction set is an instruction set on a non-performance path, a selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction set.
	However, Vincent teaches:
and wherein obtaining the alternative instruction engine of the first instruction based on the instruction processing request comprises 
using when the first instruction is an instruction on a non-performance path (Col. 2, line 67 and Col. 3, lines 1-12, “In addition, tasks may have relative priorities amongst themselves. For purposes of this discussion, tasks will be referred to as “low-priority” (Lo) tasks and “high-priority” (Hi) tasks. The terms “low-priority” and “high-priority” indicate relative priorities rather than absolute priorities. Thus, although a given “high-priority” task may have priority over a given “low-priority” task, other tasks may have yet higher or lower priorities. In certain situations, a task may be referred to as a “very-low-priority” (VLo) task, indicating that it has a priority lower than a “low-priority” task. In some situations, a “very-low-priority” task may comprise an idle or null task, or a task with the lowest possible priority.” The low and very low priority tasks, which include idle or null tasks, correlate to an instruction on a non-performance path), a selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction (Fig. 1, Col. 3, lines 12-20, 23-27, 33-38 and 59-66, “FIG. 1 shows a scenario in which the processor 102(a) is executing a low-priority task 108. A high-priority task 110 has been placed in the task queue 104. For purposes of discussion, it will be assumed that the high-priority task 110 is a task that is to be executed by a particular one of the processors, such as the processor 102(a). Furthermore, the high-priority task 110 is not compatible with being executed by any of other processors 102(b) and 102(c)... Prior to executing the high-priority task 110, the processor 102(a) will preempt the currently executing low-priority task 108 and place it in the task queue 104. In this example, it is assumed that the other processors—processors 102(b) and 102(c)—are executing very-low-priority tasks 112… In order to execute the high-priority task 110 on the processor 102(a), the scheduler 106 sends an inter-processor interrupt (IPI) or other request to the processor 102(a), indicating that the processor 102(a) should interrupt its processing of the low-priority task 108 and inspect the task queue 104 for the presence of a high-priority task… Furthermore, the scheduler 106 may account for multiple levels of preemption that may be triggered by execution of the high-priority task 110. For example, the low-priority task 108 that is preempted by the high-priority task 110 may itself cause preemption of a very-low-priority task 112. In a situation like this, the scheduler 106 may send interrupts to any processors 102 that will eventually receive new tasks as a result of the initial preemption.” The low-priority task initially being executed on processor 102(a) correlates to the first instruction executing on a first instruction engine. The processor 102(a) being compatible with the high priority task but not the processors 102(b) or (c) shows that processors 102(b) and (c) are different from (a) and therefore correlate to an alternative engine group. The scheduler sending an interrupt to processor 102(a) to preempt the low priority task correlates to the first instruction not being executed on the first instruction engine anymore. The scheduler than sending interrupts to the other processors 102(b) and (c) which are currently executing even lower priority tasks and causing the processors 102(b) and (c) to receive new tasks including the low-priority task correlates to using a selected instruction engine in the first alternative engine group as the alternative instruction engine).

	Vincent does not explicitly teach that the first instruction is a first instruction set. However, instruction sets are a popular unit of tasks as evidenced by Hirota above (paragraph 23).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the instruction engine group comprises a first alternative instruction engine group as taught by Chang because processor cores may have the same architecture and computing power or different architecture and computing power, which offers flexibility through a heterogeneous multi-core architecture (Chang: paragraph 23).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein obtaining the alternative instruction engine of the first instruction set based on the instruction processing request comprises using when the first instruction set is an instruction set on a non-performance path, a selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction set as taught by Vincent because tasks can have processor affinities that require execution through certain processors. Relative priorities within tasks can be used to determine when tasks should be executed by various processors (Vincent: Col. 2, lines 65-67 and Col. 3, line 1 and lines 55-58).

With regards to Claim 14, the method of Claim 3 performs the same steps as the machine of Claim 14, and Claim 14 is therefore rejected using the same rationale set forth above in the rejection of Claim 3.

Claim(s) 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable by Sakthivel in view of Hirota, Kipp, Chang and Rosen et al. (U.S. Patent No. US 20200210230 A1), hereinafter “Rosen.”

With regards to Claim 5, Sakthivel in view of Hirota, Kipp and Chang teaches the method of Claim 4 above. Chang further teaches:
wherein using the first selected instruction engine in the first alternative instruction engine group or the second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set comprises:
using when a first condition is met, the first selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction set (Fig. 8, paragraph 53, “FIG. 8 is a diagram illustrating a sixth task scheduling operation which makes one task that belongs to a thread group migrate from a run queue of a processor core (e.g., a heaviest-loaded processor core) in one cluster to a run queue of a processor core (e.g., an idle processor core) in another cluster. Assume that the processor core CPU_ 5 triggers a load balance procedure due to empty run queue or timer expiration.” Processor core CPU_5 triggering a load balance procedure due to an empty run queue or timer expiration corresponds to a first condition being met. The load balancing procedure causing one task to migrate from a run queue of a processor core in one cluster to a different run queue of a processor core in another cluster corresponds to using the first selected instruction engine in the first alternative instruction engine group),

Sakthivel in view of Hirota, Kipp and Chang does not explicitly teach:
wherein the first condition is that a queue depth of an instruction processing request queue corresponding to at least one instruction engine in the first alternative instruction engine group is less than a first preset threshold.
However, Rosen teaches:
wherein the first condition is that a queue depth of an instruction processing request queue corresponding to at least one instruction engine in the first alternative instruction engine group is less than a first preset threshold (Paragraphs 49 and 87, “According to the example embodiment of FIG. 1, an array of processors 116, configured as a mesh network, executes tasks that the array extracts from processor queues 114. A separate processor of processors-array 116 is allocates to each processor-queue… Q-Depth threshold—if queue length is above Q-Depth threshold, affinity is not preserved, and tasks will spill to other queues, reducing the load and latency on the present queue. If queue length is below the Q-Depth threshold, affinity will be observed, and more tasks will be directed to the present queue.” The array of processors corresponds to the first alternative instruction engine group. Each processor being allocated its own processor queue for executing tasks correlates to an instruction processing request queue corresponding to at least one instruction engine in the first alternative instruction engine group. The queue length being below the Q-Depth threshold causing more tasks to be directed to the present queue correlates to the first condition being that the queue depth is less than a first preset threshold).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein using the first selected instruction engine in the first alternative instruction engine group or the second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set comprises: using, when a first condition is met, the first selected instruction engine in the first alternative instruction engine group as the alternative instruction engine of the first instruction set as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

Additionally, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the first condition is that a queue depth of an instruction processing request queue corresponding to at least one instruction engine in the first alternative instruction engine group is less than a first preset threshold as taught by Rosen because task processing with low latency and low drop rate is achieved through suitable load balancing of processors. Affinity and process priority are traded off when queue lengths grow, and processing tasks of the same flow is ideally done in the same processor or same two processors (Rosen: paragraph 45).

With regards to Claim 16, Sakthivel in view of Hirota, Kipp and Chang teaches the method of Claim 15 above. Chang further teaches:
wherein the program block dispatcher is further configured to:
use, when a second condition is met, the second instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set (Fig. 8, paragraph 53, “FIG. 8 is a diagram illustrating a sixth task scheduling operation which makes one task that belongs to a thread group migrate from a run queue of a processor core (e.g., a heaviest-loaded processor core) in one cluster to a run queue of a processor core (e.g., an idle processor core) in another cluster. Assume that the processor core CPU_ 5 triggers a load balance procedure due to empty run queue or timer expiration.” Processor core CPU_5 triggering a load balance procedure due to an empty run queue or timer expiration corresponds to a second condition being met. The load balancing procedure causing one task to migrate from a run queue of a processor core in one cluster to a different run queue of a processor core in another cluster corresponds to using the second selected instruction engine in the second alternative instruction engine group),

Sakthivel in view of Hirota, Kipp and Chang does not explicitly teach:
wherein the second condition is that queue depths of instruction processing request queues corresponding to all instruction engines in the first alternative instruction engine group exceed a first preset threshold.
However, Rosen teaches:
wherein the second condition is that queue depths of instruction processing request queues corresponding to all instruction engines in the first alternative instruction engine group exceed a first preset threshold. (Paragraphs 48 and 87, “The classifier generates descriptors of the tasks (comprising an indicator to the target processor, and other information), and sends the descriptors to main queue 110, which, in the example embodiment of FIG. 1, comprises a First-In-First-Out (FIFO) memory.… Q-Depth threshold—if queue length is above Q-Depth threshold, affinity is not preserved, and tasks will spill to other queues, reducing the load and latency on the present queue. If queue length is below the Q-Depth threshold, affinity will be observed, and more tasks will be directed to the present queue.” The target processors correspond to the first alternative instruction engine group. The main queue holding task descriptors for all of the target processors correlates to an instruction processing request queue corresponding to all instruction engines in the first alternative instruction engine group. The queue length being above the Q-Depth threshold causing tasks to spill to other queues correlates to the second condition being that the queue depth exceeds a first preset threshold).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the program block dispatcher is further configured to: use, when a second condition is met, the second instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

Additionally, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the second condition is that queue depths of instruction processing request queues corresponding to all instruction engines in the first alternative instruction engine group exceed a first preset threshold as taught by Rosen because task processing with low latency and low drop rate is achieved through suitable load balancing of processors. Affinity and process priority are traded off when queue lengths grow, and processing tasks of the same flow is ideally done in the same processor or same two processors (Rosen: paragraph 45).


Claim(s) 6-8 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable by Sakthivel in view of Hirota, Kipp, Chang, Rosen and Smith et al. (U.S. Patent No. US 20200233706 A1), hereinafter “Smith.”

With regards to Claim 6, Sakthivel in view of Hirota, Kipp and Chang teaches the method of Claim 4 above. Chang further teaches:
wherein the second instruction engine group comprises a second alternative instruction engine group and a third alternative instruction engine group (Paragraph 23, “Regarding the clusters 112 _ 1 - 112 _N, each cluster may be a group of processor cores. For example, the cluster 112 _ 1 may include one or more processor cores 117 , each having the same processor architecture with the same computing power; and the cluster 112 _N may include one or more processor cores 118 , each having the same processor architecture with the same computing power. In one example, the processor cores 117 may have different processor architectures with different computing power. In another example, the processor cores 118 may have different processor architectures with different computing power.” The clusters 112_1 to 112_N include multiple clusters and therefore at least correlate to a second and third alternative instruction engine group),
and wherein using the second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set comprises: 
using the second selected instruction engine in the second alternative instruction engine group of the second instruction engine group as the alternative instruction engine of the first instruction set (Fig. 8, paragraphs 51 and 53, “FIG. 8 is a diagram illustrating a sixth task scheduling operation which makes one task that belongs to a thread group migrate from a run queue of a processor core (e.g., a heaviest-loaded processor core) in one cluster to a run queue of a processor core (e.g., an idle processor core) in another cluster. Assume that the processor core CPU_ 5 triggers a load balance procedure due to empty run queue or timer expiration… In another case where the multi-core processor system 10 has more than two clusters and/or at least one of the clusters 117 and 118 has more than four processor cores, the scheduling unit 104 merely treats some processor cores included in the multi-core processor system 10 as the selected processor cores CPU_ 0 -CPU_ 7 shown in FIG. 8-FIG. 11.” The load balancing procedure causing one task to migrate from a run queue of a processor core in one cluster to a different run queue of a processor core in another cluster corresponds to using the second selected instruction engine in the second alternative instruction engine group);

Sakthivel in view of Hirota, Kipp and Chang does not explicitly teach:
and adding when a third condition is met, at least one instruction engine in the third alternative instruction engine group to the second alternative instruction engine group,
wherein the third condition is that the second alternative instruction engine group is empty, or queue depths of instruction processing request queues corresponding to all the instruction engines in the second alternative instruction engine group exceed a second preset threshold

However, Smith teaches:
and adding when a third condition is met, at least one engine in the third alternative engine group to the second alternative engine group (Paragraph 95, “In step 510, it is detected that the time remaining to complete the job is less than a threshold amount of time. For example, it may be detected that the time remaining to complete the job is less than thirty minutes. In step 512, a second set of nodes different from the first set of nodes for executing the plurality of tasks is determined in response to detecting that the time remaining to complete the job is less than the threshold amount of time. In one example, the second set of nodes may comprise the first set of nodes plus an additional one or more nodes within the cluster. The second set of nodes may comprise twice the number of nodes as the first set of nodes; for example, the first set of nodes may comprise two nodes within the cluster and the second set of nodes may comprise four nodes within the cluster.” The detection of the time remaining to complete the job being less than a threshold amount of time correlates to a third condition being met. The first and second set of nodes corresponds to the third and second alternative engine groups respectively. The second set of nodes being determined in response to the detection, where the second set of nodes comprises the first set of nodes plus at least one additional node, correlates to adding at least one engine in the third alternative engine group to the second alternative engine group),

Smith does not explicitly teach that the engine groups are instruction engine groups. However, instruction engine groups are a popular method for executing instructions as evidenced by Sakthivel above (paragraph 246).

Additionally, Rosen teaches:
wherein the third condition is that the second alternative instruction engine group is empty, or queue depths of instruction processing request queues corresponding to all the instruction engines in the second alternative instruction engine group exceed a second preset threshold (Paragraphs 49 and 87, “According to the example embodiment of FIG. 1, an array of processors 116, configured as a mesh network, executes tasks that the array extracts from processor queues 114. A separate processor of processors-array 116 is allocates to each processor-queue… Q-Depth threshold—if queue length is above Q-Depth threshold, affinity is not preserved.” The array of processors corresponds to the second alternative instruction engine group. Each processor being allocated its own processor queue for executing tasks correlates to an instruction processing request queue. In the configuration of the array of processors consisting of a single processor, the processor queue therefore corresponds to all instruction engines in the second alternative instruction engine group. The queue length being above the Q-Depth threshold correlates to the third condition being that the queue depth is exceeds a second preset threshold).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the second instruction engine group comprises a second alternative instruction engine group and a third alternative instruction engine group, and wherein using the second selected instruction engine in the second instruction engine group as the alternative instruction engine of the first instruction set comprises: using the second selected instruction engine in the second alternative instruction engine group of the second instruction engine group as the alternative instruction engine of the first instruction set as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

It would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel and adding when a third condition is met, at least one engine in the third alternative engine group to the second alternative engine group as taught by Smith because adjusting the size of sets of nodes within the cluster for executing a plurality of tasks can be based on the number of healthy nodes or a time remaining to complete the job. Estimating the time to complete a job allows task migration to occur in order for the job to be completed within a set threshold amount of time (Smith: paragraphs 94-95).

Additionally, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the third condition is that the second alternative instruction engine group is empty, or queue depths of instruction processing request queues corresponding to all the instruction engines in the second alternative instruction engine group exceed a second preset threshold as taught by Rosen because task processing with low latency and low drop rate is achieved through suitable load balancing of processors. Affinity and process priority are traded off when queue lengths grow, and processing tasks of the same flow is ideally done in the same processor or same two processors (Rosen: paragraph 45).

With regards to Claim 17, the method of Claim 6 performs the same steps as the machine of Claim 17, and Claim 17 is therefore rejected using the same rationale set forth above in the rejection of Claim 6.

With regards to Claim 7, Sakthivel in view of Hirota, Kipp, Chang, Smith and Rosen teach the method of Claim 6 above. Smith further teaches:
selecting in the third alternative engine group, at least one engine corresponding to an instruction processing request queue whose queue depth is less than a third preset threshold (Paragraphs 78 and 93, “A job may comprise a sequence of tasks that are to be executed using the plurality of nodes and the task queue may manage the execution of a subset of the tasks for the first node… In step 504, a number of healthy nodes within the cluster of data storage nodes is identified. In one example, the number of healthy nodes may comprise the total number of nodes within the cluster that are available for executing tasks. In another example, the number of healthy nodes may comprise the number of nodes within the cluster with a task queue length less than ten or less than an upper queue length threshold, such as the upper queue length threshold 418 in FIG. 4A.” The nodes and clusters correlate to the alternative engines and alternative engine groups respectively. The task queue for each node correlates to an instruction processing request queue. Determining a number of healthy nodes based on whether each node has a task queue length less than an upper queue length threshold correlates to selecting at least one engine corresponding to an instruction processing request queue whose queue depth is less than a third preset threshold); and
adding the at least one engine to the second alternative engine group (Paragraphs 94-95, “In step 506, a first set of nodes within the cluster for executing the plurality of tasks is determined based on the number of healthy nodes and/or a time remaining to complete the job… In step 510, it is detected that the time remaining to complete the job is less than a threshold amount of time. For example, it may be detected that the time remaining to complete the job is less than thirty minutes. In step 512, a second set of nodes different from the first set of nodes for executing the plurality of tasks is determined in response to detecting that the time remaining to complete the job is less than the threshold amount of time. In one example, the second set of nodes may comprise the first set of nodes plus an additional one or more nodes within the cluster. The second set of nodes may comprise twice the number of nodes as the first set of nodes; for example, the first set of nodes may comprise two nodes within the cluster and the second set of nodes may comprise four nodes within the cluster.” The second set of nodes corresponds to the second alternative engine group. The second set of nodes being determined in response to the detection, where the second set of nodes comprises the first set of nodes plus at least one additional node, correlates to adding at least one engine to the second alternative engine group).

It would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with selecting in the third alternative engine group, at least one engine corresponding to an instruction processing request queue whose queue depth is less than a third preset threshold; and adding the at least one engine to the second alternative engine group as taught by Smith because adjusting the size of sets of nodes within the cluster for executing a plurality of tasks can be based on the number of healthy nodes or a time remaining to complete the job. Estimating the time to complete a job allows task migration to occur in order for the job to be completed within a set threshold amount of time (Smith: paragraphs 94-95).

With regards to Claim 8, Sakthivel in view of Hirota, Kipp, Chang, Smith and Rosen teach the method of Claim 6 above. 
Rosen further teaches:
recording an instruction engine difference (Fig. 7, paragraphs 126 and 130, “In case of a spill, tasks belonging to a common flow are likely to be spilled together to another processor, or, in case of a FAT flow, by the same two (or more) processors… Circuit 700 compares the activity of a FAT flow 702 to the activity of a Candidate-FAT Flow 704. The comparison is done by a counter 706, which increments whenever FAT Flow 702 submits a task (to any processor queue), and decrements whenever Candidate-FAT Flow 704 submits a task. Counter 706 counts until it reaches a preset positive threshold, or a preset negative threshold, wherein the counter will be reset, and a new comparison will start.” The FAT flow, which describes tasks belonging to a common flow across multiple processors, being compared to a candidate FAT flow correlates to an instruction engine difference. The difference being kept track of through an incrementing counter correlate to recording the instruction engine difference); 
Smith further teaches:
and deleting when the engine difference exceeds a fourth preset threshold, all the engines in the second alternative engine group (Paragraph 15, “As the time remaining to complete a job, the estimated time to complete the job, and the number of healthy nodes within the cluster may vary over time (e.g., due to nodes being added to or removed from the cluster or due to task failures), the distributed job scheduler may periodically adjust the number of nodes used to execute the plurality of tasks… In some cases, upon detection that the time remaining to complete a job minus the estimated time to complete the job has risen above a threshold amount of time (e.g., there is more than thirty minutes to complete the job), then the maximum node parallelism limit for the job may be decreased (e.g., cut in half).” The detection of the time remaining to complete a job minus the estimated time to complete the job has risen above a threshold amount of time correlates to the engine difference exceeding a fourth preset threshold. The distributed job scheduled removing nodes from the cluster in response to the time remaining rising above a threshold correlate to deleting all the engines in the second alternative engine group when the engine difference exceeds a fourth threshold),

Chang further teaches:
wherein the instruction engine selection difference indicates a quantity difference between a first quantity of times of selecting the first selected instruction engine from the first alternative instruction engine group and a second quantity of times of selecting the second selected instruction engine from the second alternative instruction engine group (Paragraphs 50-52, “For example, a processor core load of the processor core that triggers the current load balance procedure may be compared with processor core loads of other processor cores in the selected processor cores. When a specific processor core of the selected processor cores has a processor core load heavier than that possessed by the processor core that triggers the load balance procedure... In a case where the multi-core processor system 10 has only two clusters 112_1 and 112_N (N=2) denoted by Cluster_0 and Cluster_1, respectively; one cluster 112_1 denoted by Cluster_0 has only four processor cores 117 denoted by CPU_0, CPU_1, CPU_2, and CPU_3, respectively; and the other cluster 112_N denoted by Cluster_1 has only four processor cores 118 denoted by CPU_4, CPU_5, CPU_6, and CPU_7, respectively. In this case, all of the processor cores included in the multi-core processor system 10 may be treated as selected processor cores… In the examples of FIG. 3-FIG. 7, a load balance procedure may be executed when there is a new task or a resumed task (e.g., a waking task currently being woken up) that is not included in any run queue of the multi-core processor system 10 and thus required to be added to one run queue of the multi-core processor system 10 for execution… For example, when the task scheduler 100 finds that there are no task(s) in run queue(s) of the multi-core processor system 10, a load balance procedure may be executed to pull a task from a run queue of a busier processor core among the selected processor cores, such as a busiest processor core (i.e., a heaviest-loaded processor core) among the selected processor cores, to a run queue of an idle processor core with no running task and/or runnable task (which may be a processor core that triggers the load balance procedure due to its empty run queue).” The processor core load is based on the tasks in the run queue for the particular processor core. Tasks are added to a processor core’s run queue each time the processor core is chosen to execute the particular task. Each processor core belongs to a particular cluster and correlates to an instruction engine in an instruction engine group. When a load balance procedure is triggered, comparing the processor core load across all other processor cores in different clusters to find a more idle processor core correlates to an instruction engine selection difference indicating a quantity difference between first and second quantity of times a first and second selected instruction engine was selected).

Rosen and Smith do not explicitly teach that the instruction engine difference and engine difference are instruction engine selection differences respectively. However, instruction engine selection differences are a popular comparison used to between instruction engines as evidenced by Chang above (paragraphs 50-52).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with recording an instruction engine difference as taught by Rosen because distributing tasks to processors so tasks of the same flow are executed by the same processors can help to achieve low average latency on tasks and a low rate of dropped tasks. Keeping track of different flows by tracking and comparing activity can be used to determine whether preset thresholds have been met to designate a flow as a FAT flow (Rosen: paragraphs 126 and 130).

It would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with deleting when the engine difference exceeds a fourth preset threshold, all the engines in the second alternative engine group as taught by Smith because adjusting the size of sets of nodes within the cluster for executing a plurality of tasks can be based on the number of healthy nodes or a time remaining to complete the job. Estimating the time to complete a job allows task migration to occur in order for the job to be completed within a set threshold amount of time (Smith: paragraphs 94-95).

Therefore, it would have been obvious to one of ordinary skill in the art to which said subject matter pertains before the effective filing date of the claimed invention to combine Sakthivel with wherein the instruction engine selection difference indicates a quantity difference between a first quantity of times of selecting the first selected instruction engine from the first alternative instruction engine group and a second quantity of times of selecting the second selected instruction engine from the second alternative instruction engine group as taught by Chang because load balance procedures can be used when determining whether to migrate one task to another processor. Effective load balancing such as moving tasks from busier processors to idle processors can cause cache coherence overhead reduction (Chang: paragraph 56).

With regards to Claim 18, the method of Claims 7 and 8 performs the same steps as the machine of Claim 18, and Claim 18 is therefore rejected using the same rationale set forth above in the rejection of Claims 7 and 8.

Prior Art Made of Record
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Liu et al. (CN Patent No. CN 109800064 A); teaching a method of thread processing for improving message processing efficiency. An input scheduling module is used for distributing threads into the execution module based on a predetermined scheduling module. An output scheduling module is used according to the context of the thread to determine whether to continue processing the thread and proceed with a thread loop to the input scheduling module.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELINA HU whose telephone number is (571)272-5428. The examiner can normally be reached Monday-Friday 8:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached at (571) 272-3721. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SELINA ELISA HU/               Examiner, Art Unit 2193                                                                                                                                                                                         


/Chat C Do/               Supervisory Patent Examiner, Art Unit 2193
Read full office action
Prosecution Timeline

Apr 28, 2023
Application Filed
May 30, 2023
Response after Non-Final Action
Sep 29, 2025
Non-Final Rejection — §101, §103
Dec 30, 2025
Response Filed
Jan 26, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/895,687
Patent 12585485
Warm migrations for virtual machines in a cloud computing environment
2y 5m to grant Granted Mar 24, 2026
18/020,618
Patent 12563114
CONTENT INITIALIZATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+100.0%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.