Prosecution Insights
Last updated: April 19, 2026
Application No. 17/557,667

GENERAL PURPOSE REGISTER HIERARCHY SYSTEM AND METHOD

Non-Final OA §102§103
Filed
Dec 21, 2021
Examiner
HUISMAN, DAVID J
Art Unit
2183
Tech Center
2100 — Computer Architecture & Software
Assignee
Advanced Micro Devices, Inc.
OA Round
5 (Non-Final)
58%
Grant Probability
Moderate
5-6
OA Rounds
4y 8m
To Grant
92%
With Interview

Examiner Intelligence

Grants 58% of resolved cases
58%
Career Allow Rate
389 granted / 670 resolved
+3.1% vs TC avg
Strong +34% interview lift
Without
With
+33.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
88 currently pending
Career history
758
Total Applications
across all art units

Statute-Specific Performance

§101
6.1%
-33.9% vs TC avg
§103
33.6%
-6.4% vs TC avg
§102
21.5%
-18.5% vs TC avg
§112
31.7%
-8.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 670 resolved cases

Office Action

§102 §103
DETAILED ACTION Claims 1-16 and 18-21 have been examined. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on November 12, 2024, has been entered. Specification The specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification. Claim Objections Claim 16 is objected to because of the following informalities: In line 3, insert --and-- before “comprising” (similar to claim 1), to make it more clear that the second memory device, not the first memory device, comprises a second plurality of GPRs. Appropriate correction is required. Claim Interpretation At least one claim is identified as including non-limiting contingent limitations. “The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.” “The broadest reasonable interpretation of a system (or apparatus or product) claim having structure that performs a function, which only needs to occur if a condition precedent is met, requires structure for performing the function should the condition occur. The system claim interpretation differs from a method claim interpretation because the claimed structure must be present in the system regardless of whether the condition is met and the function is actually performed.” See MPEP 2111.04(II). Regarding claim 13, when the remapping event does not occur, the method is not required to perform the transferring. Thus, claim 13, at its broadest, is not further limiting of the method of claim 9. The examiner recommends amending claim 13 to require the remapping event. For instance, applicant could claim an initial step such as --detecting a remapping event that comprises…--, and then follow with --transferring at least one variable…in response to the remapping event.--. This would require the presence of the remapping event and, thus, the transferring. Claim 14, under broadest reasonable interpretation, is similarly non-limiting of the method of claim 9. Applicant should similar amend as recommended above for claim 13. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-2 and 4-8 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Mohammed et al., “Pilot Register File: Energy Efficient Partitioned Register File for GPUs” (as previously cited by the examiner). Referring to claim 1, Mohammed has taught a system comprising: a first memory device comprising a first plurality of general purpose registers (GPRs) (see FIG.3(b), the abstract, and section III, first paragraph, and note the SRF partition, which comprises a large set of slow registers. These registers are in a GPU adapted for general-purpose computing (see abstract, first sentence, and section I, first paragraph) and, thus, they are general purpose registers); a second memory device distinct from the first memory device and comprising a second plurality of GPRs, wherein: the second memory device has fewer GPRs than the first memory device (see FIGs.3(b) and 6, the abstract, and section III, first paragraph, and note the FRF partition, which comprises a small set of fast registers that are distinct/separate from the SRF registers. The FRF is also distinct from the SRF in that the former operates at near-threshold voltage (NTV) while the latter operates at super-threshold voltage (STV) (see p.590, left column, 2nd to last paragraph). This second plurality also includes GPRs for reasoning given above. FIG.6 shows an example of FRF and SRF sizes), and GPRs of the first plurality of GPRs share a design with GPRs of the second plurality of GPRs (the examiner notes the breadth of this limitation, which encompasses GPRs sharing any single characteristic. For instance, all GPRs are designed to store at least one bit of data (thus, GPRs of both pluralities share this design). From section IV(A) on p.595, GPRs of both pluralities are based on an SRAM design. From sections IV(B) and IV(C), GPRs of both pluralities are implemented using FinFET technology. From TABLE IV, GPRs of both pluralities are designed to have an access energy in a range of 7-8 pJ (see FRFhigh and SRF). These are all examples of a shared design and many more may exist in Mohammed); and a controller circuit configured to store data at the first plurality of GPRs, the second plurality of GPRs, or both based on an expected frequency of access associated with the data (see the description of FIG.6 on p.594. Basically, a compiler does frequency analysis and causes a controller circuit (all actions in a GPU are ultimately performed by circuitry) to store data associated with frequently accessed registers to the FRF. All other register data is stored in the SRF. Mohammed understands that the compiler merely determines expected frequencies. Thus, Mohammed relies on additional runtime analysis to fine tune the allocations of data to FRF and SRF (see section II, 2nd paragraph, and the example of FIG.6). That is, the compiler causes a controller circuit to initially store data based on expected frequencies, and then further analysis at runtime can cause the controller circuit to change how the data is stored if the expected frequencies differ from those observed at runtime (“After the pilot warp finishes execution the highly accessed [registers] reported by the pilot warp profiling (say, R8, R9, R10 and R11 in this example) will replace the highly accessed registers reported by the compiler based profiling.”). Also, see section A1, which describes the compiler analysis differing from the runtime analysis (pilot profiling) by some percentage, meaning, the compiler simply determines expected frequencies). Referring to claim 2, Mohammed has taught the system of claim 1, wherein: the controller circuit is configured to receive the expected frequency of access associated with the data from a compiler that analyzes one or more programs that are to store data using the first memory device, the second memory device, or both (again, see the description of the example for FIG.6). Referring to claim 4, Mohammed has taught the system of claim 1, wherein: the controller circuit is further configured to store at least a portion of the data at the second plurality of GPRs based on GPR requests from programs that request allocation of GPRs of the second plurality of GPRs (see section A1 on p.592. A compiler counts instances of architected register accesses in the program. Thus, wherever the program includes a register access, it is making a request for allocation of a GPR. Since a program is given access to both the FRF and SRF, the requests are requests for allocation of the FRF and SRF (the compiler simply determines which requests are mapped with which of the partitions). Alternatively, there is a finite amount of FRF registers. See FIG.6, for instance, which shows an example FRF having four fast registers. Thus, as long as a program has at least four variables whose accesses are counted, the program can be said to be requesting the four FRF registers). Referring to claim 5, Mohammed has taught the system of claim 1, wherein: the controller circuit is further configured to store the data at the first plurality of GPRs, the second plurality of GPRs, or both based on register rules (see section A1 on p.592. An example rule 1 is the most frequently accessed data based on count is assigned to a register in FRF, or more specifically to register R0 in FRF (e.g. see the last paragraph on p.594, which states “as shown in the table since R8 is the most accessed register then the first entry stores the information that R0 is mapped to R8”). An example rule 2 is the next most frequently accessed data based on count is assigned to a register in FRF, etc. (see FIG.6 as well)). Referring to claim 6, Mohammed has taught the system of claim 5, wherein: the register rules comprise a global rule that no more than a specified number of the second plurality of GPRs be assigned to any one program (see section A1 on p.592 and FIG.6. A global rule is that no more than four most frequently accessed variables are assigned to FRF. This would apply to any program analyzed by the compiler. Another global rule that would apply to any compiled program, i.e., globally, is that the most frequently accessed variable is assigned to FRF, and, more specifically, to R0). Referring to claim 7, Mohammed has taught the system of claim 5, wherein: the register rules comprise a program-specific rule that no more than a specified number of the second plurality of GPRs be assigned to a program indicated by the program-specific rule (there is a rule built into the system that basically indicates that the compiled program is not to be allocated more than four FRF registers (because four is the number of registers in an example FRF shown in FIG.6). This rule is specific to the program being compiled and is, thus, program-specific. Additionally or alternatively, such a rule is specific to program use of variables and allocation therefor. Thus, it is a rule that is program-specific. This would be in contrast to some hardware (non-program specific) rule, for instance, such as a rule to update the hardware swapping table of FIG.7, or a rule to set a mask upon kernel launch (p.593, section B). In general, a processor implements many rules that govern operation and a four FRF register maximum rule may be deemed program-specific (specific to programs and not to hardware operation)). Referring to claim 8, Mohammed has taught the system of claim 1, further comprising: a third memory device separate from the first memory device and from the second memory device and comprising a third plurality of GPRs, wherein the third memory device has fewer GPRs than the second memory device (from FIG.6, the second memory device may be all fours registers in FRF, the first memory device may be at least five registers of the SRF, and the third memory device may be any three or fewer registers of the SRF that are different from the five registers of the first memory device). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 3, 9-10, and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Mohammed in view of Khailany et al., U.S. Patent Application Publication No. 2015/0143061 A1. Referring to claim 3, Mohammed has taught the system of claim 1, but has not taught wherein: accessing one of the first plurality of GPRs consumes more power on average than accessing one of the second plurality of GPRs. Instead, Mohammed has taught the opposite, where the first plurality (slower registers) consume less power than the second plurality (faster registers) (see section III, first paragraph, where the first plurality (FRF) operates at STV and the second plurality operates at NTV to save leakage and dynamic energy. From FIG.1, STV is a higher voltage than NTV. Also, from p.589, right column, “One way to reduce the RF power is to design the RF to operate at a near-threshold voltage (NTV).”). However, Khailany, in a similar field of endeavor, has taught allocating frequently-accessed data to a small number of low-power registers (as opposed to faster registers), and less frequently accessed data to a larger number of high power registers (instead of slower registers) (see FIG.1 and paragraph [0017]). One of ordinary skill in the art would have recognized that Mohammed could be modified in view of Khailany so as to prioritize low power versus high speed for frequently accessed data. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mohammed’s register file to instead have a low-power partition for frequent data and a high-power partition for less frequent data to lower power (instead of increase speed) for the most frequent of data accesses. Referring to claim 9, Mohammed has taught a method comprising: receiving, at a compiler, program data of a program to be executed (see section A1 on p.592); sorting variables of the program into a first set of variables and a second set of variables, wherein the second set of variables are expected to be more frequently accessed by the program than the first set of variables (see FIG.6, section A2, and the description thereof along with the first two lines on p.593. At runtime, the pilot warp sorts access counts to determine updated allocations, if necessary. A second set of variables expected to be frequently accessed are placed into a set corresponding to the FRF and a first set of variables expected to be less-frequently accessed are placed into a set corresponding to the SRF. The frequencies at runtime are also expected frequencies simply because warps may differ in terms of instruction flow (e.g. due to branch divergence), which may yield up to a 5% discrepancy in the number of accesses (see “Code Dynamics” section on p.593). As such, even the frequency counts at run-time are merely expectation-based, and are not actual frequencies. Alternatively, where the claimed sorting is interpreted as being performed by the compiler, the compiler, for the configuration of FIG.6(b), must determine the four most frequently accessed data items so as to initially assign them to the FRF. This requires sorting access counts (involving comparing values against one another so as to generate the sets of values for the FRF and SRF, respectively) as collected in section A1 on p.592. As described above, a compiler does frequency analysis to store data associated with frequently accessed registers to the FRF. All other register data is stored in the SRF. Mohammed understands that the compiler merely determines expected frequencies, and then further analysis at runtime can change how the data is stored if the expected frequencies are inadequate (“After the pilot warp finishes execution the highly accessed reported by the pilot warp profiling (say, R8, R9, R10 and R11 in this example) will replace the highly accessed registers reported by the compiler based profiling.”). Also, see section A1, which describes the compiler analysis differing from the runtime analysis (pilot profiling) by some percentage, meaning, the compiler simply determines expected frequencies); indicating that the first set of variables are to be assigned to a first plurality of general purpose registers (GPRs) of a first memory device (see above and FIG.6, where the first set of variables are indicated as being assigned to registers of the SRF partition. These registers are in a GPU adapted for general purpose computing (see abstract, first sentence, and section I, first paragraph) and, thus, they are general purpose registers); and indicating that the second set of variables are to be assigned to a second plurality of GPRs of a second memory device distinct from the first memory device (see above and FIG.6, where the second set of variables are indicated as being assigned to registers of the FRF partition, which are distinct/separate from SRF registers. The FRF is also distinct from the SRF in that the former operates at near-threshold voltage (NTV) while the latter operates at super-threshold voltage (STV) (see p.590, left column, 2nd to last paragraph). The FRF registers are also GPRs based on reasoning set forth above), wherein GPRs of the first plurality of GPRs share a design with GPRs of the second plurality of GPRs (the examiner notes the breadth of this limitation, which encompasses GPRs sharing any single characteristic. For instance, all GPRs are designed to store at least one bit of data (thus, GPRs of both pluralities share this design). From section IV(A) on p.595, GPRs of both pluralities are based on an SRAM design. From sections IV(B) and IV(C), GPRs of both pluralities are implemented using FinFET technology. From TABLE IV, GPRs of both pluralities are designed to have an access energy in a range of 7-8 pJ (see FRFhigh and SRF). These are all examples of a shared design and many more may exist in Mohammed). Mohammed has not taught wherein accessing one of the first plurality of GPRs consumes more power on average than accessing one of the second plurality of GPRs. Instead, Mohammed has taught that the first plurality (slower registers) take less power than the second plurality (faster registers) (see section III, first paragraph, where the first plurality (FRF) operates at STV and the second plurality operates at NTV to save leakage and dynamic energy. From FIG.1, STV is a higher voltage than NTV. Also, from p.589, right column, “One way to reduce the RF power is to design the RF to operate at a near-threshold voltage (NTV).”). However, Khailany, in a similar field of endeavor, has taught allocating frequently-accessed data to a small number of low-power registers (as opposed to faster registers), and less frequently accessed data to a larger number of high power registers (instead of slower registers) (see FIG.1 and paragraph [0017]). One of ordinary skill in the art would have recognized that Mohammed could be modified in view of Khailany so as to prioritize low power versus high speed for frequently accessed data. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mohammed’s register file to instead have a low-power partition for frequent data and a high-power partition for less frequent data to lower power (instead of increase speed) for the most frequent of data accesses. In other words, it is obvious to modify Mohammed such that accessing one of the first plurality of GPRs consumes more power on average than accessing one of the second plurality of GPRs. Referring to claim 10, Mohammed, as modified, has taught the method of claim 9, wherein: sorting the variables of the program is based on a number of unassigned GPRs of the second plurality of GPRs (under the interpretation where the compiler is performing the sorting, the sorting is for determining which four values will be assigned to the unassigned four registers of the FRF (FIG.6). The system sorts values to find four frequently accessed values because there are four unassigned GPRs (prior to allocation). Thus, the sorting is based on a number of unassigned registers). Referring to claim 13, Mohammed, as modified, has taught the method of claim 9, but has not explicitly taught transferring at least one variable from the second plurality of GPRs to the first plurality of GPRs in response to a remapping event that comprises an indication of overallocation of GPRs of the second plurality of GPRs. However, recall that the compiler initially allocates registers, and additionally allocates the same number of registers to each of multiple threads (p.591, 2nd paragraph). The examiner asserts that any given thread may be allocated a GPR of the second plurality based on compiler analysis. However, as noted in this cited paragraph, the compiler has difficulty predicting which threads would access a particular register the most (this is because, even though the threads are executing the same code, branching conditions may cause threads to execute differently, thereby accessing some registers more than others. As such, one of ordinary skill in the art would understand that the pilot warp of Mohammed, which tries to better determine access counts at runtime, may notice that a first thread that was allocated a GPR of the second plurality is not actually frequently accessing that GPR, but a second thread that was allocated a GPR of the first plurality is accessing it with higher frequency. In other words, the system indicates that the compiler over-allocated a GPR of the second plurality to the first thread, and, in response, the register variables are swapped so that the second thread can access its data with lower power. As such, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mohammed for transferring at least one variable from the second plurality of GPRs to the first plurality of GPRs in response to a remapping event that comprises an indication of overallocation of GPRs of the second plurality of GPRs). Referring to claim 14, Mohammed, as modified, has taught the method of claim 9, further comprising: transferring at least one variable from the first plurality of GPRs to the second plurality of GPRs in response to a remapping event that comprises an indication of deallocation of GPRs of the second plurality of GPRs (see FIGs.6(b)-(c) and the description of these FIGs. Basically, a compiler initially assigns data corresponding to registers R4-R7 in the program to the FRF (second plurality of) registers (meaning, for example, the data meant for R4 is actually stored in R0, which is a physically faster register than R4 (and this mapping is maintained in the table of FIG.7)). Requests for R4-R7 at runtime are then forwarded to the FRF. However, when the pilot warp finishes its access count collection, the allocations are updated, causing R8-R11, which are most frequently accessed, to be sent to the FRF. This happens in response to some indication, which is an indication of deallocation because registers R0-R3 (in the example above) are deallocated from storing the variables therein so that the values in R8-R11 can be respectively transferred to R0-R3). Alternatively, referring to claims 13-14, under broadest reasonable interpretation based on their contingent limitations, the claims set forth no limitation beyond those in claim 9. Thus, they are both rejected for similar reasoning as claim 9. Referring to claim 15, Mohammed, as modified, has taught the method of claim 9, wherein: the program indicates a requested number of the second plurality of GPRs to be assigned (a program, prior to compilation, includes a number of variables therein. As long as the program includes enough requests to utilize all four FRF registers (using the configuration of FIG.6), then the program can be said to indicate that it requests use of the maximum number of FRF registers), and wherein sorting the variables of the program is based on the requested number (if the program requests use of the maximum number of FRF registers, the sorting occurs to determine the maximum number of frequently-accessed variables to fully utilize the FRF). Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Mohammed in view of Khailany and MIT, “Sorting”. Referring to claim 11, Mohammed, as modified, has taught the method of claim 9, but has not explicitly taught wherein: sorting the variables of the program is based on comparing the respective expected frequency of accesses of the variables to an access frequency threshold. However, MIT has taught a number of sorting algorithms that include comparison of values to a threshold. For instance, selection sort compares values to a threshold (a[min]), which may change over time. However, even if the threshold changes depending on values to be sorted, values are still compared to a threshold. Alternatively, insertion sort compares values to a threshold (v), which stays fixed for each comparison in a given iteration of the for loop. Selection sort and insertion sort are both disclosed as being simple/easy to implement. As a result, for simple sorting, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mohammed’s sorting, either at compile-time or runtime, to be based on comparing the respective expected frequency of accesses of the variables to an access frequency threshold, as is the case in the described algorithms. Referring to claim 12, Mohammed, as modified, has taught the method of claim 11, further comprising: adjusting the access frequency threshold based on a number of unassigned GPRs of the second plurality of GPRs (again, under the interpretation that the sorting is performed by the compiler, the sorting (and associated adjustment of threshold) occurs because the system, as modified, includes a number of unassigned low-power GPRs. Thus, the threshold in the selected sorting algorithm is adjusted as a result of there being a number of unassigned GPRs to which sorted values are to be allocated). Claims 16-21 are rejected under 35 U.S.C. 103 as being unpatentable over Mohammed in view of Han et al., U.S. Patent Application Publication No. 2018/0018299 A1. Referring to claim 16, Mohammed has taught a processing unit comprising: a first memory device comprising a first plurality of general purpose registers (GPRs) (see FIG.3(b), the abstract, and section III, first paragraph, and note the FRF partition of registers, which comprises a small set of fast registers. These registers are in a GPU adapted for general purpose computing (see abstract, first sentence, and section I, first paragraph) and, thus, they are general purpose registers. Alternatively, the first memory device may include one FRF bank (the first plurality of registers) and one SRF bank, where both banks correspond to a single core/thread (e.g. the leftmost FRF and SRF banks in FIG.3(b). Also, see p.592, first full paragraph)); a second memory device distinct from the first memory device comprising a second plurality of GPRs (see FIGs.3(b) and 6, the abstract, and section III, first paragraph, and note the SRF partition of registers (or a portion thereof), which comprises a large set of slow registers that are distinct from (separate from) the registers of the FRF. The FRF is also distinct from the SRF in that the former operates at near-threshold voltage (NTV) while the latter operates at super-threshold voltage (STV) (see p.590, left column, 2nd to last paragraph). The SRF registers are also GPRs for similar reasoning given above. Alternatively, the second memory device may include another FRF bank and another SRF bank (the second plurality of registers), which are physically distinct from the first memory device, and where both banks correspond to a single core (e.g. the 2nd leftmost FRF and SRF banks in FIG.3(b). Also see p.592, first full paragraph)), wherein: accessing one of the first plurality of GPRs consumes more power on average than accessing one of the second plurality of GPRs (see section III, first paragraph, where the first plurality (FRF registers) operates at STV and the second plurality (SRF registers) operates at NTV to save leakage and dynamic energy. From FIG.1, STV is a higher voltage than NTV. Also, from p.589, right column, “One way to reduce the RF power is to design the RF to operate at a near-threshold voltage (NTV).”); and GPRs of the first plurality of GPRs share a design with GPRs of the second plurality of GPRs (the examiner notes the breadth of this limitation, which encompasses GPRs sharing any single characteristic. For instance, all GPRs are designed to store at least one bit of data (thus, GPRs of both pluralities share this design). From section IV(A) on p.595, GPRs of both pluralities are based on an SRAM design. From sections IV(B) and IV(C), GPRs of both pluralities are implemented using FinFET technology. From TABLE IV, GPRs of both pluralities are designed to have an access energy in a range of 7-8 pJ (see FRFhigh and SRF). These are all examples of a shared design and many more may exist in Mohammed); and a plurality of engines (FIG.3, cores (or any components therein to execute thousands of threads (p.589, section I))) configured to execute programs using data stored at the first memory device, the second memory device, or both (see the abstract. Frequently-accessed and less-frequently accessed data of programs are stored in these register file partitions (as shown in FIG.6 so that the engines may operate on this data (e.g. see FIG.3 where register data goes to cores for processing))). Mohammed has not taught that the processing unit is a shader processing unit, nor that the engines are shader engines. However, Han has taught a graphics processor (FIGs.1-2, GPU 12) with a shader core with multiple lanes/ALUs (e.g. shader engines) that access multiple data items in, and store results to, a general-purpose register file (56), to carry out parallel processing for various types of shading, or for non-graphics applications (see paragraphs [0080]-[0086]). That is, a shader unit is not limited to shading, but can be used to perform non-graphics workloads that are also highly parallel in nature (much like Mohammed uses a GPU to execute general-purpose applications that are highly-threaded). However, having a processing unit in Mohammed be a shader processing unit would give Mohammed the ability to perform various types of shading for graphics applications, when desired, while retaining the ability to perform non-graphics/general-purpose workloads that are highly parallel. As a result, in order to realize shading functionality in Mohammed, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mohammed such that the processing unit is a shader processing unit and the engines are shader engines. Referring to claim 17, Mohammed, as modified, has taught the shader processing unit of claim 16, further comprising: a shader controller to move the data between a system memory and the first plurality of GPRs, the second plurality of GPRs, or both based on an expected frequency of access associated with the data (the first plurality may be four fast registers, the second plurality may be the portion of the SRF corresponding to registers R4 to R7 (see FIG.6(b)), and the system memory may be the portion of SRF corresponding to registers R8 to R11. Based on run-time pilot warp information which determines run-time access frequencies (see description of FIG.6), data may be moved from system memory (the portion of the SRF corresponding to registers R8-R11) to the first plurality of fast registers (see FIG.6(c)). Whichever circuitry is involved in this movement is part of a “shader controller”). Referring to claim 18, Mohammed, as modified, has taught the shader processing unit of claim 17, wherein: the shader controller is further to move the data between the first and second memory devices and the plurality of shader engines (from FIG.3(b), the data in the registers is moved from the registers to the engines (cores or portions therein) so as to process the data. Whichever circuitry is responsible for moving data to/from registers is part of the “shader controller”. This could include the operand buffer shown in FIG.3(b), which collects operands from the registers before issuing them to the execute stage (see p.596, first full paragraph)). Referring to claim 19, Mohammed, as modified, has taught the shader processing unit of claim 18, wherein: the shader controller is to move first data from the first memory device to a first shader engine concurrently with moving second data from the second memory device to a second shader engine (se FIG.3(b) and p.592, first full paragraph. Using the alternative interpretation in the rejection of claim 16, one request is active for any bank (memory device), meaning there may be multiple current requests in bank 0, bank 1, bank 2, etc. so that each thread can simultaneously operate in parallel (as desired according to the abstract and section I. If multiple banks (devices) could not be accessed simultaneously, the parallel nature of the system would be greatly decreased). Referring to claim 20, Mohammed, as modified, has taught the shader processing unit of claim 17, further comprising: a shader compiler (the compiler of Mohammed, which compiles programs for execution on a GPU with shader functionality, as modified, is a shader compiler) to: compile one or more programs that use the data to be stored at the first memory device, the second memory device, or both (again, see section A1 on p.592); determine the expected frequency of access associated with the data based on a weighting process (see section A1 on p.592. The compiler determines the expected frequencies according to access counts, which are weights, as higher counts/weights indicate more heavily accessed data, whereas lower counts/weights indicate lightly accessed data. As described above, see description of FIG.6 on p.594. Basically, a compiler does frequency analysis to allow storage of data associated with frequently accessed registers to the FRF. All other register data is stored in the SRF. Mohammed understands that the compiler merely determines expected frequencies. Thus, Mohammed relies on additional runtime analysis to fine tune the allocation to FRF and SRF (see section II, 2nd paragraph, and the example of FIG.6). That is, the compiler causes initial storage of data based on expected frequencies, and then further analysis at runtime can cause a change in how the data is stored if the expected frequencies are inadequate (“After the pilot warp finishes execution the highly accessed reported by the pilot warp profiling (say, R8, R9, R10 and R11 in this example) will replace the highly accessed registers reported by the compiler based profiling.”)); and assign GPRs of the first plurality of GPRs, the second plurality of GPRs, or both to the one or more programs based on the expected frequency of access (see the description of FIG.6 on p.594). Referring to claim 21, Mohammed, as modified, has taught the shader processing unit of claim 20, further comprising: a third memory device separate from the first memory device and from the second memory device and comprising a third plurality of GPRs (the third memory device may comprise a subset of the FRF. For instance, the third plurality may be R2-R3, the second plurality may be R0-R1, and the first plurality may be R4-R15 (if using the example of Figure 6). Thus, each plurality, by including different registers, is separate from the other pluralities), wherein accessing one of the second plurality of GPRs consumes more power on average than accessing one of the third plurality of GPRs (since the second and third pluralities are both implemented as part of the FRF (lower-power registers, as modified), the registers thereof that are accessed more will consume more power on average over numerous program runs. As shown in the Example on p.594, which includes the description of Figure 7, the most-accessed variable is assigned to register R0, and any other data in the FRF would be accessed less often. As such, R0 for a given program run is accessed more than R13, for instance, and will therefore will consume more power on average over time. For instance, if accessing R0 consumes X power, and accessing R3 consumes X power (due to them both having the same hardware implementation), then if R0 is accessed on average 1000 times during each program run for a total power consumption of 1000X and R3 is accessed on average 800 times during each program run for a total power consumption of 800X, then it can be seen that, on average, accessing one of the second plurality consumes more power than access one of the third plurality), and wherein the shader compiler is further to assign GPRs of the third plurality of GPRs to the one or more programs based on the expected frequency of access (again, see the description of FIG.6 on p.594). --------------------------------------------------------------------------------------------------------------------- Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-10 and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Khailany et al., U.S. Patent Application Publication No. 2015/0143061 A1, in view of Intel, “IA-64 Application Developer’s Architecture Guide”. Referring to claim 1, Khailany has taught a system (FIG.1, 100) comprising: a first memory device comprising a first plurality of registers (see FIG.1, device 110 (or a portion thereof), also shown as device 204 (FIG.2), which has 240 registers); a second memory device distinct from the first memory device and comprising a second plurality of registers, wherein: the second memory device has fewer registers than the first memory device (see FIG.1, device 108 (or a portion thereof), also shown as device 202 (FIG.2), which has 16 registers. These 16 registers are not part of the 240 registers of the first device; thus, the first and second memory devices are distinct. Further, from column 1, line 61, to column 2, line 5, the first memory device comprises SRAM while the second memory device comprises an array of latches; thus, the memory devices are distinct for this reason as well); and a controller circuit configured to store data at the first plurality of registers, the second plurality of registers, or both based on an expected frequency of access associated with the data (see paragraph [0017]. The examiner notes that only when the program executes at runtime will the frequencies be actual frequencies. Prior to that, e.g. at compile time, the determined frequencies are simply what is expected at runtime. Even if the frequencies determined at compile time end up matching those determined at runtime, they are still only expected frequencies at the time of compilation. Furthermore, note that ultimately, all actions are performed by circuitry in a processor. When data is stored in registers at runtime, this is done by a “controller circuit”). Khailany has not explicitly taught that the registers of the first and second pluralities are general purpose registers (GPRs). However, Intel has taught a general-purpose register file (p.3-1, section 3.1, 1st bullet, and p.3-3, Figure 3-1) with general-purpose registers used to store integer values, e.g. operand values and results of various instructions shown in chapter 7 (e.g. a logical ADD operation on p.7-3 will obtain operands from general purpose registers and store a result to a general-purpose register. Similarly, a logical AND operation on p.7-6 will identify general-purpose registers to obtain operands and store a result). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany’s register file 106 to be a general-purpose register file such that the first and second pluralities are general purpose registers (GPRs). The motivation to do so would be to provide storage for general-purpose processing while realizing lower energy consumption for more frequently accessed values used in general-purpose processing, such as arithmetic and logical processing indicated in paragraph [0012] of Khailany. Khailany, as modified, has further taught GPRs of the first plurality of GPRs share a design with GPRs of the second plurality of GPRs (the examiner notes the breadth of this limitation, which encompasses GPRs sharing any single characteristic. For instance, all GPRs are designed to store at least one bit of data (thus, GPRs of both pluralities share this design). All GPRs are made of transistors and, thus, share a transistor-based design. These are just examples of a shared design and many more may exist in Khailany (e.g. the both require voltage to operate and/or require energy to access)). Referring to claim 2, Khailany, as modified, has taught the system of claim 1, wherein: the controller circuit is configured to receive the expected frequency of access associated with the data from a compiler that analyzes one or more programs that are to store data using the first memory device, the second memory device, or both (see at least paragraphs [0017] and [0004]. The frequencies are generated to be received by something to control allocation. Again, only circuitry actually performs actions in a system. Thus, the circuitry receiving these frequency values to control allocation is part of “the controller circuit”). Referring to claim 3, Khailany, as modified, has taught the system of claim 1, wherein: accessing one of the first plurality of GPRs consumes more power on average than accessing one of the second plurality of GPRs (see paragraph [0015]). Referring to claim 4, Khailany, as modified, has taught the system of claim 1, wherein: the controller circuit is further configured to store at least a portion of the data at the second plurality of GPRs based on GPR requests from programs that request allocation of GPRs of the second plurality of GPRs (see paragraph [0035]. In one embodiment, 25% of each thread’s (sub-program’s) requests to store data will be mapped to the second plurality of GPRs). Referring to claim 5, Khailany, as modified, has taught the system of claim 1, wherein: the controller circuit is further configured to store the data at the first plurality of GPRs, the second plurality of GPRs, or both based on register rules (see paragraph [0035]. For a given thread, all of its data may be stored in the second plurality, or in the first and second pluralities, depending on the implemented rules governing system operation. Additionally, rules include “allocate frequently accessed data to low-power registers” and “allocate less-frequently accessed data to high-power registers”). Referring to claim 6, Khailany, as modified, has taught the system of claim 5, wherein: the register rules comprise a global rule that no more than a specified number of the second plurality of GPRs be assigned to any one program (based on FIG.2, one rule is that no more than 16 registers of the second plurality can be assigned to any one program (because there are only 16 registers (R0-R15) in the second plurality). Alternatively, from paragraph [0035], a rule applied to all threads (global rule) is that no more than a number of low-power registers corresponding to 25% for each thread may be assigned to that thread). Referring to claim 7, Khailany, as modified, has taught the system of claim 5, wherein: the register rules comprise a program-specific rule that no more than a specified number of the second plurality of GPRs be assigned to a program indicated by the program-specific rule (based on FIG.2, one rule is that no more than 16 registers of the second plurality can be assigned to any one program (because there are only 16 registers (R0-R15) in the second plurality). Alternatively, from paragraph [0035], a rule applied to a given thread (program-specific rule) is that no more than a number of low-power registers corresponding to 25% for that thread may be assigned to that thread. Alternatively, or in addition, because such a rule relates to allocation for a program, it is a program-specific rule, which may be different from a rule that doesn’t necessarily apply to programs, but to hardware, e.g. a rule that 32 instructions can be issued each cycle to 32 lanes (paragraph [0032]). Referring to claim 8, Khailany, as modified, has taught the system of claim 1, further comprising: a third memory device separate from the first memory device and from the second memory device and comprising a third plurality of GPRs, wherein the third memory device has fewer GPRs than the second memory device (the third memory device is any subset of 110/204 that is not considered part of the first memory device. For example, the third memory device could include the final 16 registers (R240-R255). This third memory device is separate from the second memory device 108/202 and from the first memory device, which could include any register subset that includes 17 or more registers within R16-R239. As a basic example, the first device would include registers R16-R239, the second device would include registers R0-R15, and the third device would include register R240-R255). Referring to claim 9, Khailany has taught a method comprising: receiving, at a compiler, program data of a program to be executed (from paragraph [0017], a compiler receives source code to be executed); sorting variables of the program into a first set of variables and a second set of variables, wherein the second set of variables are expected to be more frequently accessed by the program than the first set of variables (see paragraphs [0004] and claim 6. The examiner notes that only when the program executes at runtime will the frequencies be actual frequencies. Prior to that, e.g. at compile time, the determined frequencies are simply what is expected at runtime. Even if the frequencies determined at compile time match those determined at runtime, they are still expected frequencies at the time of compilation); indicating that the first set of variables are to be assigned to a first plurality of registers of a first memory device (see paragraphs [0004] and [0017] and claim 6. Less frequently accessed variables are assigned to later addresses in the namespace, i.e., to first memory device 204); and indicating that the second set of variables are to be assigned to a second plurality of registers of a second memory device distinct from the first memory device (see paragraphs [0004] and [0017] and claim 6. More frequently accessed variables are assigned to earlier addresses in the namespace, i.e., to second memory device 202. These 16 registers of the second memory device are not part of the 240 registers of the first memory device; thus, the first and second memory devices are distinct. Further, from column 1, line 61, to column 2, line 5, the first memory device comprises SRAM while the second memory device comprises an array of latches; thus, the memory devices are distinct for this reason as well), wherein accessing one of the first plurality of registers consumes more power on average than accessing one of the second plurality of registers (see paragraph [0015], the idea being to assign variables that are accessed more frequently to lower power registers). Khailany has not explicitly taught that the registers of the first and second pluralities of registers are general purpose registers (GPRs). However, Intel has taught a general-purpose register file (p.3-1, section 3.1, 1st bullet, and p.3-3, Figure 3-1), which is used to store integer values, e.g. operand values and results of various instructions shown in chapter 7 (e.g. a logical ADD operation on p.7-3 will obtain operands from general purpose registers and store a result to a general-purpose register. Similarly, a logical AND operation on p.7-6 will identify general-purpose registers to obtain operands and store a result). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany’s register file 106 to be a general-purpose register file such that the first and second pluralities of registers are general purpose registers (GPRs). The motivation to do so would be to provide storage for general-purpose processing while realizing lower energy consumption for more frequently accessed values used in general-purpose processing (e.g. values used in arithmetic (e.g. ADD) and logical (e.g. AND) processing). Khailany, as modified, has further taught GPRs of the first plurality of GPRs share a design with GPRs of the second plurality of GPRs (the examiner notes the breadth of this limitation, which encompasses GPRs sharing any single characteristic. For instance, all GPRs are designed to store at least one bit of data (thus, GPRs of both pluralities share this design). All GPRs are made of transistors and, thus, share a transistor-based design. These are just examples of a shared design and many more may exist in Khailany (e.g. the both require voltage to operate and/or require energy to access)). Referring to claim 10, Khailany, as modified, has taught the method of claim 9, wherein: sorting the variables of the program is based on a number of unassigned GPRs of the second plurality of GPRs (the sorting determines which 16 variables will be assigned to the 16 unassigned registers of the second plurality (R0-R15). The system sorts values to find 16 frequently accessed values because there are 16 unassigned GPRs (prior to allocation). Thus, the sorting is based on unassigned registers). Referring to claims 13-14, under broadest reasonable interpretation based on their contingent limitations, the claims set forth no limitation beyond those in claim 9. Thus, they are both rejected for similar reasoning as claim 9. Referring to claim 15, Khailany, as modified, has taught the method of claim 9, wherein: the program indicates a requested number of the second plurality of GPRs to be assigned, and wherein sorting the variables of the program is based on the requested number (a program, prior to compilation, includes a number of variables therein. As long as the program includes enough requests to utilize all low-power registers, then the program can be said to indicate that it requests use of the maximum number of low-power registers), and wherein sorting the variables of the program is based on the requested number (if the program requests use of the maximum number of FRF registers, the sorting occurs to determine the maximum number of frequently-accessed variables to fully utilize the FRF). Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Khailany in view of Intel and MIT, “Sorting”. Referring to claim 11, Khailany, as modified, has taught the method of claim 9, but has not explicitly taught wherein: sorting the variables of the program is based on comparing the respective expected frequency of accesses of the variables to an access frequency threshold. However, MIT has taught a number of sorting algorithms that include comparison of values to a threshold. For instance, selection sort compares values to a threshold (a[min]), which may change over time. However, even if the threshold changes depending on values to be sorted, values are still compared to a threshold. Alternatively, insertion sort compares values to a threshold (v), which stays fixed for each comparison in a given iteration of the for loop. Selection sort and insertion sort are both disclosed as being simple/easy to implement. As a result, for simple sorting, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany’s sorting of the variables to be based on comparing the respective expected frequency of accesses of the variables to an access frequency threshold, as is the case in the described algorithms. Referring to claim 12, Khailany, as modified, has taught the method of claim 11, further comprising: adjusting the access frequency threshold based on a number of unassigned GPRs of the second plurality of GPRs (from claim 6 of Khailany, the sorting (and associated adjustment of threshold) occurs because the system, as modified, includes a number of unassigned low-power GPRs. Thus, the threshold in the selected sorting algorithm is adjusted as a result of there being a number of unassigned GPRs to which sorted values are to be allocated). Claims 13-14 are alternatively rejected under 35 U.S.C. 103 as being unpatentable over Khailany in view of Intel and Pattnaik et al., U.S. Patent Application Publication No. 2019/0354371 A1. Referring to claim 13, Khailany has taught the method of claim 9, but has not taught transferring at least one variable from the second plurality of GPRs to the first plurality of GPRs in response to a remapping event that comprises an indication of overallocation of GPRs of the second plurality of GPRs. However, note that Khailany does map different registers to different threads (column 2, lines 5-8). As such, Khailany is understood to analyze access frequencies of multiple threads at compile time and allocate registers accordingly. One of ordinary skill in the art would recognize that any thread could be allocated any combination of registers as this is entirely dependent on the program written, data being operated on, and other runtime conditions, all of which may vary practically an infinite number of ways. Furthermore, Pattnaik has taught tracking register usage for threads and stopping execution to move a variable from a higher-performance register (in a fully-ported register group) to a lower-performance register (in a fewer-ported register group) when that variable’s access frequency falls outside of the top N, where N is the number of high-performance registers. This allows for dynamic assignment of data to different types of registers based on actual frequency during execution so as to improve performance. See FIG.5 and the description thereof. As applied to Khailany, a similar performance improvement would be realized by stopping execution to move less-frequently accessed data from a low-power register to a higher-power register, to allow more-frequently accessed data to move into a low-power register. In other words, the system would realize that a particular thread was over-allocated at least one GPR from the second plurality, and, in response, the particular thread’s variables in the at least one GPR would be transferred to a GPR of the first plurality so as to reduce power for more frequently accessed items of the same or another thread. This is particularly useful in a dynamic environment, where actual data access frequency can be different than initially predicted by the compiler. For instance, some data may or may not be accessed as frequently as expected based on whether or not branches are taken. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany for transferring at least one variable from the second plurality of GPRs to the first plurality of GPRs in response to a remapping event that comprises an indication of overallocation of GPRs of the second plurality of GPRs. Referring to claim 14, Khailany, as modified, has taught the method of claim 9, but has not taught transferring at least one variable from the first plurality of GPRs to the second plurality of GPRs in response to a remapping event that comprises an indication of deallocation of GPRs of the second plurality of GPRs. However, Pattnaik has taught tracking register usage and stopping execution to move a variable from a higher-performance register (in a fully-ported register group) to a lower-performance register (in a fewer-ported register group) when that variable’s access frequency falls outside of the top N, where N is the number of high-performance registers. This allows for dynamic assignment of data to different types of registers based on actual frequency during execution so as to improve performance. See FIG.5 and the description thereof. As applied to Khailany, a similar performance improvement would be realized by stopping execution to move less-frequently accessed data from a low-power register to a higher-power register, to allow more-frequently accessed data to move into a low-power register. This movement would occur in response to a signal to perform the movement, which is also an indication of deallocation because the GPRs of the second plurality will be deallocated from storing their current data. This is particularly useful in a dynamic environment, where actual data access frequency can be different than initially predicted by the compiler. For instance, some data may or may not be accessed as frequently as expected based on whether or not branches are taken. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany for transferring at least one variable from the first plurality of GPRs to the second plurality of GPRs in response to a remapping event that comprises an indication of deallocation of GPRs of the second plurality of GPRs. Claim 15 is alternatively rejected under 35 U.S.C. 103 as being unpatentable over Khailany in view of Intel and Borole et al., U.S. Patent Application Publication No. 2021/0065779 A1. Referring to claim 15, Khailany has taught the method of claim 9, but, under a second interpretation where the program includes an explicit indication that low-power registers are to be assigned, Khailany has not taught wherein: the program indicates a requested number of the second plurality of GPRs to be assigned, and wherein sorting the variables of the program is based on the requested number. However, Borole has taught that a program may indicate that it is high priority, which means it is assigned to the lowest power registers. See paragraphs [0150]-[0154]. An indication by a program that it is high priority is an indication to the compiler that its registers (a requested number of registers) are to be low-power registers, as this will result in the task, whose data is to be accessed more frequently, i.e., with priority, running with lower power. If implemented in Khailany, this would affect sorting, because the sorting can’t just take into account the frequencies, but must also take into account priority, where lower priority data may be frequently accessed but be mapped to a lower performance register to make room for high priority data. Therefore, in order to allow high-priority tasks to run at low power, even if its data is not as frequently accessed as some data in lower power tasks, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany such that the program indicates a requested number of the second plurality of GPRs to be assigned, and wherein sorting the variables of the program is based on the requested number. Claims 16-18 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Khailany in view of Han et al., U.S. Patent Application Publication No. 2018/0018299 A1. Referring to claim 16, Khailany has taught a processing unit comprising: a first memory device comprising a first plurality of registers (see FIG.4, device 356, also shown as device 204 (FIG.2), which has 240 registers); a second memory device distinct from the first memory device comprising a second plurality of registers (see FIG.4, at least a portion of device 354, also shown as device 202 (FIG.2), which has 16 registers. These 16 registers are not part of the 240 registers of the first device; thus, the first and second memory devices are distinct. Further, from column 1, line 61, to column 2, line 5, the first memory device comprises SRAM while the second memory device comprises an array of latches; thus, the memory devices are distinct for this reason as well), wherein accessing one of the first plurality of registers consumes more power on average than accessing one of the second plurality of registers (see paragraph [0015] and the abstract); a plurality of engines (FIG.4, engines 360 (or any portion thereof)) configured to execute programs using data stored at the first memory device, the second memory device, or both (engines execute program instructions using data in register file 352, which includes the aforementioned devices (also see paragraph [0034], 1st sentence)); and a system memory (e.g. memory 104 and paragraph [0013]). Khailany has also not taught that the processing unit is a shader processing unit, nor that the engines are shader engines, nor that the registers of the first and second pluralities are general purpose registers (GPRs). However, Han has taught a graphics processor (FIGs.1-2, GPU 12) with a shader core with multiple shader engines (e.g. ALUs/lanes) that access multiple data items in, and store results to, a general-purpose register file (56), to carry out parallel processing for various types of shading (on graphics or otherwise) (see paragraphs [0080]-[0086]). The examiner notes that Khailany’s invention applies to graphics processor architecture (e.g. see at least paragraphs [0042], [0002], and [0012]). As a result, in order to allow Khailany to perform shading for various applications (e.g. for graphics/image rendering) in a more power-efficient way using the inventive register file of Khailany, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany such that the processing unit is a shader processing unit, the engines are shader engines, and the registers of the first and second pluralities are general purpose registers (GPRs). Khailany, as modified, has further taught GPRs of the first plurality of GPRs share a design with GPRs of the second plurality of GPRs (the examiner notes the breadth of this limitation, which encompasses GPRs sharing any single characteristic. For instance, all GPRs are designed to store at least one bit of data (thus, GPRs of both pluralities share this design). All GPRs are made of transistors and, thus, share a transistor-based design. These are just examples of a shared design and many more may exist in Khailany (e.g. the both require voltage to operate and/or require energy to access)). Khailany, as modified, has also not explicitly a shader controller configured to move the data between the system memory and the first plurality of GPRs, the second plurality of GPRs, or both based on an expected frequency of access associated with the data. However, note that the system memory may include disk (paragraph [0013]). Storing to disk allows for long-term storage of data for later retrieval (e.g. for saving and/or resuming work). To get data to disk, it must be moved from the registers (by a “shader controller”) to a memory/disk controller 306, which would further transmit the data to memory/disk. Thus, storage to disk in Khailany is based on register contents. And, the register contents are based on expected frequency of access (the examiner notes that only when the program executes at runtime will the frequencies be actual frequencies. Prior to that, e.g. at compile time, the determined frequencies are simply what is expected at runtime. Even if the frequencies determined at compile time match those determined at runtime, they are still expected frequencies at the time of compilation). Thus, storage to disk is based on expected frequency of access. Further, for a later retrieval from disk (e.g., to restore a system), the data from disk would be loaded into the registers according to their frequency of access, as taught by Khailany. As a result, to allow for long-term storage and subsequent retrieval, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany to include a shader controller configured to move the data between the system memory and the first plurality of GPRs, the second plurality of GPRs, or both based on an expected frequency of access associated with the data. Referring to claim 18, Khailany, as modified, has taught the shader processing unit of claim 16, wherein: the shader controller is further to move the data between the first and second memory devices and the plurality of shader engines (the shader engines 360 operate on data in the registers (memory devices 354 and 356). As such, input data must be moved from the devices to the engines, and calculation results would be moved from the engines to the memory devices (also recall Han, paragraphs [0080]-[0086]). Again, whichever circuitry is responsible for moving data to/from registers (whether to the engines, or to another memory controller, is part of the “shader controller”). Referring to claim 20, Khailany, as modified, has taught the shader processing unit of claim 16, further comprising: a shader compiler to: compile one or more programs that use the data to be stored at the first memory device, the second memory device, or both (see paragraphs [0004] and [0017] and claim 6); determine the expected frequency of access associated with the data based on a weighting process (see paragraphs [0004]. The weighting process is the process used to determine the expected frequency (e.g. number determination and division). As described above, compiler-determined frequencies are expected frequencies, because actual frequencies are not determinable until runtime); and assign GPRs of the first plurality of GPRs, the second plurality of GPRs, or both to the one or more programs based on the expected frequency of access (see paragraphs [0004] and [0017] and claim 6). Referring to claim 21, Khailany, as modified, has taught the shader processing unit of claim 20, further comprising: a third memory device separate from the first memory device and from the second memory device and comprising a third plurality of GPRs (the third memory device may comprise a subset of structure 202. For instance, the third plurality may be R8-R15, the second plurality may be R0-R7, and the first plurality may be R16-R255. Thus, each plurality, by including different registers, is separate from the other pluralities), wherein accessing one of the second plurality of GPRs consumes more power on average than accessing one of the third plurality of GPRs (since the second and third pluralities are both implemented as part of structure 202 (with low-power latches), the registers thereof that are accessed more will consume more power on average over numerous program runs. From paragraph [0028], the registers are allocated starting with R0 and working towards the other end, assigning R0 to the most frequently accessed variable, assigning R1 to the next most frequently accessed variable, and so on. As such, R0 for a given program run is accessed more than R15, for instance, and will therefore will consume more power on average over time. For instance, if accessing R0 consumes X power, and accessing R15 consumes X power (due to them both having the same hardware implementation), then if R0 is accessed on average 1000 times during each program run for a total power consumption of 1000X and R15 is accessed on average 800 times during each program run for a total power consumption of 800X, then it can be seen that, on average, accessing one of the second plurality consumes more power than access one of the third plurality), and wherein the shader compiler is further to assign GPRs of the third plurality of GPRs to the one or more programs based on the expected frequency of access (again, see paragraphs [0004] and [0017] and claim 6). Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Khailany in view of Han and Intel. Referring to claim 19, Khailany, as modified, has taught the shader processing unit of claim 18, but has not taught the shader controller is to move first data from the first memory device to a first shader engine concurrently with moving second data from the second memory device to a second shader engine. However, from paragraph [0033]-[0035], multiple threads are executed in parallel, e.g. on different engines. Further, from paragraph [0035], a given thread may include some registers in the first memory device and other registers in the second memory device. While Khailany has taught arithmetic operations (e.g. paragraph [0012], no specific instruction has been taught. However, Intel has taught arithmetic instructions, such as ADD (p.7-3), which take in two source data items taken (e.g. r2/r3) from the register file concurrently so as to accumulate them. In order to allow Khailany to perform addition, a well-known arithmetic operation, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany to include a two source ADD instruction to be executed by the threads. Furthermore, one of ordinary skill in the art would have recognized that two registers required at the same time by any given thread could be from either memory device. For instance, one piece of data to be added may be frequently used data whereas other data to be added may be infrequently used data (thus data from both memory devices would be moved to the engines, i.e., ALU inputs, concurrently). This is simply a matter of the program being run and the data being operated upon, but any permutation would have been apparent to one of skill in the art. As a result, in order to operate on frequently and infrequently accessed data at the same time (any given ADD could add two frequently used values, two infrequently-used values, or one of each), it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Khailany such that the shader controller is to move first data from the first memory device to a first shader engine concurrently with moving second data from the second memory device to a second shader engine. Response to Arguments On page 8 of applicant’s response, applicant argues that Mohammed discloses partitioning a single register file into two partitions and not two distinct memory devices such as two distinct register files. The examiner disagrees that Mohammed has not taught two distinct memory devices for reasons set forth in the rejections. On page 9 of applicant’s response, applicant argues that Mohammed has not taught the third memory device as claimed in claim 8 because Mohammed makes no mention of first, second, and third memory devices that are separate from one another. This is not persuasive because applicant has not addressed the examiner’s reasoning set forth in the rejection of claim 8. The examiner has explained what the three memory devices are in Mohammed. With respect to the argument for claims 9 and 16 on pages 9-10 and 11-12, respectively, of applicant’s response, the argument is not persuasive for similar reasoning given above. As such, there are no deficiencies in Mohammed (with respect to separate memory devices and a shared design) for Khailany/Han to remedy. On pages 12-13 of applicant’s response, applicant argues that Khailany does not teach features of claim 1 as amended and Intel does not remedy Intel’s deficiencies. The examiner disagrees that Khailany has not taught the distinct memory devices and a shared design for reasons set forth in the rejections. As such, Khailany is not deficient in this regard so as to require Intel to remedy the deficiencies. With respect to the argument for claim 16 on page 15 of applicant’s response, the argument is not persuasive for similar reasoning given above. As such, there are no deficiencies in Khailany (with respect to separate memory devices and a shared design) for Han to remedy. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168. The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /David J. Huisman/Primary Examiner, Art Unit 2183
Read full office action

Prosecution Timeline

Dec 21, 2021
Application Filed
Feb 23, 2023
Non-Final Rejection — §102, §103
May 12, 2023
Interview Requested
May 19, 2023
Examiner Interview Summary
May 19, 2023
Applicant Interview (Telephonic)
Jul 26, 2023
Response Filed
Sep 23, 2023
Final Rejection — §102, §103
Nov 17, 2023
Notice of Allowance
Feb 07, 2024
Response after Non-Final Action
Feb 13, 2024
Response after Non-Final Action
Apr 07, 2024
Non-Final Rejection — §102, §103
Jul 11, 2024
Response Filed
Sep 25, 2024
Final Rejection — §102, §103
Nov 12, 2024
Response after Non-Final Action
Jan 02, 2025
Request for Continued Examination
Jan 13, 2025
Response after Non-Final Action
Jan 12, 2026
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602229
NEURAL NETWORK ACCELERATOR FOR OPERATING A CONSUMER PIPELINE STAGE USING A START FLAG SET BY A PRODUCER PIPELINE STAGE
2y 5m to grant Granted Apr 14, 2026
Patent 12530199
SYSTEMS AND METHODS FOR LOAD-DEPENDENT-BRANCH PRE-RESOLUTION
2y 5m to grant Granted Jan 20, 2026
Patent 12499078
IMAGE PROCESSOR AND METHODS FOR PROCESSING AN IMAGE
2y 5m to grant Granted Dec 16, 2025
Patent 12468540
TECHNOLOGIES FOR PREDICTION-BASED REGISTER RENAMING
2y 5m to grant Granted Nov 11, 2025
Patent 12399722
MEMORY DEVICE AND METHOD INCLUDING PROCESSOR-IN-MEMORY WITH CIRCULAR INSTRUCTION MEMORY QUEUE
2y 5m to grant Granted Aug 26, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
58%
Grant Probability
92%
With Interview (+33.8%)
4y 8m
Median Time to Grant
High
PTA Risk
Based on 670 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month