DETAILED ACTION
Response to Arguments
Applicant’s arguments with respect to the 101 rejections are persuasive. The amendments overcome the 101 rejections.
Applicant’s arguments with respect to the 103 rejections have been fully considered but are not persuasive.
Applicant argues: Claim 1 recites “executing a hash function that hashes an input value into a second value corresponding to a number of bins, wherein the number of bins is an integer equal to a product of a first integer and a second integer, the first integer being a prime number greater than two and the second integer being a power of two greater than one.” Claims 9 and 17 recite the same limitation. Ma does not teach this structure. Instead, Ma discloses hashing a virtual address using a mapping function that retains multiple prime factors, specifically 3, 5, and 7, in combination with powers of two. See Ma, col. 7, lines 50-60. The result is a flexible mapping architecture for various cache sizes, but Ma never teaches a hash function constrained to a product of only one prime number greater than two and only one power of two.
Applicant’s argument is not persuasive. Ma US 11416405 (Ma ‘405) expressly discloses an embodiment whereby the cache size is factored only into a single prime greater than two and a power of two. See fig. 2B-1, ID=25 and fig. 2B-2, ID=31. Ma ‘405 explains “[i]t is possible to (trivially) re-combine values in the case where there is only one prime power besides the powers of two.” [emphasis added] Furthermore, it is readily apparent that a number that decomposes into its prime factors having multiple powers of two is equivalent to the same set of prime factors having a single power of two, where the powers of two are multiplied (e.g., 2m2n = 2(m+n))
Applicant argues: Further, claim 1 recites “performing a first modulo hashing operation on the input value into the first integer” and “performing a second hashing operation using less than all bits of the input value.” Ma discloses multiple modulo operations, mod 3, mod 5, and mod 7, but does not disclose any second hashing operation that uses less than all bits of the input value. Instead, each modulo operation in Ma uses the full address, and Ma relies on programmable multiplexing to select among the outputs, not on any two-stage hashing pipeline. See Ma, col. 7, lines 43-66; FIG. 9.
Applicant’s argument is not persuasive. To be clear, Applicant’s asserted “two-stage hashing pipeline” is not defined in the claims as a particular circuit arrangement, but rather a functional workflow that is defined based on numerical properties of a cache size and mathematical operations. The A’ mod2a operation as described by Ma ‘405 reads on the claimed limitations and is in fact substantially similar to applicant’s disclosed invention. Applicant’s specification describes segmenting the address into two portions: one for the modulo operation with the prime greater than 2, and the other for the modulo operation with the power of 2. The “truncated” portion are the bits that represent the modulo operation with the prime greater than 2. See para 0036-0037. Furthermore, applicant’s specification explains that the modulo operation with the power of 2 (i.e., 2n) is equivalent to keeping only the n least significant bits of the number. See para 0010. I.e., the remaining truncated address is the result of the second hashing operation. In fig. 2B-1, ID=25 and fig. 2B-2, ID=31, Ma ‘405 discloses an embodiment whereby the modulo operation with the power of 2 [i.e., second hashing operation] is implemented by masking (AND operation in combination with zero) [i.e., using less than all bits of the input value] the input address and the masked value is then combined with a bitwise-left-shifted modulus result. See Ma ‘405 col. 10, lines 22-25 and 41-48. The modulo 2n operation is effectively the n least significant bits of the address. See also Ma ‘405 col. 5, lines 22-23.
Applicant argues: Moreover, claims 1, 9, and 17 recite “concatenating [a] result of the first modulo hashing operation with a result of the second hashing operation to form an output value corresponding to the hash of the input value into the second value.” Ma does not perform any such concatenation. Instead, Ma produces three separate outputs corresponding to three dimensions (e.g., row, column, and set) and combines those using multiplexers or interleaving logic to access specific cache entries. This is materially different from concatenating two specific hash outputs to generate a scalar index as required by the claims.
Applicant’s argument is not persuasive. The claims do not require “concatenating two specific hash outputs to generate a scalar index.” Instead, the claims require “concatenating result of the first modulo hashing operation with a result of the second hashing operation to form an output value corresponding to the hash of the input value into the second value.” The “output value” as claimed is not defined narrowly as a scaler value with an implied use, such as an integer for indexing a one-dimensional LUT or array, but broadly encompasses the result of concatenating the first and second hash operations as claimed. Furthermore, the claim defines the output value is merely “used” to select a bin in memory or data structure. The organization of the memory/data structure with respect to the bin allocations is not defined in the claims, and hence, the BRI of the claimed “output value” encompasses a set or coordinate location such that the set elements/coordinates are concatenated. The rejection explains that the arrangement of the cache location value of Ma ‘405 (i.e., concatenating the result of two hash operations) is a matter of design choice depending on the downstream input format requirements. It is also noted that Ma ‘405 explicitly describes an embodiment whereby a modulo prime (greater than 2) is concatenated with a modulo power of 2 to compute a given dimension (see col. 5, lines 45-56; col. 5, lines 22-23: the lowest “a” bits of A’ are effectively the operation: A’ mod 2a), and Ma ‘405 also expressly discloses their system can 1) have only one prime power besides the power of two (see col. 5, lines 30-32), and 2) be configured for various dimensions (see col. 5, lines 57-60). Hence, even assuming, arguendo, the claim recited a scalar index as the output value, such a feature is not a material difference as asserted by applicant, but an obvious variation of the cache indexes expressly disclosed by Ma. In view of basic principles of volumetric multiplication and integer factorization, the 3-dimensional data structure embodiment disclosed by Ma can be trivially reorganized in other dimensions; e.g., an equivalent one-dimensional data structure would have a substantially similar modulo scheme (e.g., a data structure with height = 1 and length = the prime factorization of the cache size).
Applicant argues: Finally, the claims recite “performing a lookup operation by using the output value to select a bin in the memory or data structure” (claim 1) or “selects one of the bins in the memory in accordance with the output value’ (claims 9 and 17). Ma’s output is not a single bin index used in a lookup operation, but instead a collection of multidimensional indices used to index an addressable cache space. Accordingly, Ma fails to teach or suggest the specific architecture of the claimed hashing and bin- selection system.
Applicant’s argument is not persuasive. As identified in the rejections, fig. 2B-1, ID=25 and fig. 2B-2, ID=31 of Ma ‘405 are relevant configurations because they define cache sizes that decompose into two prime factors, 3 and a power of 2. Both ID=25 and ID=31 correspond to cache sizes of 3*2(A+B+C). See Ma ‘405 col. 5, lines 19-33. Moreover, Ma ‘405 describes that their invention is not fixed on a certain set of dimensions, and covers caches of varying configurations that are flexible (“With an appropriate muxing (i.e., multiplexer) circuit, the same hardware can be used to map input addresses to SA caches of varying configurations, that is, to SA caches of various sizes (e.g., various dimensions).” [col. 5, lines 55-60; see also col. 2, lines 39-51]). Hence, two alternative dimensions within the scope of Ma are 1-dimension (1 dimension = 3*2(A+B+C)) and 2-dimension (first dimension = 3 and second dimension = 2(A+B+C)) variations. In both of these scenarios, the hash operations performed using the respective primes (3 and 2) align with the claimed first and second integers.
For example, consider a hypothetical variant of Ma fig. 2B-2, ID=31, where the cache size is 3*2(A+B+C), in a 2-dimensional representation using the circuit defined in Ma ‘405 fig. 2A. In this variant, let the first dimension (X) location value = 3, the second dimension (Y) location value = 2(A+B+C), then the third dimension (Z) would be set to 0. The circuit of fig. 2A can realize this variant as follows:
First dimension: mod3 circuit (206-2) is selected by MUX (216) [prime factor = 3]; the bitwise-left shifter (232a) would be shifted 0 bits [zero powers of 2]; and the entire memory address 208 would be masked to 0 [again because zero powers of 2]. The first location (output of combiner 241a) would be the result of the mod3 operation on the input address.
Second dimension: mod1 circuit (206-1) is selected by MUX (216) [no prime factor greater than 2]; the bitwise-left shifter (232b) would be shifted (A+B+C) bits [A+B+C power of 2]; and the bits of the address except for the lowest (A+B+C) bits would be masked to 0 [again because (A+B+C) powers of 2]. The second location (output of combiner 241b) would be the result of the lowest (A+B+C) bits of the address, which is effectively a mod2(A+B+C) operation on the input address.
Third dimension: mod1 circuit (206-1) is selected by MUX (216) [no prime factor greater than 2]; the bitwise-left shifter (232b) would be shifted zero bits [zero powers of 2] and the entire address would be masked to 0 [again because zero powers of 2]. The second location (output of combiner 241c) would be 0.
A similar type of variant can be readily ascertained in the circuit of fig. 2A for a 1-dimensional representation where the cache size is 3*2(A+B+C).
Hence, a variation of Ma in a 1-dimensional or 2-dimensional representation where the prime factorization of the cache size is limited to a prime greater than 2 and a power of 2 would have been predictable to one of ordinary skill in the art before the effective filing date of the claimed invention.
Applicant argues: Because Parady performs only one hashing operation, it does not and cannot teach “performing a second hashing operation using less than all bits of the input value,” nor does it teach “concatenating” two hash results as recited in claims 1, 9, and 17. Even when combined with Ma, Parady’s use of a Mersenne modulus is fundamentally incompatible with the two-stage structure and split-bitwidth hashing required by the claims. Parady also does not teach or suggest bitwise XOR logic, as recited in the second-hash alternative in the claims.
Applicant’s argument is not persuasive. The rejection does not propose to modify the system of Ma ‘405 by incorporating a Mersenne pseudo-prime number addressing scheme. Instead, the rejection proposes to modify the primary reference’s addressing scheme by utilizing an XOR operation on respective address bits with an offset of the address bits as taught by Parady ‘143. As explained in the rejection, this mixing operation provides the benefit of reducing conflict misses in a cache, which is a benefit that is not limited to a Mersenne pseudo-prime number addressing scheme. See also In re Keller, 642 F.2d 413, 425, 208 USPQ 871, 881 (CCPA 1981) (“The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference.... Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art.”); In re Sneed, 710 F.2d 1544, 1550, 218 USPQ 385, 389 (Fed. Cir. 1983) (“[I]t is not necessary that the inventions of the references be physically combinable to render obvious the invention under review.”); and In re Nievelt, 482 F.2d 965, 179 USPQ 224, 226 (CCPA 1973) (“Combining the teachings of references does not involve an ability to combine their specific structures.”)
With respect applicant’s arguments that Adams, Yasue, and Chira are unrelated to the claimed subject matter, Applicant argues against the references individually. The claimed features that applicant argues are not taught by Adams, Yasue and Chira, are in fact taught by Ma, and the rejections provide a sufficient rationale why these combinations of references render obvious the claimed invention. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See MPEP 2145.IV.
Claim Rejections - 35 USC § 112
Claims 1-6, 8-14 and 16-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the memory or data structure" in the last line. Claims 9 and 17 recite the limitation "the memory" in line 4 and the last line, respectively. Claim 9 also recites the limitation "the bins" in line 4. Claim 17 also recites the limitation "the number of bins" in line 6. There is insufficient antecedent basis for these limitations in the claims. Although the preambles of the independent claims recite “cache utilization during hash-based bin selection,” the preambles only recite a stated improvement of cache utilization and does not positively identify a memory or bins used in the claimed method or defined as part of the system.
Moreover, claims 9 and 17 are indefinite because the scope of the limitation “after exactly two hashing operations” is ambiguous. The limitation can have two mutually exclusive interpretations: “exactly two hashing operations” can refer to the prior recited first and second hashing operations or it can refer to the first and second hashing operations as a subset of a larger set of operations that constitute “exactly two hashing operations.”
Furthermore, claims 9 and 17 are indefinite because the scope of the limitation “a modulo operation on a truncated version of the input value modulo the second integer” is ambiguous. The limitation can have two mutually exclusive interpretations: the limitation can define two nested modulo operations or a single modulo operation, where the limitation “modulo the second integer” further defines the preceding limitation “a modulo operation.”
Applicant’s specification does not support a modulo operation on a truncated version of an input value modulo the second integer. Rather, the specification supports a modulo operation on a truncated version of the input value, or a truncated version of an input value modulo the second integer. See fig. 3 and related text.
The dependent claims inherit the defect of their parent claims and are rejected for the same reasons.
Claim Rejections - 35 USC § 103
Examiner’s note: Applicant has elected to amend the independent claims to define a second hashing operation as one of two operations: a modulo operation on a truncated version of the input value and a bitwise XOR operation on selected bits of the input value. For the purpose of compact prosecution, there are two sets of rejections under §103. The first set of rejections are applied to the claims where the second hashing operation is the first operation, and the second set of rejections are applied to the claims where the second hashing operation is the second operation.
Claims 1, 4, 9, 12, 17 and 20 are rejected under 103 as being unpatentable over Ma US 11416405 (hereinafter Ma ‘405).
As per claim 1, Ma ‘405 discloses a method for improving processor-level performance and cache utilization during hash-based bin selection, the method comprising:
executing a hash function that hashes an input value into a second value corresponding to a number of bins, wherein the number of bins is an integer equal to a product of a first integer and a second integer, the first integer being a prime number greater than two and the second integer being a power of two greater than one (Col. 4, lines 46-48, “A straightforward mapping of address blocks to sets is s=A mod S, where A is the block address and s is the target set. This mapping ensures that the same number of block addresses (give or take one) map onto each set.”; col. 4, lines 58-60, “Conventionally, A′ may be produced, for example, by applying a hash on A.”; col. 5, lines 19-32, “Any S can be decomposed into its prime factors, S=2a3b5c7d . . . Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′ … It is possible to (trivially) re-combine values in the case where there is only one prime power besides the powers of two”; fig. 2B-2, ID:31 [3*(2A); 2B; 2C] and/or fig. 2B-1, ID:25 [depending on memory configuration]; “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0],” “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; for ID: 31, prime number = 3 and second integer is 2(A+B+C));
wherein executing the hash function comprises:
performing a first modulo hashing operation on the input value into the first integer (col. 12, lines 16-18, “For the equations disclosed in table 250, “addr′” is the input memory address 208, % means modulo, << means left shift, | means OR, and [ ] means bit range.”; fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0]”);
performing a second hashing operation using less than all bits of the input value, wherein the second hashing operation comprises one of: a modulo operation on a truncated version of the input value or a bitwise XOR operation on selected bits of the input value (fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): addr’[A-1:0]”; “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; col. 5, lines 20-23, “Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′.” the operations addr’[A-1:0], addr’[A+B+C-1:A+B] and addr’[A+B-1:A] only utilizes a subset of the address [i.e., truncation] and effectively perform mod 2 operations (assignment as a location index or a portion of a location index); examiner’s note: the limitation only requires one of the two recited operations);
concatenating result of the first modulo hashing operation with a result of the second hashing operation to form an output value corresponding to the hash of the input value into the second value (col. 5, lines 28-30, “Finally, modulus (i.e., “mod”) values can be re-combined to yield x, y, z, where x, y, z defines the cache location in terms of column, row, and set, respectively”; col. 13, lines 45-48, “The multiplexer circuit may include a plurality of multiplexers and the selecting may cause at least a portion of the plurality of modulus results to be steered among the plurality of multiplexers and combined with respective distinct portions of the input memory address.” Furthermore, the arrangement of the cache location value (whether as discrete values or concatenated) is a matter of design choice that depends on the downstream input format requirements);
performing a lookup operation by using the output value to select a bin in the memory or data structure (col. 6, lines 8-13, “Based on the varying n cache sizes that the programmable circuit is to support, that is, the respective S=X×Y×Z configurations, each S can be decomposed into its prime factors to work out n equations for mapping an input memory address onto a cache location”; fig. 4, reference no. 408).
Ma ‘405 does not explicitly list an embodiment concatenating result of the first modulo hashing operation with a result of the second hashing operation to form an output value corresponding to the hash of the input value into the second value (i.e., such that the second value corresponds to the number of bins; e.g., although Ma ‘405 describes the cache size prime factorization can be only 3 and powers of 2 (see fig. 2B-1 and 2B-2, ID=25 and ID=31 respectively), Ma ‘405 does not expressly describe the addressable bin location in 1- or 2-dimensions). However, Ma ‘405 discloses their invention is configurable “to map input addresses to SA caches of varying configurations, that is, to SA caches of various sizes (e.g., various dimensions)” (col. 5, lines 58-60), and that “[v]alues for A, B, and C, disclosed above, may be programmable within respective ranges supported by the circuit 202, as defined by a designer (not shown) of the circuit 202” (col. 11, lines 1-3; note also the mod1 circuit of fig. 2a). See also, col. 2, lines 44-51: “The first, second, and third locations in the first, second, and third dimensions, respectively, may correspond to a row, column, and set, respectively, of the SA cache; however, the SA cache is not limited to having its cache locations identified by a row, column, and set. For example, a cache location may be identified by a row and set. Alternatively, the cache location may be identified a chip, row, column, and set.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to try a limited range of natural numbers (0, 1, 2 …) for the inputs of A, B and C, and thereby to reorganize the system of Ma ‘405 as addressable in 1- or 2-dimensions such that concatenating the result of the first modulo hashing operation with a result of the second hashing operation forms an output value corresponding to the hash of the input value into the second value. One of ordinary skill in the art would have chosen from a finite number of identified, predicable solutions, with a reasonable expectation of success because 1) the system of Ma ‘405 is designed to be reconfigurable for different N-way set associative cache sizes/configurations, and Ma ‘405 expressly discloses their system can be modified into different dimensions, 2) the different reconfigurations based on different dimensions (especially in 1- or 2-dimensions) are predictable (see again Ma ‘405, col. 5, lines 19-34 and 60), and 3) one of ordinary skill in the art would have tried different variations in order to optimize cache performance for different cache sizes and for different types of information to be stored/retrieved.
As per claim 4, Ma ‘405 discloses the method of claim 1 (supra). In addition, Ma ‘405 discloses wherein the lookup operation comprises performing a set selection in accordance with the output value (col. 11, lines 63-67, “The table 250 includes respective equations for performing calculations 254 for computing the first location 226a, second location 226b, and third location 226c, that represent the coordinates for the cache location 222 within the SA cache 214”).
As per claim 9, Ma ‘405 discloses a system configured to improve processor level hashing efficiency and cache utilization during hash-based bin selection, the system comprising:
the memory that stores data arranged in the bins; and a processor operatively coupled to the memory (fig. 5) and configured to:
execute a hash function that hashes an input value into a second value corresponding to the number of bins, the number of bins being equal to a product of a first integer and a second integer, the first integer being a prime number greater than two and constituting only non-power-of-two factor of the number of bins, and the second integer being a power of two greater than one (Col. 4, lines 46-48, “A straightforward mapping of address blocks to sets is s=A mod S, where A is the block address and s is the target set. This mapping ensures that the same number of block addresses (give or take one) map onto each set.”; col. 4, lines 58-60, “Conventionally, A′ may be produced, for example, by applying a hash on A.”; col. 5, lines 19-32, “Any S can be decomposed into its prime factors, S=2a3b5c7d . . . Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′ … It is possible to (trivially) re-combine values in the case where there is only one prime power besides the powers of two”; fig. 2B-2, ID:31 [3*(2A); 2B; 2C] and/or fig. 2B-1, ID:25 [depending on memory configuration]; “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0],” “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; for ID: 31, prime number = 3 and second integer is 2(A+B+C));
perform a first modulo hashing operation on the input value to obtain a first hash result equal to the input value modulo the first integer (col. 12, lines 16-18, “For the equations disclosed in table 250, “addr′” is the input memory address 208, % means modulo, << means left shift, | means OR, and [ ] means bit range.”; fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0]”);
perform using less than all bits of the input value, a second hashing operation selected from: (a) a modulo operation on a truncated version of the input value modulo the second integer, or (b) a bitwise XOR of selected bits of the input value, to obtain a second hash result (fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): addr’[A-1:0]”; “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; col. 5, lines 20-23, “Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′.” the operations addr’[A-1:0], addr’[A+B+C-1:A+B] and addr’[A+B-1:A] only utilizes a subset of the address [i.e., truncation] and effectively perform mod 2 operations (assignment as a location index or a portion of a location index); examiner’s note: the limitation only requires one of the two recited operations);
concatenate the first hash result and the second hash result to generate an output value that represents the hash of the input value into the second value after exactly two hashing operations (col. 5, lines 28-30, “Finally, modulus (i.e., “mod”) values can be re-combined to yield x, y, z, where x, y, z defines the cache location in terms of column, row, and set, respectively”; col. 13, lines 45-48, “The multiplexer circuit may include a plurality of multiplexers and the selecting may cause at least a portion of the plurality of modulus results to be steered among the plurality of multiplexers and combined with respective distinct portions of the input memory address.” Furthermore, the arrangement of the cache location value (whether as discrete values or concatenated) is a matter of design choice that depends on the downstream input format requirements);
and perform a lookup operation that selects one of the bins in the memory in accordance with the output value (col. 6, lines 8-13, “Based on the varying n cache sizes that the programmable circuit is to support, that is, the respective S=X×Y×Z configurations, each S can be decomposed into its prime factors to work out n equations for mapping an input memory address onto a cache location”; fig. 4, reference no. 408).
Ma ‘405 does not explicitly list an embodiment that concatenate[s] the first hash result and the second hash result to generate an output value that represents the hash of the input value into the second value after exactly two hashing operations (i.e., such that the second value corresponds to the number of bins; e.g., although Ma ‘405 describes the cache size prime factorization can be only 3 and powers of 2 (see fig. 2B-1 and 2B-2, ID=25 and ID=31 respectively), Ma ‘405 does not expressly describe the addressable bin location in 1- or 2-dimensions). However, Ma ‘405 discloses their invention is configurable “to map input addresses to SA caches of varying configurations, that is, to SA caches of various sizes (e.g., various dimensions)” (col. 5, lines 58-60), and that “[v]alues for A, B, and C, disclosed above, may be programmable within respective ranges supported by the circuit 202, as defined by a designer (not shown) of the circuit 202” (col. 11, lines 1-3; note also the mod1 circuit of fig. 2a). See also, col. 2, lines 44-51: “The first, second, and third locations in the first, second, and third dimensions, respectively, may correspond to a row, column, and set, respectively, of the SA cache; however, the SA cache is not limited to having its cache locations identified by a row, column, and set. For example, a cache location may be identified by a row and set. Alternatively, the cache location may be identified a chip, row, column, and set.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to try a limited range of natural numbers (0, 1, 2 …) for the inputs of A, B and C, and thereby to reorganize the system of Ma ‘405 as addressable in 1- or 2-dimensions such that concatenating the result of the first modulo hashing operation with a result of the second hashing operation forms an output value corresponding to the hash of the input value into the second value. One of ordinary skill in the art would have chosen from a finite number of identified, predicable solutions, with a reasonable expectation of success because 1) the system of Ma ‘405 is designed to be reconfigurable for different N-way set associative cache sizes/configurations, and Ma ‘405 expressly discloses their system can be modified into different dimensions, 2) the different reconfigurations based on different dimensions (especially in 1- or 2-dimensions) are predictable (see again Ma ‘405, col. 5, lines 19-34 and 60), and 3) one of ordinary skill in the art would have tried different variations in order to optimize cache performance for different cache sizes and for different types of information to be stored/retrieved.
As per claim 12, Ma ‘405 discloses the system of claim 9 (supra). In addition, Ma ‘405 discloses wherein the lookup operation comprises performing a set selection for a set associative cache stored in the memory in accordance with the output value (col. 11, lines 63-67, “The table 250 includes respective equations for performing calculations 254 for computing the first location 226a, second location 226b, and third location 226c, that represent the coordinates for the cache location 222 within the SA cache 214”; examiner’s note: “SA” stands for set-associative).
Claims 17 and 20 are computer readable medium claims storing instructions that correspond to claims 9 and 12. Hence, claims 17 and 20 are rejected for the same reasons as claims 9 and 12 (Ma ‘405).
Claims 2, 10 and 18 are rejected under 103 as being unpatentable over Ma ‘405 in view of Chirca et al. US 20140115279 (hereinafter Chirca ‘279) or in view of Adams US 5889996 (hereinafter Adams ‘996).
As per claim 2, Ma ‘405 discloses the method of claim 1 (supra). In addition, Ma ‘405 discloses the method further comprising forming a [truncated] input value [formed by truncating a plurality of least significant bits from the input value], wherein the second hashing operation comprises a second modulo hashing operation in which the [truncated] input value is hashed into the second integer (fig. 2B-2, ID: 31, “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”). Although Ma ‘405 does not disclose truncating a plurality of LSBs from the input value, where the truncated input value is hashed into the second integer, Ma ‘405 discloses that “[t]he first, second, and third locations in the first, second, and third dimensions, respectively, may correspond to a row, column, and set, respectively, of the SA cache … the SA cache is not limited to having its cache locations identified by a row, column, and set. For example, a cache location may be identified by a row and set. Alternatively, the cache location may be identified [by] a chip, row, column, and set.” (Col. 2, lines 44-51). Moreover, Chirca ‘279 discloses “[t]he following description mentions access addresses. It is known in the art that these addresses need not be the complete endpoint memory address [sic] a number of least significant bits of these addresses could be truncated so that the addresses refer to a larger quantity of data such as a whole cache line” (para 0062). Furthermore, Adams ‘996 discloses truncating a plurality of LSBs from the input value, where the truncated input value is hashed into the second integer. See Adams ‘996, col. 14, lines 27-35, (“A significant feature of a direct-mapped cache is that every block of memory within the source memory device 20 has a specific cache line 140 to which it will be cached any time it is cached. In one scheme, the least significant bits in an address corresponding to a block within a memory device may be truncated to the same size as the address of a cache line 140. Thus, every block of memory 20 is assigned to a cache line 140 having the same least significant bit address.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to truncate a plurality of least significant bits from the input value and performing the second modulo hashing operation on the truncated input value. One would have been motivated to do so for cache placement schemes whereby an access address for a set is larger than a retrievable datum (e.g., a byte), and based on the design criteria of the cache (e.g., a one-way set associative cache) to attain the benefits of such a design (e.g., placement policy can be simple and efficient).
Claim 10 is a system claim comprising a memory and one or more processors that correspond to claim 2. Hence, claim 10 is rejected for the same reasons as claim 2 (Ma ‘405 in view of Chirca ‘279 or Adams ‘996).
Claim 18 is a computer readable medium claim storing instructions that correspond to claim 2. Hence, claim 18 is rejected for the same reasons as claim 2 (Ma ‘405 in view of Chirca ‘279 or Adams ‘996).
Claims 3, 11 and 19 are rejected under 103 as being unpatentable over Ma ‘405 in view of Parady US 5649143 (hereinafter Parady ‘143).
As per claim 3, Ma ‘405 discloses the method of claim 1 (supra). Ma ‘405 does not disclose, but Parady ‘143 discloses wherein the second hashing operation comprises: for each bit (bx) of a plurality of bits in the input value, performing the following XOR operation: bx XOR bx+k, where k is an integer greater than one (fig. 3a; col. 6, lines 34-50, “As shown in FIG. 3a, the logic circuitry 265 includes a plurality of logic gates 301-312 oriented in parallel to offset the set bits "A[16:5]" input into the cache while the most significant set bits "A[19:17]" remain unaffected since the address line is only 32-bits wide. Preferably, the plurality of logic gates 301-312 are dual-input XOR gates, although any logic gates providing identical functionality and minimal latency may be used. As shown, the first logic gate 301 is coupled to both the least significant set bit (i.e., A[5]) and the least significant tag bit (i.e., A[20]). The same coupling pattern continues until the most significant tag bit (i.e., A[31]) and A[16] are coupled to the logic gate 312. Upon the processor sub-system 220 generating a memory address, the logic circuitry 265 appropriately offsets the set address to produce the indexed set address preventing conflict misses between successive memory accesses associated with an identical cache line”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the modulo 2 hashing operation of Ma ‘405 with a variant of the XOR operation as disclose by Parady ‘143 (i.e. apply a bitwise XOR operation on a contiguous interval of least significant bits with corresponding higher significant bits based on an integer offset greater than one). The cache indexing scheme described by Parady ‘143 has the benefit of reducing conflict misses. One would have been motivated to do so as a simple substitution of one known element for another to obtain predictable results.
Claim 11 is a system claim comprising a memory and one or more processors that corresponds to claim 3. Hence, claim 11 is rejected for the same reasons as claim 3 (Ma ‘405 in view of Parady ‘143).
Claim 19 is a computer readable medium claim storing instructions that corresponds to claim 3. Hence, claim 19 is rejected for the same reasons as claim 3 (Ma ‘405 in view of Parady ‘143).
Claims 5-6, 8, 13-14 and 16 are rejected under 103 as being unpatentable over Ma ‘405 in view of Yasue US 20050268038 (hereinafter Yasue ‘038).
As per claim 5, Ma ‘405 discloses the method of claim 4 (supra). Ma ‘405 does not disclose, but Yasue ‘038 discloses wherein the input value corresponds to a virtual address in a cache associated with a graphics processor (para. 0028, “In accordance with one or more embodiments of the invention, however, the number of data accesses to the system memory 106 may be reduced by way of a software implemented cache within the local memory 104…. The software invoked cache memory areas 120, however, are not implemented using such hardware, rather these areas are formed using software code. For example, with reference to FIG. 3, the processor 102 may specify a number of parameters of the software cache 120A utilizing the API code. At action 300, the processor 102 may specify a cache entry size, which indicates a number of cache lines 122 to include in the software invoked cache memory area 120A. As illustrated in FIG. 2, there may be any number of lines specified, such as four. The processor 102 may also specify a line size using the API code, where the line size specifies the extent of each cache line 122 within the software invoked cache memory area(s) 120A-N.”; para. 0066, “The MMU 562 is preferably operable to translate effective addresses (taken from DMA commands) into real addresses for memory access”; para. 0054, “Some capabilities of the SPU 508 include graphics geometry pipelines, surface subdivision, Fast Fourier Transforms, image processing keywords, stream processing, MPEG encoding/decoding…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ma ‘405 such that the input value corresponds to a virtual address in a cache associated with a graphics processor as taught by Yasue ‘038. One would have been motivated to do so to apply a configurable and efficient indexing scheme for mapping memory address as described by Ma ‘405 in a graphics processor with a software implemented memory (programmable cache) that requires a vast number of data accesses and computations. See Ma ‘405, col. 1, lines 54-61; Yasue ‘038, para. 0004 and 0010.
As per claim 6, Ma ‘405 in view of Yasue ‘038 discloses the method of claim 5 (supra). Moreover, Ma ‘405 discloses wherein the second value corresponds to a number of sets in the cache (col. 4, lines 39-45, “For example, and without loss of generality, an SA cache may be organized into rows and columns of banks. The number of sets in the SA cache may be given by S=X×Y×Z, where S is the total number of sets, X is the number of columns of banks, Y is the number of rows of banks, and Z is the number of sets within each bank. The sets in the SA cache can be numbered from 0 to S−1.”; fig. 2B-2, ID:31 [3*(2A); 2B; 2C]).
As per claim 8, Ma ‘405 discloses the method of claim 1 (supra). Ma ‘405 does not disclose, but Yasue ‘038 discloses wherein a plurality of SIMD units perform the hash function in parallel (fig. 4, reference nos. 310-322; figs. 5-7; para. 0051, “The PU 504 can be, e.g., a standard processor capable of stand-alone processing of data and applications. In operation, the PU 504 preferably schedules and orchestrates the processing of data and applications by the sub-processing units. The sub-processing units preferably are single instruction, multiple data (SIMD) processors. Under the control of the PU 504, the sub-processing units perform the processing of these data and applications in a parallel and independent manner”; para. 0052, “It is noted that the PU 504 may be implemented by one of the sub-processing units 508 taking on the role of a main processing unit that schedules and orchestrates the processing of data and applications by the sub-processing units 508”; para. 0055, “The sub-processing unit 508 includes two basic functional units, namely an SPU core 510A and a memory flow controller (MFC) 510B. The SPU core 510A performs program execution, data manipulation, etc., while the MFC 510B performs functions related to data transfers between the SPU core 510A and the DRAM 514 of the system”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ma ‘405 such that a plurality of SIMD units perform the hash function in parallel. One would have been motivated to do so to utilize the performance improvements afforded by SIMD parallel processing as disclosed by Yasue ‘038 and as known to one of ordinary skill in the art.
Claims 13-14 and 16 are system claims comprising a memory and one or more processors that correspond to claims 5-6 and 8. Hence, claims 13-14 and 16 are rejected for the same reasons as claims 5-6 and 8 (Ma ‘405 in view of Yasue ‘038).
==================================================================
Claims 1, 3-4, 9, 11-12, 17 and 19-20 are rejected under 103 as being unpatentable over Ma US 11416405 (hereinafter Ma ‘405) in view of Parady US 5649143 (hereinafter Parady ‘143).
As per claim 1, Ma ‘405 discloses a method for improving processor-level performance and cache utilization during hash-based bin selection, the method comprising:
executing a hash function that hashes an input value into a second value corresponding to a number of bins, wherein the number of bins is an integer equal to a product of a first integer and a second integer, the first integer being a prime number greater than two and the second integer being a power of two greater than one (Col. 4, lines 46-48, “A straightforward mapping of address blocks to sets is s=A mod S, where A is the block address and s is the target set. This mapping ensures that the same number of block addresses (give or take one) map onto each set.”; col. 4, lines 58-60, “Conventionally, A′ may be produced, for example, by applying a hash on A.”; col. 5, lines 19-32, “Any S can be decomposed into its prime factors, S=2a3b5c7d . . . Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′ … It is possible to (trivially) re-combine values in the case where there is only one prime power besides the powers of two”; fig. 2B-2, ID:31 [3*(2A); 2B; 2C] and/or fig. 2B-1, ID:25 [depending on memory configuration]; “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0],” “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; for ID: 31, prime number = 3 and second integer is 2(A+B+C));
wherein executing the hash function comprises:
performing a first modulo hashing operation on the input value into the first integer (col. 12, lines 16-18, “For the equations disclosed in table 250, “addr′” is the input memory address 208, % means modulo, << means left shift, | means OR, and [ ] means bit range.”; fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0]”);
performing a second hashing operation using less than all bits of the input value, wherein the second hashing operation comprises one of: a modulo operation on a truncated version of the input value or a bitwise XOR operation on selected bits of the input value (fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): addr’[A-1:0]”; “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; col. 5, lines 20-23, “Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′.” the operations addr’[A-1:0], addr’[A+B+C-1:A+B] and addr’[A+B-1:A] only utilizes a subset of the address [i.e., truncation] and effectively perform mod 2 operations (assignment as a location index or a portion of a location index); examiner’s note: the limitation only requires one of the two recited operations);
concatenating result of the first modulo hashing operation with a result of the second hashing operation to form an output value corresponding to the hash of the input value into the second value (col. 5, lines 28-30, “Finally, modulus (i.e., “mod”) values can be re-combined to yield x, y, z, where x, y, z defines the cache location in terms of column, row, and set, respectively”; col. 13, lines 45-48, “The multiplexer circuit may include a plurality of multiplexers and the selecting may cause at least a portion of the plurality of modulus results to be steered among the plurality of multiplexers and combined with respective distinct portions of the input memory address.” Furthermore, the arrangement of the cache location value (whether as discrete values or concatenated) is a matter of design choice that depends on the downstream input format requirements);
performing a lookup operation by using the output value to select a bin in the memory or data structure (col. 6, lines 8-13, “Based on the varying n cache sizes that the programmable circuit is to support, that is, the respective S=X×Y×Z configurations, each S can be decomposed into its prime factors to work out n equations for mapping an input memory address onto a cache location”; fig. 4, reference no. 408).
Ma ‘405 does not explicitly list an embodiment concatenating result of the first modulo hashing operation with a result of the second hashing operation to form an output value corresponding to the hash of the input value into the second value (i.e., such that the second value corresponds to the number of bins; e.g., although Ma ‘405 describes the cache size prime factorization can be only 3 and powers of 2 (see fig. 2B-1 and 2B-2, ID=25 and ID=31 respectively), Ma ‘405 does not expressly describe the addressable bin location in 1- or 2-dimensions). However, Ma ‘405 discloses their invention is configurable “to map input addresses to SA caches of varying configurations, that is, to SA caches of various sizes (e.g., various dimensions)” (col. 5, lines 58-60), and that “[v]alues for A, B, and C, disclosed above, may be programmable within respective ranges supported by the circuit 202, as defined by a designer (not shown) of the circuit 202” (col. 11, lines 1-3; note also the mod1 circuit of fig. 2a). See also, col. 2, lines 44-51: “The first, second, and third locations in the first, second, and third dimensions, respectively, may correspond to a row, column, and set, respectively, of the SA cache; however, the SA cache is not limited to having its cache locations identified by a row, column, and set. For example, a cache location may be identified by a row and set. Alternatively, the cache location may be identified a chip, row, column, and set.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to try a limited range of natural numbers (0, 1, 2 …) for the inputs of A, B and C, and thereby to reorganize the system of Ma ‘405 as addressable in 1- or 2-dimensions such that concatenating the result of the first modulo hashing operation with a result of the second hashing operation forms an output value corresponding to the hash of the input value into the second value. One of ordinary skill in the art would have chosen from a finite number of identified, predicable solutions, with a reasonable expectation of success because 1) the system of Ma ‘405 is designed to be reconfigurable for different N-way set associative cache sizes/configurations, and Ma ‘405 expressly discloses their system can be modified into different dimensions, 2) the different reconfigurations based on different dimensions (especially in 1- or 2-dimensions) are predictable (see again Ma ‘405, col. 5, lines 19-34 and 60), and 3) one of ordinary skill in the art would have tried different variations in order to optimize cache performance for different cache sizes and for different types of information to be stored/retrieved.
Moreover, Ma ‘405 does not disclose, but Parady ‘143 discloses performing a second hashing operation using less than all bits of the input value, wherein the second hashing operation comprises one of: … a bitwise XOR operation on selected bits of the input value (fig. 3a; col. 6, lines 34-50, “As shown in FIG. 3a, the logic circuitry 265 includes a plurality of logic gates 301-312 oriented in parallel to offset the set bits "A[16:5]" input into the cache while the most significant set bits "A[19:17]" remain unaffected since the address line is only 32-bits wide. Preferably, the plurality of logic gates 301-312 are dual-input XOR gates, although any logic gates providing identical functionality and minimal latency may be used. As shown, the first logic gate 301 is coupled to both the least significant set bit (i.e., A[5]) and the least significant tag bit (i.e., A[20]). The same coupling pattern continues until the most significant tag bit (i.e., A[31]) and A[16] are coupled to the logic gate 312. Upon the processor sub-system 220 generating a memory address, the logic circuitry 265 appropriately offsets the set address to produce the indexed set address preventing conflict misses between successive memory accesses associated with an identical cache line”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the modulo 2 hashing operation of Ma ‘405 with a variant of the XOR operation as disclose by Parady ‘143 (i.e. apply a bitwise XOR operation on selected bits of the input value). The cache indexing scheme described by Parady ‘143 has the benefit of reducing conflict misses. One would have been motivated to do so as a simple substitution of one known element for another to obtain predictable results.
As per claim 3, Ma ‘405 in view of Parady ‘143 disclose the method of claim 1 (supra). Ma ‘405 does not disclose, but Parady ‘143 discloses wherein the second hashing operation comprises: for each bit (bx) of a plurality of bits in the input value, performing the following XOR operation: bx XOR bx+k, where k is an integer greater than one (fig. 3a; col. 6, lines 34-50, “As shown in FIG. 3a, the logic circuitry 265 includes a plurality of logic gates 301-312 oriented in parallel to offset the set bits "A[16:5]" input into the cache while the most significant set bits "A[19:17]" remain unaffected since the address line is only 32-bits wide. Preferably, the plurality of logic gates 301-312 are dual-input XOR gates, although any logic gates providing identical functionality and minimal latency may be used. As shown, the first logic gate 301 is coupled to both the least significant set bit (i.e., A[5]) and the least significant tag bit (i.e., A[20]). The same coupling pattern continues until the most significant tag bit (i.e., A[31]) and A[16] are coupled to the logic gate 312. Upon the processor sub-system 220 generating a memory address, the logic circuitry 265 appropriately offsets the set address to produce the indexed set address preventing conflict misses between successive memory accesses associated with an identical cache line”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the modulo 2 hashing operation of Ma ‘405 with a variant of the XOR operation as disclose by Parady ‘143 (i.e. apply a bitwise XOR operation on a contiguous interval of least significant bits with corresponding higher significant bits based on an integer offset greater than one). The cache indexing scheme described by Parady ‘143 has the benefit of reducing conflict misses. One would have been motivated to do so as a simple substitution of one known element for another to obtain predictable results.
As per claim 4, Ma ‘405 in view of Paraday ‘143 disclose the method of claim 1 (supra). In addition, Ma ‘405 discloses wherein the lookup operation comprises performing a set selection in accordance with the output value (col. 11, lines 63-67, “The table 250 includes respective equations for performing calculations 254 for computing the first location 226a, second location 226b, and third location 226c, that represent the coordinates for the cache location 222 within the SA cache 214”).
As per claim 9, Ma ‘405 discloses a system configured to improve processor level hashing efficiency and cache utilization during hash-based bin selection, the system comprising:
the memory that stores data arranged in the bins; and a processor operatively coupled to the memory (fig. 5) and configured to:
execute a hash function that hashes an input value into a second value corresponding to the number of bins, the number of bins being equal to a product of a first integer and a second integer, the first integer being a prime number greater than two and constituting only non-power-of-two factor of the number of bins, and the second integer being a power of two greater than one (Col. 4, lines 46-48, “A straightforward mapping of address blocks to sets is s=A mod S, where A is the block address and s is the target set. This mapping ensures that the same number of block addresses (give or take one) map onto each set.”; col. 4, lines 58-60, “Conventionally, A′ may be produced, for example, by applying a hash on A.”; col. 5, lines 19-32, “Any S can be decomposed into its prime factors, S=2a3b5c7d . . . Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′ … It is possible to (trivially) re-combine values in the case where there is only one prime power besides the powers of two”; fig. 2B-2, ID:31 [3*(2A); 2B; 2C] and/or fig. 2B-1, ID:25 [depending on memory configuration]; “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0],” “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; for ID: 31, prime number = 3 and second integer is 2(A+B+C));
perform a first modulo hashing operation on the input value to obtain a first hash result equal to the input value modulo the first integer (col. 12, lines 16-18, “For the equations disclosed in table 250, “addr′” is the input memory address 208, % means modulo, << means left shift, | means OR, and [ ] means bit range.”; fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): ((addr’%3<<A) | addr’[A-1:0]”);
perform using less than all bits of the input value, a second hashing operation selected from: (a) a modulo operation on a truncated version of the input value modulo the second integer, or (b) a bitwise XOR of selected bits of the input value, to obtain a second hash result (fig. 2B-2, ID: 31, “FIRST LOCATION CALCULATION (e.g., ROW): addr’[A-1:0]”; “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”; col. 5, lines 20-23, “Number theory tells that the computation of A′ mod S is congruent to the computation of: (A′ mod 2a, A′ mod 3b, A′ mod 5c, A′ mod 7d . . . ), where A′ mod 2a is simply the lowest a bits of A′.” the operations addr’[A-1:0], addr’[A+B+C-1:A+B] and addr’[A+B-1:A] only utilizes a subset of the address [i.e., truncation] and effectively perform mod 2 operations (assignment as a location index or a portion of a location index); examiner’s note: the limitation only requires one of the two recited operations);
concatenate the first hash result and the second hash result to generate an output value that represents the hash of the input value into the second value after exactly two hashing operations (col. 5, lines 28-30, “Finally, modulus (i.e., “mod”) values can be re-combined to yield x, y, z, where x, y, z defines the cache location in terms of column, row, and set, respectively”; col. 13, lines 45-48, “The multiplexer circuit may include a plurality of multiplexers and the selecting may cause at least a portion of the plurality of modulus results to be steered among the plurality of multiplexers and combined with respective distinct portions of the input memory address.” Furthermore, the arrangement of the cache location value (whether as discrete values or concatenated) is a matter of design choice that depends on the downstream input format requirements);
and perform a lookup operation that selects one of the bins in the memory in accordance with the output value (col. 6, lines 8-13, “Based on the varying n cache sizes that the programmable circuit is to support, that is, the respective S=X×Y×Z configurations, each S can be decomposed into its prime factors to work out n equations for mapping an input memory address onto a cache location”; fig. 4, reference no. 408).
Ma ‘405 does not explicitly list an embodiment that concatenate[s] the first hash result and the second hash result to generate an output value that represents the hash of the input value into the second value after exactly two hashing operations (i.e., such that the second value corresponds to the number of bins; e.g., although Ma ‘405 describes the cache size prime factorization can be only 3 and powers of 2 (see fig. 2B-1 and 2B-2, ID=25 and ID=31 respectively), Ma ‘405 does not expressly describe the addressable bin location in 1- or 2-dimensions). However, Ma ‘405 discloses their invention is configurable “to map input addresses to SA caches of varying configurations, that is, to SA caches of various sizes (e.g., various dimensions)” (col. 5, lines 58-60), and that “[v]alues for A, B, and C, disclosed above, may be programmable within respective ranges supported by the circuit 202, as defined by a designer (not shown) of the circuit 202” (col. 11, lines 1-3; note also the mod1 circuit of fig. 2a). See also, col. 2, lines 44-51: “The first, second, and third locations in the first, second, and third dimensions, respectively, may correspond to a row, column, and set, respectively, of the SA cache; however, the SA cache is not limited to having its cache locations identified by a row, column, and set. For example, a cache location may be identified by a row and set. Alternatively, the cache location may be identified a chip, row, column, and set.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to try a limited range of natural numbers (0, 1, 2 …) for the inputs of A, B and C, and thereby to reorganize the system of Ma ‘405 as addressable in 1- or 2-dimensions such that concatenating the result of the first modulo hashing operation with a result of the second hashing operation forms an output value corresponding to the hash of the input value into the second value. One of ordinary skill in the art would have chosen from a finite number of identified, predicable solutions, with a reasonable expectation of success because 1) the system of Ma ‘405 is designed to be reconfigurable for different N-way set associative cache sizes/configurations, and Ma ‘405 expressly discloses their system can be modified into different dimensions, 2) the different reconfigurations based on different dimensions (especially in 1- or 2-dimensions) are predictable (see again Ma ‘405, col. 5, lines 19-34 and 60), and 3) one of ordinary skill in the art would have tried different variations in order to optimize cache performance for different cache sizes and for different types of information to be stored/retrieved.
Moreover, Ma ‘405 does not disclose, but Parady ‘143 discloses perform using less than all bits of the input value, a second hashing operation selected from: … (b) a bitwise XOR of selected bits of the input value (fig. 3a; col. 6, lines 34-50, “As shown in FIG. 3a, the logic circuitry 265 includes a plurality of logic gates 301-312 oriented in parallel to offset the set bits "A[16:5]" input into the cache while the most significant set bits "A[19:17]" remain unaffected since the address line is only 32-bits wide. Preferably, the plurality of logic gates 301-312 are dual-input XOR gates, although any logic gates providing identical functionality and minimal latency may be used. As shown, the first logic gate 301 is coupled to both the least significant set bit (i.e., A[5]) and the least significant tag bit (i.e., A[20]). The same coupling pattern continues until the most significant tag bit (i.e., A[31]) and A[16] are coupled to the logic gate 312. Upon the processor sub-system 220 generating a memory address, the logic circuitry 265 appropriately offsets the set address to produce the indexed set address preventing conflict misses between successive memory accesses associated with an identical cache line”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the modulo 2 hashing operation of Ma ‘405 with a variant of the XOR operation as disclose by Parady ‘143 (i.e. apply a bitwise XOR operation on selected bits of the input value). The cache indexing scheme described by Parady ‘143 has the benefit of reducing conflict misses. One would have been motivated to do so as a simple substitution of one known element for another to obtain predictable results.
Claim 11 is a system claim comprising a memory and one or more processors that corresponds to claim 3. Hence, claim 11 is rejected for the same reasons as claim 3 (Ma ‘405 in view of Parady ‘143).
As per claim 12, Ma ‘405 in view of Parady ‘143 disclose the system of claim 9 (supra). In addition, Ma ‘405 discloses wherein the lookup operation comprises performing a set selection for a set associative cache stored in the memory in accordance with the output value (col. 11, lines 63-67, “The table 250 includes respective equations for performing calculations 254 for computing the first location 226a, second location 226b, and third location 226c, that represent the coordinates for the cache location 222 within the SA cache 214”; examiner’s note: “SA” stands for set-associative).
Claims 17 and 19-20 are computer readable medium claims storing instructions that correspond to claims 9 and 11-12. Hence, claims 17 and 19-20 are rejected for the same reasons as claims 9 and 11-12 (Ma ‘405 in view of Parady ‘143).
Claims 2, 10 and 18 are rejected under 103 as being unpatentable over Ma ‘405 in view of Parady ‘143, and further in view of Chirca et al. US 20140115279 (hereinafter Chirca ‘279) or Adams US 5889996 (hereinafter Adams ‘996).
As per claim 2, Ma ‘405 in view of Paraday ‘143 disclose the method of claim 1 (supra). In addition, Ma ‘405 discloses the method further comprising forming a [truncated] input value [formed by truncating a plurality of least significant bits from the input value], wherein the second hashing operation comprises a second modulo hashing operation in which the [truncated] input value is hashed into the second integer (fig. 2B-2, ID: 31, “SECOND LOCATION CALCULATION (e.g., COLUMN): addr’[A+B-1:A],” “THIRD LOCATION CALCULATION (e.g., SET): addr’[A+B+C-1:A+B]”).
Although Ma ‘405 does not disclose truncating a plurality of LSBs from the input value, where the truncated input value is hashed into the second integer, Ma ‘405 discloses that “[t]he first, second, and third locations in the first, second, and third dimensions, respectively, may correspond to a row, column, and set, respectively, of the SA cache … the SA cache is not limited to having its cache locations identified by a row, column, and set. For example, a cache location may be identified by a row and set. Alternatively, the cache location may be identified [by] a chip, row, column, and set.” (Col. 2, lines 44-51). Moreover, Chirca ‘279 discloses “[t]he following description mentions access addresses. It is known in the art that these addresses need not be the complete endpoint memory address [sic] a number of least significant bits of these addresses could be truncated so that the addresses refer to a larger quantity of data such as a whole cache line” (para 0062). Furthermore, Adams ‘996 discloses truncating a plurality of LSBs from the input value, where the truncated input value is hashed into the second integer. See Adams ‘996, col. 14, lines 27-35, (“A significant feature of a direct-mapped cache is that every block of memory within the source memory device 20 has a specific cache line 140 to which it will be cached any time it is cached. In one scheme, the least significant bits in an address corresponding to a block within a memory device may be truncated to the same size as the address of a cache line 140. Thus, every block of memory 20 is assigned to a cache line 140 having the same least significant bit address.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to truncate a plurality of least significant bits from the input value and performing the second modulo hashing operation on the truncated input value. One would have been motivated to do so for cache placement schemes whereby an access address for a set is larger than a retrievable datum (e.g., a byte), and based on the design criteria of the cache (e.g., a one-way set associative cache) to attain the benefits of such a design (e.g., placement policy can be simple and efficient).
Claim 10 is a system claim comprising a memory and one or more processors that correspond to claim 2. Hence, claim 10 is rejected for the same reasons as claim 2 (Ma ‘405 in view of Parady ‘143 and Chirca ‘279 or Adams ‘996).
Claim 18 is a computer readable medium claim storing instructions that correspond to claim 2. Hence, claim 18 is rejected for the same reasons as claim 2 (Ma ‘405 in view of Parady ‘143 and Chirca ‘279 or Adams ‘996).
Claims 5-6, 8, 13-14 and 16 are rejected under 103 as being unpatentable over Ma ‘405 in view of Parady ‘143, and further in view of Yasue US 20050268038 (hereinafter Yasue ‘038).
As per claim 5, Ma ‘405 in view of Parady ‘143 disclose the method of claim 4 (supra). Ma ‘405 does not disclose, but Yasue ‘038 discloses wherein the input value corresponds to a virtual address in a cache associated with a graphics processor (para. 0028, “In accordance with one or more embodiments of the invention, however, the number of data accesses to the system memory 106 may be reduced by way of a software implemented cache within the local memory 104…. The software invoked cache memory areas 120, however, are not implemented using such hardware, rather these areas are formed using software code. For example, with reference to FIG. 3, the processor 102 may specify a number of parameters of the software cache 120A utilizing the API code. At action 300, the processor 102 may specify a cache entry size, which indicates a number of cache lines 122 to include in the software invoked cache memory area 120A. As illustrated in FIG. 2, there may be any number of lines specified, such as four. The processor 102 may also specify a line size using the API code, where the line size specifies the extent of each cache line 122 within the software invoked cache memory area(s) 120A-N.”; para. 0066, “The MMU 562 is preferably operable to translate effective addresses (taken from DMA commands) into real addresses for memory access”; para. 0054, “Some capabilities of the SPU 508 include graphics geometry pipelines, surface subdivision, Fast Fourier Transforms, image processing keywords, stream processing, MPEG encoding/decoding…”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ma ‘405 such that the input value corresponds to a virtual address in a cache associated with a graphics processor as taught by Yasue ‘038. One would have been motivated to do so to apply a configurable and efficient indexing scheme for mapping memory address as described by Ma ‘405 in a graphics processor with a software implemented memory (programmable cache) that requires a vast number of data accesses and computations. See Ma ‘405, col. 1, lines 54-61; Yasue ‘038, para. 0004 and 0010.
As per claim 6, Ma ‘405 in view of Parady ‘143 and Yasue ‘038 disclose the method of claim 5 (supra). Moreover, Ma ‘405 discloses wherein the second value corresponds to a number of sets in the cache (col. 4, lines 39-45, “For example, and without loss of generality, an SA cache may be organized into rows and columns of banks. The number of sets in the SA cache may be given by S=X×Y×Z, where S is the total number of sets, X is the number of columns of banks, Y is the number of rows of banks, and Z is the number of sets within each bank. The sets in the SA cache can be numbered from 0 to S−1.”; fig. 2B-2, ID:31 [3*(2A); 2B; 2C]).
As per claim 8, Ma ‘405 in view of Parady ‘143 disclose the method of claim 1 (supra). Ma ‘405 does not disclose, but Yasue ‘038 discloses wherein a plurality of SIMD units perform the hash function in parallel (fig. 4, reference nos. 310-322; figs. 5-7; para. 0051, “The PU 504 can be, e.g., a standard processor capable of stand-alone processing of data and applications. In operation, the PU 504 preferably schedules and orchestrates the processing of data and applications by the sub-processing units. The sub-processing units preferably are single instruction, multiple data (SIMD) processors. Under the control of the PU 504, the sub-processing units perform the processing of these data and applications in a parallel and independent manner”; para. 0052, “It is noted that the PU 504 may be implemented by one of the sub-processing units 508 taking on the role of a main processing unit that schedules and orchestrates the processing of data and applications by the sub-processing units 508”; para. 0055, “The sub-processing unit 508 includes two basic functional units, namely an SPU core 510A and a memory flow controller (MFC) 510B. The SPU core 510A performs program execution, data manipulation, etc., while the MFC 510B performs functions related to data transfers between the SPU core 510A and the DRAM 514 of the system”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ma ‘405 such that a plurality of SIMD units perform the hash function in parallel. One would have been motivated to do so to utilize the performance improvements afforded by SIMD parallel processing as disclosed by Yasue ‘038 and as known to one of ordinary skill in the art.
Claims 13-14 and 16 are system claims comprising a memory and one or more processors that correspond to claims 5-6 and 8. Hence, claims 13-14 and 16 are rejected for the same reasons as claims 5-6 and 8 (Ma ‘405 in view of Parady ‘143 and Yasue ‘038).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUNG W KIM whose telephone number is (571)272-3804. The examiner can normally be reached Monday-Friday, 10 a.m. - 6 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amy Cohen Johnson can be reached at 571-272-2238. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JUNG W KIM/Supervisory Patent Examiner, Art Unit 2494