DETAILED ACTION
The instant application having Application No. 17/884,777 filed on 8/10/2022 is presented for examination by the examiner. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-5, 7-12, 14-18, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gueron (US 2015/0067302).
As per Claim 1, Gueron discloses a processor, comprising: an instruction fetch unit that fetches instructions to be executed (Figures 4A-B, 19A and Paragraph 0188, apparatus 1901 is part of architecture core 490 which includes instruction fetch unit 438);
an architected register file including a plurality of registers for storing source and destination operands (Figures 4B, 19A and Paragraphs 0091, 0117 and 0189, execution engine unit 450 includes physical register files 458 which are architectural registers for storing source and destination operands);
and an execution unit for executing a Galois multiply instruction (Figures 19A-B, 20D and Paragraphs 0188-0190 and 0196, apparatus 1901 is implemented by e.g. execution unit(s) 462 for executing a binary finite field multiplication instruction over a Galois field);
wherein the execution unit includes: a carryless multiplier configured to multiply operands of the Galois multiply instruction to generate a product (Figure 19A and Paragraph 0189, carry-less multiplier 1913);
and a modular reduction circuit configured to receive the product and determine, based on a logical combination of the product and a fixed polynomial, a reduced product having a fewer number of bits than the product (Figures 18A-19A and Paragraphs 0188-0189 and 0196, 15-bit product is reduced modulo an irreducible polynomial to an 8-bit reduced product via reduction unit 1917, which implements XORs and shifts, i.e. logical combinations, of the product and the irreducible polynomial in order to perform reduction, i.e. via one of 1801-1804);
wherein the execution unit is configured to store the reduced product to the architected register file as a result of the Galois multiply instruction (Figure 20D and Paragraphs 0117, 0189 and 0196, the reduced product is stored in an SIMD destination register, i.e. physical register file unit(s) 458, which are architectural registers).
As per Claim 2, Gueron discloses the processor of claim 1, wherein the fixed polynomial is g(x)=1+X+x^2+x^7+x^128 (Figure 18C and Paragraph 0183, the irreducible polynomial in GF(256) is p=x128+x7+x2+x+1).
As per Claim 3, Gueron discloses the processor of claim 1, wherein: the product includes a high part including high-order bits of the product and a low part including low-order bits of the product, the modular reduction circuit is configured to compute a first result equivalent to a carryless multiplication of the high part and the fixed polynomial, wherein the modular reduction circuit includes: shift circuitry that applies multiple different bit position shifts to the high part of the product consistent with asserted bits in the fixed polynomial, and bitwise exclusive OR (XOR) circuitry that logically combines multiple instances of the high part of the product having different respective bit position shifts applied by the shift circuitry (Figure 18A and Paragraphs 0179-0180, the product q[15:0] includes high part qH and low part qL, wherein reduction circuit includes shifting circuitry and XOR circuitry to perform shifts of the high part in accordance with the irreducible polynomial and XOR operations on the shifted high parts, i.e. qH<<4 XOR qH<<3 XOR qH<<1 XOR qH in accordance with polynomial p=x8+x4+x3+x+1 in order to compute q mod p).
As per Claim 4, Gueron discloses the processor of claim 3, wherein: the shift circuitry is further configured to apply multiple different bit position shifts to the high part of the first result consistent with asserted bits in the fixed polynomial; the bitwise exclusive OR (XOR) circuitry is further configured to logically combine multiple instances of the high part of the first result having different respective bit position shifts applied by the shift circuitry to obtain a second result, wherein the bitwise XOR circuitry generates the reduced product based on the first result, the second result, and the low part of the product (Figure 18A and Paragraphs 0179-0180, the first result T includes high part TH and low part TL, wherein reduction circuit includes shifting circuitry and XOR circuitry to perform shifts of the high part in accordance with the irreducible polynomial and XOR operations on the shifted high parts, i.e. TH<<4 XOR TH<<3 XOR TH<<1 XOR TH in accordance with polynomial p=x8+x4+x3+x+1 in order to compute the reduced product q mod p =
T
L
⨁
T
H
≪
4
⨁
T
H
≪
3
⨁
T
H
≪
1
⨁
T
H
wherein T is based on the low part of the product and the shifted versions of the high part of the product).
As per Claim 5, Gueron discloses the processor of claim 3, wherein the bitwise exclusive OR (XOR) circuitry includes at least two stages of bitwise XOR circuitry (Figure 18A, reduction circuit includes XOR stage 1825 and XOR stage 1835).
As per Claim 7, Gueron discloses the processor of claim 1, wherein: the carryless multiplier is a first multiply-multiply engine; the execution unit includes a second multiply-multiply engine, wherein the first and second multiply-multiply engines have a first data width; the operands include first and second operands having a second data width that is an integer multiple of the first data width; and the first and second multiply-multiply engines are configured to multiply subsets of the first and second operands in parallel (Figure 19B and Paragraph 0191, a plurality of carry-less multiply engines 1902 each compute 8-bit multiplication on subsets of the input operands 1920 and 1922 in parallel, i.e. g0∙h0 and g1∙h1, each input operand comprising e.g. 64 bits, i.e. eight 8-bit elements).
As per Claim 8, Gueron discloses a data processing system, comprising: multiple processors, including the processor of claim 1; a shared memory; and a system interconnect communicatively coupling the shared memory and the multiple processors (Figures 5-7 and Paragraphs 0026-0028, 0123-0124, 0129-0131 and 0138, the system may comprise multiple cores/processors comprising shared cache and/or memory coupled to the processors via a bus or interconnect).
As per Claim 9, Gueron discloses a method of data processing in a processor, said method comprising: fetching, by an instruction fetch unit, instructions to be executed by the processor wherein the instructions include a Galois multiply instruction (Figures 4A-B, 19A and Paragraph 0188, apparatus 1901 is part of architecture core 490 which includes instruction fetch unit 438);
and based on receiving the Galois multiply instruction, an execution unit of the processor executing the Galois multiply instruction (Figures 19A-B, 20D and Paragraphs 0188-0190 and 0196, apparatus 1901 is implemented by e.g. execution unit(s) 462 for executing a binary finite field multiplication instruction over a Galois field);
wherein the executing includes: multiplying, by a carryless multiplier, operands of the Galois multiply instruction to generate a product (Figure 19A and Paragraph 0189, carry-less multiplier 1913);
a modular reduction circuit receiving the product and determining, based on a logical combination of the product and a fixed polynomial, a reduced product having a fewer number of bits than the product (Figures 18A-19A and Paragraphs 0188-0189 and 0196, 15-bit product is reduced modulo an irreducible polynomial to an 8-bit reduced product via reduction unit 1917, which implements XORs and shifts, i.e. logical combinations, of the product and the irreducible polynomial in order to perform reduction, i.e. via one of 1801-1804);
and storing the reduced product to an architected register file of the processor as a result of the Galois multiply instruction (Figures 4B, 19A, 20D and Paragraphs 0091, 0117, 0189 and 0196, the reduced product is stored in an SIMD destination register, i.e. physical register file unit(s) 458, which are architectural registers).
As per Claims 10-12 and 14, they recite method(s) comprising the same limitations as recited in Claims 2-4 and 7. Thus, Claims 10-12 and 14 are rejected under the same rationale as presented in the rejection(s) of Claims 2-4 and 7 above.
As per Claim 15, Gueron discloses a design structure tangibly embodied in a machine-readable storage device for designing, manufacturing, or testing an integrated circuit (Paragraph 0205, machine-readable media comprising design data such as HDL);
the design structure comprising: a processor, including: an instruction fetch unit that fetches instructions to be executed (Figures 4A-B, 19A and Paragraph 0188, apparatus 1901 is part of architecture core 490 which includes instruction fetch unit 438);
an architected register file including a plurality of registers for storing source and destination operands (Figures 4B, 19A and Paragraphs 0091, 0117 and 0189, execution engine unit 450 includes physical register files 458 which are architectural registers for storing source and destination operands);
and an execution unit for executing a Galois multiply instruction (Figures 19A-B, 20D and Paragraphs 0188-0190 and 0196, apparatus 1901 is implemented by e.g. execution unit(s) 462 for executing a binary finite field multiplication instruction over a Galois field);
wherein the execution unit includes: a carryless multiplier configured to multiply operands of the Galois multiply instruction to generate a product (Figure 19A and Paragraph 0189, carry-less multiplier 1913);
and a modular reduction circuit configured to receive the product and determine, based on a logical combination of the product and a fixed polynomial, a reduced product having a fewer number of bits than the product (Figures 18A-19A and Paragraphs 0188-0189 and 0196, 15-bit product is reduced modulo an irreducible polynomial to an 8-bit reduced product via reduction unit 1917, which implements XORs and shifts, i.e. logical combinations, of the product and the irreducible polynomial in order to perform reduction, i.e. via one of 1801-1804);
wherein the execution unit is configured to store the reduced product to the architected register file as a result of the Galois multiply instruction (Figure 20D and Paragraphs 0117, 0189 and 0196, the reduced product is stored in an SIMD destination register, i.e. physical register file unit(s) 458, which are architectural registers).
As per Claims 16-18 and 20, they recite device(s) comprising the same limitations as recited in Claims 2-4 and 7. Thus, Claims 16-18 and 20 are rejected under the same rationale as presented in the rejection(s) of Claims 2-4 and 7 above.
Allowable Subject Matter
Claims 6, 13 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Suresh et al. (US 10,496,373) – discloses a 128-bit carry-less multiplier and reduction circuit, wherein a 255-bit Galois multiplication product is reduced to a 128-bit result
Bradbury (US 2014/0208079) – discloses a Vector Galois Field Multiply Sum and Accumulate instruction which performs carryless multiplication and uses a fetch unit and architected registers for storing input and output operands
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW SANDIFER whose telephone number is (571)270-5175. The examiner can normally be reached Mon-Fri 9:30am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MATTHEW D SANDIFER/Primary Examiner, Art Unit 2151