Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Priority Acknowledgment is made of applicant's claim for foreign priority based on an application IN 2022410444 41 filed on 08/03/22 . It is noted, however, that applicant has not filed a certified copy of the foreign priority application as required by 37 CFR 1.55. Claim Objections Claims 1-20 objected to because of the following informalities. Claim 1 line 7, and claim 4-6 each recite “the arithmetic operation”. This limitation lacks antecedent basis. Antecedent basis is present for “the arithmetic operation execution circuitry”. The first instance of an arithmetic operation performed by “the arithmetic operation circuitry” should recite “an arithmetic operation”. Claims 2-7 inherit the same deficiency as claim 1 based on dependence. Claim 8, and 11- 13, and claim 16, and 18-19 each recite substantially the same limitation and are objected to for the same reasons. Claims 9-14 inherit the same deficiency as claim 8 based on dependence. Claims 16-20 inherit the same deficiency as claim 16 based on dependence. Claim 1 line 11, and claim 5-6 each recite “the execution circuitry”. This limitation lacks antecedent basis. Antecedent basis is present for “the arithmetic operation execution circuitry”. Claims 2-7 inherit the same deficiency as claim 1 based on dependence. Claim 8, and 12-13 each recite substantially the same limitation and are objected to for the same reasons. Claims 9-14 inherit the same deficiency as claim 8 based on dependence. Claim 2-3, claim 9-10 recite “the first source operand”. This limitation lacks antecedent basis. Antecedent basis is present for “the first packed data source operand”. Appropriate correction is required. Double Patenting The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg , 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman , 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi , 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum , 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Voge l , 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington , 418 F.2d 528, 163 USPQ 644 (CCPA 1969). A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13. The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA/25, or PTO/AIA/26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer . Claims 1-3, 8-10, and 15 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 8-10, and 15 of copending Application No. 17958370 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other. Claims 1-3, 8-10, and 15 of the reference application would anticipate claims 1-3, 8-10, and 15 of the present application. The floating point scale operation of the reference application is an anticipatory arithmetic operation of the present application. This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. See representative claim comparison below. LINK Excel.Sheet.12 https://usptogov-my.sharepoint.com/personal/elarocque_uspto_gov/Documents/Documents/Patents%20being%20Examined/17958373/17958373%20dp.xlsx Sheet1!R1C1:R5C2 \a \f 4 \h 17958373 17958370 1. An apparatus comprising: 1. An apparatus comprising: decode circuitry to decode an instance of a single instruction, the single instruction to include fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on 8-bit floating point data elements in that data element position in 8-bit floating point format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand; and decode circuitry to decode an instance of a single instruction, the single instruction to include fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a FP8 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a FP8 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand; and the execution circuitry to execute the decoded instruction according to the opcode. the execution circuitry to execute the decoded instruction according to the opcode. Claims 1 and 15 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 and 16 of the Reference Application above (17958469) . Although the claims at issue are not identical, they are not patentably distinct from each other. Claims 1 and 16 of the reference application would anticipate claims 1 and 15 of the present application. The fused multiply-accumulate operation of the reference application is an anticipatory arithmetic operation of the present application. See representative claim mapping below with respect to the Reference Application. LINK Excel.Sheet.12 https://usptogov-my.sharepoint.com/personal/elarocque_uspto_gov/Documents/Documents/Patents%20being%20Examined/17958373/17958373%20dp.xlsx Sheet1!R7C1:R11C2 \a \f 4 \h 17958373 17958369 1. An apparatus comprising: 1. An apparatus comprising: decode circuitry to decode an instance of a single instruction, the single instruction to include fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on 8-bit floating point data elements in that data element position in 8-bit floating point format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand; and decoder circuitry to decode a single instruction, the single instruction to include fields for an opcode, an identification of location of a packed data source/destination operand (a first packed data source operand), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third packed data source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, 4 bits for an exponent, and three bits for a fraction; and the execution circuitry to execute the decoded instruction according to the opcode. e xecution circuitry to execute the decoded single instruction according to the opcode. LINK Excel.Sheet.12 https://usptogov-my.sharepoint.com/personal/elarocque_uspto_gov/Documents/Documents/Patents%20being%20Examined/17958373/17958373%20dp.xlsx Sheet1!R7C1:R11C2 \a \f 4 \h This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented. Claim 1-4, 15, and 8 are rejected on the ground of nonstatutory double patenting a s being unpatentable over claim 1-4, 17, and 25 respectively of U.S. Patent No. 12517728 (reference Patent) . Although the claims at issue are not identical, they are not patentably distinct from each oth er . . Claims 1 and 16 of the reference patent would anticipate claims 1 and 15 of the present application. The prefix sum operation of the reference application is an anticipatory arithmetic operation of the present application. See representative claim mapping below with respect to the Reference Patent. LINK Excel.Sheet.12 https://usptogov-my.sharepoint.com/personal/elarocque_uspto_gov/Documents/Documents/Patents%20being%20Examined/17958373/17958373%20dp.xlsx Sheet1!R13C1:R17C2 \a \f 4 \h 17958373 12517728 1. An apparatus comprising: 1. An apparatus comprising: decode circuitry to decode an instance of a single instruction, the single instruction to include fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on 8-bit floating point data elements in that data element position in 8-bit floating point format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand; and decoder circuitry to decode an instance of a single instruction, the single instruction to include fields for an opcode, one or fields to reference a first source operand, one or fields to reference a second source operand, one or fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least: perform a prefix sum by for each non-masked data element position of the second source operand adding a data element of that data element position to each data element of preceding data element positions and adding at least one data element of a defined data element position of the first source operand, and store each prefix sum for each data element position of the second source operand into a corresponding data element position of the destination operand; and the execution circuitry to execute the decoded instruction according to the opcode. execution circuitry configured to execute the decoded instruction according to the opcode. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.— The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 1 lines 6-7 recite “the identified packed data source operands”. This limitation lacks antecedent basis. It is unclear whether this recites multiple of the “first packed data source operand”, or multiple of the “second packed data source operand” or multiple of “the first packed data source operand” and multiple of “the second source operand” or other. For purposes of examination, Examiner interprets as the identified first packed data source operand and the identified second packed data source operand. Claims 2-7 inherit the same deficiency as claim 1 based on dependence. Claim 8 and claim 16 each recite substantially the same limitation and are objected to for the same reasons. Claims 9-15 inherit the same deficiency as claim 8 based on dependence. Claims 17-20 inherit the same deficiency as claim 16 based on dependence. Claim 5-6, claim 12-13, and claim 18-19 each recite “the 8-bit floating point data”. This limitation lacks antecedent basis. It is unclear whether this refers to the “8-bit floating point data elements” or the “8-bit floating point format” or other. For purposes of examination, Examiner interprets as “the 8-bit floating point data elements”. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Claim s 1-3, 5-6, 8-10, 12-13, 15, and 18-20 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by US 20180308207 A1 Appu et al., (hereinafter “ Appu ”) . Regarding claim 1, Appu teaches the following: decode circuitry to decode an instance of a single instruction, the single instruction to include fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand and an identification of location of a packed data destination operand ([0083], fig 22 2240, for decode circuitry to decode an instance of a single instruction, fig 22-2210 128-bit instruction, opcode 2212, [0247] SIMD instructions includes various data elements stored as a packed data type, SRC0 2220 for identification of a location of a first packed data source operand, SRC1 2222 for identification of a location of a second packed data source operand, DEST 2218 for identification of location of a packed data destination operand) , wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on 8-bit floating point data elements in that data element position in 8-bit floating point format ([0253] for each format, instruction opcode 2212 defines the operation that the execution unit is to perform, [0247] various packed data elements include 8-bit data elements, with different vector widths possible, and [0154] for FP8 (8-bit floating point) operations, sorting for the arithmetic operations) and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand; and the execution circuitry to execute the decoded instruction according to the opcode ([0253], fig 2 2240) . Regarding claim 2, in addition to the teachings addressed in the claim 1 analysis, Appu teaches the following: wherein the field for the identification of the first source operand is to identify a vector register ([0247]). Regarding claim 3, in addition to the teachings addressed in the claim 1 analysis, Appu teaches the following: wherein the field for the identification of the first source operand is to identify a memory location ([0247] register). Regarding claim 5, in addition to the teachings addressed in the claim 1 analysis, Appu teaches the following: wherein the execution circuitry is to upscale the 8-bit floating point data prior to the arithmetic operation ([0157-0158], [0309], e.g., FP64, FP32, FP16, INT32, INT16). Regarding claim 6, in addition to the teachings addressed in the claim 5 analysis, Appu teaches the following: wherein the execution circuitry is to downscale the 8-bit floating point data prior to the arithmetic operation ([0157-0158], [0309], [0208], 4-bit integer). Regarding claim 8 , Appu teaches the following: memory to store an instance of a single instruction ([0141]); decode circuitry to decode an instance of a single instruction, the single instruction to include fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand and an identification of location of a packed data destination operand ([0083], fig 22 2240, for decode circuitry to decode an instance of a single instruction, fig 22-2210 128-bit instruction, opcode 2212, [0247] SIMD instructions includes various data elements stored as a packed data type, SRC0 2220 for identification of a location of a first packed data source operand, SRC1 2222 for identification of a location of a second packed data source operand, DEST 2218 for identification of location of a packed data destination operand) , wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on 8-bit floating point data elements in that data element position in 8-bit floating point format ([0253] for each format, instruction opcode 2212 defines the operation that the execution unit is to perform, [0247] various packed data elements include 8-bit data elements, with different vector widths possible, and [0154] for FP8 (8-bit floating point) operations, sorting for the arithmetic operations) and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand; and the execution circuitry to execute the decoded instruction according to the opcode ([0253], fig 2 2240) . Regarding claim 9, in addition to the teachings addressed in the claim 8 analysis, Appu teaches the following: wherein the field for the identification of the first source operand is to identify a vector register ([0247]). Regarding claim 10, in addition to the teachings addressed in the claim 8 analysis, Appu teaches the following: wherein the field for the identification of the first source operand is to identify a memory location ([0247] register). Regarding claim 12, in addition to the teachings addressed in the claim 8 analysis, Appu teaches the following: wherein the execution circuitry is to upscale the 8-bit floating point data prior to the arithmetic operation ([0157-0158], [0309], e.g., FP64, FP32, FP16, INT32, INT16). Regarding claim 13, in addition to the teachings addressed in the claim 8 analysis, Appu teaches the following: wherein the execution circuitry is to downscale the 8-bit floating point data prior to the arithmetic operation ([0157-0158], [0309], [0208], 4-bit integer). Claims 15, and 18-19 are directed to a method that would be practiced by the apparatus of claims 1, and 5-6 respectively as configured. All steps recited in the method of claims 15,and 18-19 would be practiced by the apparatus of claims 1, and 5-6 respectively as configured. The claim 1, and 5-6 analysis applies equally to claims 15, and 18-19 respectively. Regarding claim 20, in addition to the teachings addressed in the claim 15 analysis, Appu teaches the following: translating the single instruction to one or more instructions of a different instruction set architecture, wherein the executing the decoded instruction according to the opcode comprises executing the one or more instructions of the different instruction set architecture ([0212]) . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim s 4, 11, and 16 are rejected under 35 U.S.C. 1 03 as being unpatentable over Appu . Regarding claim 4, in addition to the teachings addressed in the claim 1 analysis, Appu discloses conventional systems provide support for converting operands to a common format, and wherein Appu further discloses supporting mixed precision fused multiply-accumulate (FMAC) operations that include different precision and format ([0157]). Appu does not, however, explicitly disclose wherein the arithmetic operation of claim 1 is one of addition, multiplication, division, and subtraction. However, it would have been obvious to one of ordinary skill in the art before the effective filing data to use Appu’s decode circuitry, and execution circuitry to decode an instance of a single instruction as in claim 1 wherein the opcode is to indicate the arithmetic operation be the FMAC as disclosed by Appu executing in the 8-bit floating point format as disclosed by Appu . It would have been obvious to achieve the benefit of supporting FMAC in various precisions, wherein the precision includes 8-bit floating point format ([0157], [0154]). Regarding claim 11, in addition to the teachings addressed in the claim 8 analysis, Appu discloses conventional systems provide support for converting operands to a common format, and wherein Appu further discloses supporting mixed precision fused multiply-accumulate (FMAC) operations that include different precision and format ([0157]). Appu does not, however, explicitly disclose wherein the arithmetic operation of claim 8 is one of addition, multiplication, division, and subtraction. However, it would have been obvious to one of ordinary skill in the art before the effective filing data to use Appu’s decode circuitry, and execution circuitry to decode an instance of a single instruction as in claim 1 wherein the opcode is to indicate the arithmetic operation be the FMAC as disclosed by Appu executing in the 8-bit floating point format as disclosed by Appu . It would have been obvious to achieve the benefit of supporting FMAC in various precisions, wherein the precision includes 8-bit floating point format ([0157], [0154]). Claim 16 is directed to a method that would be practiced by the apparatus of claim 4 as configured. All steps recited in the method of claim 16 would be practiced by the apparatus of claim 4 as configured. The claim 4 analysis applies equally to claim 16. Claim s 7, 14, and 17 are rejected under 35 U.S.C. 10 3 as being unpatentable over Appu in view of P. Micikevicius et al., FP8 Formats for Deep Learning , arXiv:2209.05433v2 [ cs.LG ] 29 Sep 2022 (hereinafter “ Micikevicius ”). Regarding claim 7, Appu discloses the claim 1 limitations. Appu discloses the 8-bit floating point format, but is silent as to the allocation of bit fields. However in the same field of endeavor, Micikevicius discloses an 8-bit floating point binary interchange format consisting of two encodings, the E4M3 (4-bit exponent and 3-bit mantissa), and the E5M2 (5-bit exponent and 2-bit mantissa) (abstract). It would have been obvious to one of ordinary skill in the art before the effective filing date to choose the E4M3 format for the allocation of the bit fields in the 8-bit floating point format to achieve the desired dynamic range for the required arithmetic operation (Section 2 first paragraph). Regarding claim 14, Appu discloses the claim 8 limitations. Appu discloses the 8-bit floating point format, but is silent as to the allocation of bit fields. However in the same field of endeavor, Micikevicius discloses an 8-bit floating point binary interchange format consisting of two encodings, the E4M3 (4-bit exponent and 3-bit mantissa), and the E5M2 (5-bit exponent and 2-bit mantissa) (abstract). It would have been obvious to one of ordinary skill in the art before the effective filing date to choose the E4M3 format for the allocation of the bit fields in the 8-bit floating point format to achieve the desired dynamic range for the required arithmetic operation (Section 2 first paragraph). Claim 17 is directed to a method that would be practiced by the apparatus of claim 7 as configured. All steps recited in the method of claim 17 would be practiced by the apparatus of claim 7 as configured. The claim 7 analysis applies equally to claim 17. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20200387351 A1 Agrawal et al., (hereinafter “Agrawal”) discloses a fused multiply-multiply-accumulate (FMMA) unit executed in a single instruction that includes FP8 operands ([0012], [0024-0025]). US 20200225948 A1 Sim et al., (hereinafter “Sim”) discloses an apparatus for compressing floating-point values including fetching instructions from a memory, decoding a single instruction, performing operations on packed data, and wherein the floating point operands include fp8 (abstract, [0007], [0081], [0089], claim 6). Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289 . The examiner can normally be reached 10:00am - 1200pm, 2:00pm - 8pm ET M-F . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached at 571 272 3702 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /EMILY E LAROCQUE/ Primary Examiner, Art Unit 2182