DETAILED ACTION
Status of Application
Claims 21-43 are pending in the present application.
The Preliminary Amendment filed 12/27/2024 has been entered.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/30/2025, 04/07/2025, 12/31/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21, 33, and 40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 2, 9, and 16, respectively, of U.S. Patent No. 12,135,968 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims perform the same function but with different terminology.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 21, 25-27, 29-30, 33, 35-36, 39-40, and 42 is/are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke et al (hereinafter Heinecke), US 20190079762 A1, in view of Das Sarma, US 20200349216 A1.
Referring to claims 21 and 33, taking claim 21 as exemplary, Heinecke discloses an apparatus comprising:
decoder circuitry to decode a single instruction [fig. 1, element 109], the single instruction having a first field to identify a first source single instruction, multiple data (SIMD) register [fig. 3B, field 326; paragraph 72, “Here, the source vector locations can be either in memory or in registers”; paragraph 45, “In some embodiments, computing system 100 is a SIMD processor”], a second field to identify a second source SIMD register [fig. 3B, field 328], and a third field to identify a destination SIMD register [fig. 3B, field 324]; and
execution circuitry [fig. 1, element 117] to perform operations corresponding to the single instruction, including to:
convert to a first plurality of floating-point data elements [fig. 3B, see pseudocode “convert_fp32_to_bfloat16” and fig. 3C];
convert to a second plurality floating-point data elements [fig. 3B, see pseudocode “convert_fp32_to_bfloat16” and fig. 3C]; and
store the first and second pluralities of floating-point data elements in corresponding data element positions of the destination SIMD register [fig. 2D, operating on first and second sources with execution circuitry to store in destination 278].
Heinecke discloses converting to 16-bit floating point [fig. 3C] data but does not explicitly disclose the first source SIMD register to store a first plurality of half-precision floating-point data elements, the second source SIMD register to store a second plurality of half-precision floating-point data elements;
converting the first plurality of half-precision floating-point data elements to a first plurality of 8-bit floating-point data elements;
converting the second plurality of half-precision floating-point data elements to a second plurality of 8-bit floating-point data elements.
However, Das Sarma discloses the first source SIMD register to store a first plurality of half-precision floating-point data elements, the second source SIMD register to store a second plurality of half-precision floating-point data elements [paragraphs 36, 40, 41, 65, 72, fig. 3, “With respect to floating-point formats, the node is configurable to operate in multiple formats such as 8-bit, 16-bit, and 32-bit formats”; “Depending on the computational goal, a different format may be used to represent a number value”; “data can be read to and from memory into and from load registers 305 and post-processing unit register file 307. The connection to the registers allows data values to be quickly stored in a register, for example, as arguments for a matrix or vector computation”, “In some embodiments, a 16-bit floating-point operand is represented with a single bit for a sign bit, 8-bits for the exponent, and 7-bits for the mantissa”; “the elements of the vector may be converted/formatted to 8-bit, 16-bit, or 32-bit elements depending on the precision needed”];
converting the first plurality of half-precision floating-point data elements to a first plurality of 8-bit floating-point data elements [paragraphs 36, 40, 41, 65, 72, fig. 3, “With respect to floating-point formats, the node is configurable to operate in multiple formats such as 8-bit, 16-bit, and 32-bit formats”; “Depending on the computational goal, a different format may be used to represent a number value”; “data can be read to and from memory into and from load registers 305 and post-processing unit register file 307. The connection to the registers allows data values to be quickly stored in a register, for example, as arguments for a matrix or vector computation”, “In some embodiments, a 16-bit floating-point operand is represented with a single bit for a sign bit, 8-bits for the exponent, and 7-bits for the mantissa”; “the elements of the vector may be converted/formatted to 8-bit, 16-bit, or 32-bit elements depending on the precision needed”];
converting the second plurality of half-precision floating-point data elements to a second plurality of 8-bit floating-point data elements [paragraphs 36, 40, 41, 65, 72, fig. 3, “With respect to floating-point formats, the node is configurable to operate in multiple formats such as 8-bit, 16-bit, and 32-bit formats”; “Depending on the computational goal, a different format may be used to represent a number value”; “data can be read to and from memory into and from load registers 305 and post-processing unit register file 307. The connection to the registers allows data values to be quickly stored in a register, for example, as arguments for a matrix or vector computation”, “In some embodiments, a 16-bit floating-point operand is represented with a single bit for a sign bit, 8-bits for the exponent, and 7-bits for the mantissa”; “the elements of the vector may be converted/formatted to 8-bit, 16-bit, or 32-bit elements depending on the precision needed”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Das Sarma in the apparatus of Heinecke to implement, the first source SIMD register to store a first plurality of half-precision floating-point data elements, the second source SIMD register to store a second plurality of half-precision floating-point data elements; converting the first plurality of half-precision floating-point data elements to a first plurality of 8-bit floating-point data elements; converting the second plurality of half-precision floating-point data elements to a second plurality of 8-bit floating-point data elements, in order to provide flexibility of the system, which improves bandwidth and performance of the system [Das Sarma, paragraph 36].
Referring to claims 25 and 35, taking claim 25 as exemplary, the modified Heinecke discloses the apparatus of claim 21, wherein the execution circuitry is to convert a half- precision floating-point data element of the first plurality of half-precision floating-point data elements to an 8-bit floating-point data element of the first plurality of 8-bit floating-point data elements based on a value in a register [Heinecke, paragraph 74, “advantageously perform rounding of normal numbers and considers a rounding_bias. The code illustrates that the format-convert instruction has an improved rounding behavior than just truncating. The rounding behavior of disclosed embodiments facilitates more accurate computation than conversion by truncation. In some embodiments, execution circuitry adheres to rounding behavior according to rounding rules promulgates as IEEE 754, for example, ‘NE’ which indicates rounding to nearest even. In some embodiments, the rounding behavior is specified by the instruction, for example by including a suffix, ‘NE,’ in the opcode to indicate rounding to Nearest Even. In other embodiments, the rounding behavior adopts a default behavior, like ‘NE.’ In yet other embodiments, the rounding behavior is controlled by an architectural model-specific register (MSR) that is configured by software”].
Referring to claims 26 and 36, taking claim 26 as exemplary, the modified Heinecke discloses the apparatus of claim 25, wherein the value is an 8-bit value in the register [Heinecke, figs. 5A, 5B, values 518, 568].
Referring to claims 27, 37, and 42, taking claim 27 as exemplary, the modified Heinecke discloses the apparatus of claim 21, wherein the execution circuitry is to convert the first plurality of half-precision floating-point data elements to corresponding ones of the first plurality of 8-bit floating-point data elements, and convert the second plurality of half-precision floating-point data elements to corresponding ones of the second plurality of 8-bit floating-point data elements, using an associated value from a register [Heinecke, paragraph 74, “advantageously perform rounding of normal numbers and considers a rounding_bias. The code illustrates that the format-convert instruction has an improved rounding behavior than just truncating. The rounding behavior of disclosed embodiments facilitates more accurate computation than conversion by truncation. In some embodiments, execution circuitry adheres to rounding behavior according to rounding rules promulgates as IEEE 754, for example, ‘NE’ which indicates rounding to nearest even. In some embodiments, the rounding behavior is specified by the instruction, for example by including a suffix, ‘NE,’ in the opcode to indicate rounding to Nearest Even. In other embodiments, the rounding behavior adopts a default behavior, like ‘NE.’ In yet other embodiments, the rounding behavior is controlled by an architectural model-specific register (MSR) that is configured by software”; figs. 5A, 5B, values 518, 568].
Referring to claims 29 and 39, taking claim 29 as exemplary, the modified Heinecke discloses the apparatus of claim 21, wherein the first and second source SIMD registers are 128-bit registers, and wherein the destination SIMD register is a 128-bit register [Heinecke, paragraph 40, “all operands, be they source or destination operands, can be stored in the same type of vector registers, be they 128-bit, 256-bit, or 512-bit vector registers”].
Referring to claim 30, the modified Heinecke discloses the apparatus of claim 21, wherein the first and second source SIMD registers are 64-bit registers, and wherein the destination SIMD register is a 64-bit register [Heinecke, paragraph 94, alternative embodiments may support more, less and/or different vector operand sizes (with more, less, or different data element widths].
Referring to claim 40, Heinecke discloses a system comprising:
a dynamic random access memory (DRAM) [paragraph 200]; and
a processor coupled with the DRAM [fig. 12], the processor comprising:
decoder circuitry to decode a single instruction [fig. 1, element 109], the single instruction having a first field to identify a first source single instruction, multiple data (SIMD) register [fig. 3B, field 326; paragraph 72, “Here, the source vector locations can be either in memory or in registers”; paragraph 45, “In some embodiments, computing system 100 is a SIMD processor”], a second field to identify a second source SIMD register [fig. 3B, field 328], and a third field to identify a destination SIMD register [fig. 3B, field 324]; and
execution circuitry [fig. 1, element 117] to perform operations corresponding to the single instruction, including to:
convert to a first plurality of floating-point data elements [fig. 3B, see pseudocode “convert_fp32_to_bfloat16” and fig. 3C];
convert to a second plurality floating-point data elements [fig. 3B, see pseudocode “convert_fp32_to_bfloat16” and fig. 3C]; and
store the first and second pluralities of floating-point data elements in corresponding data element positions of the destination SIMD register [fig. 2D, operating on first and second sources with execution circuitry to store in destination 278].
Heinecke discloses converting to 16-bit floating point [fig. 3C] data but does not explicitly disclose the first source SIMD register to store a first plurality of half-precision floating-point data elements, the second source SIMD register to store a second plurality of half-precision floating-point data elements;
converting the first plurality of half-precision floating-point data elements to a first plurality of 8-bit floating-point data elements;
converting the second plurality of half-precision floating-point data elements to a second plurality of 8-bit floating-point data elements.
However, Das Sarma discloses the first source SIMD register to store a first plurality of half-precision floating-point data elements, the second source SIMD register to store a second plurality of half-precision floating-point data elements [paragraphs 36, 40, 41, 65, 72, fig. 3, “With respect to floating-point formats, the node is configurable to operate in multiple formats such as 8-bit, 16-bit, and 32-bit formats”; “Depending on the computational goal, a different format may be used to represent a number value”; “data can be read to and from memory into and from load registers 305 and post-processing unit register file 307. The connection to the registers allows data values to be quickly stored in a register, for example, as arguments for a matrix or vector computation”, “In some embodiments, a 16-bit floating-point operand is represented with a single bit for a sign bit, 8-bits for the exponent, and 7-bits for the mantissa”; “the elements of the vector may be converted/formatted to 8-bit, 16-bit, or 32-bit elements depending on the precision needed”];
converting the first plurality of half-precision floating-point data elements to a first plurality of 8-bit floating-point data elements [paragraphs 36, 40, 41, 65, 72, fig. 3, “With respect to floating-point formats, the node is configurable to operate in multiple formats such as 8-bit, 16-bit, and 32-bit formats”; “Depending on the computational goal, a different format may be used to represent a number value”; “data can be read to and from memory into and from load registers 305 and post-processing unit register file 307. The connection to the registers allows data values to be quickly stored in a register, for example, as arguments for a matrix or vector computation”, “In some embodiments, a 16-bit floating-point operand is represented with a single bit for a sign bit, 8-bits for the exponent, and 7-bits for the mantissa”; “the elements of the vector may be converted/formatted to 8-bit, 16-bit, or 32-bit elements depending on the precision needed”];
converting the second plurality of half-precision floating-point data elements to a second plurality of 8-bit floating-point data elements [paragraphs 36, 40, 41, 65, 72, fig. 3, “With respect to floating-point formats, the node is configurable to operate in multiple formats such as 8-bit, 16-bit, and 32-bit formats”; “Depending on the computational goal, a different format may be used to represent a number value”; “data can be read to and from memory into and from load registers 305 and post-processing unit register file 307. The connection to the registers allows data values to be quickly stored in a register, for example, as arguments for a matrix or vector computation”, “In some embodiments, a 16-bit floating-point operand is represented with a single bit for a sign bit, 8-bits for the exponent, and 7-bits for the mantissa”; “the elements of the vector may be converted/formatted to 8-bit, 16-bit, or 32-bit elements depending on the precision needed”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Das Sarma in the system of Heinecke to implement, the first source SIMD register to store a first plurality of half-precision floating-point data elements, the second source SIMD register to store a second plurality of half-precision floating-point data elements; converting the first plurality of half-precision floating-point data elements to a first plurality of 8-bit floating-point data elements; converting the second plurality of half-precision floating-point data elements to a second plurality of 8-bit floating-point data elements, in order to provide flexibility of the system, which improves bandwidth and performance of the system [Das Sarma, paragraph 36].
Claim(s) 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke, in view of Das Sarma, as applied to claim 21 above, and further in view of Lai, US 20110082999 A1.
Referring to claim 22, the modified Heinecke does not explicitly disclose the apparatus of claim 21, wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register.
However, Lai discloses wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register [paragraph 5, “In the field of computer architecture, the term data endianness is the interpretation of data byte order for putting a sequence of byte data into a destination storage (such as register, memory, or data bus) that has data width more than one byte. The big-endian order and the little-endian order are the most common”; “the data byte D0 from the lowest address of the memory 150 is put on the least significant byte (LSB) of the destination storage, while data bytes with higher addresses go toward the most significant side of the destination storage. According to the big-endian byte order 120, the data byte D0 from the lowest address of the memory 150 is put on the most significant byte (MSB) of the destination storage, while data bytes with higher addresses go toward the least significant side of the destination storage”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Lai in the apparatus of the modified Heinecke to implement, wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, in order to allow more flexible data endianness management and easier software development [Lai, paragraph 13].
Referring to claim 23, the modified Heinecke discloses the apparatus the apparatus of claim 22, wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register [Lai, paragraph 5, “In the field of computer architecture, the term data endianness is the interpretation of data byte order for putting a sequence of byte data into a destination storage (such as register, memory, or data bus) that has data width more than one byte. The big-endian order and the little-endian order are the most common”; “the data byte D0 from the lowest address of the memory 150 is put on the least significant byte (LSB) of the destination storage, while data bytes with higher addresses go toward the most significant side of the destination storage. According to the big-endian byte order 120, the data byte D0 from the lowest address of the memory 150 is put on the most significant byte (MSB) of the destination storage, while data bytes with higher addresses go toward the least significant side of the destination storage”].
Claim(s) 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke, in view of Das Sarma, as applied to claim 21 above, and further in view of Van Dalen et al (hereinafter Van Dalen), US 20160328233 A1.
Referring to claim 24, the modified Heinecke does not explicitly disclose the apparatus of claim 21, wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa.
However, Van Dalen discloses wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa [paragraph 45, “In some embodiments, a custom or internal floating point format may optionally be used. The custom or internal floating point format point may not be a standard floating-point format, such as 16-bit half precision, 32-bit single precision, or the like. Rather, the custom or internal floating point format point may optionally be a non-standard floating point format. In some embodiments, the floating-point format may have less than 16-bits, less than 12-bits, or 8-bit or less”; “In some embodiments, the filter coefficients may optionally have a floating point format in which they each have a mantissa, a sign, and an exponent or shift factor”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Van Dalen in the apparatus of the modified Heinecke to implement, wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa, in order to help to eliminate many data alignments and thereby improve overall performance [Van Dalen, paragraph 33].
Claim(s) 31, 34, and 41 is/are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke, in view of Das Sarma, as applied to claims 21, 40, and 33 above, and further in view of Lai, US 20110082999 A1 and Van Dalen et al (hereinafter Van Dalen), US 20160328233 A1.
Referring to claim 31, the modified Heinecke discloses the apparatus of claim 21, wherein the execution circuitry is to convert a half-precision floating-point data element of the first plurality of half-precision floating-point data elements to an 8-bit floating-point data element of the first plurality of 8-bit floating-point data elements based on a value in a register [Heinecke, paragraph 74, “advantageously perform rounding of normal numbers and considers a rounding_bias. The code illustrates that the format-convert instruction has an improved rounding behavior than just truncating. The rounding behavior of disclosed embodiments facilitates more accurate computation than conversion by truncation. In some embodiments, execution circuitry adheres to rounding behavior according to rounding rules promulgates as IEEE 754, for example, ‘NE’ which indicates rounding to nearest even. In some embodiments, the rounding behavior is specified by the instruction, for example by including a suffix, ‘NE,’ in the opcode to indicate rounding to Nearest Even. In other embodiments, the rounding behavior adopts a default behavior, like ‘NE.’ In yet other embodiments, the rounding behavior is controlled by an architectural model-specific register (MSR) that is configured by software”; figs. 5A, 5B, values 518, 568], and wherein the first and second source SIMD registers are 128-bit registers, and wherein the destination SIMD register is a 128- bit register [Heinecke, paragraph 40, “all operands, be they source or destination operands, can be stored in the same type of vector registers, be they 128-bit, 256-bit, or 512-bit vector registers”].
The modified Heinecke does not explicitly disclose wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register.
However, Lai discloses wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register [paragraph 5, “In the field of computer architecture, the term data endianness is the interpretation of data byte order for putting a sequence of byte data into a destination storage (such as register, memory, or data bus) that has data width more than one byte. The big-endian order and the little-endian order are the most common”; “the data byte D0 from the lowest address of the memory 150 is put on the least significant byte (LSB) of the destination storage, while data bytes with higher addresses go toward the most significant side of the destination storage. According to the big-endian byte order 120, the data byte D0 from the lowest address of the memory 150 is put on the most significant byte (MSB) of the destination storage, while data bytes with higher addresses go toward the least significant side of the destination storage”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Lai in the apparatus of the modified Heinecke to implement, wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register, in order to allow more flexible data endianness management and easier software development [Lai, paragraph 13].
The modified Heinecke does not explicitly disclose wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa.
However, Van Dalen discloses wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa [paragraph 45, “In some embodiments, a custom or internal floating point format may optionally be used. The custom or internal floating point format point may not be a standard floating-point format, such as 16-bit half precision, 32-bit single precision, or the like. Rather, the custom or internal floating point format point may optionally be a non-standard floating point format. In some embodiments, the floating-point format may have less than 16-bits, less than 12-bits, or 8-bit or less”; “In some embodiments, the filter coefficients may optionally have a floating point format in which they each have a mantissa, a sign, and an exponent or shift factor”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Van Dalen in the apparatus of the modified Heinecke to implement, wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa, in order to help to eliminate many data alignments and thereby improve overall performance [Van Dalen, paragraph 33].
Referring to claims 41 and 34, taking claim 41 as exemplary, the modified Heinecke discloses the system of claim 40, further comprising a mass storage device coupled with the DRAM [Heinecke, fig. 13, element 1328 coupled with 1332/1334].
The modified Heinecke does not explicitly disclose wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, and wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register.
However, Lai discloses a mass storage device coupled with the DRAM, wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, and wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register [Lai, paragraph 5, “In the field of computer architecture, the term data endianness is the interpretation of data byte order for putting a sequence of byte data into a destination storage (such as register, memory, or data bus) that has data width more than one byte. The big-endian order and the little-endian order are the most common”; “the data byte D0 from the lowest address of the memory 150 is put on the least significant byte (LSB) of the destination storage, while data bytes with higher addresses go toward the most significant side of the destination storage. According to the big-endian byte order 120, the data byte D0 from the lowest address of the memory 150 is put on the most significant byte (MSB) of the destination storage, while data bytes with higher addresses go toward the least significant side of the destination storage”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Lai in the system of the modified Heinecke to implement, wherein the execution circuitry is to store the second plurality of 8-bit floating-point data elements in most significant data element positions of the destination SIMD register, and wherein the execution circuitry is to store the first plurality of 8-bit floating-point data elements in least significant data element positions of the destination SIMD register, in order to allow more flexible data endianness management and easier software development [Lai, paragraph 13].
The modified Heinecke does not explicitly disclose wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa.
However, Van Dalen discloses wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa [paragraph 45, “In some embodiments, a custom or internal floating point format may optionally be used. The custom or internal floating point format point may not be a standard floating-point format, such as 16-bit half precision, 32-bit single precision, or the like. Rather, the custom or internal floating point format point may optionally be a non-standard floating point format. In some embodiments, the floating-point format may have less than 16-bits, less than 12-bits, or 8-bit or less”; “In some embodiments, the filter coefficients may optionally have a floating point format in which they each have a mantissa, a sign, and an exponent or shift factor”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teachings of Van Dalen in the system of the modified Heinecke to implement, disclose wherein the first plurality of 8-bit floating-point data elements have a format comprising a 1-bit sign, a 5-bit exponent, and a 2-bit mantissa, in order to help to eliminate many data alignments and thereby improve overall performance [Van Dalen, paragraph 33].
Allowable Subject Matter
Claims 28, 32, 38, and 43 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the execution circuitry is to convert the first and second pluralities of half-precision floating-point data elements to the first and second pluralities of 8-bit floating-point data elements based on a plurality of values in a source register, wherein a first value of the plurality of values in the source register is to bias rounding of a first half-precision floating-point data element of the first plurality of half-precision floating-point data elements, in combination with other recited limitations in claim 28.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the execution circuitry is to convert the first and second pluralities of half-precision floating-point data elements to the first and second pluralities of 8-bit floating-point data elements based on a plurality of values in a source register, wherein a first value of the plurality of values in the source register is to bias rounding of a first half-precision floating-point data element of the first plurality of half-precision floating-point data elements, in combination with other recited limitation sin claim 32.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the first and second pluralities of half-precision floating-point data elements are respectively converted to the first and second pluralities of 8-bit floating-point data elements based on a plurality of values in a source register, wherein a first value of the plurality of values in the source register biases rounding of a first half-precision floating-point data element of the first plurality of half-precision floating-point data elements, in combination with other recited limitations in claim 38.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest a graphics processing unit coupled with the processor, wherein the first and second source SIMD registers are 128-bit registers, and wherein the destination SIMD register is a 128-bit register, and wherein the execution circuitry is to convert the first and second pluralities of half-precision floating-point data elements to the first and second pluralities of 8-bit floating-point data elements based on a plurality of values in a source register, wherein a first value of the plurality of values in the source register is to bias rounding of a first half-precision floating-point data element of the first plurality of half-precision floating-point data elements, in combination with other recited limitations in claim 43.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-3425. The examiner can normally be reached Mon-Fri 8:30 AM - 7 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached at (571) 270-1023. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Farley Abad/Primary Examiner, Art Unit 2181