Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
This Office Action is in response to Applicant’s Continuation filed on 11/21/2024.
Claims 1-20 are pending for this examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/21/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claims 5 and 14 are objected to because of the following informalities:
In claim 5, line 2, “machining learning” should be amended to read as --machine learning-- to be consistent with claim language throughout.
In claim 14, line 2, “machining learning” should be amended to read as --machine learning-- to be consistent with claim language throughout.
Appropriate correction is required.
Obvious-Type Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 10-11, 13 and 19 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 3 of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778). Although the claims at issue are not identical, they are not patentably distinct from each other because claims 10-11, 13 and 19 of instant Application, respectively contains every element of claim 3 of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778), as listed below with differences being underlined and italicized text being the limitation rearranged to a different position in the claim for the sake of visually seeing the comparison, and as such anticipate the claims of the copending application:
Claims
Instant Application
Claims
US Patent No. 12,165,016 (parent application s/n 17/032,778)
Independent claim 10
A system comprising:
an input/output die;
a machine learning accelerator stacked on the input/output die,
a first die directly coupled to a second port of the input/output die,
wherein the machine learning accelerator is directly coupled to a first port of the input/output die; and
wherein the machine learning accelerator is configured to direct first communication to the first die via the input/output die; and
wherein the first die is configured to direct second communication to the machine learning accelerator via the input/output die.
Independent claim 1
and dependent claim 3
1. A processor, comprising:
one or more processor chiplets;
an input/output die; and
a machine learning accelerator, wherein the machine learning accelerator is stacked on the input/output die,
wherein the one or more processor chiplets are coupled to the input/output die via one or more processor ports of the input/output die; and
wherein the machine learning accelerator is coupled to the input/output die via a machine learning accelerator port of the input/output die.
3. The processor of claim 1, wherein the input/output die includes a data fabric that routes traffic between one or more ports of the input/output die.
Analysis
Examiner points out that the instant claims uses “a first die”, “a first port”, and “a second port” compared with the claims in U.S. Patent No. 12,165,016 (parent application s/n 17/032,778) which uses “one or more processor chiplets”, “one or more processor ports”, and “accelerator port” respectively, and instant claims includes the limitations of “the machine learning accelerator… direct first communication… via the input/output die”, and “the first die … direct second communication… via the input/output die”, which is the equivalence / rewording of what dependent claim 3 of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778) claims, which is the routing of traffic between one or more ports of the I/O die. As such, Examiner finds the instant claims to be a broader claim of using broader terms which can still be encompassed / read / anticipated by the claims of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778).
Independent claim 19
A system comprising:
an input/output die; and
a machine learning accelerator stacked on the input/output die,
wherein the machine learning accelerator is directly coupled to a first port of the input/output die,
wherein the machine learning accelerator is configured to direct first communication to a first die via the input/output die; and
wherein the first die is configured to direct second communication to the machine learning accelerator via the input/output die.
Independent claim 1
and dependent claim 3
1. A processor, comprising:
one or more processor chiplets;
an input/output die; and
a machine learning accelerator, wherein the machine learning accelerator is stacked on the input/output die,
wherein the one or more processor chiplets are coupled to the input/output die via one or more processor ports of the input/output die; and
wherein the machine learning accelerator is coupled to the input/output die via a machine learning accelerator port of the input/output die.
3. The processor of claim 1, wherein the input/output die includes a data fabric that routes traffic between one or more ports of the input/output die.
Analysis
Examiner points out that the instant claims uses “a first die” and “a first port” compared with the claims in U.S. Patent No. 12,165,016 (parent application s/n 17/032,778) which uses “one or more processor chiplets” and “accelerator port” respectively, and instant claims includes the limitations of “the machine learning accelerator… direct first communication… via the input/output die”, and “the first die … direct second communication… via the input/output die”, which is the equivalence / rewording of what dependent claim 3 of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778) claims, which is the routing of traffic between one or more ports of the I/O die. As such, Examiner finds the instant claims to be a broader claim anticipated by the claims of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778).
Likewise, dependent claims 11 and 13 of the instant claims contain elements already claimed in independent claim 1 and dependent claim 3 of U.S. Patent No. 12,165,016 (parent application s/n 17/032,778), such as the directing second communication utilizing a data fabric of the I/O die (dependent claim 3 claiming the “input/output die including a data fabric that routes traffic between one or more ports of the input/output die”), and the first die comprising a first central processing unit chiplet or a memory die (independent claim 1 claiming “one or more processor chiplets”).
Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4, 10-11, 13 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bleiweiss et al. (US 2019/0205737), herein referred to as Bleiweiss ‘737, in view of Surugucchi (US 6,928,509), herein referred to as Surugucchi ‘509.
Referring to claim 1, Bleiweiss ‘737 teaches a method (see Abstract) comprising:
directing first communication (see Paragraphs 0140-0141, wherein bridge 1182 allows for routing of electrical signals such as I/O signals for chip-to-chip communications between logics 1172 and 1174, i.e. directing communications from the machine learning accelerator to the first die through the bridge 1182, where the first communication can be from chip 1174 to 1172, i.e. directing / routing chip-to-chip communication), by a machine learning accelerator (see Fig. 11B, logic 1174; see Paragraph 0140, where logic 1172 and 1174 may be implemented at least partly in configurable logic or fixed-functionality logic hardware and can include one or more portions of any of the processor core(s), graphics processor(s), or other accelerator device; see Paragraph 0026, wherein the hardware can include machine learning acceleration mechanisms) stacked on an input/output die (see Fig. 11B, wherein logic 1174 is stacked on top of bridge 1182 and substrate 1180), to a first die (see Fig. 11B, logic 1172), via the input/output die (see Fig. 11B, bridge 1182 that is a part of substrate 1180; see Paragraph 0141), wherein the machine learning accelerator is directly coupled to the input/output die (see Fig. 11B, interconnect structure 1173 connecting logics 1172 and 1174 to bridge 1182; see Paragraph 0140-0141, wherein interconnect structure 1173 is configured to route electrical signals between logics 1172 and 1174 such as I/O signals where bridge 1182 may be a dense interconnect structure that provides a route for electrical signals) and the first die is directly coupled to the input/output die (see Fig. 11B, interconnect structure 1173 connecting logics 1172 and 1174 to bridge 1182; see Paragraph 0140-0141, wherein interconnect structure 1173 is configured to route electrical signals between logics 1172 and 1174 such as I/O signals where bridge 1182 may be a dense interconnect structure that provides a route for electrical signals); and
directing second communication (see Paragraphs 0140-0141, wherein bridge 1182 allows for routing of electrical signals such as I/O signals for chip-to-chip communications between logics 1172 and 1174, i.e. directing communications from the first die to the machine learning accelerator through the bridge 1182, where the second communication can from chip 1172 to 1174, i.e. directing / routing chip-to-chip communication), by the first die (see Fig. 11B, logic 1172), to the machine learning accelerator (see Fig. 11B, logic 1174; see Paragraph 0140, where logic 1172 and 1174 may be implemented at least partly in configurable logic or fixed-functionality logic hardware and can include one or more portions of any of the processor core(s), graphics processor(s), or other accelerator device; see Paragraph 0026, wherein the hardware can include machine learning acceleration mechanisms), via the input/output die (see Fig. 11B, bridge 1182 that is a part of substrate 1180; see Paragraph 0141).
However, Bleiweiss ‘737 does not specifically teach the input/output die having a first port and second port for directly connecting the first die and machine learning accelerator as claimed.
Surugucchi ‘509 teaches a system (see Abstract) where a data communication bridge (see Fig. 4, data communication bridge 430) has multiple communication ports (see Fig. 4, communication ports 460) for direct connection with controllers (see Fig. 4, wherein controllers 410 directly connect to ports 460 using data communication links 420) and has a second set of multiple ports (see Fig. 4, serial data communication ports 470) for direct connection with serial storage devices (see Fig. 4, storage devices 450 directly connecting to ports 470 using serial links 440).
Bleiweiss ‘737 and Surugucchi ‘509 apply as analogous prior arts as both pertain to the same field of endeavor of connecting multiple hardware logics together through a communication bridge to allow communications between the connected logics through the bridge.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bleiweiss ‘737 system as set forth above to have the bridge include a first port and second port for direct connection to the different hardware logics such that communication between the hardware logics are passed through the ports of the bridge, as taught by Surugucchi ‘509, as a person of ordinary skill in the art would be motivated to include ports in a bridge as this would create smaller, more manageable segments of the bridge for purposes of addressing communications to the bridge interconnect and reducing the number of data collisions.
As to claim 2, Bleiweiss ‘737 teaches the method of claim 1, wherein the directing the first communication and the directing the second communication utilize a data fabric of the input/output die (see Paragraphs 0141-0142, wherein bridge 1182 is a dense interconnect structure providing routes for electrical signals to provide chip-to-chip connection between logics 1172 and 1174, i.e. a data fabric is computer architecture for connecting and integrating different data silos in a distributed environment which is the same as a bridge interconnecting different processing devices such as a processor and accelerator).
As to claim 4, Bleiweiss ‘737 teaches the method of claim 1, wherein the first die comprises a first central processing unit chiplet or a memory die (see Fig. 12, which depicts a SoC integrated circuit, i.e. a package assembly 1170 as seen in Fig. 11B, comprising multiple processors, and memory die 1265 and flash die 1260).
Referring to claim 10, Bleiweiss ‘737 teaches a system (see Fig. 11B, package assembly 1170) comprising:
an input/output die (see Fig. 11B, bridge 1182 that is a part of substrate 1180; see Paragraph 0141);
a machine learning accelerator (see Fig. 11B, logic 1174; see Paragraph 0140, where logic 1172 and 1174 may be implemented at least partly in configurable logic or fixed-functionality logic hardware and can include one or more portions of any of the processor core(s), graphics processor(s), or other accelerator device; see Paragraph 0026, wherein the hardware can include machine learning acceleration mechanisms) stacked on the input/output die (see Fig. 11B, wherein logic 1174 is stacked on top of bridge 1182 and substrate 1180), wherein the machine learning accelerator is directly coupled to the input/output die (see Fig. 11B, interconnect structure 1173 connecting logics 1172 and 1174 to bridge 1182; see Paragraph 0140-0141, wherein interconnect structure 1173 is configured to route electrical signals between logics 1172 and 1174 such as I/O signals where bridge 1182 may be a dense interconnect structure that provides a route for electrical signals); and
a first die (see Fig. 11B, logic 1172) directly coupled to the input/output die (see Fig. 11B, interconnect structure 1173 connecting logics 1172 and 1174 to bridge 1182; see Paragraph 0140-0141, wherein interconnect structure 1173 is configured to route electrical signals between logics 1172 and 1174 such as I/O signals where bridge 1182 may be a dense interconnect structure that provides a route for electrical signals),
wherein the machine learning accelerator is configured to direct first communication to the first die via the input/output die (see Paragraphs 0140-0141, wherein bridge 1182 allows for routing of electrical signals such as I/O signals for chip-to-chip communications between logics 1172 and 1174, i.e. directing communications from the machine learning accelerator to the first die through the bridge 1182, where the first communication can be from chip 1174 to 1172, i.e. directing / routing chip-to-chip communication); and
wherein the first die is configured to direct second communication to the machine learning accelerator via the input/output die (see Paragraphs 0140-0141, wherein bridge 1182 allows for routing of electrical signals such as I/O signals for chip-to-chip communications between logics 1172 and 1174, i.e. directing communications from the first die to the machine learning accelerator through the bridge 1182, where the second communication can from chip 1172 to 1174, i.e. directing / routing chip-to-chip communication).
However, Bleiweiss ‘737 does not specifically teach the input/output die having a first port and second port for directly connecting the first die and machine learning accelerator as claimed.
Surugucchi ‘509 teaches a system (see Abstract) where a data communication bridge (see Fig. 4, data communication bridge 430) has multiple communication ports (see Fig. 4, communication ports 460) for direct connection with controllers (see Fig. 4, wherein controllers 410 directly connect to ports 460 using data communication links 420) and has a second set of multiple ports (see Fig. 4, serial data communication ports 470) for direct connection with serial storage devices (see Fig. 4, storage devices 450 directly connecting to ports 470 using serial links 440).
Bleiweiss ‘737 and Surugucchi ‘509 apply as analogous prior arts as both pertain to the same field of endeavor of connecting multiple hardware logics together through a communication bridge to allow communications between the connected logics through the bridge.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bleiweiss ‘737 system as set forth above to have the bridge include a first port and second port for direct connection to the different hardware logics such that communication between the hardware logics are passed through the ports of the bridge, as taught by Surugucchi ‘509, as a person of ordinary skill in the art would be motivated to include ports in a bridge as this would create smaller, more manageable segments of the bridge for purposes of addressing communications to the bridge interconnect and reducing the number of data collisions.
As to claim 11, Bleiweiss ‘737 teaches the system of claim 10, wherein the directing the first communication and the directing the second communication utilize a data fabric of the input/output die (see Paragraphs 0141-0142, wherein bridge 1182 is a dense interconnect structure providing routes for electrical signals to provide chip-to-chip connection between logics 1172 and 1174, i.e. a data fabric is computer architecture for connecting and integrating different data silos in a distributed environment which is the same as a bridge interconnecting different processing devices such as a processor and accelerator).
As to claim 13, Bleiweiss ‘737 teaches the system of claim 10, wherein the first die comprises a first central processing unit chiplet or a memory die (see Fig. 12, which depicts a SoC integrated circuit, i.e. a package assembly 1170 as seen in Fig. 11B, comprising multiple processors, and memory die 1265 and flash die 1260).
Referring to claim 19, Bleiweiss ‘737 teaches a system (see Fig. 11B, package assembly 1170) comprising:
an input/output die (see Fig. 11B, bridge 1182 that is a part of substrate 1180; see Paragraph 0141); and
a machine learning accelerator (see Fig. 11B, logic 1174; see Paragraph 0140, where logic 1172 and 1174 may be implemented at least partly in configurable logic or fixed-functionality logic hardware and can include one or more portions of any of the processor core(s), graphics processor(s), or other accelerator device; see Paragraph 0026, wherein the hardware can include machine learning acceleration mechanisms) stacked on the input/output die (see Fig. 11B, wherein logic 1174 is stacked on top of bridge 1182 and substrate 1180), wherein the machine learning accelerator is directly coupled to the input/output die (see Fig. 11B, interconnect structure 1173 connecting logics 1172 and 1174 to bridge 1182; see Paragraph 0140-0141, wherein interconnect structure 1173 is configured to route electrical signals between logics 1172 and 1174 such as I/O signals where bridge 1182 may be a dense interconnect structure that provides a route for electrical signals),
wherein the machine learning accelerator is configured to direct first communication to a first die (see Fig. 11B, logic 1172) via the input/output die (see Paragraphs 0140-0141, wherein bridge 1182 allows for routing of electrical signals such as I/O signals for chip-to-chip communications between logics 1172 and 1174, i.e. directing communications from the machine learning accelerator to the first die through the bridge 1182, where the first communication can be from chip 1174 to 1172, i.e. directing / routing chip-to-chip communication); and
wherein the first die is configured to direct second communication to the machine learning accelerator via the input/output die (see Paragraphs 0140-0141, wherein bridge 1182 allows for routing of electrical signals such as I/O signals for chip-to-chip communications between logics 1172 and 1174, i.e. directing communications from the first die to the machine learning accelerator through the bridge 1182, where the second communication can from chip 1172 to 1174, i.e. directing / routing chip-to-chip communication).
However, Bleiweiss ‘737 does not specifically teach the input/output die having a first port and second port for directly connecting the first die and machine learning accelerator as claimed.
Surugucchi ‘509 teaches a system (see Abstract) where a data communication bridge (see Fig. 4, data communication bridge 430) has multiple communication ports (see Fig. 4, communication ports 460) for direct connection with controllers (see Fig. 4, wherein controllers 410 directly connect to ports 460 using data communication links 420) and has a second set of multiple ports (see Fig. 4, serial data communication ports 470) for direct connection with serial storage devices (see Fig. 4, storage devices 450 directly connecting to ports 470 using serial links 440).
Bleiweiss ‘737 and Surugucchi ‘509 apply as analogous prior arts as both pertain to the same field of endeavor of connecting multiple hardware logics together through a communication bridge to allow communications between the connected logics through the bridge.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bleiweiss ‘737 system as set forth above to have the bridge include a first port for direct connection to the different hardware logics such that communication between the hardware logics are passed through the ports of the bridge, as taught by Surugucchi ‘509, as a person of ordinary skill in the art would be motivated to include ports in a bridge as this would create smaller, more manageable segments of the bridge for purposes of addressing communications to the bridge interconnect and reducing the number of data collisions.
As to claim 20, Bleiweiss ‘737 teaches the system of claim 19, wherein the directing the first communication and the directing the second communication utilize a data fabric of the input/output die (see Paragraphs 0141-0142, wherein bridge 1182 is a dense interconnect structure providing routes for electrical signals to provide chip-to-chip connection between logics 1172 and 1174, i.e. a data fabric is computer architecture for connecting and integrating different data silos in a distributed environment which is the same as a bridge interconnecting different processing devices such as a processor and accelerator).
Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Bleiweiss ‘737, in view of Surugucchi ‘509 and Dorr et al. (US 2004/0017807), herein referred to as Dorr ‘807.
As to claim 3, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the method of claim 2, wherein the directing the first communication and the directing the second communication are performed utilizing full bandwidth of the data fabric.
Dorr ‘807 teaches a interconnect system connecting multiple hardware processing elements (see Abstract; see Fig. 1, processing elements 103 and interconnect fabric 110), where interconnect is implemented with a selected maximum datum width for each configuration corresponding to a maximum data width for the packet as well as smaller data widths than the selected maximum data width of the interconnect (see Paragraph 0033). Examiner points out that if a packet being transmitted has the data width equal to the interconnect’s maximum data width, than it is by definition utilizing the full bandwidth of the interconnect.
Bleiweiss ‘737, Surugucchi ‘509, and Dorr ‘807 apply as analogous prior arts as all these arts pertain to the same field of endeavor of connecting multiple hardware logics together through an interconnect to allow communications between the connected logics through the interconnect.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to utilize communication protocols for transmitting data across the bridge/interconnect with a selected maximum datum width for a configuration corresponding to a maximum datum width for the packet which includes the possibility of the datum width of a packet being equal to the interconnect’s maximum data width meaning full bandwidth would be utilized, as taught by Dorr ‘807, as a person of ordinary skill in the art would recognize that communications between hardware logics all utilize a communication protocol in order for signals to be encoded / decoded where there would be motivated to utilize a selected datum width for packets equal to the interconnect’s maximum datum width in order to fully maximize hardware usage and the data transfer rates for the communications.
As to claim 12, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the system of claim 11, wherein the directing the first communication and the directing the second communication are performed utilizing full bandwidth of the data fabric.
Dorr ‘807 teaches a interconnect system connecting multiple hardware processing elements (see Abstract; see Fig. 1, processing elements 103 and interconnect fabric 110), where interconnect is implemented with a selected maximum datum width for each configuration corresponding to a maximum data width for the packet as well as smaller data widths than the selected maximum data width of the interconnect (see Paragraph 0033). Examiner points out that if a packet being transmitted has the data width equal to the interconnect’s maximum data width, than it is by definition utilizing the full bandwidth of the interconnect.
Bleiweiss ‘737, Surugucchi ‘509, and Dorr ‘807 apply as analogous prior arts as all these arts pertain to the same field of endeavor of connecting multiple hardware logics together through an interconnect to allow communications between the connected logics through the interconnect.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to utilize communication protocols for transmitting data across the bridge/interconnect with a selected maximum datum width for a configuration corresponding to a maximum datum width for the packet which includes the possibility of the datum width of a packet being equal to the interconnect’s maximum data width meaning full bandwidth would be utilized, as taught by Dorr ‘807, as a person of ordinary skill in the art would recognize that communications between hardware logics all utilize a communication protocol in order for signals to be encoded / decoded where there would be motivated to utilize a selected datum width for packets equal to the interconnect’s maximum datum width in order to fully maximize hardware usage and the data transfer rates for the communications.
Claims 5-9 and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Bleiweiss ‘737, in view of Surugucchi ‘509 and Nystad et al. (US 2019/0096025), herein referred to as Nystad ‘025.
As to claim 5, Bleiweiss ‘737 teaches the hardware logic (see Paragraph 0164) being used to perform convolution operations (see Paragraph 0165; also see Figs. 29-31).
However, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the method of claim 1, wherein the first die is a memory and the directing the first communication comprises fetching machine learning weights and inputs from the memory.
Nystad ‘025 teaches graphics processing unit for performing convolution operations (see Abstract) wherein an instruction is provided to perform a convolution operation including a set of weight values and a set of input data values (see Paragraph 0017) where a convolution instruction includes where the input data values and where the set or kernel of weight values for the convolution operation are stored, then fetching the input data values and set or kernel of weight values from where they are stored (see Paragraph 0082), where the apparatus receives a set or kernel of weight values as part of and/or with an instruction to perform the convolution operation by fetching the set or kernel of weight values from storage such as from memory of the data processing system (see Paragraphs 0049-0050).
Bleiweiss ‘737, Surugucchi ‘509, and Nystad ‘025 apply as analogous prior arts as all these arts pertain to the same field of endeavor of multiple hardware logics connected together and in communication with each other in order to execute / perform operations across the multiple hardware logics.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to have the convolution operation of Bleiweiss ‘737 include instructions that direct the hardware to fetch weight values and inputs from the memory, as taught by Nystad ‘025, as a person of ordinary skill in the art would be motivated to fetch input values and weight values from memory as input data and weight values are known in the art to be necessary data for performing convolution operations especially within convolution neural network systems such as Bleiweiss ‘737.
As to claim 6, Bleiweiss ‘737 teaches the method of claim 5, further comprising performing matrix multiplication or convolution operations in the machine learning accelerator (see Paragraphs 0162-0163, wherein hardware acceleration for machine learning application 1502 includes convolution operations; also see Paragraph 0165) using the weights and inputs (see Paragraph 0158).
As to claim 7, Bleiweiss ‘737 teaches a first logic, such as a CPU, directing operations (offloading) to a second logic, such as a GPU (see Paragraph 0324; also see Fig. 35), for execution / processing.
However, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the directing the second communication comprises the first die controlling machine learning operations on the machine learning accelerator.
Nystad ‘025 teaches graphics processing unit for performing convolution operations (see Abstract) wherein an instruction is provided to perform a convolution operation including where the input data values and the set or kernel of weight values for the a set of weight values are stored (see Paragraph 0082), where a host processor executing an application that requires data or graphics processing by a graphics processing unit can instruct / control the graphics processing unit accordingly (see Paragraph 0102).
Bleiweiss ‘737, Surugucchi ‘509, and Nystad ‘025 apply as analogous prior arts as all these arts pertain to the same field of endeavor of multiple hardware logics connected together and in communication with each other in order to execute / perform operations across the multiple hardware logics.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to have one of the logics be a host processor, i.e. first die, and the other logic be the logic that performs the convolution operation, i.e. machine learning accelerator, where the host processor can control the second logic / graphics processing unit through sending instructions to perform convolution operations, as taught by Nystad ‘025, as a person of ordinary skill in the art would be motivated to have a processor offload execution to a second logic such as an accelerator or graphics processing unit for execution in order to reduce the primary workload of the first die / processor allowing it to focus on critical tasks while allowing for specialized hardware to handle processing-intensive just such as graphics processing.
As to claim 8, Bleiweiss ‘737 and Surugucchi ‘509 does not specifically teach the method of claim 7, wherein the controlling of the machine learning operations comprises instructing the machine learning accelerator regarding where to fetch inputs, what operations to perform, and where to store results.
Nystad ‘025 teaches graphics processing unit for performing convolution operations (see Abstract) wherein an instruction is provided to perform a convolution operation including where the input data values and the set or kernel of weight values for the a set of weight values are stored (see Paragraph 0082), and where to store results of a convolution operations (see Paragraph 0005, where a store instruction to store the results is included in known convolution operation).
Bleiweiss ‘737, Surugucchi ‘509, and Nystad ‘025 apply as analogous prior arts as all these arts pertain to the same field of endeavor of multiple hardware logics connected together and in communication with each other in order to execute / perform operations across the multiple hardware logics.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to have convolution operations include information about where to fetch inputs, what convolution operation to perform using the inputs, and what to do with the results, i.e. where to store results, as taught by Nystad ‘025, as a person of ordinary skill in the art would be motivated to have instructions include things like source, destination, and operation fields as instructions are known to be formatted in the art to include fields for conveying information regarding these fields.
As to claim 9, Bleiweiss ‘737 teaches the method of claim 8, wherein the controlling of the machine learning operations comprises performing operations for one layer of a machine learning model (see Paragraph 0158, where the machine learning algorithm used in Bleiweiss ‘737 is a neural network arranged in multiple layers, where a convolution neural network has computations including convolution mathematical operations done for convolution layers of a multidimension array, i.e. at least one layer of the CNN would have convolution operations done in a machine learning algorithm).
As to claim 14, Bleiweiss ‘737 teaches the hardware logic (see Paragraph 0164) being used to perform convolution operations (see Paragraph 0165; also see Figs. 29-31).
However, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the system of claim 10, wherein the first die is a memory and the directing the first communication comprises fetching machine learning weights and inputs from the memory.
Nystad ‘025 teaches graphics processing unit for performing convolution operations (see Abstract) wherein an instruction is provided to perform a convolution operation including a set of weight values and a set of input data values (see Paragraph 0017) where a convolution instruction includes where the input data values and where the set or kernel of weight values for the convolution operation are stored, then fetching the input data values and set or kernel of weight values from where they are stored (see Paragraph 0082), where the apparatus receives a set or kernel of weight values as part of and/or with an instruction to perform the convolution operation by fetching the set or kernel of weight values from storage such as from memory of the data processing system (see Paragraphs 0049-0050).
Bleiweiss ‘737, Surugucchi ‘509, and Nystad ‘025 apply as analogous prior arts as all these arts pertain to the same field of endeavor of multiple hardware logics connected together and in communication with each other in order to execute / perform operations across the multiple hardware logics.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to have the convolution operation of Bleiweiss ‘737 include instructions that direct the hardware to fetch weight values and inputs from the memory, as taught by Nystad ‘025, as a person of ordinary skill in the art would be motivated to fetch input values and weight values from memory as input data and weight values are known in the art to be necessary data for performing convolution operations especially within convolution neural network systems such as Bleiweiss ‘737.
As to claim 15, Bleiweiss ‘737 teaches the system of claim 14, wherein the machine learning accelerator is further configured to performing matrix multiplication or convolution operations (see Paragraphs 0162-0163, wherein hardware acceleration for machine learning application 1502 includes convolution operations; also see Paragraph 0165) using the weights and inputs (see Paragraph 0158).
As to claim 16, Bleiweiss ‘737 teaches a first logic, such as a CPU, directing operations (offloading) to a second logic, such as a GPU (see Paragraph 0324; also see Fig. 35), for execution / processing.
However, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the directing the second communication comprises the first die controlling machine learning operations on the machine learning accelerator.
Nystad ‘025 teaches graphics processing unit for performing convolution operations (see Abstract) wherein an instruction is provided to perform a convolution operation including where the input data values and the set or kernel of weight values for the a set of weight values are stored (see Paragraph 0082), where a host processor executing an application that requires data or graphics processing by a graphics processing unit can instruct / control the graphics processing unit accordingly (see Paragraph 0102).
Bleiweiss ‘737, Surugucchi ‘509, and Nystad ‘025 apply as analogous prior arts as all these arts pertain to the same field of endeavor of multiple hardware logics connected together and in communication with each other in order to execute / perform operations across the multiple hardware logics.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to have one of the logics be a host processor, i.e. first die, and the other logic be the logic that performs the convolution operation, i.e. machine learning accelerator, where the host processor can control the second logic / graphics processing unit through sending instructions to perform convolution operations, as taught by Nystad ‘025, as a person of ordinary skill in the art would be motivated to have a processor offload execution to a second logic such as an accelerator or graphics processing unit for execution in order to reduce the primary workload of the first die / processor allowing it to focus on critical tasks while allowing for specialized hardware to handle processing-intensive just such as graphics processing.
As to claim 17, Bleiweiss ‘737 and Surugucchi ‘509 do not specifically teach the system of claim 16, wherein the controlling of the machine learning operations comprises instructing the machine learning accelerator regarding where to fetch inputs, what operations to perform, and where to store results.
Nystad ‘025 teaches graphics processing unit for performing convolution operations (see Abstract) wherein an instruction is provided to perform a convolution operation including where the input data values and the set or kernel of weight values for the a set of weight values are stored (see Paragraph 0082), and where to store results of a convolution operations (see Paragraph 0005, where a store instruction to store the results is included in known convolution operation).
Bleiweiss ‘737, Surugucchi ‘509, and Nystad ‘025 apply as analogous prior arts as all these arts pertain to the same field of endeavor of multiple hardware logics connected together and in communication with each other in order to execute / perform operations across the multiple hardware logics.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Bleiweiss ‘737 and Surugucchi ‘509 system as set forth above to have convolution operations include information about where to fetch inputs, what convolution operation to perform using the inputs, and what to do with the results, i.e. where to store results, as taught by Nystad ‘025, as a person of ordinary skill in the art would be motivated to have instructions include things like source, destination, and operation fields as instructions are known to be formatted in the art to include fields for conveying information regarding these fields.
As to claim 18, Bleiweiss ‘737 teaches the system of claim 17, wherein the controlling of the machine learning operations comprises performing operations for one layer of a machine learning model (see Paragraph 0158, where the machine learning algorithm used in Bleiweiss ‘737 is a neural network arranged in multiple layers, where a convolution neural network has computations including convolution mathematical operations done for convolution layers of a multidimension array, i.e. at least one layer of the CNN would have convolution operations done in a machine learning algorithm).
Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Daga et al. (US 10,769,526) teaches a machine learning accelerator architecture with weight buffer and input buffer to perform matrix multiplication for platforms that involve complex convolution neural network operations that need to be calculated in systems with multiple cores connected to a graphics engine and shared memory.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL SUN whose telephone number is (571)270-1724. The examiner can normally be reached Monday-Friday 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL SUN/Primary Examiner, Art Unit 2183