DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-4, 8-9, 14-19 and 21-23 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Pechanek [US 2015/0039855 A1].
Regarding Claim 1, Pechanek teaches “A processor-implemented method for task processing comprising: accessing an array of compute elements, wherein each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements, wherein the array of compute elements is coupled to at least one data cache, wherein the data cache provides memory storage for the array of compute elements;” as “the organization of nodes is arranged by plane in a three dimension (3D) 4.times.4.times.4 topology with the nodes combined as discussed in FIGS. 20A-20E to produce the WAM 4.times.4 Quad Core array 2100 as shown in FIG. 21. Sixteen quad core nodes each structured similar to the exemplary quad core node symbol 2080 of FIG. 20E are arranged in a four quad core nodes by four quad core nodes (4.times.4) arrangement of quad core nodes. The 3D physical layout form shown in FIG. 3 using WAM array memory for data memory, such as a data cache, and a vertical pipe between instruction memory and processing elements is extended in the implementation of the WAM 16 quad core network 2100.” [¶0167]
“providing control for the array of compute elements on a cycle-by-cycle basis, wherein the control is enabled by a stream of wide control words generated by the compiler;” as “FIG. 12B illustrates a pipeline diagram 1230 with instruction executions per cycle for the FFT signal flow graph of FIG. 10 in accordance with an embodiment of the present invention. The instruction execution cycles EX1-EX10 1241-1250, respectively, are listed across the first top row of the diagram 1230 with each labeled column representing an execution cycle.” [¶0127]
“generating a load address and a store address, wherein the load address and the store address comprise memory block move addresses, and wherein the memory block move addresses point to memory storage locations in the at least one data cache; and” as “ the load and store networks may be extended to support larger arrays based on using the folding techniques described herein. Also, higher levels of adjacency may be used, such as using 1.fwdarw.5 level adjacency buses between the PRS nodes and between the LTZ nodes and then using 1.fwdarw.3 level adjacency buses between the S and V nodes and the M and Z nodes to create an arrangement of quad core nodes” [¶0167]
“executing a memory block move, based on the memory block move addresses, wherein data for the memory block move is transferred outside of the array of compute elements.” as “In Table 5, the Mg,h relative to M2,2 column indicates a horizontal movement of one or two steps followed by a vertical movement of one or two steps to reach the specified destination memory block to be selected.” [¶0116]
Regarding Claim 2, Pechanek teaches “wherein the load address and the store address are generated in a same cycle.” as “FIG. 20D illustrates an exemplary quad core node that supports store and load operations in parallel in accordance with an embodiment of the present invention” [¶0052]
Regarding Claim 3, Pechanek teaches “wherein the memory block move comprises a data cache to data cache transfer.” as “FIG. 3 illustrates a nine node processing system 300 in a 3D physical layout form using WAM array memory for data memory, such as including an array of data caches, and a vertical pipe between instruction memory and processing elements in accordance with an embodiment of the present invention. ” [¶0074]
Regarding Claim 4, Pechanek teaches “wherein a control word from the stream of wide control words includes a load target start address, a store target start address, a block size, and a stride.” as “The load operation fetches a data element at an address according to an increment amount, stride, hold information, and the like which may be encoded in various opcode dependent fields 906, 912, and 913 and interpreted according to a data type stored in Dtype 907.” [¶0117]
Regarding Claim 8, Pechanek teaches “further comprising coupling load buffers located adjacent to at least one edge of the array of compute elements.” as “Input data, also referred to as operands, may be loaded to an arithmetic unit over a Wings array memory (WAM) load network at connection point 560 through an input interface 561 which may contain buffer storage according to requirements of a processor.” [¶0077]
Regarding Claim 9, Pechanek teaches “wherein the memory block move that is transferred outside of the array of compute elements is enabled by the load buffers.” as “Data may also be transferred from data bus 564 to output interface 567 to a WAM store network 568 for storage in one or more memory blocks of the processor memory.” [¶0077]
Regarding Claim 14, Pechanek teaches “wherein the array of compute elements comprises a two- dimensional (2D) array.” as “FIG. 19 illustrates a WAM 4.times.4.times.4 network for store operations that is a reorganized WAM 4.times.4.times.4 network with 4.times.4 PRS planes and 4.times.4 VM planes each arranged in a 2 dimensional (2D) organization” [¶0048]
Regarding Claim 15, Pechanek teaches “wherein the 2D array includes rows of compute elements and columns of compute elements.” as “2 dimensional (2D) organization interconnected by a 1 to 3 level adjacency networks in the rows and in the columns in accordance with an embodiment of the present invention;” [¶0048]
Regarding Claim 16, Pechanek teaches “wherein the generating a load address and a store address is performed by one or more compute elements within a column of compute elements.” as “A column select 1020 identifies load instructions associated with each row to load the X and W values to the appropriate execution unit in each row. A column select 1022 identifies a first and a second groupfun instructions associated with each row to provide a complex multiplication and a move result function in each row. A column select 1024 identifies add or subtract instructions according to the row. ” [¶0123]
Regarding Claim 17, Pechanek teaches “wherein successful completion of the memory block move occurs within one architectural cycle.” as “FIG. 12B illustrates a pipeline diagram 1230 with instruction executions per cycle for the FFT signal flow graph of FIG. 10 in accordance with an embodiment of the present invention.” [¶0127]
Regarding Claim 18, Pechanek teaches “wherein the architectural cycle includes a plurality of clock cycles.” as “The clock used in FIGS. 26 and 27 may be a clock that is slower than the clock used in FIGS. 24 and 25, such as a 500 MHz clock.” [¶0178]
Regarding Claim 19, Pechanek teaches “wherein the memory block move implements a load-to-store forwarding operation.” as “FIG. 12A illustrates a pipeline diagram for Row 3 of the FFT signal flow graph of FIG. 10 in accordance with an embodiment of the present invention;” [¶0039]
Regarding Claim 21, Pechanek teaches “wherein the stream of wide control words comprises variable length control words generated by the compiler.” as “The Li.a0 instruction 607 is coded by a compiler or programmer to indicate a chained link to a destination instruction based on register linkage between instructions and placement of a linked instruction in a CEP, such as in the row 0 CEP 606.” [¶0081]
Claim 22 is anticipated by Pechanek under the same rationale of anticipation of claim 1.
Claim 23 is anticipated by Pechanek under the same rationale of anticipation of claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pechanek [US 2015/0039855 A1] in view of Ingalls et al. [US 2021/0263854 A1].
Claim 5 is rejected over Pechanek and Ingalls.
Pechanek does not explicitly teach wherein the generating a load address and a store address encompasses physical address translation of the load target start address and the store target start address, respectively.
However, Ingalls teaches “wherein the generating a load address and a store address encompasses physical address translation of the load target start address and the store target start address, respectively.” as “Each processor core 1100 can include a L1 instruction cache 1500 which is associated with a L1 translation lookaside buffer (TLB) 1510 for virtual-to-physical address translation.” [¶0022]
Pechanek and Ingalls are analogous arts because they teach storage system and cache management.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Pechanek and Ingalls before him/her, to modify the teachings of Pechanek to include the teachings of Ingalls with the motivation of the prediction hint can be a disable load data return if unknown Read-After-Write hazard, a speculative store bypass disable, or other alternate behavior. [Ingalls, ¶0011]
Claim(s) 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pechanek [US 2015/0039855 A1] in view of Brewer. [US 2022/0121450 A1].
Claim 6 is rejected over Pechanek and Brewer.
Pechanek does not explicitly teach wherein the memory block move is executed as a pseudo-atomic operation.
However, Brewer teaches “wherein the memory block move is executed as a pseudo-atomic operation.” as “Built-in atomic operators can also involve requests for a “standard” atomic operator on the requested data, such as comparatively simple, single cycle, integer atomics-such as fetch-and-increment or compare-and-swap-which will occur with the same throughput as a regular memory read or write operation not involving an atomic operator. ” [¶0044]
Pechanek and Brewer are analogous arts because they teach storage system and cache management.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Pechanek and Brewer before him/her, to modify the teachings of Pechanek to include the teachings of Brewer with the motivation of chiplet system offers advantages in allowing adaptation to different memory storage technologies; and different memory interfaces, through updated chiplet configurations, without requiring redesign of the remainder of the system structure.. [Brewer, ¶0032]
Claim 7 is rejected over Pechanek and Brewer.
Pechanek does not explicitly teach wherein the pseudo-atomic operation uses memory hazard detection and mitigation.
However, Brewer teaches “wherein the pseudo-atomic operation uses memory hazard detection and mitigation.” as “Following the writing of the resulting data to the cache 210, any corresponding hazard bit which was set will be cleared by the memory hazard unit 260.” [¶0044]
Claim(s) 12-13 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pechanek [US 2015/0039855 A1] in view of Chinnakonda et al. [US 2003/0196072 A1].
Claim 12 is rejected over Pechanek and Chinnakonda.
Pechanek does not explicitly teach further comprising coupling a crossbar switch between the load buffers and the at least one data cache.
However, Chinnakonda teaches “further comprising coupling a crossbar switch between the load buffers and the at least one data cache.” as “DSP core 10 communicates with memory 12 via load buses L00 and L01, a store bus S0 and an instruction bus IO. Memory 12 includes a store buffer 300, a load skid buffer 302, prioritization logic 310, bank conflict detection and handling logic 312, control logic 314, SRAM megabanks 320 and 322 and a data crossbar 330.” [¶0048]
Pechanek and Chinnakonda are analogous arts because they teach storage system and cache management.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Pechanek and Chinnakonda before him/her, to modify the teachings of Pechanek to include the teachings of Chinnakonda with the motivation of an advantage of the pipelined architecture is increased operating speed, since multiple instructions may be in process simultaneously, with different instructions being in different states of completion. [Chinnakonda, ¶0030]
Claim 13 is rejected over Pechanek and Chinnakonda.
Pechanek does not explicitly teach wherein the crossbar switch enables memory access anywhere within the at least one data cache.
However, Chinnakonda teaches “wherein the crossbar switch enables memory access anywhere within the at least one data cache.” as “Data crossbar 330 routes data from megabanks 320 and 322 to DSP core 10, DSP core 14 and a DMA requester in accordance with control signals derived from the instruction being executed.” [¶0052]
Claim 20 is rejected over Pechanek and Chinnakonda.
Pechanek does not explicitly teach wherein the load-to-store forwarding operation enables hazard detection and mitigation.
However, Chinnakonda teaches “wherein the load-to-store forwarding operation enables hazard detection and mitigation.” as “Data address generator 22 may also include a P register file 74, a future file 76, hazard detection circuitry 78 and a TLB 80.” [¶0027]
Pechanek and Chinnakonda are analogous arts because they teach storage system and cache management.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of Pechanek and Chinnakonda before him/her, to modify the teachings of Pechanek to include the teachings of Chinnakonda with the motivation of an advantage of the pipelined architecture is increased operating speed, since multiple instructions may be in process simultaneously, with different instructions being in different states of completion. [Chinnakonda, ¶0030]
Allowable Subject Matter
Claims 10 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MASUD K KHAN whose telephone number is (571)270-0606. The examiner can normally be reached Monday-Friday (8am-5pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached at (571) 272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MASUD K KHAN/Primary Examiner, Art Unit 2132