DETAILED ACTION
Claims 1-21 are pending in this application.
Claims 9-12 and 20 are objected to.
Claims 1-8, 13-19 and 21 are rejected.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-7, 17-18 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hakura et al. (U.S. PGPub No. 2019/0235915) in view of Nickolls et al. (U.S. PGPub No. 2011/0078692).
Claim 1
Hakura (2019/0235915) teaches:
A computer-implemented method for performing memory store operations, the method comprising:
receiving a first store operation from a first processor; FIG. 3 streaming multiprocessors (SMs) 310; P. 0073-74 L2 slice 270 receives an atomic operation from the SM 310; P. 0038 the operation may be a store operation
determining that the first store operation comprises an ordered store operation of a first type; P. 0073-74 ordering unit 420 determines the ordering number 418 included in the ordered atomic operation 410 is equal to the current ordering number 494 associated with the target memory address 412
wherein the first store operation causes data stored in a memory by the first store operation and data stored in the memory by prior store operations to be visible in an ordered view of the memory; and P. 0049 and FIG. 4 When a memory address 490(x) is orderable, another memory address 490(y) is stores the current ordering number 494(x) associated with the memory address 490(x). Each address 390 stores a data value 492 or an ordering number 494
maintaining ordering of the first store operation relative to a second store operation comprising an ordered store operation of a second type, […] 0076-77 a received ordered atomic operation 410 may not have an equal ordering number to the current ordering number 494, in which case the ordered atomic operation 510 in the CAM
Hakura does not explicitly teach continuing to execute operations while a first ordered operation is pending.
Nickolls (2011/0078692) teaches:
wherein the first store operation causes data stored in a memory by the first store operation and data stored in the memory by prior store operations to be visible in an ordered view of the memory; and […] P. 0065 a memory transaction is considered “performed” when it has been committed to memory order and is visible to all other threads
[…] wherein the first processor continues to execute operations while the first store operation is pending. P. 0094 a “triggering” MEMBAR instruction (a MEMBAR instruction that opens the coalescing window) is received by the memory barrier instruction execution unit 500 within the L1 cache 320, the MEMBAR accept/retry unit 505 records the MEMBAR instruction and enters a coalescing window; P. 0095 While coalescing, the memory barrier instruction execution unit 500 in the L1 cache 320 will continue to process requests from threads that have not reached a MEMBAR instruction
It would have been obvious to a person with ordinary skill in the art at the effective filing date of the application to include the invention of Hakura with continuing to execute operations while a first ordered operation is pending taught by Nickolls
The motivation being it provides a many-core high performance compute platform (see Nickolls P. 0061)
The systems of Hakura and Nickolls are analogous because they are from the “same field of endeavor” and from the same “problem solving area.” Namely, they are both from the field of memory systems.
Therefore it would have been obvious to combine Hakura with Nickolls to obtain the invention as recited in claims 1-20.
Claim 2
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 1, wherein maintaining ordering of the first store operation relative to the second store operation comprises delaying execution of the first store operation pending receiving an acknowledgement that data for a second store operation comprising an ordered store operation of a second type is visible in memory. P. 0062 a memory barrier (MEMBAR) instruction is used to ensure that all memory transactions issued before the MEMBAR instruction are sufficiently performed so that their results are visible; P. 0088 MEMBAR.GL instructions commit all of the memory transactions before the MEMBAR.GL to the L2 cache 350, MEMBAR.SYS instructions commit all of the memory transactions before the MEMBAR.SYS to system memory 104 [either is analogous to a first type of store operation]
Claim 3
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 2, wherein: the ordered store operation of the first type comprises a first strong ordered store operation; and P. 0070 There are multiple levels of MEMBAR instructions … MEMBAR.CTA enforces memory ordering among threads in the CTA, MEMBAR.GL enforces ordering at the global level (e.g. among the memory interface 214 clients), and MEMBAR.SYS enforces ordering at the system level (e.g. including system and peer memory).
the ordered store operation of the second type comprises a second strong ordered store operation. P. 0062 LOAD and STORE operations from any one thread to the same memory address must be performed with respect to just that thread in program order
Claim 4
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 1, wherein: the ordered store operation of the first type comprises a strong ordered store operation; and P. 0070 There are multiple levels of MEMBAR instructions … MEMBAR.CTA enforces memory ordering among threads in the CTA, MEMBAR.GL enforces ordering at the global level (e.g. among the memory interface 214 clients), and MEMBAR.SYS enforces ordering at the system level (e.g. including system and peer memory).
the ordered store operation of the second type comprises a weak ordered store operation. P. 0062 LOAD and STORE operations from any one thread to the same memory address must be performed with respect to just that thread in program order [weak]
Claim 5
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 1, further comprising: receiving acknowledgement that data for the second store operation is visible in memory; and executing the first store operation. P. 0062 a memory barrier (MEMBAR) instruction is used to ensure that all memory transactions issued before the MEMBAR instruction are sufficiently performed so that their results are visible; P. 0097 The MEMBAR detection circuit 512 is configured to wait for “ACCEPT” from the L1 cache 320 for all previous LD/ST instructions in the same warp before outputting a MEMBAR instruction
Claim 6
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 1, wherein: the ordered store operation of the first type comprises a first weak ordered store operation; P. 0070 There are multiple levels of MEMBAR instructions … MEMBAR.CTA enforces memory ordering among threads in the CTA [weak]
the ordered store operation of the second type comprises a second weak ordered store operation; and P. 0062 LOAD and STORE operations from any one thread to the same memory address must be performed with respect to just that thread in program order [weak]
execution of the first store operation is not delayed due to pendency of the second store operation. P. 0094 a “triggering” MEMBAR instruction (a MEMBAR instruction that opens the coalescing window) is received by the memory barrier instruction execution unit 500 within the L1 cache 320, the MEMBAR accept/retry unit 505 records the MEMBAR instruction and enters a coalescing window; P. 0095 While coalescing, the memory barrier instruction execution unit 500 in the L1 cache 320 will continue to process requests from threads that have not reached a MEMBAR instruction
Claim 7
Hakura (2019/0235915) teaches:
The computer-implemented method of claim 1, further comprising: prior to receiving the first store operation, receiving a third store operation from the first processor; and determining that the third store operation comprises an unordered store operation, wherein execution of the first store operation is not further delayed due to pendency of the third store operation. P. 0067 If the atomic processing circuit 480 receives a conventional atomic operation for the SM 310, then the atomic operation unit 430 executes the conventional atomic operation and returns the result to the SM 310
Claim 17
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 1, wherein delaying execution of the first store operation maintains ordering between the first store operation and the second store operation relative to a physical memory aperture. P. 0070 There are multiple levels of MEMBAR instructions … MEMBAR.CTA enforces memory ordering among threads in the CTA, MEMBAR.GL enforces ordering at the global level (e.g. among the memory interface 214 clients), and MEMBAR.SYS enforces ordering at the system level (e.g. including system and peer memory)
Claim 18
Nickolls (2011/0078692) teaches:
The computer-implemented method of claim 17, wherein the physical memory aperture comprises at least one of a system memory aperture, a peer memory aperture, or a video memory aperture. P. 0070 MEMBAR.SYS enforces ordering at the system level (e.g. including system and peer memory)
Claim 21
Hakura (2019/0235915) teaches:
A system comprising: a first processor that: generates a first store operation; and FIG. 3 streaming multiprocessors (SMs) 310
a memory management unit that: FIG. 4 Atomic Processing Circuit 480
receives the first store operation from the first processor, determines that the first store operation comprises an ordered store operation of a first type, P. 0073-74 L2 slice 270 receives an atomic operation from the SM 310, determining the ordering number 418 included in the ordered atomic operation 410 is equal to the current ordering number 494 associated with the target memory address 412; P. 0038 the operation may be a store operation
wherein the first store operation causes data stored in a memory by the first store operation and data stored in the memory by prior store operations to be visible in an ordered view of the memory; and P. 0049 When a memory address 490(x) is orderable, another memory address 490(y) is stores the current ordering number 494(x) associated with the memory address 490(x). Each address 390 stores a data value 492 or an ordering number 494
[…] a second store operation comprising an ordered store operation of a second type […] P. 0076-77 a received ordered atomic operation 410 may not have an equal ordering number to the current ordering number 494, in which case the ordered atomic operation 510 in the CAM
Hakura does not explicitly teach continuing to execute operations while a first ordered operation is pending.
Nickolls (2011/0078692) teaches:
wherein the first store operation causes data stored in a memory by the first store operation and data stored in the memory by prior store operations to be visible in an ordered view of the memory; and P. 0065 a memory transaction is considered “performed” when it has been committed to memory order and is visible to all other threads
delays execution of the first store operation pending receiving an acknowledgement that a second store operation comprising an ordered store operation of a second type, P. 0062 a memory barrier (MEMBAR) instruction is used to ensure that all memory transactions issued before the MEMBAR instruction are sufficiently performed so that their results are visible; P. 0088 MEMBAR.GL instructions commit all of the memory transactions before the MEMBAR.GL to the L2 cache 350, MEMBAR.SYS instructions commit all of the memory transactions before the MEMBAR.SYS to system memory 104 [either is analogous to a first type of store operation]
wherein the first processor continues to execute operations while the first store operation is pending. P. 0094 a “triggering” MEMBAR instruction (a MEMBAR instruction that opens the coalescing window) is received by the memory barrier instruction execution unit 500 within the L1 cache 320, the MEMBAR accept/retry unit 505 records the MEMBAR instruction and enters a coalescing window; P. 0095 While coalescing, the memory barrier instruction execution unit 500 in the L1 cache 320 will continue to process requests from threads that have not reached a MEMBAR instruction
It would have been obvious to a person with ordinary skill in the art at the effective filing date of the application to include the invention of Hakura with continuing to execute operations while a first ordered operation is pending taught by Nickolls
The motivation being it provides a many-core high performance compute platform (see Nickolls P. 0061)
The systems of Hakura and Nickolls are analogous because they are from the “same field of endeavor” and from the same “problem solving area.” Namely, they are both from the field of memory systems.
Therefore it would have been obvious to combine Hakura with Nickolls to obtain the invention as recited in claim 21.
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hakura et al. (U.S. PGPub No. 2019/0235915) in view of Nickolls et al. (U.S. PGPub No. 2011/0078692) in view of Basu et al. (U.S. PGPub No. 2019/0384722).
Claim 8
The systems of Hakura and Nickolls do not explicitly teach delaying an address translation for the first store operation pending completion of an address translation for the second store operation.
Basu (2019/0384722) teaches:
The computer-implemented method of claim 1, further comprising: delaying a first virtual address to physical address translation for the first store operation pending completion of a second virtual address to physical address translation for the second store operation. P. 0038 L2 TLB 440 selectively dispatches address translation requests to high priority queue 446 and low priority queue 448; P. 0040 address translation requests are allocated from the high priority queue 446 until it is empty, before translation requests in the low priority queue 448 are allocated
It would have been obvious to a person with ordinary skill in the art at the effective filing date of the application to include the invention of Hakura and Nickolls with delaying an address translation for the first store operation pending completion of an address translation for the second store operation taught by Basu
The motivation being it decreases address translation request congestion at any one hardware resource and maintains a balance for high-throughput address translation needs (see Basu P. 0043)
The systems of Hakura, Nickolls and Basu are analogous because they are from the “same field of endeavor” and from the same “problem solving area.” Namely, they are both from the field of memory systems.
Therefore it would have been obvious to combine Hakura and Nickolls with Basu to obtain the invention as recited in claims 8-12.
Claim(s) 13-16 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hakura et al. (U.S. PGPub No. 2019/0235915) in view of Nickolls et al. (U.S. PGPub No. 2011/0078692) in view of Adler et al. (U.S. PGPub No. 2018/0181432).
Claim 13
The systems of Hakura and Nickolls do not explicitly teach an interface that does not support non-posted operations receiving a non-posted operation.
Adler (2018/0181432) teaches:
The computer-implemented method of claim 1, further comprising: receiving a third store operation from the first processor; determining that the third store operation comprises an ordered store operation and is directed to a first interface that does not support non-posted store operations; and P. 0039 a non-posted write transaction that is to be routed to a PCI-based fabric that does not support a non-posted write
executing the third store operation without delaying the third store operation to receive an acknowledgement that data for the first store operation is visible in memory. P. 0039 the non-posted write may be converted; P. 0041 a bridge agent may map the non-posted memory write transaction to a posted memory write transaction (e.g., MWr32 or MWr64), and generate a completion to send to the original requester; P. 0025 the completion message provides the requested data
It would have been obvious to a person with ordinary skill in the art at the effective filing date of the application to include the invention of Hakura and Nickolls with an interface that does not support non-posted operations receiving a non-posted operation taught by Adler
The motivation being it ensures interoperability and backwards compatibility (see Adler P. 0041)
The systems of Hakura, Nickolls and Adler are analogous because they are from the “same field of endeavor” and from the same “problem solving area.” Namely, they are both from the field of memory systems.
Therefore it would have been obvious to combine Hakura and Nickolls with Adler to obtain the invention as recited in claims 13-15.
Claim 14
Adler (2018/0181432) teaches:
The computer-implemented method of claim 13, wherein the first store operation is directed to the first interface that does not support non-posted store operations. P. 0039 a non-posted write transaction that is to be routed to a PCI-based fabric that does not support a non-posted write
Claim 15
Adler (2018/0181432) teaches:
The computer-implemented method of claim 13, wherein the first interface comprises a peripheral component interconnect express (PCIe) interface. P. 0039 a non-posted write transaction is routed to a PCIe switch fabric that does not support a non-posted write
Claim 16
The systems of Hakura and Nickolls do not explicitly teach an interface that does not support non-posted operations receiving a non-posted operation.
Adler (2018/0181432) teaches:
The computer-implemented method of claim 1, wherein the first store operation is directed to a first interface that does not support non-posted store operations, and further comprising: P. 0039 a non-posted write transaction is routed to a PCIe switch fabric that does not support a non-posted write
receiving a third store operation from the first processor; P. 0039 a non-posted write transaction is routed to a PCIe switch fabric that does not support a non-posted write
determining that previously sent ordered store operations are visible in memory; and P. 0025 the completion message provides the requested data
executing the third store operation without delaying to receive an acknowledgement that data for the first store operation is visible in memory. P. 0039 the non-posted write may be converted; P. 0041 a bridge agent may map the non-posted memory write transaction to a posted memory write transaction (e.g., MWr32 or MWr64), and generate a completion to send to the original requester; P. 0025 the completion message provides the requested data
It would have been obvious to a person with ordinary skill in the art at the effective filing date of the application to include the invention of Hakura and Nickolls with an interface that does not support non-posted operations receiving a non-posted operation taught by Adler
The motivation being it ensures interoperability and backwards compatibility (see Adler P. 0041)
The systems of Hakura, Nickolls and Adler are analogous because they are from the “same field of endeavor” and from the same “problem solving area.” Namely, they are both from the field of memory systems.
Therefore it would have been obvious to combine Hakura and Nickolls with Adler to obtain the invention as recited in claim 14.
Claim 19
The systems of Hakura and Nickolls do not explicitly teach an interface that does not support non-posted operations receiving a non-posted operation.
Adler (2018/0181432) teaches:
The computer-implemented method of claim 1, wherein the second store operation is directed to a first interface that does not support non-posted store operations and P. 0039 a non-posted write transaction is routed to a PCIe switch fabric that does not support a non-posted write
the first store operation is directed to a second interface that supports non-posted store operations. P. 0024 and FIG. 2 IP agents 130, 140, and 150 may each include a corresponding primary interface, a sideband interface and a DFx interface; P. 0048-49 posted write transactions may be stored in a posted queue of an endpoint (also referred to as an agent); P. 0025 and FIG. 1 primary interface may support non-posted requests
It would have been obvious to a person with ordinary skill in the art at the effective filing date of the application to include the invention of Hakura and Nickolls with an interface that does not support non-posted operations receiving a non-posted operation taught by Adler
The motivation being it ensures interoperability and backwards compatibility (see Adler P. 0041)
The systems of Hakura, Nickolls and Adler are analogous because they are from the “same field of endeavor” and from the same “problem solving area.” Namely, they are both from the field of memory systems.
Therefore it would have been obvious to combine Hakura and Nickolls with Adler to obtain the invention as recited in claim 19.
Allowable Subject Matter
Claims 9-12 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
Claim 9 recites the limitation “determining that the second virtual address to physical address translation for the second store operation has completed; and
performing the first virtual address to physical address translation for the first store operation”
Said limitation is taught by the specification of the instant application as originally filed at least at [P. 0064-65]. Said limitations, in combination with the other recited limitations of claim 9, are not taught or suggested by the prior art of record.
The closest prior art of record is Basu (2019/0384722) which teaches dispatching address translation requests to a high priority queue and low priority queue, and executing translation requests from the high priority queue until it is empty before executing translation requests in a low priority queue but does not teach performing a low-priority translation request after determining a particular high priority translation request has completed.
Claims 10-12 depend from claim 9, and are considered allowable for at least the same reasons as claim 9.
Claim 20 recites the limitation “the first store operation is directed to a first aperture that supports non-posted store operations; and the
second store operation is directed to a second aperture that does not support non-posted store operations; and
further comprising: delaying the first store operation by inserting an aperture switch input/output flush operation on the second aperture, wherein the aperture switch input/output flush operation comprises a dummy read operation;
receiving a read operation response to the dummy read operation indicating that data associated with the second store operation is visible in memory; and
in response to receiving the read operation response, allowing the first store operation to proceed”
Said limitation is taught by the specification of the instant application as originally filed at least at [P. 0071-72]. Said limitations, in combination with the other recited limitations of claim 20, are not taught or suggested by the prior art of record.
The closest prior art of record includes: Nickolls (2011/0078692) and Adler (2018/0181432)
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Response to Arguments
Applicant's arguments filed 915/2025 have been fully considered but they are not persuasive.
The applicant argues neither Hakura or Nickolls teaches ordered store operations causing the associated data to be visible in an ordered view of memory. The examiner respectfully notes Hakura FIG. 4 shows an L2 slice with cached data 440 identified by memory addresses 490(0 ~ C-1). Hakura P. 0049 states memory addresses 490 may be orderable, where orderable addresses 490(x) have their associated ordering number 494 stored in the following address 490(y), and that all of the addresses 490(0 ~ C-1) store either data 492 or an ordering number 494. The memory addresses 490(0 ~ C-1) are analogous to an ordered view, as particular addresses 490 have an associated ordering number. Additionally, Nickolls P. 0065 explicitly states “a memory transaction is considered “performed” when it has been committed to memory order and is visible to all other threads”, indicating there is a memory order which is visible to other threads.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Riocreux (2011/0125944) teaches generating barrier transaction requests indicating to the interconnect that an ordering of transaction requests within a stream should be maintained
Hofmann (2008/0301342) teaches using a data synchronization barrier (DSB) to enforce an ordering requirement, where requests can be strongly ordered or weakly ordered.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEPHANIE WU whose telephone number is (571)272-0257. The examiner can normally be reached 1pm to 6pm, and 10pm to 1am Eastern time (10am to 3pm, and 7pm to 10pm Pacific time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocio Del Mar Perez-Velez can be reached at (571) 270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/STEPHANIE WU/ Primary Examiner, Art Unit 2133