Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 10/24/2025.
Applicant’s arguments/remarks made in amendment filed 10/24/2025.
Claims 1, and 10 are amended. Claims 3, 5, 9, 12, 15, and 19-20 are cancelled.
Claims 1-2, 4, 6-8, 10-11, 13-14, 16-18, and 21-23 are presented for examination.
Response to Arguments
Applicant presents arguments. Each is addressed.
Applicant remarks that “The Applicant’s specification explains that a key objective of the disclosed technology is to reduce overhead (due to duplicative look-ups for the same information, see Id. At paragraph [0033]) by adapting the above-described system architecture in a way that allows the multiple different AI accelerators to access a shared memory space that stores training data or model parameters and do so without altering driver logic of these AI accelerators that, as purchased from 3rd party vendors, may require separate host memory spaces. See, e.g., Id. at paragraph [0034].” (Remarks, page 8, paragraph 2, line 6.) Examiner notes that these descriptions are not specifically written in the claim, particularly “without altering driver logic of these AI accelerators”. However, if a limitation could be written that specifically describes the underlined portions of the above quoted paragraph, the proposed limitation is not taught by the prior art of record.
Applicant argues “As discussed below, Lin and AMD do not render claim 1 obvious because neither reference contemplates a solution that make it possible to provide a shared memory space to multiple accelerators that themselves lack the logic to mutually access a same (shared) host memory space.” (Remarks, page 9, paragraph 3, line 1.) Examiner acknowledges the description above, if understood correctly, is not taught by the prior art of record. However, the claim does not recite the above quoted description specifically including “…multiple accelerators that themselves lack the logic to mutually access a same (shared) host memory space”. Therefore, the rejection is proper and maintained.
Applicant argues “AMD does not appear to disclose or suggest any possible motivation for mapping the host memory spaces used by different peripheral devices to a share memory space.” (Remarks, page 9, paragraph 4, line 6.) However, AMD is used to map to “multiple different dedicated host memory spaces with the host device”, and “separate from the host device”. The motivation to combine AMD is “to efficiently translate addressees for dynamic memory and protect memory from disallowed actions”, something AMD specifically recites (AMD, page 13, paragraph 1, line 1. See office action page 9.). Therefore, the rejection is proper and maintained.
Applicant argues that “Likewise, Lim does not disclose or suggest any mechanism for mapping those different memory spaces in the host to a shared memory space (e.g., so that it is possible for those accelerator devices to operate as if they are accessing their respective different memory space in host memory).”(Remarks, page 10, paragraph 3, line 3.) However, Lim discloses “multiple accelerators accessing a shared memory space”. Lim does not disclose “multiple different dedicated host memory spaces within the host device”, whereas AMD discloses “multiple different dedicated host memory spaces within the host device” (See office action, page 8.). This is why AMD is combined into Lim. Therefore, the rejection is proper and maintained.
Applicant argues “At most, Lin discloses workers configured to directly access a shared memory – a solution that could only be achieved by altering driver logic in the workers. The claimed solution, in contrast makes it possible for different AI accelerators to share a memory space without altering their respective driver logic.” (Remarks, page 11, paragraph 2, line 6.) However, there is no mention of “without altering their respective driver logic” in the claims. Therefore, the rejection is proper and maintained.
Applicant argues “Claims 10 and 18 recite features substantially similar to those discussed above with respect to claim 1….As such, the Applicant believes that claims 10 and 18 are allowable over the art of record.” (Remarks, page 11, paragraph 4, line 1.) However, claim 1 remains rejected. Therefore, claims 10 and 18 remain rejected. The dependent claims remain rejected at least for depending from rejected base claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 7-8, 10-11, 18, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Lim et al (Accelerating Training of DNN in Distributed Machine Learning System with Shared Memory, herein Lim), and Advanced Micro Devices, Inc. (AMD I/O Virtualization Technology (IOMMU) Architectural Specification, herein AMD.
Regarding claim 1,
Lim teaches a computer system comprising: one or more processors of a host device that store model parameters or training data for an artificial intelligence model (Lim, Figure 1, Figure 2, and, page 1209, column 2, paragraph 4, line 1 “In this paper, we use remote shared memory to maintain parameters of distributed deep learning model. We modified the remote memory extension system we developed in our previous study [7,10]. Memory server connected via high-speed network exports its own main memory, which is accessible through low-latency and high-speed RDMA and shared across multiple worker nodes.”
PNG
media_image1.png
329
536
media_image1.png
Greyscale
PNG
media_image2.png
287
569
media_image2.png
Greyscale
Examiner notes that “Host device” is a broad phrase. Examiner is interpreting that a “Host device” is a computer that has some particular computer function, such as an application, processing power, or memory that can be shared by users or remote devices. In other words,
memory server is host device, memory server has one or more processors, parameters is parameters, and distributed deep learning model is artificial intelligence model.);
a plurality of artificial intelligence accelerators that process data for the artificial intelligence model using the model parameters or the training data (Lim, See above mapping. In other words, from Figures 1 and 2, workers are plurality of artificial intelligence accelerators that process data, parameters are model parameters, and, from Fig. 1, device driver is device driver configured to access host memory space on the memory server.),
each of the plurality of artificial intelligence accelerators being configured to access a different one of [multiple different dedicated host memory spaces] within the host device (Lim, See above mapping. In other words, from Figures 1 and 2, workers are plurality of artificial intelligence accelerators, and, from Fig. 1, device driver is device driver configured to access host memory space within the memory server on the host device.) ;
a memory agent device [separate from the host device] that serves as a communication interface between the one or more processors of the host device and the plurality of artificial intelligence accelerators (Lim, Figures 2 and 3. Examiner notes the specification of the instant application recites “In such embodiments, the memory agent device may comprise a shared buffer and may be configured to cache the training data or the model parameters in the shared buffer when a first accelerator of the plurality of artificial intelligence accelerators accesses the training data or the model parameters until each of the plurality of artificial intelligence accelerators has accessed the training data or the model parameters.” Therefore, Examiner is interpreting the memory agent device is a shared buffer. From Figure 2, each worker (artificial intelligence accelerator) has an SMDevice that interfaces with the buffer.
PNG
media_image3.png
431
525
media_image3.png
Greyscale
In other words, the Buffer in the Shared Memory of the Memory Server is a memory agent device that interfaces with each of the artificial intelligence accelerators.), the memory agent device configured to:
provide a shared memory space to the plurality of artificial intelligence accelerators, the shared memory space being mapped, via a many-to-one mapping, to the [multiple different dedicated host memory spaces] of the plurality of artificial intelligence accelerators (Lim, Figure 1. In other words, memory server is host, from Figure 1, shared memory region is shared memory space, workers is plurality of artificial intelligence accelerators, and, the shared memory region is mapped via a many-to-one mapping from the dedicated host memory space to the respective artificial intelligence accelerators. Examiner notes that multiple different dedicated host memory space is previously mapped to AMD.);
cache, at an address block of the shared memory space, a subset of the training data or the model parameters obtained from the host device (Lim, Figure 1, and Figure 3, abstract, line 10 “Our framework can accelerate training of DNN by speeding up the parameter sharing in every training iteration in distributed model training.” In other words, parameters is a subset of training data, and from Figure 3, buffer is shared memory space, and Shmem table identifies an address block of the shared memory space, and memory server is host device.); and
return the subset of the training data or the model parameters from the address block of the shared memory space when responding to requests from the plurality of artificial intelligence accelerators that are directed to the [multiple different dedicated host memory spaces] with the host device (Lim, Figure 3. In other words, the read/write arrows from the buffer of the shared memory in the memory server are returning a subset of the training data or the model parameters from the address block of the shared memory space as a response to requests from a plurality of artificial intelligence accelerators (workers). Examiner notes that multiple different dedicated host memory space is previously mapped to AMD.).
Thus far, Lim does not explicitly teach multiple different dedicated host memory spaces within the host device.
AMD teaches multiple different dedicated host memory spaces within the host device (AMD, page 18, paragraph2, line 1 “In summary, the IOMMU is very similar to the processor's MMU, except that it provides address translation and page protection to memory accesses by peripheral devices rather than memory accesses by the processor and that it provides an interrupt remapping capability.” And, page 22, paragraph 6, line 1 “An IOMMU with two-level translation enforces system protection policies while allowing arbitrary I/O devices to be properly controlled by user-level processes in a virtualized system. As noted in Section 2.2.4 [User Mode Device Accesses], I/O devices whose memory accesses are translated by the IOMMU can only access pages that are explicitly mapped by the associated I/O page tables as granted by the hypervisor and operating system.” And, page 151, paragraph 4, line 1 “To avoid deadlocks, the IOMMU requires a dedicated virtual channel for its I/O page table walk requests.” In other words, address translation and page protection to memory accesses by peripheral devices is providing multiple different dedicated host memory spaces within the host device.)
AMD teaches separate from the host device (AMD, Figure 1,
PNG
media_image4.png
537
577
media_image4.png
Greyscale
In other words, the IOMMU is memory agent device separate from the host device.)
Both Lim and AMD are directed to memory accessed by external devices, among other things. Lim teaches a computer system comprising one or more processors of a host device that store model parameters or training data for an artificial intelligence model, a plurality of artificial intelligence accelerators that process data for the artificial intelligence model using the model parameters or the training data each of the plurality of artificial intelligence accelerators being configured to access memory within the host device and a memory agent device; but does not explicitly teach multiple different dedicated host memory spaces, or a memory agent device that is separate from the host device. AMD teaches multiple different dedicated host memory spaces and a memory agent device that is separate from the host device.
In view of the teaching of Lim, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of AMD into Lim. This would result in a computer system comprising one or more processors of a host device that store model parameters or training data for an artificial intelligence model, a plurality of artificial intelligence accelerators configured to access a different one of multiple different dedicated host memory spaces, and a memory agent device that is separate from the host device.
One of ordinary skill in the art would be motivated to do this in order to efficiently translate addresses for dynamic memory and protect memory from disallowed actions. (AMD, page 13, paragraph 1, line 1 “The I/O Memory Management Unit (IOMMU) is a system function that translates addresses used in DMA transactions, protects memory from disallowed access by I/O devices, and remaps peripheral interrupts.”)
10. Regarding claim 2,
The combination of Lim and AMD teaches the computer system of claim 1 wherein the computer system further comprises
one or more communication links between accelerators of the plurality of artificial intelligence accelerators (Lim, page 1210, column 1, paragraph 4, line 3 “We modified TensorFlow, an open-source deep learning framework developed by Google. In TensorFlow, parameters are shared between workers through parameter servers in distributed training. TensorFlow uses gRPC (over TCP) protocol to move data between workers and parameter servers.” In other words, move data between workers is one or more communication links between accelerators.), wherein the memory agent device is configured to:
initiate communication of at least a portion of the training data or at least a portion of the model parameters over the one or more communication links (Lim, Figure 3. See above mapping. In other words, parameter server is memory server, from prior mapping, Buffer within Shared Memory of Memory Server is memory agent device, parameters are shared between workers through parameter servers, in distributed training is communication of at least a portion of the training data or model parameters over the one or more communication links.).
Regarding claim 7,
The combination of Lim and AMD teaches the computer system of claim 1, wherein
the memory agent device stores a mapping of virtual page numbers used by the plurality of artificial intelligence accelerators to physical page numbers of one or more memory circuits of the memory agent device (Lim, Figure 3, and Table 1, and, page 1210, column 2, paragraph 3, line 1 “In order for multiple workers to share parameters stored in shared memory, mapping table is required to manage the memory mapping information of the parameters. Because workers should share the mapping table; the mapping table is maintained in the shared memory. Since shared memory is mapped to the virtual address space of a process, workers share and access parameters and mapping table through their respective process address spaces.” In other words, Shmem table is a mapping of virtual page numbers to physical page numbers.)
Regarding claim 8,
The combination of Lim and AMD teaches the computer system of claim 1, wherein
the memory agent device comprises a shared buffer and is configured to cache the subset of the training data or model parameters in the shared buffer when a first accelerator of the plurality of artificial intelligence accelerators accesses the training data or the model parameters (Lim, Figures 2 and 3, In other words, memory server buffer is memory agent device which comprises a shared buffer, model parameters, and AI accelerators have been previously mapped, the shared memory caches the model parameters, and from Figure 2, each of the workers (accelerators) accesses the shared memory to access a subset of the training data or the model parameters.).
Claim 10 is a method claim having the same relevant limitations as computer system claim 1. Otherwise, they are the same. The combination of Lim and AMD teaches a method. (Lim, page 1210, column 1, paragraph 3, line 1 “In this paper, we propose the distributed deep learning framework that shares parameters by using the remote shared memory described in chapter II.” In other words, framework that shares parameters by using the remote shared memory is a method.) Therefore, claim 10 is rejected for the same reasons as claim 1.
Claim 11 is a method claim depending from claim 10 corresponding to a subset of computer system claim 2. Otherwise, they are substantially the same. Therefore, claim 11 is rejected for the same reasons as claims 10 and 2.
Claim 18 is a non-transitory computer readable storage claim having the same relevant limitations as computer system claim 1. Otherwise, they are the same. The combination of Lim and AMD teaches a non-transitory computer readable storage medium (AMD, page 13, paragraph 4, line 20 “Device Exclusion Vector (DEV). Contiguous arrays of bits in physical memory. Each bit in the DEV table represents a 4KB page of physical memory (including system memory and MMIO).” In other words, physical memory is non-transitory computer readable storage medium.) Therefore, claim 18 is rejected for the same reasons as claim 1.
Regarding claim 21,
The combination of Lim and AMD teaches the computer system of claim 1 wherein the memory agent:
receives a plurality of addresses from the plurality of artificial intelligence accelerators (Lim, Figures 1 and 3, and Table 1,
PNG
media_image5.png
473
543
media_image5.png
Greyscale
In other words, worker is AI accelerator, and Shared memory region receives a plurality of addresses from the plurality of AI accelerators.);
stores the addresses as virtual page numbers (Lim, Table 1, “Allocate memory server’s centralized memory on the process’s virtual address space. Return shared memory identifier.” In other words, share memory identifier is store address as virtual page number.); and
maps the virtual page numbers to a physical page of one or more memory circuits (Lim Figure 3, and Table 1. See mapping of Claim 7, office action page 10. In other words, Shmem table is a mapping of virtual page numbers to physical page numbers.).
Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Lim, AMD, and Stumm et al (Algorithms Implementing Distributed Shared Memory, herein Stumm).
Regarding claim 4,
The combination of Lim and AMD teaches the computer system of claim 1 wherein
the one or more processors are configured to write to the shared memory space (Lim, Figure 3, See above mapping. In other words, read/write is the plurality of processors are configured to write to shared memory space.) and
Thus far, the combination of Lim and AMD does not explicitly teach the plurality of artificial intelligence accelerators are not configured to write to the shared memory space.
Stumm teaches the plurality of artificial intelligence accelerators are not configured to write to the shared memory (Stumm, page 56, column 2, line 1 “Central-server algorithm. The simplest strategy for implementing distributed shared memory uses a central server that is responsible for servicing all accesses to shared data and maintains the only copy of the shared data. Both read and write operations involve the sending of a request message to the data server by the process executing the operation, as depicted in Figure 2.”
PNG
media_image6.png
309
673
media_image6.png
Greyscale
In other words, the central server is responsible for servicing all accesses is the plurality of AI accelerators are not configured to write to the shared memory.)
Both Stumm and the combination of Lim and AMD are directed to shared memory. The combination of Lim and AMD teaches a computer system with a shared memory space that is readable by a plurality of artificial intelligence accelerators, but does not explicitly teach the artificial intelligence accelerators are not configured to write to the shared memory. Stumm teaches algorithms for implementing shared memory including the central host controlling what’s written to the shared memory.
In view of the teaching of the combination of Lim and AMD it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Stumm into the combination of Lim and AMD. This would result in a computer system with a shared memory space that is readable by a plurality of artificial intelligence accelerators, where the artificial intelligence accelerators are not configured to write to the shared memory.
One of ordinary skill in the art would be motivated to do this to ensure consistency of data i.e. avoiding the readers-writers problem where the writes occur during reading. (Stumm, page 56, column 2, paragraph 3, line 1 “More formally, the result of applications using shared data should be the same as if the memory operations of all hosts were executing in some sequential order, and the operations of each individual host appear in sequence in the order specified by its program, in which case the shared memory is said to be consistent.”)
Claim 13 is a method claim corresponding to the second limitation of computer system claim 4. Otherwise, they are substantially the same. Therefore, claim 13 is rejected for the same reasons as claim 4.
Claims 6, 14, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Lim, AMD, and Ma et al (A Hypervisor for Shared-Memory FPGA Platforms, herein Ma).
Regarding claim 6,
The combination of Lim and AMD teaches the computer system of claim 1 wherein
Thus far, the combination of Lim and AMD does not explicitly teach the memory agent device is a field-programmable gate array or an application-specific integrated circuit
Ma teaches the memory agent device is a field-programmable gate array or an application-specific integrated circuit (Ma, abstract, paragraph 2,ine 1 “This paper presents OPTIMUS, the first hypervisor that supports scalable shared-memory FPGA virtualization.” In other words, shared-memory FPGA is a field-programmable gate array memory agent device.)
Both Ma and the combination of Lim and AMD are directed to shared memory systems. The combination of Lim and AMD teaches a computer system with one or more processors with a shared memory space that is readable by a plurality of artificial intelligence accelerators, but does not explicitly teach a memory agent device that is a field-programmable gate array or an application-specific integrated circuit. Ma teaches a memory agent device that is a field programmable gate array or an application-specific integrated circuit.
In view of the teaching of the combination of Lim and AMD, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ma into the combination of Lim and AMD . This would result in a computer system with one or more processors with a shared memory space that is readable by a plurality of artificial intelligence accelerators, where the memory agent device is a field-programmable gate array or an application-specific integrated circuit.
One of ordinary skill in the art would be motivated to do this because most accelerators are host-centric shared memory systems which have high run-time overhead for workloads. (Ma, abstract, line 7 “The host-centric model incurs high runtime overhead for workloads that exhibit pointer chasing. Thus, FPGAs are beginning to support a shared-memory programming model in which accelerators can issue DMAs.”)
Claim 14 is a computer method claim corresponding to computer system claim 6. Otherwise, they are substantially the same. Therefore, claim 14 is rejected for the same reasons as claim 6.
Claim 16 is a method claim depending from method claim 14 that has the same relevant limitations as computer system claim 8. Otherwise, they are substantially the same. Therefore, claim 16 is rejected for the same reasons as claims 14 and 8.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Lim, AMD, Ma, and Archer, et al (Broadcasting Data in a Hybrid Computing Environment – US2010/0191822 A1, herein Archer).
Regarding claim 17,
The combination of Lim, AMD, and Ma teaches the method of claim 16, wherein the memory agent device
Thus far, the combination of Lim, AMD, and Ma does not explicitly teach increments a counter when each of the plurality of artificial intelligence accelerators access the training data or the model parameters and resets the counter when it is equal to a number of accelerators in the plurality of artificial intelligence accelerators.
Archer teaches increments a counter when each of the plurality of artificial intelligence accelerators access the training data or the model parameters and resets the counter when it is equal to a number of accelerators in the plurality of artificial intelligence accelerators (Archer, FIG. 5, and, paragraph [0075], line 7 “In the method of FIG. 5, notifying (506) the host computer (110) that the accelerators (104) have read the data includes incrementing (510) remotely, by each accelerator (104) upon reading the data, a counter (512) residing in the shared local memory (159) of the host computer (110). The method of FIG. 5 also includes determining (516), by the host computer (110), whether the value of the counter (512) exceeds a predetermined threshold (514). If the value of the counter exceeds the predetermined threshold, the broadcast operation is complete (518). If the value of the counter does not exceed the predetermined threshold (514), the broadcast operation is incomplete (520) and the host computer (110) may returns to determining (516) whether the value of the counter exceeds the predetermined threshold. Incrementing (510) the counter (512) remotely by each accelerator (104) upon reading the data may be carried out by locking by each accelerator the counter…”
PNG
media_image7.png
613
467
media_image7.png
Greyscale
In other words, counter is counter, incrementing the counter remotely by each accelerator is increments a counter when each of the plurality of AI accelerators accesses the training data, and, the counter restarts for each broadcast is resets the counter.)
Both Archer and the combination of Lim, AMD, and Ma are directed to shared memory systems. The combination of Lim, AMD, and Ma teaches a computer system with one or more processors of a host device with a shared memory space that is shared by a plurality of artificial intelligence accelerators; but does not explicitly teach incrementing a counter for each accelerator that accesses the training data or the model parameters for updates or retaining the data in shared memory until the accelerators are updated. Archer teaches a shared memory system where a counter is incremented for each accelerator that accesses the training data or shared parameters for updates and retains the data in shared memory until all the accelerators are updated.
In view of the teaching of the combination of Lim, AMD, and Ma it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Archer into the combination of Lim, AMD, and Ma. This would result in a computer system with one or more processors of a host device with a shared memory space that is shared by a plurality of artificial intelligence accelerators, and incrementing a counter for each accelerator that accesses the training data or the model parameters for updates or retaining the data in shared memory until the accelerators are updated.
One of ordinary skill in the art would be motivated to do this because hybrid environments are more powerful that non-hybrid computing environments but still present substantial challenges such as sharing data in a distributed environment and using a counter can help mitigate the challenges. (Archer, paragraph [0005], line 4 “Such computing environments are described in this specification as hybrid environments, denoting that such environments include host computers and accelerators having different architectures. Although hybrid computing environments are more computationally powerful and efficient in data processing than many non-hybrid computing environments, such hybrid computing environments still present substantial challenges to the science of auto mated computing machinery.”)
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Lim, AMD, and Cohen, et al (Theory of Multi Core Hypervisor Verification, herein Cohen).
27. Regarding claim 22,
The combination of Lim and AMD teaches the computer system of claim 21, wherein
Thus far, the combination of Lim and AMD does not explicitly teach the memory agent device further classifies addresses as one of shared or unique.
Cohen teaches the memory agent device further classifies addresses as one of shared or unique (Cohen, page 10, paragraph 3, line 3 “A slightly more sophisticated discipline is to classify each address as either shared or owned by a single processor.” In other words, classify each address as either share or owned by a single processor is classifies addresses as one of shared or unique.), and wherein when an address received from one of the plurality of artificial intelligence accelerators does not match an address stored as the virtual page numbers,
the memory agent device accesses a physical page of one or more memory circuits of the memory agent device that is not shared (Lim, Figure 3. And, page 1210, column 2, paragraph 2, line 1 “As shown in Figure 3, SMAllocator in SMDevice allocates and deallocates share memory for parameters from memory server. Newly added operation kernels for SMDevice read and write parameters from (to) allocated shared memory. And, page 1210, column 1, paragraph 2, line 1 “If an application write data to allocated memory, it is applied to its local temporal memory region. Application can issue transferring data from temporal memory region to remote shared memory explicitly by using mb_wsync().” In other words, SMAllocator in SMDevice allocates memory on a physical page, otherwise, it transfers data using mb_wsync to shared memory.)
Both Cohen and the combination of Lim and AMD are directed to shared memory systems and model parallelism, among other things. The combination of Lim and AMD teaches a computer system with one or more processors with a shared memory space that is readable by a plurality of artificial intelligence accelerators and a memory agent device; but does not explicitly teach the memory agent device further classifies addresses as one of shared or unique. Cohen teaches the memory agent device further classifies addresses as one of shared or unique.
In view of the teaching of the combination of Lim and AMD it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Cohen into the combination of Lim and AMD Ma. This would result in a computer system with one or more processors with a shared memory space that is readable by a plurality of artificial intelligence accelerators and a memory agent device, where the memory agent device further classifies addresses as one of shared or unique.
One of ordinary skill in the art would be motivated to do this because having multi core systems improves speed through parallelism and verifying hypervisor correctness is important for confidence in the hypervisor execution. (Cohen, Abstract, line 1 “ From 2007 to 2010, researchers from Microsoft and the Verisoft XT project verified code from Hyper-V, a multi-core x-64 hypervisor, using VCC, a verifier for concurrent C code. However, there is a significant gap between code verification of a kernel (such as a hypervisor) and a proof of correctness of a real system running the code. When the project ended in 2010, crucial and tricky portions of the hypervisor product were formally verified, but one was far from having an overall theory of multi core hypervisor correctness even on paper. For example, the kernel code itself has to set up low-level facilities such as its call stack and virtual memory map, and must continue to use memory in a way that justifies the memory model assumed by the compiler and verifier, even though these assumptions are not directly guaranteed by the hardware.”)
Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Lim, AMD, Cohen, and Ma.
Regarding claim 23,
The combination of Lim, AMD, and Cohen teaches the computer system of claim 22, wherein
Thus far, the combination of Lim, AMD, and Cohen does not explicitly teach the one or more processors are part of a host system.
Ma teaches the one or more processors are part of a host system (Ma, page 827, column 2, paragraph 4, line1 “The key difference between host-centric and shared-memory FPGA programming models is whether or not accelerators can issue direct memory accesses (DMAs, via which an I/O device obtains data from system memory). In host-centric models, the host issues all DMAs via a CPU-configured DMA engine, which passes the accessed data to the necessary accelerator; the accelerators themselves cannot issue DMAs.” In other words, host-centric system is the one or more processors are part of a host system. ), and wherein, in response to receiving a first address stored as a first virtual page number in the memory agent device from one of the plurality of artificial intelligence accelerators,
Ma teaches the memory agent device retrieves data from the host system and stores the data in the one or more memory circuits (Ma, See above mapping. And, page 827, column 2, paragraph 3, line 4 “Although multi-tenant FPGA hypervisors and operating systems exist [15, 18, 21, 37, 40, 53, 55, 72-74, 86], these solutions are restricted to FPGA platforms that expose a host-centric programming model, as opposed to a shared-memory model.” In other words, the host issues all DMAs is the memory agent device retrieves data from the host system and stores the data in host memory, which is one or more memory circuits.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ma into the combination of Lim, AMD, and Cohen, at least for the reasons used to combine Ma into the combination of Lim and AMD described in the mapping of claim 6.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.I.R./Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124