DETAILED ACTION
Re Application No. 18/915697, this action responds to the amended claims dated 03/09/2026.
At this point, claims 1-20 have been cancelled. Claims 34 and 36 have been amended. Claims 21-40 are pending.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
Examiner notes Applicant’s amended claims dated 03/09/2026. In view of the amended claims, Examiner’s prior rejections under 35 USC § 112(b) have been rendered moot, and are accordingly withdrawn.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 21-24 and 34-39 are rejected under 35 U.S.C. 103 as being unpatentable over Hendry et al (US 2011/0252200 A1) in view of Dixon et al (US 2014/0281196 A1), further in view of Banerjee et al (US 2006/0206686).
Re claim 21, Hendry discloses the following:
An apparatus comprising […] a central processing unit (CPU) […] comprising: a plurality of CPU cores to execute instructions […] a CPU cache associated with one or more of the CPU cores (Fig. 3, CPU 34, CPU cores 46, caches 48 and 50). The apparatus comprises a CPU with multiple CPU cores and respective CPU caches, as well as a shared CPU cache;
a graphics processing unit (GPU) […] comprising: a plurality of GPU cores to execute instructions […] and a GPU cache associated with one or more of the GPU cores (Fig. 3, GPU 36, GPU cores 56, caches 58 and 60). The GPU contains a plurality of GPU cores to execute instructions, each GPU having a respective GPU cache, as well as a shared GPU cache;
translation circuitry to manage virtual-to-physical address mappings […] to access a memory (Figs. 3-4; ¶ 32-33 and 37). The cache coherence components may be part of the MMU/TLB (¶ 33 and 37). The MMU/TLB/coherence components store a plurality of page entries (virtual-to-physical mappings) (¶ 37) which allow for translating from virtual addresses to physical addresses, as well as determining ownership of addresses by components (¶ 32). Since the MMU/TLB/coherence components contain entries for pages, this can be considered “page table structures”;
at least a portion of the virtual-to-physical address mappings comprising shared virtual memory addresses to be shared by the CPU cores and the GPU cores to access the memory at corresponding physical memory addresses; and (¶ 32). In addition to the caches, the virtual-to-physical translation may also be used for accessing shared memory;
a system cache coupled to the CPU cores and the GPU cores, the system cache to store cache line data corresponding to the shared virtual memory addresses (Fig. 3, shared memory 42; ¶ 26-27 and 32). The system comprises a shared memory (shared cache) which stores data corresponding to shared virtual memory addresses (Fig. 3; ¶ 32). While the shared memory is not explicitly referred to as a “cache”, it is used to “temporarily” store instructions; moreover, it may be used to store data coherently with more persistent storage such as nonvolatile storage (¶ 26-27). Accordingly, the shared memory may be considered a “shared cache”, and the data stored therein may be considered “cache line data”;
wherein both the CPU cores and the GPU cores are to be provided access to the cache line data and wherein a coherency control structure is to be accessed to maintain the cache line data in a coherent state (¶ 32). Both the CPU and GPU cores have access to the shared memory (shared cache), and coherency is maintained through the MMU/TLB/cache coherence components (coherency control structure).
Hendry does not explicitly disclose the particular claimed structure of chips, packages, and instruction sets. Furthermore, while Hendry discloses a shared memory, which can broadly be considered a “shared cache”, it does not explicitly refer to it as such. For both these reasons, in the interest of furthering compact prosecution, Examiner has provided Dixon.
Dixon discloses the following:
a package device comprising a plurality of integrated circuit (IC) dies, the plurality of IC dies including (¶ 41). The system may implement an integrated circuit including a package comprising multiple chips, including the CPU and GPU;
a central processing unit (CPU) die comprising: a plurality of CPU cores to execute instructions of a first instruction set architecture; and […] a graphics processing unit (GPU) die comprising: a plurality of GPU cores to execute instructions of a second instruction set architecture; and (¶ 26-28). The processor may be implemented as any number of logical processors, including CPUs and GPUs (¶ 26). Each of these may include a plurality of cores (¶ 27). The various CPUs/GPUs/cores may share the same instruction set architecture, or may utilize different instruction sets (¶ 28);
a system cache coupled to the CPU cores and the GPU cores, the system cache to store cache line data corresponding to the shared virtual memory addresses, wherein both the CPU cores and the GPU cores are to be provided access to the cache line data and wherein a coherency control structure (Fig. 11, shared cache units 1106; ¶ 89). The shared cache units are shared across all the processor cores, which includes the GPU and CPU cores (Fig. 11). The caches maintain coherency to shared data (¶ 85).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to implement the multiprocessor system of Hendry using ICs with packages and chips, as in Dixon, because it would be applying a known technique to a known apparatus ready for improvement, to yield predictable results. Hendry discloses a multiprocessor system, which is ready to be implemented in ICs, packages, and chips. Dixon discloses implementing a multiprocessor system using ICs, packages, and chips, which is applicable to Hendry. It would have been obvious to integrate the multiprocessor of Hendry using ICs, packages, and chips, because it would yield the predictable result of utilizing known manufacturing techniques to implement the multiprocessor system. Additionally, it would have been obvious to modify the shared memory of Hendry to include a shared cache, as in Dixon, because it would be improving a similar apparatus in the same way. Hendry discloses a shared memory. Dixon also discloses a shared memory, which has been improved in a similar way to the claimed invention, to utilize caching. It would have been obvious to implement the shared cache of Dixon into the shared memory of Hendry, because it would yield the predictable improvement of speeding up access to shared memory.
Hendry (combined with Dixon) discloses a MMU/TLB which manages virtual-to-logical translations for memory locations; while it is well known in the art that a TLB typically works in conjunction with a page table, it is not explicitly disclosed whether the apparatus of Hendry (combined with Dixon) contains such a page table. Accordingly, in the interest of furthering compact prosecution, Examiner has provided Banerjee.
Banerjee discloses translation circuitry to manage virtual-to-physical address mappings stored in page table structures to access a memory (¶ 5-6 and 88). The memory apparatus contains a page table which works in conjunction with the TLB, wherein the page table stores entries including virtual-to-physical translations for memory locations.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine the TLB of Hendry (combined with Dixon) to work with a page table, as in Banerjee, because it would be applying a known technique to a known apparatus ready for improvement in order to yield predictable results. Hendry (combined with Dixon) disclose memory translation which is managed by an MMU/TLB, which is ready for the improvement of using a page table. Banerjee discloses utilizing a page table in conjunction with a TLB, which is applicable to the TLB of Hendry (combined with Dixon). It would have been obvious to combine the page table of Banerjee with the TLB of Hendry (combined with Dixon), because it would yield the predictable result of allowing memory requests directed at locations outside the TLB to nonetheless be serviced, albeit at a higher cost.
Re claim 22, Hendry, Dixon, and Banerjee disclose the apparatus of claim 21, and Hendry further discloses that the coherency control structure comprises a first coherency control structure, the apparatus further comprising: a second coherency control structure (Fig. 3, MMU/TLB 52 and 62, cache coherence components 54 and 64). The multiprocessor system comprises respective MMU/TLB/cache coherence components (collectively first and second coherency control structures).
Re claim 23, Hendry, Dixon, and Banerjee disclose the apparatus of claim 22, and Hendry further discloses that the first coherency control structure is to be implemented in circuitry of the GPU (Fig. 3). The MMU/TLB 62 and cache coherence component 64 (collectively the first coherency control structure) are implemented in the circuitry of the GPU.
Dixon discloses a GPU die (¶ 41).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry, Dixon, and Banerjee, for the reasons noted in claim 21 above.
Re claim 24, Hendry, Dixon, and Banerjee disclose the apparatus of claim 23, and Hendry further discloses that the first and second coherency control structures are to store tracking data to be dynamically updated responsive to accesses to the cache line data by the GPU cores or the CPU cores, respectively (Fig. 4; ¶ 36-38). The MMU/TLB/cache coherent components (first and second coherency control structures) include page tables listing page addresses owned by respective CPUs/GPUs; these track ownership, and are dynamically updated when a CPU/GPU accesses memory and takes ownership of a piece of data (cache line data).
Re claims 34-35, Hendry, Dixon, and Banerjee disclose the apparatuses of claims 21 and 23 above, respectively; accordingly, they also disclose methods implemented by those apparatuses, as in claims 34-35, respectively (see Hendry, abstract). Furthermore, it is noted that while claim 34 largely mirrors claim 21, it is slightly broader, in that it only requires “one or more” CPU cores and GPU cores, rather than “a plurality”. Nevertheless, Hendry discloses a plurality of CPU and GPU cores, which encompasses “one or more” as well.
Re claim 36, Hendry, Dixon, and Banerjee disclose the method of claim 35, and Hendry further discloses storing tracking data to be dynamically updated responsive to access to the cache line by the GPU cores in the first coherency control structure and storing tracking data to be dynamically updated responsive to the cache line data by the one or more CPU cores in a second coherency control structure (Fig. 4; ¶ 36-38). The MMU/TLB/cache coherent components (first and second coherency control structures) include page tables listing page addresses owned by respective CPUs/GPUs; these track ownership, and are dynamically updated in the respective structures when a CPU/GPU accesses memory and takes ownership of a piece of data (cache line data).
Re claims 37-39, Hendry, Dixon, and Banerjee disclose the apparatuses of claims 21 and 23-24 above, respectively; accordingly, they also disclose systems implementing similar functionality, as in claims 37-39, respectively (see Hendry, abstract).
Claims 25-33 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Hendry in view of Dixon, further in view of Banerjee, and further in view of Ramanathan et al (US 2007/0117348 A1).
Re claim 25, Hendry, Dixon, and Banerjee disclose the apparatus of claim 24, but do not specifically disclose stacked 3D IC dies.
Ramanathan discloses that the plurality of IC dies include stacked 3D IC dies (¶ 2 and 28). The integrated circuit comprises a plurality of 3D stacked IC dies, including stacking components such as logical circuitry (CPUs, graphics processors) and memory.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify the multiprocessor and memory components of Hendry (combined with Dixon and Banerjee) into a 3D stack of dies, as in Ramanathan, because Ramanathan suggests that 3D stacking brings improvements such as improved form factors, lower costs, enhanced performance, and greater integration through SOC solutions (¶ 2).
Re claim 26, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 25, and Ramanathan further discloses that the memory comprises a high-bandwidth memory (¶ 2). The 3D stack of dies includes high-bandwidth memory chips.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify the multiprocessor and memory components of Hendry (combined with Dixon and Banerjee) to utilize a high-bandwidth memory, as in Ramanathan, because Ramanathan suggests that 3D stacking enables higher bandwidth memory, which would in turn improve performance (¶ 2).
Re claim 27, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 26, and Hendry further discloses that the translation circuitry further comprises one or more translation lookaside buffers (TLBs) to cache at least a portion of the virtual-to-physical address mappings (¶ 37). The MMU/TLB/cache coherence component (translation circuitry) includes a TLB component, which stores a set of page translations.
Re claim 28, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 27, and Banerjee further discloses that the translation circuitry further comprises page table walker circuitry to perform a page walk through the page table structures to determine a virtual-to-physical address translation in response to a TLB miss (¶ 5-6 and 88). In conjunction with the TLB, there is a page table walker which can walk a page table to find a missing virtual-to-physical translation when there is a TLB miss.
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry, Dixon, Banerjee, and Ramanathan, for the reasons noted in claim 21 above.
Re claim 29, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 28, and Hendry further discloses that the one or more TLBs comprise a first TLB associated with at least one CPU core and a second TLB associated with at least one GPU core (Fig. 3, MMU/TLB 52 and 62). Each CPU/GPU core has associated MMU/TLB circuitry associated with it.
Re claim 30, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 29, and Hendry further discloses that the translation circuitry is integral to the GPU […] and at least one CPU […] (Fig. 3). The MMU/TLB/cache coherence components are integrated into the respective CPU and GPU.
Dixon discloses a CPU die [and a ] GPU die (¶ 41).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to combine Hendry, Dixon, Banerjee, and Ramanathan, for the reasons noted in claim 21 above.
Re claim 31, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 30, and Hendry further discloses that the tracking data comprises ownership data (Fig. 4). The cache coherence component includes ownership data.
Re claim 32, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 31, and Hendry further discloses that the one or more TLBs comprise a first TLB associated with at least one CPU core and a second TLB associated with at least one GPU core (Fig. 3, MMU/TLB 52 and 62). The one or more TLBs include MMU/TLBs in each of the GPU and CPU cores.
Re claim 33, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 32, and Hendry further discloses that the coherency tracking data is to be stored in the one or more TLBs (Figs. 3-4; ¶ 33). The coherency tracking data of the cache coherence components (Fig. 4) may be part of the MMU/TLB (¶ 33).
Re claim 40, Hendry, Dixon, Banerjee, and Ramanathan disclose the apparatus of claim 25; accordingly, they also disclose a system implementing similar functionality, as in claim 40 (see Hendry, abstract).
ACKNOWLEDGEMENT OF ISSUES RAISED BY THE APPLICANT
Response to Amendment
Applicant’s arguments with respect to claims 21-40 filed on 03/09/2026 have been fully considered.
As required by M.P.E.P. § 707.07(f), a response to these arguments appears below.
ARGUMENTS CONCERNING PRIOR ART REJECTIONS
Claims must be given the broadest reasonable interpretation during examination and limitations appearing in the specification but not recited in the claim are not read into the claim (See M.P.E.P. 2111 [R-1]).
Re claims 21 and 34, Applicant argues that Hendry, Dixon, and Bannerjee do not disclose separate CPU and GPU dies within a single package. In response, Applicant’s argument has been fully considered, but is not deemed persuasive. Dixon discloses that “[p]rocessor cores may be implemented in different ways, for different purposes, and in different processors. For instance, implementations of such cores may include: 1) a general purpose in-order core intended for general-purpose computing; 2) a high performance general purpose out-of-order core intended for general-purpose computing; 3) a special purpose core intended primarily for graphics and/or scientific (throughput) computing. Implementations of different processors may include: 1) a CPU including one or more general purpose in-order cores intended for general-purpose computing and/or one or more general purpose out-of-order cores intended for general-purpose computing; and 2) a coprocessor including one or more special purpose cores intended primarily for graphics and/or scientific (throughput). Such different processors lead to different computer system architectures, which may include: 1) the coprocessor on a separate chip from the CPU; 2) the coprocessor on a separate die in the same package as a CPU; 3) the coprocessor on the same die as a CPU (in which case, such a coprocessor is sometimes referred to as special purpose logic, such as integrated graphics and/or scientific (throughput) logic, or as special purpose cores); and 4) a system on a chip that may include on the same die the described CPU (sometimes referred to as the application core(s) or application processor(s)), the above described coprocessor, and additional functionality. Exemplary core architectures are described next, followed by descriptions of exemplary processors and computer architectures” [emphasis added] (¶ 72). Accordingly, not only does Dixon explicitly describe a GPU an CPU being on separate dies in the same package, but also suggests that a variety of different combinations of CPUs and coprocessors (GPUs) could be used for different purposes.
Re claims 22-33 and 35-40, Applicant argues that the claims are allowable by virtue of their dependence upon one of claims 21 and 34 above. Accordingly, Applicant is directed to Examiner’s comments regarding claims 21 and 34 above.
All arguments by the Applicant are believed to be covered in the body of the office action; thus, this action constitutes a complete response to the issues raised in the remarks dated 03/09/2026.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Rao et al (US 2015/0206277 A1). Discloses a unified memory architecture applied to a GPU and CPU (¶ 88), a feature that was claimed in parent application 18/531432, but which has not yet been claimed in the instant application.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Per the instant office action, claims 21-40 have received an action on the merits and are subject to a final rejection.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CRAIG S GOLDSCHMIDT whose telephone number is (571)270-3489. The examiner can normally be reached M-F 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CRAIG S GOLDSCHMIDT/Primary Examiner, Art Unit 2132