Last updated: April 19, 2026

Application No. 17/379,121

PERFORMING GLOBAL MEMORY ATOMICS IN A PRIVATE CACHE OF A SUB-CORE OF A GRAPHICS PROCESSING UNIT

Final Rejection §103

Filed

Jul 19, 2021

Examiner

BROWN, SHEREE N

Art Unit

2612

Tech Center

2600 — Communications

Assignee

Intel Corporation

OA Round

4 (Final)

Interview Optional

— +27.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 738 resolved cases, 2023–2026

Examiner Intelligence

BROWN, SHEREE N View full profile →

Grants 65% — above average

Career Allow Rate

481 granted / 738 resolved

+3.2% vs TC avg

Strong +27% interview lift

Without

With

+27.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

34 currently pending

Career history

772

Total Applications

across all art units

Statute-Specific Performance

§101

14.3%

-25.7% vs TC avg

§103

25.0%

-15.0% vs TC avg

§102

32.7%

-7.3% vs TC avg

§112

22.0%

-18.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 738 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Application Status
This office action is responsive to the amendments filed on 10/01/2025.  
This action has been made FINAL.  
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/02/2025 is being considered by the examiner.  A signed IDS is hereby attached.
Response to Arguments
Applicant's arguments filed 10/01/2025 have been fully considered but they are not persuasive. 
Applicant alleged the following on page 6 of the remarks, “As explained earlier during prosecution of the present application, examples of the limitations of current atomic operations are described in the background and in the beginning of the detailed description. For example, as noted in the background, at present, graphics processing unit (GPU) application programming interfaces (APIs) and hardware support two types of memory atomic operations (i.e., global atomic operations and local memory atomic operations).  See Specification at 0003 and 0039.  Global atomic operations are performed in the last-level cache or the memory controller and local atomic operations are performed in a shared local memory (SLM) of a sub-core of the GPU. See Specification at [0003] and [0039].  The methods and systems described in the above-captioned patent application, enable what is referred to as "Li atomics" - the performance of global memory atomics in a private cache (e.g., an Li cache) of a sub-core of a GPU. See Specification at [0039]. Li atomics are thought to mitigate various inefficiencies of local (or SLM atomics) and L3 atomics, including the complexities associated with management of the SLM, higher latency, and lower bandwidth. See, e.g., Specification at [0039]”.  The examiner is not persuaded.  In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., Paragraph 0003 and 0039) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Accordingly, the rejection is maintained.
Applicant alleged the following, “Addressing the claim amendments proposed herein (i.e., the second wherein clause of the receive limitations - "wherein a shared local memory (SLM) of the load/store pipeline and the primary data cache each represent a partition of a common random access memory") first, the undersigned respectfully noted Li's L1 cache is not taught or reasonably suggested to represent one of at least two partitions of a common random access memory in which a shared local memory represents another partition within the common random access memory.”  The examiner is not persuaded.  The examiner maintains the combination of Ray and Li discloses the Applicant’s claim language.  Moreover, Ray’s teachings in Paragraphs 0004; 0165 explicitly discloses the Applicant’s claim language of “wherein a shared local memory (SLM) of the load/store pipeline”.  Additionally, Ray discloses “the primary data cache” in Paragraph 0054 and goes on to disclose the Applicant’s claim limitation of “each represent a partition of a common random access memory” in Paragraphs 0053-0054.  Accordingly, the rejection is maintained.
The Applicant alleges the following: “Notably, while the above-quoted portion of Ray does use the word "partition," there is no teaching or reasonable suggestion that an SLM and a primary data cache representing a private L1 cache of a sub-core of a GPU should "each represent a partition of a common random access memory" as recited. Rather, this portion of Ray discusses with reference to FIG. 2 of Ray partition units 220A-N of a memory interface 218 and the potential one-to-one correspondence of the partition units 220A-N to memory units 224A-N of a parallel processing memory 222. Nowhere in the relied upon portions of Ray is there any specific reference to either an SLM or a primary data cache, let alone any teaching regarding an SLM and a primary data cache each representing a partition of a common random access memory. As the Examiner should appreciate, the mere use of the word "partition" is clearly insufficient to meet the limitations at issue. For at least this reason, independent claim 1 (as amended) and its dependent claims, which add further limitations are thought to be clearly distinguishable over the Examiner's proposed combination of Ray and Li.”  The examiner is not persuaded.  The examiner maintains the combination of Ray and Li discloses the Applicant’s claim language.  Moreover, Ray’s teachings in Paragraphs 0004; 0165 explicitly discloses the Applicant’s claim language of “wherein a shared local memory (SLM) of the load/store pipeline”.  In addition, Ray discloses “the primary data cache” in Paragraph 0054 and goes on to disclose the Applicant’s claim limitation of “each represent a partition of a common random-access memory” in Paragraphs 0053-0054.  Additionally, the Applicant provides no special definition for the term “represents” and as such maybe broadly interpreted by the examiner.  Moreover, the examiner notes the term “should” is not mentioned in the claims.  The examiner suggest provided more clear and concise language.  Because "applicants may amend claims to narrow their scope, a broad construction during prosecution creates no unfairness to the applicant or patentee." In re ICON Health and Fitness, Inc., 496 F.3d 1374, 1379 (Fed. Cir. 2007) (citing In re Am. Acad. of Sci. Tech Ctr., 367 F.3d 1359, 1364 (Fed. Cir. 2004)).  Accordingly, the rejection is maintained.
The Applicant alleges the following on page 10 of the remarks: “The undersigned finds no teachings in either of Ray or Li purporting to achieve this result of performing a global memory atomic operation within a private L1 cache of a sub-core of a GPU, let alone associating a local scope with a global memory atomic operation. For at least these reasons, independent claim 1 (as amended) and its dependent claims, which add further limitations, are thought to be clearly distinguishable over the Examiner's proposed combination of Ray and Li”.  The examiner is not persuaded. The examiner maintains the combination of Ray and Li discloses the Applicant’s claim language.  Moreover, Li’s teachings in Column 95, Lines 4-20 discloses the Applicant’s claim language of “private level-1 (L1) cache.”  Moreover, Ray discloses the Applicant’s claim limitation of “wherein the atomic operation comprises a global memory atomic operation with local scope” in Paragraph 0064; 0148; 0165.  Paragraph 0148 recites “Embodiments provide for a novel technique for merging multiple atomic operations to memory into a single atomic operation. In one embodiment, this novel and innovative technique may be implemented in hardware and used to detect such cases and merge any multiple same-address atomic operations in to a single atomic operation. Thus, an atomic SIMD16 operation may be completed in a single cycle, resulting in significantly speeding up these applications. This idea can be applied to both SLM and global memory systems”.  More specifically, Ray explicitly states “… can be applied to both SLM and global memory systems” in paragraph 0148. 
Accordingly, the rejection is maintained.
Applicant alleged the following, “The undersigned finds no reference to a "parameter" in Ray that indicates a global memory atomic instruction has a local scope in the portions of Ray relied upon by the Examiner or elsewhere in Ray.”  The examiner is not persuaded.  Ray’s teachings of “state parameters and commands defining how the data is to be processed” in paragraph 0052 discloses the Applicant’s claim language of “parameters”.  Moreover, Ray teachings in Paragraphs 0064; 0148; 0165 discloses the Applicant’s claim language.  Additionally, Ray’s teachings of “in one embodiment, the following algorithm may be implemented in one or more of SLM and data port controllers for handling local memory and global memory, respectively. For example, in one implementation, certain mutliplexers may be needed to replace the atomic opcode and atomic operand per slot” in Paragraph 0165 explicitly discloses the Applicant’s claim limitation. Accordingly, the rejection is maintained.
Applicant alleged the following, “Regarding dependent claim 19, the Examiner relied on 11 [0047], [0054], [0057], and [0252]-[0253] of Ray. The undersigned acknowledges a compiler or "compilation" are mentioned in various portions of Ray; however, the undersigned finds no teaching or reasonable suggestion that the recited "parameter" should be "set by a compiler based on a determination made by the compiler regarding relative efficiencies of performing the global memory atomic instruction in a last-level cache versus the primary data cache." For at least this additional reason dependent claim 19 is further distinguishable over the Examiner's proposed combination of Ray and Li.”  The examiner is not persuaded.  The Applicant is rehashing arguments already addressed above.  Nevertheless, the examiner assert Ray’s teachings of “state parameters and commands defining how the data is to be processed” in paragraph 0052 discloses the Applicant’s claim language.  Moreover, Ray teachings in Paragraphs 0064; 0148; 0165 discloses the Applicant’s claim language.  Additionally, Ray’s teachings of “in one embodiment, the following algorithm may be implemented in one or more of SLM and data port controllers for handling local memory and global memory, respectively. For example, in one implementation, certain mutliplexers may be needed to replace the atomic opcode and atomic operand per slot” in Paragraph 0165 explicitly discloses the Applicant’s claim limitation. Accordingly, the rejection is maintained.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 4, 6-8, 10, 12-14, 16 and 18-23 are rejected under 35 U.S.C. 103 as being unpatentable over Ray, US Patent Application No.: 20180300846 in view of Li, US Patent No.: 12,072,954.
Claim 1:
Ray discloses a graphics processing unit (GPU) (See Abstract & Paragraph 0034; 0041; 0068-00711).  Ray fails to explicitly disclose private level-1 (L1) cache.  However, Li discloses this feature in Column 95, Lines 4-20.  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Ray by the teachings of Li to enable improved maintaining of GPUs by incorporating private level cache, more effectively (See Li Abstract & Column 95, Lines 4-20).  In addition, both of the references, Ray and Li teach features that are directed to analogous art and they are directed to the same field of endeavor, such as, a graphics processing units (GPUs).  This close relation between both of the references highly suggests an expectation of success.
As modified: 
The combination of Ray and Li discloses the following:
a plurality of sub-cores each including a load/store pipeline operable to (See Ray Paragraph 02022): 
	receive information specifying an atomic operation to be performed within a primary data cache (See Ray Paragraphs 0054; 0057) of the load/store pipeline (See Figures 9A; 9C; Paragraphs 0015; 0017; 0033-0034), wherein the primary data cache represents a private level-1 (L1) cache (“a level one (“L1”) cache wherein L1 cache is private memory” See Li Column 95, Lines 4-20) of a sub-core of the plurality of sub- cores that includes the load/store pipeline (See Ray Figures 9A; 9C; Paragraphs 0015; 0017; 0033-0034) wherein a shared local memory (SLM) of the load/store pipeline (See Ray Paragraphs 0004; 0165) and the primary data cache (See Ray Paragraph 0054) each represent a partition of a common random access memory (See Ray Paragraphs 0053-0054), and wherein the atomic operation comprises a global memory atomic operation with local scope (See Ray Paragraph 0064; 0148; 0165);
read data to be modified (“updates to the cache 438 related to modifications” See Ray Paragraphs 0081; 0088; 0151) by the atomic operation (See Ray Paragraph 0151) into the primary data cache from a memory hierarchy (See Ray Paragraph 0081; 0084; 0178) shared by the plurality of sub-cores (See Ray Paragraph 0202); 
and produce an atomic result of the atomic operation by modifying the data within the primary data cache (“updates to the cache 438 related to modifications” See Ray Paragraphs 0081; 0088; 0151) based on the atomic operation (See Ray Paragraph 0151).
Claim 2:
The combination of Ray and Li discloses wherein said modifying (“updates to the cache 438 related to modifications” See Ray Paragraph 0088) is performed by an atomic Arithmetic Logic Unit (ALU) (See Ray Paragraphs 0071; 0149; 0157; 0207; 0275; 0282) of the load/store pipeline that is accessible to the primary data cache and the (SLM) (See Ray Paragraphs 0004; 0273; 0280; 0287).
Claim 4:
The combination of Ray and Li discloses wherein a size of the partition of the primary data cache is greater than a size of the partition of the SLM (See Ray Paragraphs 0004; 0208; 0273; 0280; 0287).
Claim 6:
The combination of Ray and Li discloses wherein the information specifying the atomic operation is generated by an execution unit (EU) of a plurality of EUs of the sub-core responsive to receipt by the EU of a global memory atomic instruction having a parameter indicating the global memory atomic instruction has a local scope (See Ray Figures 15-16; 0023-0024; 0052; 0054; 0069; 0087).
Claims 7, 8, 10 and 12:
Claims 7, 8, 10 and 12 are rejected on the same basis as claims 1, 2, 4 and 6.
Claims 13, 14, 16 and 18:
Claims 13, 14, 16 and 18 are rejected on the same basis as claims 1, 2, 4 and 6.
Claim 19:
The combination of Ray and Li discloses wherein the parameter is set by a compiler based on a determination made by the compiler regarding relative efficiencies of performing the global memory atomic instruction in a last-level cache versus the primary data cache (See Ray Paragraphs 0047; 0052; 0054; 0057; 0252-0253).
Claim 20:
The combination of Ray and Li discloses making the atomic result coherent throughout the memory hierarchy by using a write fence (“writes to particular cache lines” See Ray Paragraph 0151).
Claim 21:
The combination of Ray and Li discloses wherein the parameter is set by a compiler based on a determination made by the compiler regarding relative efficiencies of performing the global memory atomic instruction in a last-level cache versus the primary data cache (See Ray Paragraphs 0047; 0052; 0054; 0057; 0252-0253).
Claim 22:
The combination of Ray and Li discloses wherein the parameter is encoded within a cache-override field (See Ray Paragraph 0110) of the global memory atomic instruction (See Ray Paragraphs 0047; 0052; 0054; 0057; 0252-0253).
Claim 23:
Claim 23 is rejected on the same basis as claim 21.
Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US Patent Application No.: 20160350262 discloses a cache memory hierarchy.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHEREE N BROWN whose telephone number is (571)272-4229. The examiner can normally be reached M-F 5:30-2:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SAID BROOME can be reached on (571) 272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHEREE N BROWN/Primary Examiner, Art Unit 2612                                                                                                                                                                                                        December 13, 2025






    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Ray Paragraph 0034 recites “a graphics processing unit (GPU) is communicatively coupled to host/processor cores to accelerate graphics operations, machine-learning operations, pattern analysis operations, and various general purpose GPU (GPGPU) functions. The GPU may be communicatively coupled to the host processor/cores over a bus or another interconnect (e.g., a high-speed interconnect such as PCIe or NVLink). In other embodiments, the GPU may be integrated on the same package or chip as the cores and communicatively coupled to the cores over an internal processor bus/interconnect (i.e., internal to the package or chip). Regardless of the manner in which the GPU is connected, the processor cores may allocate work to the GPU in the form of sequences of commands/instructions contained in a work descriptor. The GPU then uses dedicated circuitry/logic for efficiently processing these commands/instructions.”.
        2 Ray Paragraph 0202 recites “each having multiple sub-cores 1450A-1450N, 1460A-1460N (sometimes referred to as core sub-slices)”.

Read full office action

Prosecution Timeline

Jul 19, 2021

Application Filed

Nov 23, 2021

Response after Non-Final Action

Sep 05, 2024

Non-Final Rejection — §103

Dec 05, 2024

Response Filed

Jan 31, 2025

Final Rejection — §103

May 05, 2025

Request for Continued Examination

May 07, 2025

Response after Non-Final Action

Jun 04, 2025

Examiner Interview Summary

Jun 04, 2025

Applicant Interview (Telephonic)

Jun 27, 2025

Non-Final Rejection — §103

Oct 01, 2025

Response Filed

Dec 13, 2025

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/551,297

Patent 12593956

METHOD FOR BUILDING IMAGE READING MODEL BASED ON CAPSULE ENDOSCOPE, DEVICE, AND MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/236,338

Patent 12573130

METHOD AND SYSTEM PROVIDING TEMPORARY TEXTURE APPLICATION TO ENHANCE 3D MODELING

2y 5m to grant Granted Mar 10, 2026

17/303,651

Patent 12548204

NEURAL FRAME EXTRAPOLATION RENDERING MECHANISM

2y 5m to grant Granted Feb 10, 2026

17/696,737

Patent 12541487

Method for Constructing Database, Method for Retrieving Document and Computer Device

2y 5m to grant Granted Feb 03, 2026

18/678,908

Patent 12541539

METHODS AND SYSTEMS FOR A COMPLIANCE FRAMEWORK DATABASE SCHEMA

2y 5m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

65%

Grant Probability

92%

With Interview (+27.0%)

3y 7m

Median Time to Grant

High

PTA Risk

Based on 738 resolved cases by this examiner. Grant probability derived from career allow rate.