Last updated: April 19, 2026

Application No. 18/359,002

GRAPHICS PROCESSORS

Non-Final OA §102§103

Filed

Jul 26, 2023

Examiner

CHU JOY, JORGE A

Art Unit

2195

Tech Center

2100 — Computer Architecture & Software

Assignee

Arm Limited

OA Round

1 (Non-Final)

Interview Optional

— +37.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 408 resolved cases, 2023–2026

Examiner Intelligence

CHU JOY, JORGE A View full profile →

Grants 77% — above average

Career Allow Rate

314 granted / 408 resolved

+22.0% vs TC avg

Strong +37% interview lift

Without

With

+37.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

41 currently pending

Career history

449

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

55.3%

+15.3% vs TC avg

§102

3.2%

-36.8% vs TC avg

§112

19.6%

-20.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 408 resolved cases

Office Action

§102 §103

DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/06/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 7, 8, 10, 12-14, 16, 17, and 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Calidas et al. (US 2022/0058476 A1).

Regarding claim 1, Calidas teaches a graphics processor (Fig. 2 GPU 205; [0053] As further shown in FIG. 2, GPU 205 includes the one or more shader units 210) comprising:
a programmable execution unit operable to execute programs to perform graphics processing operations ([0024] As used herein, the term “shader” refers to a program configured to execute on a GPU; [0053] As further shown in FIG. 2, GPU 205 includes the one or more shader units 210; [0046] programmable shader; [0054] In a typical operation, GPU 205 may designate the one or more shader units 210 to perform a variety of shading operations such as vertex shading, hull shading, domain shading, geometry shading, pixel shading, and the like by sending commands to shader units 210 to execute one or more of a vertex shader stage, a hull shader stage, a domain shader stage, a geometry shader stage, and a pixel shader stage in graphics processing pipeline 107.); and 
a machine learning processing circuit operable to perform processing operations for machine learning processing tasks ([0054] For an ML operation, the GPU 205 may similarly designate the one or more shader units 210 to perform one or more ML operations using shaders. Example ML operations include: convolution, batch normalization, pooling, concatenation, fully connected, softmax, reshape, permute, LSTM, GRU, depthwise separable convolution, transpose convolution, and depth to space.) and in communication with the programmable execution unit internally to the graphics processor, the graphics processor configured such that machine learning processing tasks can be performed by the programmable execution unit, the machine learning processing circuit, or a combination of both ([0025] another sequence may use a first conversion shader to change the input data type, a core shader to perform the ML operation, and a second conversion shader to change an output type.; [0054] In some implementations, each ML operation may be implemented with a specific shader or a sequence of more general purpose shaders. The GPU 205 may send commands to the shader units 210 to execute a sequence of shaders for the ML operation.; [0051] Moreover, GPU 205 may include internal memory 240 (e.g., corresponding to internal memory 121 of FIG. 1), such that GPU 205 may read data from and write data to directly to internal memory 240 without using a bus. In other words, GPU 205 may process data locally using a local storage, instead of off-chip memory. This configuration will enable GPU 205 to operate in a more efficient manner by eliminating the need of GPU 205 to read and write data via a bus, which may experience heavy bus traffic. all shaders are internal to the GPU 205).

Regarding claim 2, Calidas teaches being configured such that, when the execution unit is executing a program including an instruction that relates to a set of machine learning operations to be performed by the machine learning processing circuit: 
in response to the execution unit executing the instruction, the programmable execution unit is caused to message the machine learning processing circuit to cause the machine learning processing circuit to perform the set of machine learning processing operations ([0024] A group of interdependent shaders may be referred to as a sequence of shaders. A sequence of shaders may include one or more shaders. [0025] use a first conversion shader to change the input data type, a core shader to perform the ML operation, and a second conversion shader to change an output type.; [0057] In some implementations, the shaders 362, 364, 366 each may be an OpenCL shader, but in other implementations may be provided by a different library. In the illustrated example, shader 362 may perform a pre-processing operation to convert an input into a format for shader 364. The shader 364 may be a core shader configured to perform a core operation. The shader 366 may convert the output of the shader 364 to an output form for the ML operation 350.; [0054]).

Regarding claim 3, Calidas teaches wherein the machine learning processing circuit is configured to return a result of its processing to the execution unit for further processing ([0024] A group of interdependent shaders may be referred to as a sequence of shaders. A sequence of shaders may include one or more shaders. [0025] use a first conversion shader to change the input data type, a core shader to perform the ML operation, and a second conversion shader to change an output type).

Regarding claim 4, Calidas teaches wherein the machine learning processing circuit, when performing a machine learning processing task, is operable to cause the execution unit to perform one or more processing operations for the machine learning processing task being performed by the machine learning processing circuit ([0054] In some implementations, each ML operation may be implemented with a specific shader or a sequence of more general purpose shaders).

Regarding claim 7, Calidas teaches wherein the graphics processor includes a cache system for transferring data to and from an external memory, and wherein the machine learning processing circuit has access to the graphics processor's cache system ([0038] Memory external to the processing unit 120 and the content encoder/decoder 122, such as system memory 124, may be accessible to the processing unit 120 and the content encoder/decoder 122. For example, the processing unit 120 and the content encoder/decoder 122 may be configured to read from and/or write to external memory, such as the system memory 124. The processing unit 120 and the content encoder/decoder 122 may be communicatively coupled to the system memory 124 over a bus. In some examples, the processing unit 120 and the content encoder/decoder 122 may be communicatively coupled to each other over the bus or a different connection.; [0039-40]; [0051] As shown, GPU 205 may include a plurality of processing elements, such as one or more shader units (i.e., shader unit 210), that are configured to operate on multiple vertices or pixels in a parallel manner. Moreover, GPU 205 may include internal memory 240 (e.g., corresponding to internal memory 121 of FIG. 1), such that GPU 205 may read data from and write data to directly to internal memory 240 without using a bus. In other words, GPU 205 may process data locally using a local storage, instead of off-chip memory. This configuration will enable GPU 205 to operate in a more efficient manner by eliminating the need of GPU 205 to read and write data via a bus, which may experience heavy bus traffic. In some instances, however, GPU 205 may not include a separate memory, but instead utilize system memory 124 via a bus.).

Regarding claim 8, Calidas teaches wherein when a machine learning processing tasks is to be performed using the graphics processor, the graphics processor is operable to fetch required input data for the machine learning processing task via the cache system, and write an output of the machine learning processing task to memory via the cache system ([0051]; [0053] As further shown in FIG. 2, GPU 205 includes the one or more shader units 210, graphics processing pipeline 107, and texture pipeline 230. Moreover, one or more shader programs may execute on shader units 210 in GPU 205. Shader units 210 may also include one or more shader processors 220, each of which may include one or more components for fetching and decoding operations, one or more arithmetic logic units (ALUs) 250 for carrying out arithmetic calculations, one or more caches 260, or more generally other types of memory and/or registers.; [0054-55]).

Regarding claim 10, Calidas teaches comprising a plurality of programmable execution units, arranged as respective shader cores, with each shader core having its own respective machine learning processing circuit, and wherein an overall job controller of the graphics processor is operable to distribute processing tasks between the different shader cores ([0047] Referring again to FIG. 1, the processing unit 120 can include a ML sequence selection 198 configured to select a sequence of shaders for performing an ML operation using the graphics processing pipeline 107. That is, the processing unit 120 may be configured to perform a ML operation such as a layer of a ML model using the graphics processing pipeline 107. For example, the graphics processing pipeline 107 may be configured with a plurality of shaders for performing ML operations. In some implementations, the plurality of shaders may be described by an application programming interface (API). The ML sequence selection 198 may determine a plurality of sequences of shaders that are capable of performing the machine-learning operation. The ML sequence selection 198 may determine a cost of each sequence of the plurality of sequences of shaders based on a cost function associated with each shader. The ML sequence selection 198 may execute a selected sequence of shaders having a lowest cost of the plurality of sequences of shaders. In one or more example implementations, the ML sequence selection 198 may be implemented in hardware, software, or any combination thereof.).

Regarding claim 12, it is a method type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.

Regarding claim 13, it is a method type claim having similar limitations as claim 2 above. Therefore, it is rejected under the same rationale above.

Regarding claim 14, it is a method type claim having similar limitations as claim 4 above. Therefore, it is rejected under the same rationale above.

Regarding claim 16, it is a method type claim having similar limitations as claim 3 above. Therefore, it is rejected under the same rationale above.

Regarding claim 17, it is a method type claim having similar limitations as claim 7 above. Therefore, it is rejected under the same rationale above.

Regarding claim 19, it is a method type claim having similar limitations as claim 10 above. Therefore, it is rejected under the same rationale above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Calidas as applied to claim 1, in further view of Nurvitadhi et al. (US 2022/0114495 A1).

Regarding claim 5, Calidas does not explicitly teach wherein the machine learning processing circuit is operable to trigger the generation of threads for execution by the programmable execution unit to cause the execution unit to perform the one or more processing operations for the machine learning processing task being performed by the machine learning processing circuit.
	However, Nurvitadhi teaches wherein the machine learning processing circuit is operable to trigger the generation of threads for execution by the programmable execution unit to cause the execution unit to perform the one or more processing operations for the machine learning processing task being performed by the machine learning processing circuit ([0073] FIG. 3 is a block diagram of example ML system configuration circuitry 300 to compose an ML compute node (e.g., the ML compute node 217 of FIG. 2) to execute a workload (e.g., the workload(s) 216 of FIG. 2). In some examples, the ML system configuration circuitry 300 of FIG. 3 can implement the ML system configurator 102 of FIGS. 1 and/or 2. The ML system configuration circuitry 300 of FIG. 3 may be instantiated (e.g., creating an instance of, bring into being for any length of time, materialize, implement, etc.) by processor circuitry such as a CPU executing instructions. Additionally and/or alternatively, the ML system configuration circuitry 300 of FIG. 3 may be instantiated (e.g., creating an instance of, bring into being for any length of time, materialize, implement, etc.) by an ASIC or an FPGA structured to perform operations corresponding to the instructions. It should be understood that some or all of the ML system configuration circuitry 300 of FIG. 3 may, thus, be instantiated at the same or different times. Some or all of the ML system configuration circuitry 300 may be instantiated, for example, in one or more threads executing concurrently on hardware and/or in series on hardware. [0221] In some examples, the machine code corresponding to the firmware program, the embedded software program, or the software program is split into threads and executed in parallel by two or more of the cores 1502.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Nurvitadhi of identifying an optimal combination of hardware/software to execute a compute workload with the teachings of Calidas of determining how to schedule an ML sequence among shader cores. The modification would have been motivated by the desire of combining known elements to yield predictable results.

Regarding claim 15, it is a method type claim having similar limitations as claim 5 above. Therefore, it is rejected under the same rationale above.

Claims 6 are rejected under 35 U.S.C. 103 as being unpatentable over Calidas as applied to claim 1, in further view of Nguyen et al. (US 2020/0098422 A1).

Regarding claim 6, Calidas does not teach wherein the machine learning processing circuit comprises one or more multiply-and-accumulate circuits.
	However, Nguyen teaches wherein the machine learning processing circuit comprises one or more multiply-and-accumulate circuits ([0026] ML processors contain a large array of multiply-accumulate (MAC) units).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of MAC units in ML processors as taught by Nguyen with the teachings ML processors/GPU shader cores of Calidas to allow for parallel execution and achieve high throughput.

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Calidas as applied to claim 1, in further view of Park et al. (US 2018/0096513 A1).

Regarding claim 9, Calidas does not teach further comprising compression and decompression circuits for compressing and decompressing data as it is transferred between the graphics processor and the external memory.
	However, Park teaches further comprising compression and decompression circuits for compressing and decompressing data as it is transferred between the graphics processor and the external memory ([0037] The compression/decompression apparatus 220 according to at least one example embodiment of the inventive concepts may determine the number of bits assigned to each channel based on a variation of each channel. Further, the compression/decompression apparatus 220 may compress channels based on the determined number of bits and transmit the compressed channels to the on-chip memory 210 or the external memory 30; [0082] Referring to FIG. 11, the compression/decompression apparatus 221 may be interposed between the GPU 20 and the external memory 30. As described with reference to FIG. 3, the compression/decompression apparatus 220 may be embedded in the GPU 20. However, as shown in FIG. 11, the compression/decompression apparatus 221 may be arranged outside the GPU 20 as an independent apparatus.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Park with the teachings of Calidas to have a compression/decompression apparatus between the GPU and the external memory. The modification would have been motivated by the desire of preventing or, alternatively, reducing damage or loss of information about a channel with a great variation from being damaged or lost, reducing an amount of storage assigned to the on-chip memory or the external memory, and reducing an amount of computation performed by the GPU.

Regarding claim 18, it is a method type claim having similar limitations as claim 9 above. Therefore, it is rejected under the same rationale above.

Claims 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Calidas as applied to claim 1, in further view of Nystad et al. (US 2014/0327671 A1).

Regarding claim 11, Calidas teaches wherein the graphics processor is configured to perform tile-based rendering, in which graphics data is stored in one or more tile buffers, and wherein when performing a machine learning processing task at least some data for the machine learning processing task is stored using the tile buffers ([0050] In addition, FIG. 2 shows an aspect in which the processing unit 120 comprises a GPU 205 to perform the respective image processing operations. Specially, in an aspect, GPU 205 may be configured to perform graphics operations to render one or more graphics to display 131, as described above. In a typical operation, when one of the software applications executing on the device 104 requires graphics processing, the processing unit 120 may provide graphics commands and graphics data to GPU 205 for rendering to display 131. The graphics data may include texture information, drawing commands, and the like.).
	Calidas does not explicitly teach tile based rendering or tile buffers.
However, Nystad teaches tile based rendering and tile buffers ([0016] A tile-based graphics processing pipeline will also include one or more so-called tile buffers that store rendered fragment data at the end of the pipeline until a given tile is completed and written out to an external memory, such as a frame buffer, for use. This local, pipeline memory is used to retain fragment data locally before the data is finally exported to external memory.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Nystad with the teachings of Calidas to render using tile based rendering. The modification would have been motivated by the desire of reduced memory bandwidth.

Regarding claim 20, it is a method type claim having similar limitations as claim 11 above. Therefore, it is rejected under the same rationale above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692. The examiner can normally be reached Monday-Friday, 6:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J Li can be reached at (571)272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195

Read full office action

Prosecution Timeline

Jul 26, 2023

Application Filed

Dec 20, 2025

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/391,320

Patent 12602244

OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP

2y 5m to grant Granted Apr 14, 2026

17/957,939

Patent 12596565

USER ASSIGNED NETWORK INTERFACE QUEUES

2y 5m to grant Granted Apr 07, 2026

18/332,830

Patent 12591821

DYNAMIC ADJUSTMENT OF WELL PLAN SCHEDULES ON DIFFERENT HIERARCHICAL LEVELS BASED ON SUBSYSTEMS ACHIEVING A DESIRED STATE

2y 5m to grant Granted Mar 31, 2026

18/229,644

Patent 12585490

MIGRATING VIRTUAL MACHINES WHILE PERFORMING MIDDLEBOX SERVICE OPERATIONS AT A PNIC

2y 5m to grant Granted Mar 24, 2026

18/076,742

Patent 12579065

LIGHTWEIGHT KERNEL DRIVER FOR VIRTUALIZED STORAGE

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

77%

Grant Probability

99%

With Interview (+37.3%)

3y 1m

Median Time to Grant

Low

PTA Risk

Based on 408 resolved cases by this examiner. Grant probability derived from career allow rate.