Last updated: April 19, 2026

Application No. 18/399,564

HIGHER RADIX FAST FOURIER TRANSFORM IMPLEMENTATION ON GRAPHICS PROCESSING UNITS

Final Rejection §103

Filed

Dec 28, 2023

Examiner

YANG, ANDREW GUS

Art Unit

2614

Tech Center

2600 — Communications

Assignee

Qualcomm Incorporated

OA Round

2 (Final)

Interview Optional

— +8.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 558 resolved cases, 2023–2026

Examiner Intelligence

YANG, ANDREW GUS View full profile →

Grants 69% — above average

Career Allow Rate

384 granted / 558 resolved

+6.8% vs TC avg

Moderate +8% lift

Without

With

+8.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

25 currently pending

Career history

583

Total Applications

across all art units

Statute-Specific Performance

§101

9.2%

-30.8% vs TC avg

§103

61.9%

+21.9% vs TC avg

§102

17.1%

-22.9% vs TC avg

§112

6.6%

-33.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 558 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 and 3-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ligowski et al. (U.S. Patent No. 11,568,523) in view of Shibayama (U.S. PGPUB 20190129913).
With respect to claim 1, Ligowski et al. disclose an apparatus for graphics processing, comprising: at least one memory; and at least one processor coupled to the at least one memory and, based at least in part on information stored in the at least one memory, the at least one processor, is configured to:
allocate at least one data set in a plurality of data sets to at least one work item in a set of work items (column 35, lines 27-33, A process element 1783 contains process state for corresponding application 1780. A work descriptor ("WD") 1784 contained in process element 1783 can be a single job requested by an application or may contain a pointer to a queue of jobs. In at least one embodiment, WD 1784 is a pointer to a job request queue in application effective address space 1782);
load the allocated at least one data set to a set of registers associated with the set of work items (column 35, lines 49-56, In operation, a WD fetch unit 1791 in accelerator integration slice 1790 fetches next WD 1784 which includes an indication of work to be done by one or more graphics processing engines of graphics acceleration module 1746. Data from WD 1784 may be stored in registers 1745 and used by a memory management unit ("MMU") 1739, interrupt management circuit 1747 and/or context management circuit 1748 as illustrated);
compute, based on the loaded at least one data set, a fast Fourier Transform (FFT) operation (column 36, lines 53-59, one or more systems depicted in FIG. 17 are utilized to implement an API in connection with a library that enables an entity to indicate various aspects of an FFT operation, including FFT implementation properties, and cause a determination of an optimal FFT implementation based at least on said FFT implementation properties to perform said FFT operation) using a sequence of iterations for each of the set of registers (column 35, lines 25-31, process elements 1783 are stored in response to GPU invocations 1781 from applications 1780 executed on processor 1707. A process element 1783 contains process state for corresponding application 1780. A work descriptor (“WD”) 1784 contained in process element 1783 can be a single job requested by an application or may contain a pointer to a queue of jobs, column 36, lines 31-36, each WD 1784 is specific to a particular graphics acceleration module 1746 and/or a particular graphics processing engine. It contains all information required by a graphics processing engine to do work or it can be a pointer to a memory location where an application has set up a command queue of work to be completed). The data from WD 1784 stored in registers 1745 depicted in Fig 17 implements an API to indicate various aspects of FFT operation, thus computing the FFT operation. By implementing a command queue of work to be completed, this infers multiple workloads, or a sequence of iterations for the registers storing the workloads. However, Ligowski et al. do not expressly disclose arranging an order of the at least one data set based on the computation of the operation for each of the set of registers; and storing the at least one data set based on the arrangement of the order of the at least one data set.
	Shibayama, who also deal with data processing, disclose a method for arranging an order of the at least one data set based on the computation of the operation for each of the set of registers (paragraph 54, The first data rearrangement process circuit 211 rearranges a data sequence, based on a relationship of dependency of data on an algorithm in the FFT process); and storing the at least one data set based on the arrangement of the order of the at least one data set (paragraph 54, The first data rearrangement process circuit 211 outputs the rearranged data to the first butterfly operation process circuit 212).
	Ligowski et al. and Shibayama are in the same field of endeavor, namely data processing systems capable of handling computer graphics.
	Before the effective filing date of the claimed invention, it would have been obvious to apply the method of arranging an order of the at least one data set based on the computation of the operation for each of the set of registers; and storing the at least one data set based on the arrangement of the order of the at least one data set, as taught by Shibayama to the Ligowski et al. system, because when the number of points of FFT is large, circuits are not configured to correspond to all processes of the data flow 100, but a part of the processes of the data flow 100 is assigned to any one of circuits. Specifically, in accordance with a necessary processing performance, the circuit, which achieves a part of the processes of the data flow 100, is repeatedly used, and thereby the entirety of the FFT processing may be achieved (paragraph 47 of Shibayama).
	With respect to claim 3, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein the FFT operation includes one or more of: at least one butterfly computation, or a scaling operation associated with a set of twiddle factors (Shibayama: paragraph 44, using FIG. 2, a data flow 100 is illustrated in which 64-point FFT, which is decomposed into radix-8 butterfly operation processes of two stages, is executed by a Prime Factor method, paragraph 45, The data flow 100 of FIG. 2 includes a data rearrangement process 101, a butterfly operation process 102, a butterfly operation process 103, and a twiddle multiplication process 104).
	With respect to claim 4, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein the sequence of iterations includes one or more of: a sequence of butterfly iterations, or at least one radix iteration (Shibayama: paragraph 45, The data flow 100 of FIG. 2 includes a data rearrangement process 101, a butterfly operation process 102, a butterfly operation process 103, and a twiddle multiplication process 104).
With respect to claim 5, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein the plurality of data sets comprises at least one of: a plurality of signals (Ligowski et al.: column 3, lines 18-21, an FFTDx library provides functionality to perform FFT operations on various inputs, including audio signals, images, data objects, and/or variations thereof), a plurality of source signals, a plurality of radio detection and rangings (radars), or a plurality of light detection and rangings (lidars).
With respect to claim 6, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 5, wherein at least one of the plurality of signals, the plurality of source signals, the plurality of radars, or the plurality of lidars corresponds to one of: a time domain, a spatial domain, or a frequency domain (Ligowski et al.: column 3, lines 18-21, an FFTDx library provides functionality to perform FFT operations on various inputs, including audio signals, images, data objects, and/or variations thereof).
	With respect to claim 7, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 5, wherein each data set of the plurality of data sets is associated with a source of one of the plurality of signals (Ligowski et al.: column 3, lines 18-21, an FFTDx library provides functionality to perform FFT operations on various inputs, including audio signals, images, data objects, and/or variations thereof), the plurality of source signals, the plurality of radars, or the plurality of lidars.
	With respect to claim 8, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 5, wherein each data set of the plurality of data sets is a continuous data stream for at least one of the plurality of signals, the plurality of source signals, the plurality of radars, or the plurality of lidars (Ligowski et al.: column 3, lines 18-21, an FFTDx library provides functionality to perform FFT operations on various inputs, including audio signals, images, data objects, and/or variations thereof).
With respect to claim 9, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein to allocate the at least one data set to the at least one work item, the at least one processor, is configured to: perform a one-to-one mapping of the at least one data set to the at least one work item (Ligowski et al.: column 35, lines 28-31, A work descriptor ("WD") 1784 contained in process element 1783 can be a single job requested by an application or may contain a pointer to a queue of jobs).
	With respect to claim 10, Ligowski et al. as modified by Shibayama. disclose the apparatus of claim 1, wherein to allocate the at least one data set to the at least one work item, the at least one processor, is configured to: select the at least one data set for the at least one work item (Ligowski et al.: column 35, lines 28-31, A work descriptor ("WD") 1784 contained in process element 1783 can be a single job requested by an application or may contain a pointer to a queue of jobs); or divide the at least one data set amongst the at least one work item.
	With respect to claim 11, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein to arrange the order of the at least one data set, the at least one processor, is configured to: perform a bit reverse operation on the at least one data set if a length of the at least one data set is a power of two (2) (Shibayama: paragraph 57, The bit reverse order illustrated in FIG. 5 corresponds to input data sets to the first-stage radix-8 butterfly operation process 102 in the data flow 100 illustrated in FIG. 2); or perform a bit manipulation on the at least one data set if the length of the at least one data set is not the power of two (2). It would have been obvious wherein to arrange the order of the at least one data set, the at least one processor, individually or in any combination, is configured to: perform a bit reverse operation on the at least one data set if a length of the at least one data set is a power of two (2); or perform a bit manipulation on the at least one data set if the length of the at least one data set is not the power of two (2), because a sufficient filter performance is obtained with respect to the processing of the desired frequency range, and the complex multiplier, which performs the filtering of the frequency range that is not important, can be simplified. In short, according to the present example embodiment, the circuit scale and power consumption can be reduced without degrading the filter performance (paragraph 129 of Shibayama).
	With respect to claim 12, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein the at least one processor, is further configured to: obtain an indication of the at least one data set in the plurality of data sets, wherein the allocation of the at least one data set is based on the indication (Ligowski et al.: column 35, lines 25-27, process elements 1783 are stored in response to GPU invocations 1781 from applications 1780 executed
on processor 1707). The indication of the process element is from the executed application.
	With respect to claim 13, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 12, wherein to obtain the indication of the at least one data set, the at least one processor, is configured to: receive at least one input signal associated with the at least one data set; or obtain the indication of the at least one input signal associated with the at least one data set (Ligowski et al.: column 35, lines 25-27, process elements 1783 are stored in response to GPU invocations 1781 from applications 1780 executed on processor 1707, column 87, lines 60-64, process of providing, outputting, transmitting, sending, or presenting analog or digital data can be accomplished by transferring data as an input or output parameter
of a function call, a parameter of an application programming interface or interprocess communication mechanism).
	With respect to claim 14, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein to arrange the order of the at least one data set, the at least one processor, is configured to: arrange the order of the at least one data set in the set of registers (Shibayama: paragraph 4, a storage means such as a random access memory (RAM) or a register is used for rearrangement of data); and wherein to store the at least one data set, the at least one processor, is configured to: store the at least one data set in a memory (Shibayama: paragraph 54, The first data rearrangement process circuit 211 outputs the rearranged data to the first butterfly operation process circuit 212, circuit comprises a memory).
	With respect to claim 15, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein the set of work items is included in a compute unit located in a graphics processing unit (GPU) (Ligowski et al.: column 35, lines 4-13, FIG. 17 illustrates an exemplary accelerator integration slice 1790, in accordance with at least one embodiment. As used herein, a "slice" comprises a specified portion of
processing resources of an accelerator integration circuit. In at least one embodiment, the accelerator integration circuit provides cache management, memory access, context management, and interrupt management services on behalf of multiple graphics processing engines included in a graphics acceleration module. The graphics processing engines may each comprise a separate GPU); and wherein each work item of the set of work items corresponds to a lane of a single-instruction multiple-data (SIMD) unit (Ligowski et al.: column 32, each compute unit 1550 includes, without limitation, any number of SIMD units 1552 and a shared memory 1554. In at least one embodiment, each SIMD unit 1552 implements a SIMD architecture and is configured to perform operations in parallel).
	With respect to claim 16, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein to load the allocated at least one data set to the set of registers, the at least one processor, is configured to: load the allocated at least one data set to the set of registers in a linear fashion or in a contiguous data set (Ligowski et al.: column 35, lines 53-56, Data from WD 1784 may be stored in registers 17 45 and used by a memory management unit ("MMU") 1739, interrupt management circuit 1747 and/or context management circuit 1748 as illustrated).
	With respect to claim 17, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 1, wherein the at least one processor, is further configured to: output an indication of the stored at least one data set based on the arrangement of the order of the at least one data set (Shibayama: paragraph 55, Specifically, the first data rearrangement process circuit 211 rearranges the input data x(n) from the sequential order (FIG. 4), which is an input order, to the bit reverse order (FIG. 5) which is an output order to the first butterfly operation process circuit 212).
	With respect to claim 18, Ligowski et al. as modified by Shibayama disclose the apparatus of claim 17, wherein to output the indication of the stored at least one data set, the at least one processor, is configured to: transmit at least one output signal associated with the stored at least one data set; or store the indication of the at least one output signal associated with the stored at least one data set (Shibayama: paragraph 55, Specifically, the first data rearrangement process circuit 211 rearranges the input data x(n) from the sequential order (FIG. 4), which is an input order, to the bit reverse order (FIG. 5) which is an output order to the first butterfly operation process circuit 212, data is stored in the circuit).
	With respect to claim 19, Ligowski et al. as modified by Shibayama disclose a method of graphics processing, as executed by the system of claim 1; see rationale for rejection of claim 1.
	With respect to claim 20, Ligowski et al. as modified by Shibayama disclose a non-transitory computer-readable medium storing computer executable code for graphics processing (Ligowski et al.: column 18, lines 56-64, some or all of process 800 (or any other processes described herein, or variations and/or combinations thereof) is performed under control of one or more computer systems configured with computer-executable instructions and may be implemented as code (e.g., computer-executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, software, or combinations thereof), the code when executed by at least one processor causes the at least one processor to execute the system of claim 1; see rationale for rejection of claim 1.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1 and 19-20 have been considered but are moot because the new ground(s) of rejection. Ligowski computes a FFT operation (column 36, lines 53-59, one or more systems depicted in FIG. 17 are utilized to implement an API in connection with a library that enables an entity to indicate various aspects of an FFT operation, including FFT implementation properties, and cause a determination of an optimal FFT implementation based at least on said FFT implementation properties to perform said FFT operation) as shown by the process executed by the system of Fig. 17. It is inferred that the computation is using a sequence of iterations for the registers in order to account for the multiple jobs of the workloads submitted through the work descriptor.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 20250217438 to Zhou et al. for a method of implementing an FFT butterfly operation method
U.S. PGPUB 20250111006 to Ibrahim et al. for a method of FFTS lane-aligned in register files.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW GUS YANG whose telephone number is (571)272-5514. The examiner can normally be reached M-F 9 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW G YANG/Primary Examiner, Art Unit 2614                                                                                                                                                                                                        
3/9/26

Read full office action

Prosecution Timeline

Dec 28, 2023

Application Filed

Nov 10, 2025

Non-Final Rejection — §103

Feb 24, 2026

Response Filed

Mar 09, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/467,651

Patent 12602856

DICING ORACLE FOR TEXTURE SPACE SHADING

2y 5m to grant Granted Apr 14, 2026

18/488,962

Patent 12602872

DRIVABLE IMPLICIT THREE-DIMENSIONAL HUMAN BODY REPRESENTATION METHOD

2y 5m to grant Granted Apr 14, 2026

18/426,273

Patent 12592023

INTERSECTION TESTING FOR RAY TRACING

2y 5m to grant Granted Mar 31, 2026

17/484,597

Patent 12579728

MEMORY ALLOCATION FOR RECURSIVE PROCESSING IN A RAY TRACING SYSTEM

2y 5m to grant Granted Mar 17, 2026

18/142,463

Patent 12567207

THREE-DIMENSIONAL MODELING AND RECONSTRUCTION OF CLOTHING

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

69%

Grant Probability

77%

With Interview (+8.3%)

2y 10m

Median Time to Grant

Moderate

PTA Risk

Based on 558 resolved cases by this examiner. Grant probability derived from career allow rate.