Last updated: May 04, 2026

Application No. 18/335,854

CONVOLUTION ACCELERATION USING ACTIVATION STATIC VECTORIZATION

Non-Final OA §103§112

Filed

Jun 15, 2023

Examiner

BITAR, NANCY

Art Unit

2664

Tech Center

2600 — Communications

Assignee

Qualcomm Incorporated

OA Round

1 (Non-Final)

Interview Optional

— +8.1% interview lift. Interview lift (+8.1%) is below the 15.0% threshold. A written response is recommended.

Based on 948 resolved cases, 2023–2026

Examiner Intelligence

BITAR, NANCY View full profile →

Grants 83% — above average

Career Allowance Rate

787 granted / 948 resolved

+21.0% vs TC avg

Moderate +8% lift

Without

With

+8.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

31 currently pending

Career history

979

Total Applications

across all art units

Statute-Specific Performance

§101

13.2%

-26.8% vs TC avg

§103

62.1%

+22.1% vs TC avg

§102

6.4%

-33.6% vs TC avg

§112

8.8%

-31.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 948 resolved cases

Office Action

§103 §112

DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claim s 1 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 1 teaches “ an additional second value not included in the respective first values ” is unclear and confusing what value is excluded and what is an additional second value .Appropriate correction is required. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. Claim (s) 1-30 are rejected under 35 U.S.C. 103 as being unpatentable over Shafiee Ardestani et al. (US 2021/0182025) in view of Muller et al (GB2601466) As to claim 1, Shafiee Ardestani et al. teaches a n apparatus for processing image data, comprising: at least one memory; and at least one processor coupled to the at least one memory, the at least one processor ( the processor includes a plurality of multiply-accumulate (MAC) tiles, each MAC tile including an array of processing element (PE) including a plurality of PE rows and a plurality of PE columns, each PE column of the plurality of PE columns including a plurality of PEs and an adder tree ; paragraph [0010][0012]) configured to: obtain a respective first value (Time 510, figure 5A) enclosed by a convolution kernel in each position of a plurality of positions of the convolution kernel along a row of the image data ( a simultaneous multiplication of the remaining value with a corresponding subset of element values of the first matrix; for each simultaneous multiplication, storing, by the processor, result of the simultaneous multiplication in an accumulator connected to the processor; and outputting, by the processor, the values of the accumulator as a first row of an output feature map ; paragraph [0016]) ; store each respective first value using a respective memory location associated with each position of the plurality of positions of the convolution kernel ( The cache 529 (e.g., a Zigzag cache) broadcasts the weight value W 0 at position [0,0] of the kernel 504 . The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM matrix 502 positions [0,0]-[0,7] loaded in the weight registers 528 with W 0 , and load the results into a first set of accumulators ; figure 5B and 5C and paragraph [0056-0057]) ; update, based on each respective first value stored using the respective memory location, an accumulated value corresponding to a convolution output for each position of the plurality of positions of the convolution kernel ( accumulator register 122 of FIG. 2 ; The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM matrix 502 positions [0,0]-[0,7] loaded in the weight registers 528 with W 0 , and load the results into a first set of accumulators (e.g., accumulator register 122 of FIG. 2 ; figure 5C and paragraph [0051][0056]) ; obtain a plurality of second values ( the striped square number 8 is the second value, figure 5A) e nclosed by the convolution kernel in each position of the plurality of positions, wherein the plurality of second values includes a subset of the respective first values and an additional second value not included in the respective first values ( paragraphp0051]) ; and update a memory location used to store ( one read channel loads data from the position [0,8] of the first row of the IFM 502 matrix from SRAM into the weight register 528 for the eighth column of the IFM matrix 502 , while the IFM values loaded in the previous cycle (e.g., the IFM values in the IFM matrix 502 positions [0,0]-[0,7]) may be shifted to the left to the adjacent columns of the weight registers 528 , as shown in FIG. 5D ; figure 5D and paragraph [0057]) . While Shafiee Ardestani et al. teaches the limitation above, Shafiee Ardestani et al. fails to teach “ a first value not included in the plurality of second values with the additional second value, wherein the updated memory location stores the additional second value ”. However, Muller teaches in page 6 second paragraph; an input array of values (e.g. an array of pixel values for an image) may be convolved with one or more filters (also called kernels). This comprises determining an inner (dot) product of the filter and a subset of the input array called the receptive field. An inner product is determined for each receptive field as the filter is passed over the input array Each inner product, so determined, represents a different value in an output array called an activation map or a feature map. Hence, the convolution layer can be considered to be the multiplication of a vector (representing the filter) with a matrix, (each row of the matrix representing a different receptive field). This operation may be performed more than once, e.g. on each individual color channel of an RGB image, In general, determining the product of a vector V and a Matrix M involves determining multiple inner products, as each element in the output vector A corresponds to the inner product of the input vector y with a (different) row of the input matrix M. On a traditional micro-processor, the computation to calculate an inner product comprises a loop that multiples and accumulates each item in turn. For instance, this may be expressed as: def vectorXvector (V1,112) : sum for in range ( len (A) ) : sum + Vi [1] * V2 [1] return sum Here, the "for" loop implements the element-wise multiplication and accumulation to product the final inner product value ("sum"). Muller clearly teaches If the sequence is not equal to an integer multiple of the number of components in the output vector, the output vector may be rotated until the total number of rotations equals an integer multiple of the number of components (Abstract) It would have been obvious to one skilled in the art before filing of the claimed invention to use matric vector operation as taught by Muller in Shafiee Ardestani et al in order to design an instruction and micro-architectural structure which can be used to efficiently implement matrix-vector . Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. As to claim 2, Shafiee Ardestani et al. teaches the apparatus of claim 1, wherein, to update the accumulated value, the at least one processor is configured to: perform , using a computer unit (CU) (PE, figure 3) for each respective memory location, a respective multiplier-accumulator (MAC) operation using the respective first value stored using the respective memory location; and provide an output of each respective MAC operation to an accumulator buffer associated with the CU for each respective memory location ( output feature map 508 figure 5A and FIG. 5D illustrates the details of the step 512 . As shown in FIG. 5D, one read channel loads data from the position [0,8] of the first row of the IFM 502 matrix from SRAM into the weight register 528 for the eighth column of the IFM matrix 502 , while the IFM values loaded in the previous cycle (e.g., the IFM values in the IFM matrix 502 positions [0,0]-[0,7]) may be shifted to the left to the adjacent columns of the weight registers 528 , as shown in FIG. 5D. The cache 529 broadcasts the weight value W 1 at position [0, 1] of the weight matrix 504 . The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM matrix 502 positions [0,1]-[0,8] loaded in the weight registers 528 with W 1 , and the results of the multiplications may be accumulated into the first set of accumulators (e.g., accumulator register 122 of FIG. 2 ; paragraph [0056-0057]) . As to claim 3, Shafiee Ardestani et al. teaches the apparatus of claim 1, wherein the at least one processor is configured to determine the convolution output for each position of the plurality of positions of the convolution kernel based on a plurality of convolution cycles, each convolution cycle of the plurality of convolution cycles corresponding to a different location within the convolution kernel ( FIG. 5B, at 510 , the four read channels 527 (e.g., the segmented IFM delivery fabric) load data from the positions [0,0], [0,2], [0,4], and [0,6] of the first row of the IFM 502 matrix from SRAM (e.g., SRAM bank set including four SRAM blocks) into four weight registers from a plurality (e.g., eight) of weight registers 528 (e.g., weight buffer 116 of FIGS. 1-2) ; paragraph [0055]) . As to claim 4, Shafiee Ardestani et al. teaches the apparatus of claim 3, wherein: a first convolution cycle corresponds to the respective first value enclosed by the convolution kernel, each respective first value corresponding to a first location within the convolution kernel; and a second convolution cycle corresponds to the plurality of second values enclosed by the convolution kernel, each respective second value of the plurality of second values corresponding to a second location within the convolution kernel, the second location adjacent to the first location FIG. 5D illustrates the details of the step 512. As shown in FIG. 5D, one read channel loads data from the position [0,8] of the first row of the IFM 502 matrix from SRAM into the weight register 528 for the eighth column of the IFM matrix 502, while the IFM values loaded in the previous cycle (e.g., the IFM values in the IFM matrix 502 positions [0,0]-[0,7]) may be shifted to the left to the adjacent columns of the weight registers 528, as shown in FIG. 5D. The cache 529 broadcasts the weight value W1 at position [0, 1] of the weight matrix 504, paragraph [0056-0057]) . As to claim 5, Shafiee Ardestani et al. teaches the apparatus of claim 3, wherein the convolution output for each position of the plurality of positions of the convolution kernel along the row of the image data is associated with a different multiplier compute unit (CU) for each convolution cycle of the plurality of convolution cycles ( for each weight of the convolution kernel, the elements (e.g., activations) of the IFM are loaded in the weight registers. The loaded elements of the IFM are be multiplied by the current weight of the convolution kernel, and added to the accumulator. Next, the loaded elements are shifted to be multiplied by the next weight of the kernel, and additional elements from the IFM are loaded to the weight registers to be multiplied by the next weight of the convolution kernel. As a result, multiple values of the IFM that are multiplied by the same weight of the convolution kernel, are multiplied by that same weight in parallel, thereby increasing the number of operations completed per clock cycle, and previously loaded values are reused for multiplication by the next weight, thereby reducing the amount of data that is transferred , paragraph [0051]) . As to claim 6, Shafiee Ardestani et al. teaches the apparatus of claim 5, wherein the convolution output for a first position of the plurality of positions is associated with: a first multiplier CU for a first convolution cycle of the plurality of convolution cycles, wherein the first multiplier CU is associated with a first memory location used to store the respective first value for the first position of the convolution kernel; and a second multiplier CU for a second convolution cycle of the plurality of convolution cycles, wherein the second multiplier CU is associated with a second memory location used to store a respective second value included in the subset of first values (time 510-506; figure 5A; for each weight of the convolution kernel, the elements (e.g., activations) of the IFM are loaded in the weight registers. The loaded elements of the IFM are be multiplied by the current weight of the convolution kernel, and added to the accumulator. Next, the loaded elements are shifted to be multiplied by the next weight of the kernel, and additional elements from the IFM are loaded to the weight registers to be multiplied by the next weight of the convolution kernel. As a result, multiple values of the IFM that are multiplied by the same weight of the convolution kernel, are multiplied by that same weight in parallel, thereby increasing the number of operations completed per clock cycle, and previously loaded values are reused for multiplication by the next weight, thereby reducing the amount of data that is transferred , paragraph [0051]) . As to claim 7, Shafiee Ardestani et al. teaches the apparatus of claim 6, wherein an output of the first multiplier CU is stored in a first accumulator buffer (ACCO, ACC1, figure 5B) for each convolution cycle of the plurality of convolution cycles, and wherein an output of the second multiplier CU is stored in a second accumulator buffer for each convolution cycle of the plurality of convolution cycles ( The cache 529 broadcasts the weight value W 1 at position [0, 1] of the weight matrix 504 . The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM matrix 502 positions [0,1]-[0,8] loaded in the weight registers 528 with W 1 , and the results of the multiplications may be accumulated into the first set of accumulators (e.g., accumulator register 122 of FIG. 2 ; see paragraph [0057]) . As to claim 8, Muller teaches the apparatus of claim 7, wherein: an accumulated value stored in the first accumulator buffer at an end of each convolution cycle replaces an accumulated value stored in the second accumulator buffer at the end of each convolution cycle; and the accumulated value stored in the second accumulator accumulated buffer at the end of each convolution cycle replaces an accumulated value stored in a third accumulator buffer at the end of each convolution cycle (Hence, this step is a rotation operation (also called a circular shift) on the plurality of elements of the output register in which the result value is added to a component of the output vector which moves between the first end element and the second end clement. In other examples the operation performed on the result value and the component of the output vector which moves between the first and second end element is another mathematical or logical operation, e.g. a multiplication, XOR, subtraction, etc. , page 14, second paragraph) . As to claim 9, Muller teaches the apparatus of claim 7, wherein during each convolution cycle of the plurality of convolution cycles, each accumulator buffer of a plurality of accumulator buffers corresponding to a plurality of multiplier CUs receives: a first input from a corresponding memory location, the first input indicative of a pixel value within the row of image data; and a second input from an adjacent accumulator buffer of the plurality of accumulator buffers, the second input indicative of an accumulated value stored in the adjacent accumulator buffer (Hence, this step is a rotation operation (also called a circular shift) on the plurality of elements of the output register in which the result value is added to a component of the output vector which moves between the first end element and the second end clement. In other examples the operation performed on the result value and the component of the output vector which moves between the first and second end element is another mathematical or logical operation, e.g. a multiplication, XOR, subtraction, etc. As the values stored to the elements in the output register rA are added to (one at a time, as the output register rA rotates), the elements may be referred to as accumulators. The output register rA itself may be referred to as an accumulator for similar reasons, page 14, 2 nd paragraph ) . As to claim 10, Muller teaches the apparatus of claim 9, wherein each accumulator buffer receives the first input at a beginning of each convolution cycle and receives the second input at an end of each convolution cycle (The first input may be a register. The second input may be from memory. The output may be a register. The registers may be specified in the instruction or may be implicit. A sequence of instructions may be executed using the same first input register and output register and with different second input vectors from memory. If the sequence is not equal to an integer multiple of the number of components in the output vector, the output vector may be rotated until the total number of rotations equals an integer multiple of the number of components, abstract) . As to claim 11, Shafiee Ardestani et al. teaches the apparatus of claim 6, wherein the at least one processor is configured to: update an accumulated value corresponding to the convolution output for the first position of the plurality of positions of the convolution kernel using a first accumulator buffer at an end of each convolution cycle of the plurality of convolution cycles ( The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM matrix 502 positions [0,0]-[0,7] loaded in the weight registers 528 with W 0 , and load the results into a first set of accumulators (e.g., accumulator register 122 of FIG. 2) ; paragraph [0056]) . As to claim 12, Shafiee Ardestani et al. teaches the apparatus of claim 11, wherein the at least one processor is configured to: update the first accumulator buffer based on an output of the first multiplier CU for the first convolution cycle of the plurality of convolution cycles; and update the first accumulator buffer based on an output of the second multiplier CU for the second convolution cycle of the plurality of convolution cycles ( The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM 502 matrix position [3,1]-[3,8] loaded in the weight registers with W 7 , and the results of the multiplications may be accumulated into the first set of accumulators (e.g., accumulator register 122 of FIG. 2). At this stage, the processor has completed the next set of calculations for the convolution to determine the OFM 508 . Data from the position [3,0]-[3,9] of the IFM matrix 502 will be used in the next calculations and may be saved as shown in FIG. 5V. Data from the positions [2 ,2]-[2,9] may not be used in the next calculations and therefore will be replaced by the data in the IFM 502 data position [3,2]-[3,9] , paragraph [0075-0076]) As to claim 13, Shafiee Ardestani et al. teaches the apparatus of claim 11, wherein the at least one processor is configured to: update the accumulated value of the first accumulator buffer based on a respective output of a different multiplier CU for each convolution cycle of the plurality of convolution cycles ( The multipliers in each column (e.g., a PE column of a MAC tile) may multiply the IFM values in the IFM 502 matrix position [3,2]-[3,9] loaded in the weight registers with W 8 , and the results of the multiplications may be accumulated into a second set of accumulators (e.g., accumulator register 122 of FIG. 2), and the second set of accumulators may be written back into SRAM banks. (e.g., SRAM bank sets) during the next cycle. In some embodiments, the first set of accumulators may be used to accumulate the multiplications results starting the next cycle , paragraph [0076]) . As to claim 14, Shafiee Ardestani et al. teaches the apparatus of claim 13, wherein the at least one processor is configured to: update the accumulated value of the first accumulator buffer based on a respective output of a different multiplier CU for each location of a plurality of locations within the first position of the convolution kernel along the row of the image data ( The results from 532 to 548 may be added to determine the second row 530 of the OFM 508 . While the computation of the second row 530 is described above in reference to FIGS. 5N-5W, one of skill in the art would understand that the remaining rows of the OFM 508 can be computed based on the data in the remaining rows of the IFM 502 using substantially similar operations. Similarly, portions of the OFM 508 corresponding to the remaining eight column wide sections (e.g., the second eight column wide section 404 ) of the IFM 502 can be computed using similar operations, starting with computing the first row of the OFM 508 as described above in reference to FIGS. 5A-5M, and computing the remaining rows of the OFM 508 as described above in reference to FIGS. 5N-5W , paragraph [0076-0077]) . As to claim 15, Shafiee Ardestani et al. teaches the apparatus of claim 14, wherein each convolution cycle of the plurality of convolution cycles corresponds to a respective location of the plurality of locations within the first position of the convolution kernel along the row of the image data ( time step510, figure 5A) . The limitation of claims 16-30 has been addressed in claims 1-15 above. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT NANCY BITAR whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)270-1041 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Mon-Friday from 8:00 am to 5:00 p.m. . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ms. Jennifer Mehmood can be reached at 571-272-2976 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. FILLIN "Examiner Stamp" \* MERGEFORMAT NANCY . BITAR Examiner Art Unit 266 4 /NANCY BITAR/ Primary Examiner, Art Unit 2664

Read full office action

Prosecution Timeline

Jun 15, 2023

Application Filed

Aug 27, 2024

Response after Non-Final Action

Mar 20, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/972,102

Patent 12608982

SYSTEMS AND METHODS FOR BIOMETRIC DATA COLLECTIONS

3y 6m to grant Granted Apr 21, 2026

18/213,471

Patent 12608846

Method and Apparatus for Training Generative Model for 3D Consistent Medical Volume Data

2y 10m to grant Granted Apr 21, 2026

17/955,224

Patent 12599437

PRE-PROCEDURE PLANNING, INTRA-PROCEDURE GUIDANCE FOR BIOPSY, AND ABLATION OF TUMORS WITH AND WITHOUT CONE-BEAM COMPUTED TOMOGRAPHY OR FLUOROSCOPIC IMAGING

3y 6m to grant Granted Apr 14, 2026

18/224,201

Patent 12597132

IMAGE PROCESSING METHOD AND APPARATUS

2y 8m to grant Granted Apr 07, 2026

18/303,724

Patent 12597240

METHOD AND SYSTEM FOR AUTOMATED CENTRAL VEIN SIGN ASSESSMENT

2y 11m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

83%

Grant Probability

91%

With Interview (+8.1%)

2y 10m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 948 resolved cases by this examiner. Grant probability derived from career allowance rate.