Last updated: April 18, 2026

Application No. 18/305,871

ON-THE-FLY PADDING FOR CNN FEATURE MAPS

Non-Final OA §103§112

Filed

Apr 24, 2023

Examiner

JAYAKUMAR, CHAITANYA R

Art Unit

2128

Tech Center

2100 — Computer Architecture & Software

Assignee

Texas Instruments Incorporated

OA Round

1 (Non-Final)

This examiner grants 26% of cases after interview

— +22.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 51 resolved cases, 2023–2026

Examiner Intelligence

JAYAKUMAR, CHAITANYA R View full profile →

Grants only 26% of cases

Career Allow Rate

13 granted / 51 resolved

-29.5% vs TC avg

Strong +22% interview lift

Without

With

+22.5%

Interview Lift

resolved cases with interview

Typical timeline

4y 6m

Avg Prosecution

18 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

29.1%

-10.9% vs TC avg

§103

45.6%

+5.6% vs TC avg

§102

8.7%

-31.3% vs TC avg

§112

13.8%

-26.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 51 resolved cases

Office Action

§103 §112

DETAILED ACTION This action is in response to the submission filed 24 April 2023 for application 18/305,871. Currently claims 1-20 are pending and have been examined. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.— The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claims 2, 10, and 18 recite the limitation "… the vector …" in line 3. There is insufficient antecedent basis for this limitation in the claim. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness . This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claim s 1-6, 8-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al (US 20190164037 A1) in view of Udayakumaran et al (US 9329867 B2) . Regarding claim 1 : Ki m teaches: A system comprising: a processing unit, wherein the processing unit comprises registers ([0063] A plurality of processor units may store the operation result in an internal register ); memory operatively coupled with the processing unit; and wherein the processing unit is configured to at least ( [0051] As shown in FIG. 2, according to an exemplary embodiment of the present invention, the CNN processor 200 may include a memory controller 210 connected to an external memory 201, an address generator 220, a CNN accelerator 230, a plurality of processing cores 240, other interface devices 250, and a bus 260 for connecting them ): identify a padding schema for a feature map ( [0006] However, by loading the input feature map of the systolic array into the on-chip memory of each systolic array row with a padding area added and if the output feature map is stored in the on-chip memory without the padding area, the output of the previous layer cannot be used as an input in the next layer that requires padding. In order to use the output feature map of the previous layer as an input feature map, the padding area must be arranged in the address to be stored in the external memory through direct memory access (DMA). Note: This entire plan corre s ponds to padding schema ); identify a feature vector from the feature map currently in the memory ( [0006] However, by loading the input feature map of the systolic array into the on-chip memory of each systolic array row. Note: Array row corresponds to the feature vectors ); determine a padding for the feature vector based on the padding schema ( [0005] A systolic array (SA) is made up of many PEs (processing elements) that perform the same operation, and many operations may be performed simultaneously by inputting data to each PE. [0006] However, by loading the input feature map of the systolic array into the on-chip memory of each systolic array row with a padding area added and if the output feature map is stored in the on-chip memory without the padding area, the output of the previous layer cannot be used as an input in the next layer that requires padding. In order to use the output feature map of the previous layer as an input feature map, the padding area must be arranged in the address to be stored in the external memory through direct memory access (DMA). In addition, when the output feature map is stored in the feature map memory in consideration of the memory space for the padding area, the calculation result of one PE row must be stored in the feature map memory of the next PE row, and there is also a drawback that memory space is wasted ); and apply the padding to the feature vector ( [0006] However, by loading the input feature map of the systolic array into the on-chip memory of each systolic array row with a padding area added ). However, K i m does not explicitly disclose: while the feature vector is transferred from the memory to the registers of the processing unit. Udayakumaran teaches, in an analogous system: while the feature vector is transferred from the memory to the registers of the processing unit ([Column 1, Line 58 ] transferring the register values to and from memory). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Kim to incorporate the teachings of Udayakumaran to use while the feature vector is transferred from the memory to the registers of the processing unit. One would have been motivated to do this modification because doing so would give the benefit of d esigning register allocators to reduce spilling and produce efficient as taught by Udayakumaran [ Column 1, Lines 59-60 ] . Regarding claim 2: The system of Kim and Udayakumaran teaches: The system of claim 1 (as shown above). Kim further teaches: wherein the feature map comprises multiple feature vectors at different positions within the feature map, and wherein the padding schema defines the padding for the vectors based on a position of each of the multiple feature vectors within the feature map ([0089] If the input feature map is loaded into the feature map memory and there is a padding area, if the next layer is the convolution layer and padding is required, and if the convolution result may be disposed of considering the padding position, it is not necessary to transfer and reload it from the external memory, and it is possible to get very high performance because convolution is performed right away. [0090] However, in order for the input feature map of the feature map memory to include the padding area and the output feature map to be created in the feature map memory to include the padding area, the result must be stored so that the position of the center of K*K weights as shown in FIG. 7 does not change, and in the top and bottom rows, three output rows must be generated. However, since the second and third lines in the middle must produce four output lines, that is, the addresses generated by the address generator may not be used as they are propagated to the lower bank, they fall outside the systolic array condition). Regarding claim 3: The system of Kim and Udayakumaran teaches: The system of claim 1 (as shown above). Kim further teaches: wherein the memory comprises level 2 memory (L2 memory) of the processing unit, and wherein the processing unit is further configured to iteratively transfer the feature map from external memory to the L2 memory on a per-feature vector basis ([Page 3, 0058] For one layer, the input feature map is stored in SA_H memory banks. [Page 5, 0082] Therefore, the row of each processor unit processes a small input feature map with N number of input channels and a height of BH and a width of BW. When actually reading data for processing, actual data of BH*BW data is read by each bank in this case, so that it is possible to read by the same pattern on entire banks with the difference of one clock (or instruction processing cycle difference), and processing by the systolic array method is available. Note: Figure 2 shows the transfer from external memory. Figure 5 shows FM (feature map) being transferred iteratively to bank 0.. bank SA_H-1 (bank 2) corresponding to iteratively transfer the feature map from external memory to the L2 memory on a per-feature vector basis where each of the N inputs rectangles correspond to the per feature vector). Regarding claim 4: The system of Kim and Udayakumaran teaches: The system of claim 3 (as shown above). Kim further teaches: further comprising a streaming unit configured to transfer the feature vector from the L2 memory to the registers of the processing unit ([Page 3, 0063] A plurality of processor units may store the operation result in an internal register, and transmit the stored output feature map to a processor unit. [Page 5, 0084] When loading an input feature map from an external memory via memory loading. [Page 5, 0082] Therefore, the row of each processor unit processes a small input feature map with N number of input channels and a height of BH and a width of BW. When actually reading data for processing, actual data of BH*BW data is read by each bank in this case, so that it is possible to read by the same pattern on entire banks with the difference of one clock (or instruction processing cycle difference), and processing by the systolic array method is available. Note: Also see Figure 2 showing from memory to processing core). Regarding claim 5: The system of Kim and Udayakumaran teaches: The system of claim 4 (as shown above). Kim further teaches: wherein, to apply the padding to the feature vector, the processing unit is configured to instruct the streaming unit to apply the padding to the feature vector while transferring the feature vector from the L2 memory to the registers of the processing unit ([Page 5, 0082] Therefore, the row of each processor unit processes a small input feature map with N number of input channels and a height of BH and a width of BW. When actually reading data for processing, actual data of BH*BW data is read by each bank in this case, so that it is possible to read by the same pattern on entire banks with the difference of one clock (or instruction processing cycle difference), and processing by the systolic array method is available. [Page 5, 0084] the input feature map is filled with zeros when transmitting to each processor unit. Note: Filled with zeroes corresponds to apply the padding). Regarding claim 6: The system of Kim and Udayakumaran teaches: The system of claim 5 (as shown above). Kim further teaches: wherein to apply the padding to the feature vector the streaming unit is configured to insert zeros into the feature vector as instructed by the processing unit, resulting in a padded feature vector ([Page 5, 0084] If the original size of the input tile is H, W, and weights of 3×3 are used, the feature map data of (H+2)*(W+2) is placed by adding padding one by one to the top, bottom, left, and right. When loading an input feature map from an external memory via memory loading, the padding is not filled, leaving just a space, and the input feature map is filled with zeros when transmitting to each processor unit). Regarding claim 8: The system of Kim and Udayakumaran teaches: The system of claim 1 (as shown above). Kim further teaches: wherein, to identify the padding schema, the processing unit is configured to identify a source of the feature map and select a one of a group of padding schemas that corresponds to the source of the feature map ([0006] However, by loading the input feature map of the systolic array into the on-chip memory of each systolic array row with a padding area added and if the output feature map is stored in the on-chip memory without the padding area, the output of the previous layer cannot be used as an input in the next layer that requires padding. In order to use the output feature map of the previous layer as an input feature map, the padding area must be arranged in the address to be stored in the external memory through direct memory access (DMA). In addition, when the output feature map is stored in the feature map memory in consideration of the memory space for the padding area, the calculation result of one PE row must be stored in the feature map memory of the next PE row, and there is also a drawback that memory space is wasted. Also, since the output feature map, which is the result calculated with the input feature map, is stored separately in the feature map memory, the memory is used inefficiently. [0010] The processor applies the second weight group of the second layer, which is the next layer after the first layer, to the first output feature map to generate a final output feature map, and the address generator loads the input feature map from an external memory and transmits the final output feature map to the external memory. [0011] The address generator obtains the address information of the input feature map and a plurality of input pixels contained in the input feature map, determines the second position based on the address information of the first position and the size of the first weight group among the address information of the plurality of input pixels, and transmits the second position to the processor. [0012] The address generator obtains address information of the plurality of adjacent pixels, and configures part of the plurality of adjacent pixels to padding based on a result of comparing the address information of the plurality of adjacent pixels and the address information of the plurality of input pixels. Note: The address information of the feature map corresponds to the source of the feature map). Regarding claim 9: Kim teaches: A computing apparatus comprising: one or more computer-readable storage media; a processing unit comprising a memory and registers and wherein the processing unit is operatively coupled with the one or more computer-readable storage media; and program instructions stored on the one or more computer-readable storage media that, when executed by the processing unit, direct the processing unit to at least ([Page 3, 0062] Each processor unit may receive an input feature map value and an instruction to process from a processor unit [Page 3, 0063] A plurality of processor units may store the operation result in an internal register [Page 4, 0071] As shown in FIG. 4, the operation that each processor unit 434 should perform is determined by the instruction, which includes receiving the first instruction and passing it to the next processor unit. Note: Also see Figure 2 that shows memory and processing cores etc ): The remaining limitations of Claim 9 are substantially similar to claim 1 and therefore is rejected on similar grounds as claim 1 . Regarding claim 1 0 : Claim 1 0 is substantially similar to claim 2 and therefore is rejected on similar grounds as claim 2 . Regarding claim 11: Claim 1 1 is substantially similar to claim 3 and therefore is rejected on similar grounds as claim 3 . Regarding claim 12: Claim 1 2 is substantially similar to claim 4 and therefore is rejected on similar grounds as claim 4 . Regarding claim 13: Claim 1 3 is substantially similar to claim 5 and therefore is rejected on similar grounds as claim 5 . Regarding claim 14: Claim 1 4 is substantially similar to claim 6 and therefore is rejected on similar grounds as claim 6 . Regarding claim 16: Claim 1 6 is substantially similar to claim 8 and therefore is rejected on similar grounds as claim 8 . Regarding claim 17: Claim 1 7 is substantially similar to claim 1 and therefore is rejected on similar grounds as claim 1 . Regarding claim 18: Claim 1 8 is substantially similar to claim 2 and therefore is rejected on similar grounds as claim 2 . Regarding claim 19: The system of Kim and Udayakumaran teaches: The method of claim 17 (as shown above). Kim further teaches: further comprising iteratively transferring the feature map from external memory to the memory of the processing unit on a per-feature vector basis ([Page 4, 0074] As shown in FIG. 5, according to an exemplary embodiment of the present invention, processor units located at the far left and top receive weights, input feature maps, and instructions directly from the address generator (AG) and command generator, and the other processor units may receive input feature maps and weight values from their left and top processor units, respectively. Note: Figure 2 shows the transfer from external memory to the processing core). Regarding claim 20: The system of Kim and Udayakumaran teaches: The method of claim 17 (as shown above). Kim further teaches: wherein to apply the padding to the feature vector, the method further comprises inserting zeros into the feature vector based on the padding schema ([Page 5, 0084] If the original size of the input tile is H, W, and weights of 3×3 are used, the feature map data of (H+2)*(W+2) is placed by adding padding one by one to the top, bottom, left, and right. When loading an input feature map from an external memory via memory loading, the padding is not filled, leaving just a space, and the input feature map is filled with zeros when transmitting to each processor unit). Claim s 7 and 1 5 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al (US 20190164037 A1) in view of Udayakumaran et al (US 9329867 B2) and further in view of Regarding claim 7: The system of Kim and Udayakumaran teaches: The system of claim 6 (as shown above). However, the system of Kim and Udayakumaran does not explicitly disclose: wherein the processing unit is further configured to execute a deep neural network (DNN) having a convolutional neural network (CNN) layer that takes the padded feature vector as input and produces a convolved feature vector as output. Wang teaches, in an analogous system: wherein the processing unit is further configured to execute a deep neural network (DNN) having a convolutional neural network (CNN) layer that takes the padded feature vector as input and produces a convolved feature vector as output ([Page 3, 0046] Now that a feature vector can be computed and localized, dense neural patterns can be obtained by "network-convolution". Therefore, the activation of a neuron on the fifth convolutional layer may have been calculated on zero padded values. [Page 4, 0046] In order to produce the dense neural patterns map for the whole image using the fifth convolutional layer, we convolve the deep CNN model every 80 pixels in both x and y direction. Given a 640.times.480 image, it outputs 40.times .30 feature points which involves 8.times.6 model convolutions). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Kim and Udayakumaran to incorporate the teachings of Wang to use wherein the processing unit is further configured to execute a deep neural network (DNN) having a convolutional neural network (CNN) layer that takes the padded feature vector as input and produces a convolved feature vector as output. One would have been motivated to do this modification because doing so would give the benefit of c onvolv ing the deep CNN model every 80 pixels in both x and y direction as taught by Wang [ Page 4, 00 46 ] . Regarding claim 15: Claim 1 5 is substantially similar to claim 7 and therefore is rejected on similar grounds as claim 7 . Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Nair et al (US 11501147 B1) discloses Systems And Methods For Handling Padding Regions In Convolution Operations. The method may include maintaining, within a local memory device (LMD) included in a hardware accelerator (1) a filter matrix corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network (ANN), and (2) a set of activation vectors corresponding to an active region of an activation volume input into the convolutional layer. The method may also include determining that the active region of the activation volume is contiguous with a padding region associated with at least a portion of the activation volume. Bai et al (US 11295205 B2) discloses Neural Processing Unit (NPU) Direct Memory Access (NDMA) Memory Bandwidth Optimization. The NPU includes an NPU direct memory access (NDMA) core. The NDMA core includes a read engine having a read buffer. The NDMA core also includes a write engine having a write buffer. The NPU also includes a controller. The controller is configured to direct the NDMA core to perform hardware memory bandwidth optimization for reading/writing NDMA data in the read buffer and/or NDMA data in the write buffer. The NDMA core is also configured to transparently combine NDMA transaction requests for a data stripe to increase local access to available tensors in artificial neural networks. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT CHAITANYA RAMESH JAYAKUMAR whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-3369 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Mon-Fri 9am-1pm . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Omar Fernandez Rivas can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571)272-2589 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /C.R.J./ Examiner, Art Unit 2128 /OMAR F FERNANDEZ RIVAS/ Supervisory Patent Examiner, Art Unit 2128

Read full office action

Prosecution Timeline

Apr 24, 2023

Application Filed

Apr 09, 2024

Response after Non-Final Action

Mar 23, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

15/884,279

Patent 12293260

GENERATING AND DEPLOYING PACKAGES FOR MACHINE LEARNING AT EDGE DEVICES

2y 5m to grant Granted May 06, 2025

16/547,380

Patent 12147915

SYSTEMS AND METHODS FOR MODELLING PREDICTION ERRORS IN PATH-LEARNING OF AN AUTONOMOUS LEARNING AGENT

2y 5m to grant Granted Nov 19, 2024

15/866,225

Patent 11770571

Matrix Completion and Recommendation Provision with Deep Learning

2y 5m to grant Granted Sep 26, 2023

16/507,025

Patent 11769074

COLLECTING OBSERVATIONS FOR MACHINE LEARNING

2y 5m to grant Granted Sep 26, 2023

15/826,613

Patent 11741693

SYSTEM AND METHOD FOR SEMI-SUPERVISED CONDITIONAL GENERATIVE MODELING USING ADVERSARIAL NETWORKS

2y 5m to grant Granted Aug 29, 2023

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

26%

Grant Probability

48%

With Interview (+22.5%)

4y 6m

Median Time to Grant

Low

PTA Risk

Based on 51 resolved cases by this examiner. Grant probability derived from career allow rate.