Last updated: May 04, 2026
Application No. 18/301,272
DATA PROCESSING METHOD FOR A CONVOLUTIONAL NEURAL NETWORK

Non-Final OA §103§112
Filed
Apr 17, 2023
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Montage Technology Co. Ltd.
OA Round
1 (Non-Final)
This examiner grants 52% of cases after interview

— +38.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →
Grants 52% of resolved cases
Career Allowance Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
70 currently pending
Career history
206
Total Applications
across all art units
Statute-Specific Performance

§101
24.1%
-15.9% vs TC avg
§103
41.6%
+1.6% vs TC avg
§102
11.9%
-28.1% vs TC avg
§112
21.6%
-18.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action 
This action is in response to the claims filed 4/17/2023: 
Claims 1 – 11 are pending.
Claim 1 is independent.

Specification
The disclosure is objected to because of the following informalities: 
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.

Claim Objections
Claim 1 objected to because of the following informalities:  "performing batch convolution operation to the configured input" should read "performing a batch convolution operation to the configured input".  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 5 and 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 5, "a space occupied by continuously distributed same indexed sections" is indefinite.  It's unclear what the "same indexed sections" are the same as. Similarly, whether "continuously distributed" suggests a probability distribution or a continuous mechanical process, is indefinite.  Since the interpretations of "continuously distributed" are contradictory and since "same as" is relative terminology without a basis for comparison, the scope of the claim cannot reasonably be determined.  In the interest of further examination the claim limitation is interpreted as "a space occupied by indexed sections of multiple convolution kernels" where space occupied is also interpreted very broadly.

Regarding claim 8, "a self-attention module of a BERT or a Transformer" is indefinite.  First, "BERT" is an undefined acronym.  One of ordinary skill in the art would recognize that "BERT" is a common abbreviation for Bidirectional Encoder Representations from Transformers, however, the claim appears to suggest that BERT is not a Transformer model, such that it would not be clear what BERT actually stands for.  In the interest of further examination the claim is interpreted as "a self-attention module of a Transformer".

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


	Claims 1-7, 10,  and 11 are rejected under U.S.C. §103 as being unpatentable over the combination of Ma (“WeightNet: Revisiting the Design Space of Weight Networks”, 2020) and Nvidia (“NVDLA Documentation”, 2018).

    PNG
    media_image1.png
    346
    1036
    media_image1.png
    Greyscale

FIG. 2 of Ma



    PNG
    media_image2.png
    352
    1018
    media_image2.png
    Greyscale

Interpretation of FIG. 2 of Ma


	 Regarding claim 1, Ma teaches A data processing method for a convolutional neural network, ([p. 1 §1] "Designing convolution weight is a key issue in convolution networks (CNNs)") wherein the convolutional neural network comprises a first convolutional layer and a second convolutional layer ([Abstract] "We use the WeightNet, composed entirely of (grouped) fully-connected layers, to directly out put the convolutional weight" [p. 12] "Ablation study on different stages. The means the convolutions in that stage is integrated with our proposed WeightNet" Ma explicitly states that the WN is integrated with the convolution in the same stage/layer.  FIG. 2 shows multiple sequential Conv/WN layers.  As these layers explicitly comprise convolution they are interpreted as convolutional layers), wherein an output tensor of the first convolutional layer is used as a weight matrix for the second convolutional layer, the data processing method comprising:(See FIG. 2 [Abstract] "We use the WeightNet, composed entirely of (grouped) fully-connected layers, to directly out put the convolutional weight" [p. 4] “The convolutional weight is generated by Weight Net” Examiner notes that one of ordinary skill in the art would recognize that the FC-based WeightNet that outputs a kernel tensor is indistinguishable at the architectural level from a convolutional sublayer that outputs the same kernel tensor.  Ma explicitly states that the output of the WeightNet sublayer is a convolutional weight)
	setting the first convolutional layer to a batch convolution mode, and configuring a parameter of a batch convolution operation and parameters of an input tensor to be processed by the first convolutional layer; ([p. 8] "Training with batch dimension The weight generated by WeightNet has a dimension of batch size, here we briefly introduce the training method related to the batch dimension. We denote B as batch size and reshape the input X of the convolution layer to (1,B×C,h,w). Thus X has B×C channel numbers, which means we regard different samples in the same batch as different channels. Next, we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, the inputs and the outputs in each group are both equal to C." Ma explicitly configured both the operation parameter (group number B) and input tensor parameters by reshaping the input tensor X)
	wherein the configuring comprises: configuring the parameter of the batch convolution operation based on a first parameter of the weight matrix for the second convolutional layer, ([p. 8] "The weight generated by WeightNet has a dimension of batch size […] we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B" the parameter B is chosen/used based on parameters of the weight tensor (B,C,C,kh,kw) for the second layer)
	and configuring the parameters of the input tensor based on a second parameter of the weight matrix for the second convolutional layer and a first parameter [of direct memory accesses (DMAs)] where the output tensor of the first convolutional layer is stored; and([p. 8] "Training with batch dimension The weight generated by WeightNet has a dimension of batch size, here we briefly introduce the training method related to the batch dimension. We denote B as batch size and reshape the input X of the convolution layer to (1,B×C,h,w). Thus X has B×C channel numbers, which means we regard different samples in the same batch as different channels. Next, we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, the inputs and the outputs in each group are both equal to C." h and w are interpreted as a first and second parameter of direct memory accesses (See Combination) based on Kh an Kw (the second and third parameter of the weight matrix for the second convolutional layer))
	performing batch convolution operation to the configured input tensor of the first convolutional layer, ([p. 8] "The weight generated by WeightNet has a dimension of batch size […] we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, , the inputs and the outputs in each group are both equal to C." See also FIG. 3)
	and configuring output parameters of the first convolutional layer based on a third parameter of the weight matrix for the second convolutional layer and a second parameter [of the DMAs], such that a format of the output tensor of the first convolutional layer is consistent with a format of the weight matrix for the second convolutional layer; ([p. 8] "Training with batch dimension The weight generated by WeightNet has a dimension of batch size, here we briefly introduce the training method related to the batch dimension. We denote B as batch size and reshape the input X of the convolution layer to (1,B×C,h,w). Thus X has B×C channel numbers, which means we regard different samples in the same batch as different channels. Next, we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, the inputs and the outputs in each group are both equal to C." h and w are interpreted as a first and second parameter (based on Kh an Kw (the second and third parameter of the weight matrix for the second convolutional layer) used to ensure that the format of the output tensor of the first convolutional layer is consistent with a format of the weight matrix for the second convolutional layer (through reshaping))
	wherein each channel of the output tensor of the first convolutional layer is used as a convolution kernel of the weight matrix for the second convolutional layer.([p. 4] "We denote a convolution operation with the input feature map X ∈ RC×h×w, the output feature map Y ∈ RC×h×w, and the convolution weight W∈ RC×C×kh×kw" [p. 6] "the block is used before or after a convolution layer, α is computed right before a convolution (on the input feature X): Yc = Wc * (X·α)" Shows that each output tensor (feature map) channel is explicitly used as a kernel in W∈ RC×C×kh×kw where Yc = Wc * (X) [see p. 5].  In other words, the output tensor of the first convolutional layer has a channel dimension C, that tensor is reshaped while preserving the channel indexing C, after reshape each channel-indexed slice of that tensor corresponds to one kernel Wc in the weight matrix of the next convolution which WeightNet explicitly applies on p. 5).
	However, Ma does not explicitly teach using the second and third parameter for direct memory access.

	Nvidia, in the same field of endeavor, teaches using the second and third parameter for direct memory access ([p. 46] "WEIGHT_SIZE_EXT (S/R/C is the width/height/channel of input weight cube" [p. 75] "Dilation is an option that enlarges the kernel in R and S dimensions with zero values. This function can be enabled by SW according as needed." [p. 44] "<CDMA|CSC>.WEIGHT_BYTES: It should be configured as: weight_size=R*S*C*BPE*K. Regardless of weight compress mode or uncompressed mode" Nvidia explicitly discloses kernel height and width parameters R and S for DMA).

	Ma as well as Nvidia are directed towards convolutional neural networks.  Therefore, Ma as well as Nvidia are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Ma with the teachings of Nvidia by implementing Ma’s dynamically generated convolution weights using NVDLA’s weight-compression and batching mechanisms when executing WeightNet layers on hardware.  Nvidia provides as additional motivation for combination ([p. 3] “NVDLA supports two memory bandwidth optimization features that can significantly help to reduce memory bandwidth requirements for CNN layer(s) that require huge data exchange, e.g. fully connected layers.” [p. 133] “Weight compression. NVDLA has a mechanism to reduce memory bandwidth by sparsely storing convolution weights.”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 2, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein configuring the parameter of the batch convolution operation based on a first parameter of the weight matrix for the second convolutional layer comprises: configuring a batch number of the batch convolution operation according to a number of convolution kernel groups of the weight matrix.(Ma [p. 8] "Training with batch dimension The weight generated by WeightNet has a dimension of batch size, here we briefly introduce the training method related to the batch dimension. We denote B as batch size and reshape the input X of the convolution layer to (1,B×C,h,w). Thus X has B×C channel numbers, which means we regard different samples in the same batch as different channels. Next, we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, the inputs and the outputs in each group are both equal to C." batch size interpreted as synonymous with batch number of the group convolution (convolution kernel groups) of weight matrices W and (B,C,C,kh,kw)).
	
	 Regarding claim 3, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein configuring the parameters of the input tensor based on a second parameter of the weight matrix for the second convolutional layer and a first parameter of DMAs comprises: configuring a width and a height of the input tensor according to a number of convolution kernels of each convolution kernel group distributed on one of the DMAs and a number of the DMAs, respectively.(Nvidia [p. 39] "D_DATAIN_SIZE_0/1: The input W/H/C;" [p. 70] "the output data composes a W x H x K’ output surface. Here K’ refers to kernel size in a kernel group, with one kernel group being the number of kernels processed at a time" [p. 63] "Bridge DMA has two DMA interfaces, one connects to external DRAM, and the other connects to internal SRAM. Both two interfaces support read and write requests" [p. 77] "CDMA_WT is simple compared to other DMA engines, except that it can support three read steams at a time" NVDLA programming guide explicitly configures the input feature cube size using W/H/C.  Each DMA path and stream interpreted as a number of DMAs).
	
	 Regarding claim 4, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein configuring output parameters of the first convolutional layer based on a third parameter of the weight matrix for the second convolutional layer and a second parameter of the DMAs comprises: configuring a batch stride and a line stride of the output tensor of the first convolutional layer according to a space occupied by convolution kernels of each convolution kernel group of the weight matrix distributed on one of the DMAs and a buffer size of each of the DMAs, respectively.(Nvidia [p. 25] "D_LINE_UV_STRIDE 0x5044 Line stride of input cube’s UV plane […] D_BATCH_STRIDE 0x505c The stride of input data cubes when batches > 1" [p. 59] "Multi-batch The start address of each input feature cube has to be carefully arranged to make sure their offset is a fixed number as BATCH_STRIDE." [p. 58] "WEIGHT_BANK should be allocated large enough to store one kernel group weight plus 128Bytes; For compression mode, BANK for WMB is fixed as 1, this means WMB for one kernel group should always less than 32KB-128B so that additional 128Bytes can be stored in that bank" [p. 77] "CDMA consists of three sub-modules to fetch pixel data or feature data for convolution [...] Check status of convolution buffer for enough free space").
	
	 Regarding claim 5, the combination of Ma and Nvidia teaches The data processing method according to claim 4, wherein each convolution kernel group of the weight matrix for the second convolutional layer comprises a convolution kernel subgroup stored on one of the DMAs, and(Nvidia [p. 115] "Bit tags of one kernel group compose a weight mask bit group, or WMB. WMBs reside in a dedicate memory surface")
	configuring output parameters of the first convolutional layer further comprises: configuring a surface stride of the output tensor of the first convolutional layer according to a surface size of the convolution kernel subgroup of the weight matrix, (Nvidia [p. 29] "D_SRC_SURFACE_STRIDE 0xa024 Surface stride of input cube")
	wherein the surface size represents a space occupied by continuously distributed same indexed sections of multiple convolution kernels of one convolution kernel subgroup.(Nvidia [p. 112] "Scan the 1x1xC’ small cubes in a group with C’->K->W->H->C sequence. Here C’ changes fastest and C changes slowest. And map them compactly as scanning sequence. Map the weight groups compactly. Do not append any 0s between group boundaries" Because the scan order places K immediately after C', the implementation stores for a given section (same c' slice and same spatial indices), the corresponding sections across multiple kernels adjacent in memory (K varies while other indices are held)).
	
	 Regarding claim 6, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein each convolution kernel group of the weight matrix for the second convolutional layer comprises a convolution kernel subgroup stored on one of the DMAs, and corresponding convolution kernel subgroups of a plurality of convolution kernel groups on the same DMA are stored sequentially.(Nvidia [p. 65] "Convolution stride and zero padding" [p. 116] "Store WGS, WMB and compressed weight into three separated memory surfaces" [p. 77] "CDMA_WT. CDMA_WT is simple compared to other DMA engines, except that it can support three read steams at a time [...]  If the input weight format is compressed, weight, WMB, and WGS are all fetched" See FIG. 40 showing sequential storing of kernel groups.  WGS, WMB, and compressed weight interpreted as kernel subgroups.).
	
	 Regarding claim 7, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein between the first convolutional layer and the second convolutional layer, the convolutional neural network further comprises one or more layers of pooling, batch normalization or activation.(Ma [p. 4] " The symbol (>) represents the dimension reduction (global average pool) from feature space (C×H×W) to kernel space (C)" [p. 7] "For the activation vector α’s generating step, since α is a (M ×C)-dimensional vector, it may be large and parameter-consuming, therefore, we use two fully-connected layers with a reduction ratio r. It has a similar process with the two methods: a global average pool, two fully-connected layer with non-linear sigmoid (σ): α = σ(Wfc2 × Wfc1 × 1 hw i∈h,j∈w Xc,i,j), here Wfc1 ∈ RC/r× C, Wfc2 ∈ RMC× C/r, (×) denotes the matrix multiplication, r has a default setting of 16." [p. 12] "We compare the cases: one global average pooling in 1) each stage, 2) each block, and 3) each layer" See FIG. 2).
	
	 Regarding claim 10, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein the input tensor to be processed by the first convolutional layer has three axes, wherein a length of a first one of the three axes corresponds to a number of channels of the input tensor, a length of a second one of the three axes corresponds to a width of the input tensor, and a length of a third one of the three axes corresponds to a height of the input tensor.(Ma [p. 4] "We denote a convolution operation with the input feature map X ∈ RC×h×w" [p. 8] "Training with batch dimension The weight generated by WeightNet has a dimension of batch size, here we briefly introduce the training method related to the batch dimension. We denote B as batch size and reshape the input X of the convolution layer to (1,B×C,h,w). Thus X has B×C channel numbers, which means we regard different samples in the same batch as different channels. Next, we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, the inputs and the outputs in each group are both equal to C.").
	
	 Regarding claim 11, the combination of Ma and Nvidia teaches The data processing method according to claim 1, wherein the input tensor to be processed by the first convolutional layer has four axes, wherein a length of a first one of the four axes corresponds to a number of channels of the input tensor, a length of a second one of the four axes corresponds to a width of the input tensor, a length of a third one of the four axes corresponds to a height of the input tensor, and a length of a fourth one of the four axes corresponds to a batch size of the input tensor.(Ma [p. 8] "reshape the input X of the convolution layer to (1,B×C,h,w)" [p. 8] "Training with batch dimension The weight generated by WeightNet has a dimension of batch size, here we briefly introduce the training method related to the batch dimension. We denote B as batch size and reshape the input X of the convolution layer to (1,B×C,h,w). Thus X has B×C channel numbers, which means we regard different samples in the same batch as different channels. Next, we reshape the generated weight W to (B,C,C,kh,kw). Then it becomes a group convolution, with a group number of B, the inputs and the outputs in each group are both equal to C.").
	

	Claim 8 is rejected under U.S.C. §103 as being unpatentable over the combination of Ma and Nvidia and in further view of Cordonnier (“ON THE RELATIONSHIP BETWEEN SELF-ATTENTION AND CONVOLUTIONAL LAYERS”, 2020).

	 Regarding claim 8, the combination of Ma and Nvidia teaches The data processing method according to claim 1.
	However, the combination of Ma and Nvidia doesn't explicitly teach wherein a convolution operation of the first convolutional layer and a convolution operation of the second convolutional layer are two linear mappings of a self-attention module of a BERT or a Transformer.

	Cordonnier, in the same field of endeavor, teaches a convolution operation of the first convolutional layer and a convolution operation of the second convolutional layer are two linear mappings of a self-attention module of a BERT or a Transformer. ([Abstract] "This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers").

	The combination of Ma and Nvidia as well as Cordonnier are directed towards convolutional neural networks.  Therefore, the combination of Ma and Nvidia as well as Cordonnier are reasonably pertinent analogous art.  Cordonnier explicitly discloses that self-attention layers can be parameterized to perform convolution and often are such that it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the convolutions in Ma can be substituted with transformer self-attention layers.  Cordonnier provides as additional motivation for combination ([Abstract] “recently, Bello et al. (2019) augmented CNNs by replacing some convolutional layers with self-attention layers, leading to improvements on image classification and object detection tasks. Interestingly, Ramachandran et al. (2019) noticed that, even though state-of-the art results are reached when attention and convolutional features are combined, under same computation and model size constraints, self-attention-only architectures also reach competitive image classification accuracy.”).

	Claim 9 is rejected under U.S.C. §103 as being unpatentable over the combination of Ma and Nvidia and Naghizadeh (“Multidimensional convolution via a 1D convolution algorithm”, 2009).

	 Regarding claim 9, the combination of Ma and Nvidia teaches The data processing method according to claim 1.
	However, the combination of Ma and Nvidia doesn't explicitly teach wherein the input tensor to be processed by the first convolutional layer has two axes, wherein a length of one of the two axes corresponds to a number of channels of the input tensor, and a length of the other of the two axes corresponds to a width of the input tensor.

	Naghizadeh, in the same field of endeavor, teaches The data processing method according to claim 1, wherein the input tensor to be processed by the first convolutional layer has two axes, wherein a length of one of the two axes corresponds to a number of channels of the input tensor, and a length of the other of the two axes corresponds to a width of the input tensor.([p. 1] "Figure 2 portrays a numerical example showing all the steps required to convolve two 2D signals via 1D convolution. The above strategy can be easily extended to higher dimensions. Figure 3 shows how one can reduce a 3D convolution to a 1D convolution").

	The combination of Ma and Nvidia as well as Naghizadeh are directed towards performing convolution.  Therefore, the combination of Ma and Nvidia as well as Naghizadeh are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Ma and Nvidia with the teachings of Naghizadeh by reducing the dimensions of convolution.  Naghizadeh provides as additional motivation for combination ([p. 1] "To further improve the proposed algorithm, one can use sparse 1D convolution algorithms to speed up computation time (Claerbout, 1998). For instance, in Figures 1e and 3e the signal contains an excessive amount of zeros (gray cells). Therefore, a convolution sum that avoids multiplication and summation of zeros is desirable. The solution is to save the indices of the nonzero values in Figures 1e and 3e (black cells) and perform operations that only use these values").

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Brabandere (“Dynamic Weight Networks”, 2017) where convolutional layer weights are dynamically generated from feature maps.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124                                                                                                                                                                                                        


/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Apr 17, 2023
Application Filed
Dec 20, 2025
Non-Final Rejection — §103, §112
Apr 20, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021
Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
4y 7m to grant Granted Feb 24, 2026
18/486,534
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 4m to grant Granted Feb 10, 2026
16/902,547
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
5y 7m to grant Granted Jan 27, 2026
18/607,777
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
1y 9m to grant Granted Jan 06, 2026
16/940,293
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
5y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 5m (~1y 4m remaining)
Median Time to Grant
Low
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allowance rate.