Last updated: April 19, 2026
Application No. 18/008,237
BINARY NEURAL NETWORKS WITH GENERALIZED ACTIVATION FUNCTIONS

Non-Final OA §101§103
Filed
Dec 05, 2022
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Carnegie Mellon University
OA Round
3 (Non-Final)
This examiner grants 52% of cases after interview

— +38.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →
Grants 52% of resolved cases
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units
Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/17/2026 has been entered.

Remarks
This Office Action is responsive to Applicants' Amendment filed on February 17, 2026, in which claims 11, 13-14, 16-18, and 20 are currently amended. Claims 1-10, 12, 15, and 19 are canceled. Claims 21-23 are newly added.  Claims 11, 13-14, 16-18, and 20-23 are currently pending.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 11, 13-14, 16-18, and 20-23 under 35 U.S.C. 101 based on amendment have been considered, however, are not persuasive.
Examiner notes that by directing the claims entirely to a binary neural network the claims appear to be directed towards software-per-se which is non-statutory subject matter. This interpretation is explicitly supported by the instant specification ([¶0054] “the methods described herein can be implemented by a system comprising a processor and memory, storing software that, when executed by the processor, performs the functions comprising the method.”).  For at least these reasons it is reasonable and appropriate to maintain the rejection of claims 11, 13-14, 16-18, and 20-23 under 35 U.S.C. 101.

Applicant’s arguments with respect to rejection of claims 11, 13-14, 16-18, and 20-23 under 35 U.S.C. 103 based on amendment have been considered.
With respect to Applicant’s arguments on pp. 8-9 of the Remarks submitted 2/17/2026 that Ngoc does not disclose that the input and output distribution is shifted along the same axis.  Examiner respectfully disagrees.  While the interpretation of Ngoc in the Final Office Action mailed 12/18/2025 relied upon the “bias” as the third coefficient, Examiner notes that it would be reasonable to interpret beta as a third learned coefficient for shifting along the x-axis or the compound function B(x-threshold) as the third learnable coefficient for shifting along the x-axis.  See also FIG. 1, FIG. 2, and Eqn. 5 of Ngoc which shows x,y translation.  It also follows where Ngoc explicitly states that (x-threshold) shifts along the x-axis that  compound term B(x-threshold) must also necessarily shift along the x-axis.  For this reason Examiner asserts that it would be appropriate to interpret either beta or the compound term B(x-threshold) as the third coefficient to shift an output distribution along the x-axis.  For at least these reasons Examiner asserts that it would be reasonable and appropriate to maintain the rejection under 35 U.S.C. 103 in view of Lin and Ngoc.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


the claimed invention is directed to non-statutory subject matter.  The claim(s) do not fall within at least one of the four categories of patent eligible subject matter because claims 11, 13-14, 16-18, and 20-23 are directed towards a binary neural network which is interpreted as software-per-se which is non-statutory subject matter.  This interpretation is explicitly supported by the instant specification ([¶0054] “the methods described herein can be implemented by a system comprising a processor and memory, storing software that, when executed by the processor, performs the functions comprising the method.”).  Claims 11, 13-14, 16-18, and 20-23 are rejected as software-per-se.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


	Claims 11, 16, 17, 22,  and 23 are rejected under U.S.C. §103 as being unpatentable over the combination of Lin (“Rotated Binary Neural Network”, 2020) and Ngoc (“DPReLU: Dynamic Parametric Rectified Linear Unit”, 2020).

	 Regarding claim 11, Lin teaches A binary neural network comprising: a backbone comprising one or more reduction blocks, each reduction block comprising: ([Abstract] "In this paper, for the first time, we explore the influence of angular bias on the quantization error and then introduce a Rotated Binary Neural Network (RBNN)" [p. 7] "All the experiments are conducted on top of ResNet-20 with Bi-Real structure on CIFAR-10" [p. 6] "we further device the following training aware-approximation function to replace the sign function" ResNet interpreted as backbone comprising reduction blocks)
	one or more generalized Sign activation functions which shift a threshold between negative and positive results away from zero, the threshold being learnable for each channel;([Abstract] "we further introduce a bi-rotation formulation that learns two smaller rotation matrices" [p. 8 §4.3] "for XNOR-Net and our RBNN.  It can be seen that the weight values for XNOR-Net are mixed up tightly around zero center and the value magnitude remains far less than 1.  Thus it causes large quantization error when being pushed to the binary values of -1 and +1.  On the contrary, our RBNN results in two-mode distributions, each of which is centered around -1/+1.  Besides, there exist few weights around the zero, which creates a clear boundary between the two distributions.  Thus, by the weight rotation, our RBNN effectively reduces quantization error" [pp. 6-7] "The derivative of the sign function is almost zero everywhere, which makes the training unstable and degrades the accuracy performance. To solve it, various gradient approximations in the literature have been proposed to enable the gradient updating [...] we further devise the following training-aware approximation function to replace the sign function" In RBNN, the effective threshold between negative and positive binary weights is the learned hyperplane where the rotated weight vector changes sign, so the rotation matrix itself parametrizes that threshold surface instead of a fixed axis-aligned zero.  Lin explicitly states that they "devise an angle alignment scheme by learning a rotation matrix that rotates the full-precision weight vector to its geometrical vertex of the binary hypercube at the beginning of at the beginning of each training epoch", meaning these learned rotation parameters determine where weights fall relative to the sign flip boundary (and explicitly away from zero), the learned rotation matrix having learned weights for each respective channel.).
	However, Lin does not explicitly teach and one or more generalized PReLU activation functions that learn 
	a first coefficient to shift an input distribution along an x-axis,
	 a second coefficient used to fold the input distribution and  
	a third coefficient to shift an output distribution along the x-axis.

    PNG
    media_image1.png
    326
    444
    media_image1.png
    Greyscale

FIG. 2 of Ngoc


	Ngoc, in the same field of endeavor, teaches and one or more generalized PReLU activation functions that learn ([p. 2 §3.1] "DPReLU Rather than using a fixed ReLU activation function, our objective is to make the ReLU function more flexible and learnable. Our activation function is expressed in Equation (5)." [p. 3] "These four parameters are all learnable and interact with each other during the training process. When α = 0, β = 1, threshold = 0, and bias = 0, this formula becomes the vanilla ReLU" See Eqn. 5 and FIG. 2.  Learnable threshold interpreted as first coefficient to shift an input distribution.)
	a first coefficient to shift an input distribution along an x-axis,([p. 3] "These four parameters are all learnable and interact with each other during the training process. When α = 0, β = 1, threshold = 0, and bias = 0, this formula becomes the vanilla ReLU" [p. 3] "The (x − threshold) part makes our DPReLU to be shift on the x-axis by an interval of threshold")
	 a second coefficient used to fold the input distribution and  ([p. 2 §3.1] "DPReLU Rather than using a fixed ReLU activation function, our objective is to make the ReLU function more flexible and learnable. Our activation function is expressed in Equation (5)." [p. 3] "These four parameters are all learnable and interact with each other during the training process. When α = 0, β = 1, threshold = 0, and bias = 0, this formula becomes the vanilla ReLU" α is interpreted as second learnable coefficients used to fold the input distribution.)
	a third coefficient to shift an output distribution along the x-axis([p. 3] "The (x − threshold) part makes our DPReLU to be shift on the x-axis by an interval of threshold" [p. 2 §3.1] "DPReLU Rather than using a fixed ReLU activation function, our objective is to make the ReLU function more flexible and learnable. Our activation function is expressed in Equation (5)." [p. 3] "These four parameters are all learnable and interact with each other during the training process. When α = 0, β = 1, threshold = 0, and bias = 0, this formula becomes the vanilla ReLU" See also FIG. 1, FIG. 2, and Eqn. 5 of Ngoc which shows x,y translation.  It also follows where Ngoc explicitly states that (x-threshold) shifts along the x-axis that  compound term B(x-threshold) must also necessarily shift along the x-axis.  For this reason Examiner asserts that it would be appropriate to interpret either beta or the compound term B(x-threshold) as the third coefficient to shift an output distribution along the x-axis.).

	Lin as well as Ngoc are directed towards neural network model optimization.  Therefore, Lin as well as Ngoc are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to substitute the ReLU activation in the RBNN ResNet model used in Lin with the DPReLU activation taught by Ngoc.  Ngoc explicitly replaces ReLU in ResNet models and provides as additional motivation for combination ([p. 2 §1] “the convergence speed and ac curacy of the model can be improved if its formulation is learned from training rather than determined by a person. In this paper, we propose Dynamic Parametric ReLU (DPReLU), which has four trainable parameters, including alpha of PReLU and bias of FReLU, under the aforementioned assumptions” [p. 5 §5] “the DPReLU outperformed other ReLU variants in terms of convergence speed and accuracy”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 16, the combination of Lin and Ngoc teaches The binary neural network of claim 11 wherein the generalized Sign function learns a coefficient to shift an input distribution to obtain an optimal distribution for taking a sign.(Lin [Abstract] "we further introduce a bi-rotation formulation that learns two smaller rotation matrices" [p. 8 §4.3] "for XNOR-Net and our RBNN.  It can be seen that the weight values for XNOR-Net are mixed up tightly around zero center and the value magnitude remains far less than 1.  Thus it causes large quantization error when being pushed to the binary values of -1 and +1.  On the contrary, our RBNN results in two-mode distributions, each of which is centered around -1/+1.  Besides, there exist few weights around the zero, which creates a clear boundary between the two distributions.  Thus, by the weight rotation, our RBNN effectively reduces quantization error" [pp. 6-7] "The derivative of the sign function is almost zero everywhere, which makes the training unstable and degrades the accuracy performance. To solve it, various gradient approximations in the literature have been proposed to enable the gradient updating [...] we further devise the following training-aware approximation function to replace the sign function" In RBNN, the effective threshold between negative and positive binary weights is the learned hyperplane where the rotated weight vector changes sign, so the rotation matrix itself parametrizes that threshold surface instead of a fixed axis-aligned zero.  Lin explicitly states that they "devise an angle alignment scheme by learning a rotation matrix that rotates the full-precision weight vector to its geometrical vertex of the binary hypercube at the beginning of at the beginning of each training epoch", meaning these learned rotation parameters determine where weights fall relative to the sign flip boundary (and explicitly away from zero), the learned rotation matrix having learned weights for each respective channel.).
	
	 Regarding claim 17, the combination of Lin and Ngoc teaches The binary neural network of claim 11 wherein the first, second and third learned coefficients adjust activation distributions to obtain binary features.(Lin [Abstract] "we further introduce a bi-rotation formulation that learns two smaller rotation matrices" [p. 8 §4.3] "for XNOR-Net and our RBNN.  It can be seen that the weight values for XNOR-Net are mixed up tightly around zero center and the value magnitude remains far less than 1.  Thus it causes large quantization error when being pushed to the binary values of -1 and +1.  On the contrary, our RBNN results in two-mode distributions, each of which is centered around -1/+1.  Besides, there exist few weights around the zero, which creates a clear boundary between the two distributions.  Thus, by the weight rotation, our RBNN effectively reduces quantization error" [pp. 6-7] "The derivative of the sign function is almost zero everywhere, which makes the training unstable and degrades the accuracy performance. To solve it, various gradient approximations in the literature have been proposed to enable the gradient updating [...] we further devise the following training-aware approximation function to replace the sign function" Using DPReLU in the RBNN ResNet-30 architecture will necessarily adjust activation distributions used to obtain binary features through the layer sign binarization function.).
	
	 Regarding claim 22, the combination of Lin and Ngoc teaches The binary neural network of claim 17 wherein the generalized PReLU activation function is specified by: f⁡(xi)={xi-γi+ζi,if⁢xi>γiβi(xi-γi)+ζi,if⁢xi≤γi wherein xi is an input of the function ƒ on the ith channel; wherein γi ζi and are learnable shifts for moving the distribution; and wherein βi is a learnable coefficient controlling a slope of a negative part of the distribution.(Ngoc [p. 2 §3.1] "DPReLU Rather than using a fixed ReLU activation function, our objective is to make the ReLU function more flexible and learnable. Our activation function is expressed in Equation (5)." [p. 3] "These four parameters are all learnable and interact with each other during the training process. When α = 0, β = 1, threshold = 0, and bias = 0, this formula becomes the vanilla ReLU" See Eqn. 5).
	
	 Regarding claim 23, the combination of Lin and Ngoc teaches The binary neural network of claim 11 wherein the backbone further comprises: one or more normal blocks.(Lin [p. 3] "the convolution can be achieved by using the efficient xnor and bitcount logics" convolution interpreted as normal block).
	
	Claims 13, 18, and 21 are rejected under U.S.C. §103 as being unpatentable over the combination of Lin and Ngoc and Mishra (“A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions”, 2020).

	 Regarding claim 13, the combination of Lin and Ngoc teaches The binary neural network of claim 11.
	However, the combination of Lin and Ngoc doesn't explicitly teach wherein a cross-entropy loss function for training is replaced with a distributional loss function defined between outputs of the binary neural network and a real-valued reference network.

	Mishra, in the same field of endeavor, teaches a cross-entropy loss function for training is replaced with a distributional loss function defined between outputs of the binary neural network and a real-valued reference network.([p. 13] "Yun et. al. [62] distilled the predictive distribution among the same class labels for training. This distillation results in the regularization of knowledge obtained from wrong predictions, in a single DNN model. In other words, the model improves its performance using self-knowledge distillation by rapidly training and employing more precise predictions. Thus, the authors have tried to improve the performance of a build classifier in self-improving mode using dark knowledge of wrong predictions. The authors have combined two loss functions, i.e., cross-entropy and Kullback-Leibler (KL) divergence loss, using a regularization parameter λ. The combined loss function Lcomb(·) is defined as Lcomb(x, x′, y, φ, T) = LCE(x, y, φ)+λT 2LKL(x, x′, φ, T), where, LCE(·) and LKL(·) are the cross-entropy and KL-divergence losses"" See also FIG. 7 which shows compressed (binary) neural network trained alongside standard model.).

	The combination of Lin and Ngoc as well as Mishra are directed towards neural networks.  Therefore, the combination of Lin and Ngoc as well as Mishra are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Lin and Ngoc with the teachings of Mishra by using the combined loss function described in Mishra.  Mishra provides as additional motivation for combination that the combined loss model functions ([p. 13] "to improve the performance of a build classifier in self-improving mode using dark knowledge of wrong predictions").

	 Regarding claim 18, the combination of Lin, Ngoc, and Mishra teaches the binary neural network of claim 13 wherein the distribution loss function is a KL divergence between the output of a real-valued reference network and the output of the binary neural network. (Mishra [p. 13] "Yun et. al. [62] distilled the predictive distribution among the same class labels for training. This distillation results in the regularization of knowledge obtained from wrong predictions, in a single DNN model. In other words, the model improves its performance using self-knowledge distillation by rapidly training and employing more precise predictions. Thus, the authors have tried to improve the performance of a build classifier in self-improving mode using dark knowledge of wrong predictions. The authors have combined two loss functions, i.e., cross-entropy and Kullback-Leibler (KL) divergence loss, using a regularization parameter λ. The combined loss function Lcomb(·) is defined as Lcomb(x, x′, y, φ, T) = LCE(x, y, φ)+λT 2LKL(x, x′, φ, T), where, LCE(·) and LKL(·) are the cross-entropy and KL-divergence losses"" See also FIG. 7 which shows compressed (binary) neural network trained alongside standard model.).

Regarding claim 21, the combination of Lin, and Ngoc teaches The binary neural network of claim 16.
	However, the combination of Lin, and Ngoc doesn't explicitly teach wherein the generalized Sign activation function is specified by: xib=h⁡(xir)={+1,if⁢xir>αi-1,if⁢xir≤αi wherein xi r is a real-valued input of the function h on the ith channel; wherein xi b is the binary output; wherein αi is a learnable coefficient controlling a threshold which varies for different channels.

	Mishra, in the same field of endeavor, teaches the generalized Sign activation function is specified by: xib=h⁡(xir)={+1,if⁢xir>αi-1,if⁢xir≤αi wherein xi r is a real-valued input of the function h on the ith channel; wherein xi b is the binary output; wherein αi is a learnable coefficient controlling a threshold which varies for different channels.([p. 11] "The authors in the paper have highlighted two binarization techniques, i.e., deterministic and stochastic. In deterministic binarization, the real-valued number r is transformed into binary value b using Sign(·) function, which is defined as [Eqn. 7]").

	The combination of Lin and Ngoc as well as Mishra are directed towards neural networks.  Therefore, the combination of Lin and Ngoc as well as Mishra are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Lin and Ngoc with the teachings of Mishra by using the combined loss function described in Mishra.  Mishra provides as additional motivation for combination that the combined loss model functions ([p. 13] "to improve the performance of a build classifier in self-improving mode using dark knowledge of wrong predictions").
	
	Claim 14 is rejected under U.S.C. §103 as being unpatentable over the combination of Lin and Ngoc and Gope (US11561767B2).

	 Regarding claim 14, the combination of Lin and Ngoc teaches The binary neural network of claim 11.
	However, the combination of Lin and Ngoc doesn't explicitly teach wherein the backbone is MobileNet.

	Gope, in the same field of endeavor, teaches wherein the backbone is MobileNet. ([Col. 11 l. 24-40] "Quantizing the weights of MobileNets to binary (−1,1) or ternary (−1,0,1) values may achieve significant improvement in energy savings and possibly overall throughput especially on custom hardware, such as ASICs and FPGAs while reducing the resultant model size considerably. This is attributed to the replacement of multiplications by additions in binary-weight and ternary-weight networks").

	The combination of Lin and Ngoc as well as Gope are directed towards convolutional neural networks.  Therefore, the combination of Lin and Ngoc as well as Gope are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to substitute the Inception model of the teachings of the combination of Lin and Ngoc with the binary Mobilenet in Gope.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that both the Inception model and Mobilenet model are convolutional neural networks which can be used for binary classification (see Cao and Gope) and while each model has advantages and disadvantages the substitution would lead to obvious and expected results (binary classification).  

	Claim 20 is rejected under U.S.C. §103 as being unpatentable over the combination of Lin, Ngoc, Chollet (“Xception: Deep Learning with Depthwise Separable Convolutions”, 2017), and Szegedy (“Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”, 2017).

	 Regarding claim 20, the combination of Lin and Ngoc teaches The binary neural network of claim 11.
	However, the combination of Lin and Ngoc doesn't explicitly teach wherein: depth-wise and point-wise convolutions of the normal block are replaced with generic convolutions 
	in parallel with an identity shortcut using average pooling; 
	and input activations of the normal block are duplicated and concatenated with outputs when input and output channel numbers of the normal block are unequal.

	Chollet, in the same field of endeavor, teaches The binary neural network of claim 11 wherein: depth-wise and point-wise convolutions of the normal block are replaced with generic convolutions ([p. 1252 §1.2] "An “extreme” version of an Inception module, based on this stronger hypothesis, would first use a 1x1 convolution to map cross-channel correlations, and would then separately map the spatial correlations of every output channel. This is shown in figure 4. We remark that this extreme form of an Inception module is almost identical to a depthwise separable convolution, an operation that has been used in network design as early as 2014 [15] and has become more popular since its inclusion in the TensorFlow framework [1] in 2016" [p. 1253 §1.2] "Having made these observations, we suggest that it may be possible to improve upon the Inception family of architectures by replacing Inception modules with depthwise separable convolutions, i.e. by building models that would be stacks of depthwise separable convolutions" It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the reverse process can be performed to reduce the depthwise separable convolutions in Chollet to standard convolutions to arrive at the Inception architecture.).

	The combination of Lin and Ngoc as well as Chollet are directed towards ResNet model optimizations.  Therefore, the combination of Lin and Ngoc as well as Chollet are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Lin and Ngoc with the teachings of Chollet by reversing the depthwise separable convolution block in Chollet to use standard convolutions to arrive at the Inception architecture in Chollet.  Chollet explicitly relies upon the teachings of Szegedy.  
Szegedy, in the same field of endeavor, teaches in parallel with an identity shortcut using average pooling; (See blocks in FIG. 3 which each utilize parallel standard convolutions and an average pooling identity shortcut.)
	and input activations of the normal block are duplicated and concatenated with outputs when input and output channel numbers of the normal block are unequal.([pp. 4279-4280] "Each Inception block is followed by filter-expansion layer (1 × 1 convolution without activation) which is used for scaling up the dimensionality of the filter bank before the residual compensate for the dimensionality reduction induced by the Inception block" See also filter Concat in FIG. 3 and expected output channel change for each block in FIG. 2 (combined Inception A and Reduction A block takes 384 input channels and outputs 1024 channels.).

	The combination of Lin and Ngoc, and the combination of Chollet and Szegedy are directed towards ResNet model optimizations.  Therefore, the combination of Lin, Ngoc, and Chollet as well as Szegedy are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Lin, Ngoc, and Chollet with the teachings of Szegedy by using the Inception-Resnet model as the model binarized by Lin, and converting between the Inception-Resnet model in Szegedy and the Xception model in Chollet.  Chollet explicitly teaches that Xception is based on using depthwise separable convolutions in the Inception network such that using the Inception network would lead to obvious and expected results.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Phan (“MoBiNet: A Mobile Binary Network for Image Classification”, 2020) is directed towards a binary neural network with reduction blocks and learnable PReLU activation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Dec 05, 2022
Application Filed
Aug 16, 2025
Non-Final Rejection — §101, §103
Nov 18, 2025
Response Filed
Dec 10, 2025
Final Rejection — §101, §103
Feb 17, 2026
Request for Continued Examination
Feb 25, 2026
Response after Non-Final Action
Mar 11, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021
Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
18/486,534
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
16/902,547
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
18/607,777
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
16/940,293
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.