Last updated: May 29, 2026
Application No. 18/295,791
NEURAL NETWORK LAYER FOR NON-LINEAR NORMALIZATION

Non-Final OA §101§103
Filed
Apr 04, 2023
Priority
May 13, 2022 — EU 22 17 3331.4
Examiner
RAMESH, TIRUMALE K
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
1 (Non-Final)
Interview Optional

— +2.1% interview lift. Interview lift (+2.1%) is below the 15.0% threshold. A written response is recommended.
Based on 40 resolved cases, 2023–2026
Examiner Intelligence

RAMESH, TIRUMALE K View full profile →
Grants only 18% of cases
Career Allowance Rate
7 granted / 40 resolved
-37.5% vs TC avg
Minimal +2% lift
Without
With
+2.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
17 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
1.2%
-38.8% vs TC avg
§103
98.4%
+58.4% vs TC avg
§102
0.4%
-39.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-8 are rejected for “software per se” under 101 as claim 1 recites a machine learning system comprising a plurality of layers and the claims do not recite any computer hardware.  The machine learning system and the plurality of layers are interpreted as software components; thus the claims are directed to non-statutory subject matter, software per se.

Claim Interpretation
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation is:  “  A training system configured to train a machine learning system including a plurality of layers, the training system configured to: provide an output signal based on an input signal by forwarding the input signal through the plurality of layers of the machine learning system, at least one of the layers of the plurality of layers being configured to receive a layer input, which is based on the input signal, and provide a layer output based on which the output signal is determined, the layer being configured to determine the layer output using a non-linear normalization of the layer input”  in claim 11. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2 and 8-12 are rejected under 35 U.S.C. 103 as being anticipated over 
Chun-Fu CHEN et al. (hereinafter Chen) US 2019/0122113 A1,
in view of Jiu-Che Lin et.al. (hereinafter Lin) US 2023/0342016 A1.
In regard to claim 1:
Chen discloses:
-	A computer-implemented machine learning system, the machine learning system comprising: a plurality of layers, the machine learning system being configured to provide an output signal based on an input signal by forwarding the input signal through the plurality of layers of the machine learning system, wherein at least one of the layers of the plurality of layers is configured to receive a layer input, which is based on the input signal, and to provide a layer output based on which the output signal is determined, 
In [0021]:
FIG. 2 illustrates a convolutional neural network, according to one embodiment described herein. 

    PNG
    media_image1.png
    416
    765
    media_image1.png
    Greyscale

As shown, the CNN 200 includes an input layer 210, a convolutional layer 215, a subsampling layer 220, a convolutional layer 225, subsampling layer 230, fully connected layers 235 and 240, and an output layer 245. The input layer 210 in the depicted embodiment is configured to accept a 32×32 pixel image. The convolutional layer 215 generates 6 28×28 feature maps from the input layer, and so on. While a particular CNN 200 is depicted, more generally a CNN is composed of one or more convolutional layers, frequently with a subsampling step, and then one or more fully connected layers. 
In [0037]:
 Generally, performing ISBP from 3-way tensor to 3-way tensor tends to be more complicated than the above two cases, as the operations of the forward propagation between the input and output
In [0027]:
Accordingly, DCNN optimization component 140 can perform Importance Score Back Propagation (ISBP) for optimizing a CNN.
In [0041]:
If             
                
                    
                        B
                        P
                    
                    
                        c
                        o
                        n
                        v
                    
                    
                
                f
                n
                
                    
                        i
                        ,
                         
                        j
                    
                
                ≠
                1
            
        , this can indicate that the i.sup.th position in the output layer comes from a convolution operation involving the 
            
                
                    
                        j
                    
                    
                        t
                        h
                         
                    
                
            
        position in the input layer. 
( BRI: The layer output based on which the final output signal of a neural network is determined is the output layer, which is the final, topmost layer in the network architecture)
In [0023]:
 As discussed above, typical CNNs consist of convolutional (Conv) layer, pooling layer, non-linear layer (e.g. ReLU), normalization layer (e.g. local response normalization (LRN)) and fully connected (FC) layer, etc. The convolutional layer generally includes a set of trainable kernels, which extract local features from a small spatial region but cross-depth volume of the input tensor. Each kernel can be trained as a feature extractor for some specific visual features, such as an edge or a color in the first layer
(BRI: normalization provide an output that is a nonlinear transformation of their input)
	Chen does not explicitly disclose:
-	wherein the layer is configured to determine the layer output using a non-linear normalization of the layer input.  
	However, Lin discloses:
-	wherein the layer is configured to determine the layer output using a non-linear normalization of the layer input.  
In [0055]:
The machine-learning model processes the input to generate an output. An artificial neural network includes an input layer that consists of values in a data point. The next layer is called a hidden layer, and nodes at the hidden layer each receive one or more of the input values. Each node contains parameters (e.g., weights) to apply to the input values. Each node therefore essentially inputs the input values into a multivariate function (e.g., a non-linear mathematical transformation) to produce an output value. A next layer can be another hidden layer or an output layer. In either case, the nodes at the next layer receive the output values from the nodes at the previous layer, and each node applies weights to those values and then generates its own output value. 
(BRI:  the art is analogous to “ Chen”  as both systems often use machine learning to process input data, recognize patterns, and make autonomous, optimized decisions (outputs and solve reasonably pertinent problem where ML system used for, say, quality control in manufacturing (control variables as “knobs”) is highly pertinent to an AI system in a non-manufacturing sector, such as automated medical imaging or text analysis . Each node (or neuron) in a hidden or output layer contains learnable parameters—specifically weights and a bias—that are applied to input values to produce an output)

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
 It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output and also teaches non-linear normalization of the layer input within the context of the CNN consisting of a “normalization layer”.
Lin teaches a non-linear normalization of the layer input and determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).



In regard to claim 2:
Chen does not explicitly disclose:	
-	wherein for determining the layer output, the layer is configured to normalize at least one group of values of the layer input, wherein the group includes all values of the layer input or a subset of the values of the layer input.  
However, Lin discloses:
-	wherein for determining the layer output, the layer is configured to normalize at least one group of values of the layer input, wherein the group includes all values of the layer input or a subset of the values of the layer input.  
 	In [0088]:
 At operation 412, processing logic performs one or more preprocessing operations on the input data. In some embodiments, the preprocessing operations can include a smoothing operation, a normalization operation, a dimensions reduction operations, a sort features operation, or any other operation configured to prepare data for training a machine-learning model.
In [0055]:
The machine-learning model processes the input to generate an output. An artificial neural network includes an input layer that consists of values in a data point. The next layer is called a hidden layer, and nodes at the hidden layer each receive one or more of the input values. Each node contains parameters (e.g., weights) to apply to the input values. Each node therefore essentially inputs the input values into a multivariate function (e.g., a non-linear mathematical transformation) to produce an output value
(BRI: the input layer acts as a container for individual data point features, and each node in the hidden layer receives all or a weighted subset of these input values, effectively] aggregating them to learn complex patterns)

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output using a non-linear normalization of the layer input.  
Lin teaches determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
In regard to claim 8:
Chen does not explicitly disclose:
-	wherein the input signal characterizes a signal obtained from a sensor.  
However, Lin discloses:
-	wherein the input signal characterizes a signal obtained from a sensor.  
In [0061]:
 running trained machine-learning model 190 on the current sensor data input to obtain one or more outputs. 
In [0057]:
After one or more rounds of training, processing logic can determine whether a stopping criterion has been met. A stopping criterion can be a target level of accuracy, a target number of processed images from the training dataset
(BRI:  the art is considered analogous in the context of “sensor” since it addresses the same problem to map using a sensor data as obtained as an input signal in which the input signal are images] 

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output using a non-linear normalization of the layer input.  
Lin teaches determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
In regard to claim 9:
Chen discloses:
-	A computer-implemented method for training a machine learning system, the machine learning system including a plurality of layers, the method comprising the following: 
providing an output signal based on an input signal by forwarding the input signal through the plurality of layers of the machine learning system, at least one of the layers of the plurality of layers receiving a layer input, which is based on the input signal, and providing a layer output based on which the output signal is determined, 
In [0021]:
FIG. 2 illustrates a convolutional neural network, according to one embodiment described herein. 

    PNG
    media_image1.png
    416
    765
    media_image1.png
    Greyscale

As shown, the CNN 200 includes an input layer 210, a convolutional layer 215, a subsampling layer 220, a convolutional layer 225, subsampling layer 230, fully connected layers 235 and 240, and an output layer 245. The input layer 210 in the depicted embodiment is configured to accept a 32×32 pixel image. The convolutional layer 215 generates 6 28×28 feature maps from the input layer, and so on. While a particular CNN 200 is depicted, more generally a CNN is composed of one or more convolutional layers, frequently with a subsampling step, and then one or more fully connected layers. 
In [0037]:
 Generally, performing ISBP from 3-way tensor to 3-way tensor tends to be more complicated than the above two cases, as the operations of the forward propagation between the input and output
In [0027]:
Accordingly, DCNN optimization component 140 can perform Importance Score Back Propagation (ISBP) for optimizing a CNN.
In [0041]:
If             
                
                    
                        B
                        P
                    
                    
                        c
                        o
                        n
                        v
                    
                    
                
                f
                n
                
                    
                        i
                        ,
                         
                        j
                    
                
                ≠
                1
            
        , this can indicate that the i.sup.th position in the output layer comes from a convolution operation involving the 
            
                
                    
                        j
                    
                    
                        t
                        h
                         
                    
                
            
        position in the input layer. 
( BRI: The layer output based on which the final output signal of a neural network is determined is the output layer, which is the final, topmost layer in the network architecture].
In [0023]:
 As discussed above, typical CNNs consist of convolutional (Conv) layer, pooling layer, non-linear layer (e.g. ReLU), normalization layer (e.g. local response normalization (LRN)) and fully connected (FC) layer, etc. The convolutional layer generally includes a set of trainable kernels, which extract local features from a small spatial region but cross-depth volume of the input tensor. Each kernel can be trained as a feature extractor for some specific visual features, such as an edge or a color in the first layer
(BRI: normalization provide an output that is a nonlinear transformation of their input)
in [0020] :
The deep convolutional neural network (DCNN) optimization component 140 is generally configured to optimize the structure of the trained DCNN model 138. In order to achieve a balance between the predictive power and model redundancy of CNNs, the DCNN optimization component 140 can learn the importance of convolutional kernels and neurons in FC layers from feature selection perspective. The DCNN optimization component 140 can optimize a CNN by pruning less important kernels and neurons based on their importance scores. The DCNN optimization component 140 can further fine-tune the remaining kernels and neurons to achieve a minimum loss of accuracy in the optimized DCNN.
	Chen does not explicitly disclose:
-	the layer determining the layer output using a non-linear normalization of the layer input.  
However, Lin discloses:
-	wherein the layer is configured to determine the layer output using a non-linear normalization of the layer input.  
In [0055]:
The machine-learning model processes the input to generate an output. An artificial neural network includes an input layer that consists of values in a data point. The next layer is called a hidden layer, and nodes at the hidden layer each receive one or more of the input values. Each node contains parameters (e.g., weights) to apply to the input values. Each node therefore essentially inputs the input values into a multivariate function (e.g., a non-linear mathematical transformation) to produce an output value. A next layer can be another hidden layer or an output layer. In either case, the nodes at the next layer receive the output values from the nodes at the previous layer, and each node applies weights to those values and then generates its own output value. 
(BRI:  each node (or neuron) in a hidden or output layer contains learnable parameters—specifically weights and a bias—that are applied to input values to produce an output)

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
 It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output and also teaches non-linear normalization of the layer input within the context of the CNN consisting of a “normalization layer”.
Lin teaches a non-linear normalization of the layer input and determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
In regard to claim 10:
Chen discloses:
-	A computer-implemented method for determining an output signal based on an input signal, the method comprising: determining the output signal by providing the input signal to a machine learning system, the machine learning system including a plurality of layers, the machine learning system providing the output signal based on an input signal by forwarding the input signal through the plurality of layers of the machine learning system, at least one of the layers of the plurality of layers receiving a layer input, which is based on the input signal, and providing a layer output based on which the output signal is determined, 
In [0021]:
FIG. 2 illustrates a convolutional neural network, according to one embodiment described herein. 

    PNG
    media_image1.png
    416
    765
    media_image1.png
    Greyscale

As shown, the CNN 200 includes an input layer 210, a convolutional layer 215, a subsampling layer 220, a convolutional layer 225, subsampling layer 230, fully connected layers 235 and 240, and an output layer 245. The input layer 210 in the depicted embodiment is configured to accept a 32×32 pixel image. The convolutional layer 215 generates 6 28×28 feature maps from the input layer, and so on. While a particular CNN 200 is depicted, more generally a CNN is composed of one or more convolutional layers, frequently with a subsampling step, and then one or more fully connected layers. 
In [0037]:
 Generally, performing ISBP from 3-way tensor to 3-way tensor tends to be more complicated than the above two cases, as the operations of the forward propagation between the input and output
In [0027]:
Accordingly, DCNN optimization component 140 can perform Importance Score Back Propagation (ISBP) for optimizing a CNN.
In [0041]:
If             
                
                    
                        B
                        P
                    
                    
                        c
                        o
                        n
                        v
                    
                    
                
                f
                n
                
                    
                        i
                        ,
                         
                        j
                    
                
                ≠
                1
            
        , this can indicate that the i.sup.th position in the output layer comes from a convolution operation involving the 
            
                
                    
                        j
                    
                    
                        t
                        h
                         
                    
                
            
        position in the input layer. 
( BRI: The layer output based on which the final output signal of a neural network is determined is the output layer, which is the final, topmost layer in the network architecture].
In [0023]:
 As discussed above, typical CNNs consist of convolutional (Conv) layer, pooling layer, non-linear layer (e.g. ReLU), normalization layer (e.g. local response normalization (LRN)) and fully connected (FC) layer, etc. The convolutional layer generally includes a set of trainable kernels, which extract local features from a small spatial region but cross-depth volume of the input tensor. Each kernel can be trained as a feature extractor for some specific visual features, such as an edge or a color in the first layer
(BRI: normalization provide an output that is a nonlinear transformation of their input]
in [0020] :
The deep convolutional neural network (DCNN) optimization component 140 is generally configured to optimize the structure of the trained DCNN model 138. In order to achieve a balance between the predictive power and model redundancy of CNNs, the DCNN optimization component 140 can learn the importance of convolutional kernels and neurons in FC layers from feature selection perspective. The DCNN optimization component 140 can optimize a CNN by pruning less important kernels and neurons based on their importance scores. The DCNN optimization component 140 can further fine-tune the remaining kernels and neurons to achieve a minimum loss of accuracy in the optimized DCNN.
	Chen does not explicitly disclose:
-	the layer determining the layer output using a non-linear normalization of the layer input.  
However, Lin discloses:
-	wherein the layer is configured to determine the layer output using a non-linear normalization of the layer input.  
In [0055]:
The machine-learning model processes the input to generate an output. An artificial neural network includes an input layer that consists of values in a data point. The next layer is called a hidden layer, and nodes at the hidden layer each receive one or more of the input values. Each node contains parameters (e.g., weights) to apply to the input values. Each node therefore essentially inputs the input values into a multivariate function (e.g., a non-linear mathematical transformation) to produce an output value. A next layer can be another hidden layer or an output layer. In either case, the nodes at the next layer receive the output values from the nodes at the previous layer, and each node applies weights to those values and then generates its own output value. 
(BRI: Using the nonlinear transformation provides the direct method for determining an output signal based on input signal]

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
 It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output and also teaches non-linear normalization of the layer input within the context of the CNN consisting of a “normalization layer”.
Lin teaches a non-linear normalization of the layer input and determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
In regard to claim 11:
Chen discloses:
-	A training system configured to train a machine learning system including a plurality of layers, the training system configured to: provide an output signal based on an input signal by forwarding the input signal through the plurality of layers of the machine learning system, at least one of the layers of the plurality of layers being configured to receive a layer input, which is based on the input signal, and provide a layer output based on which the output signal is determined, the layer being configured to determine the layer output using a non-linear normalization of the layer input.  
In [0021]:
FIG. 2 illustrates a convolutional neural network, according to one embodiment described herein. 

    PNG
    media_image1.png
    416
    765
    media_image1.png
    Greyscale

As shown, the CNN 200 includes an input layer 210, a convolutional layer 215, a subsampling layer 220, a convolutional layer 225, subsampling layer 230, fully connected layers 235 and 240, and an output layer 245. The input layer 210 in the depicted embodiment is configured to accept a 32×32 pixel image. The convolutional layer 215 generates 6 28×28 feature maps from the input layer, and so on. While a particular CNN 200 is depicted, more generally a CNN is composed of one or more convolutional layers, frequently with a subsampling step, and then one or more fully connected layers. 
In [0037]:
 Generally, performing ISBP from 3-way tensor to 3-way tensor tends to be more complicated than the above two cases, as the operations of the forward propagation between the input and output
In [0027]:
Accordingly, DCNN optimization component 140 can perform Importance Score Back Propagation (ISBP) for optimizing a CNN.
In [0041]:
If             
                
                    
                        B
                        P
                    
                    
                        c
                        o
                        n
                        v
                    
                    
                
                f
                n
                
                    
                        i
                        ,
                         
                        j
                    
                
                ≠
                1
            
        , this can indicate that the i.sup.th position in the output layer comes from a convolution operation involving the 
            
                
                    
                        j
                    
                    
                        t
                        h
                         
                    
                
            
        position in the input layer. 
(BRI: layer output based on which the final output signal of a neural network is determined is the output layer, which is the final, topmost layer in the network architecture].
In [0023]:
 As discussed above, typical CNNs consist of convolutional (Conv) layer, pooling layer, non-linear layer (e.g. ReLU), normalization layer (e.g. local response normalization (LRN)) and fully connected (FC) layer, etc. The convolutional layer generally includes a set of trainable kernels, which extract local features from a small spatial region but cross-depth volume of the input tensor. Each kernel can be trained as a feature extractor for some specific visual features, such as an edge or a color in the first layer
BRI: [normalization provide an output that is a nonlinear transformation of their input]
in [0020] :
The deep convolutional neural network (DCNN) optimization component 140 is generally configured to optimize the structure of the trained DCNN model 138. In order to achieve a balance between the predictive power and model redundancy of CNNs, the DCNN optimization component 140 can learn the importance of convolutional kernels and neurons in FC layers from feature selection perspective. The DCNN optimization component 140 can optimize a CNN by pruning less important kernels and neurons based on their importance scores. The DCNN optimization component 140 can further fine-tune the remaining kernels and neurons to achieve a minimum loss of accuracy in the optimized DCNN.
	Chen does not explicitly disclose:
-	the layer determining the layer output using a non-linear normalization of the layer input.  
However, Lin discloses:
-	wherein the layer is configured to determine the layer output using a non-linear normalization of the layer input.  
In [0055]:
The machine-learning model processes the input to generate an output. An artificial neural network includes an input layer that consists of values in a data point. The next layer is called a hidden layer, and nodes at the hidden layer each receive one or more of the input values. Each node contains parameters (e.g., weights) to apply to the input values. Each node therefore essentially inputs the input values into a multivariate function (e.g., a non-linear mathematical transformation) to produce an output value. A next layer can be another hidden layer or an output layer. In either case, the nodes at the next layer receive the output values from the nodes at the previous layer, and each node applies weights to those values and then generates its own output value. 
( BRI: Using the nonlinear transformation provides the direct method for determining an output signal based on input signal)
The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
 It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output and also teaches non-linear normalization of the layer input within the context of the CNN consisting of a “normalization layer”.
Lin teaches a non-linear normalization of the layer input and determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
In regard to claim 12:
Chen discloses:
-	A non-transitory machine-readable storage medium on which is stored a computer program 
	In [0050]:
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device
-	for determining an output signal based on an input signal, the computer program, when executed by a computer, causing the computer to perform: determining the output signal by providing the input signal to a machine learning system, the machine learning system including a plurality of layers, the machine learning system providing the output signal based on an input signal by forwarding the input signal through the plurality of layers of the machine learning system, at least one of the layers of the plurality of layers receiving a layer input, which is based on the input signal, and providing a layer output based on which the output signal is determined, 
	In [0050]:
A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
In [0021]:
FIG. 2 illustrates a convolutional neural network, according to one embodiment described herein. 

    PNG
    media_image1.png
    416
    765
    media_image1.png
    Greyscale

As shown, the CNN 200 includes an input layer 210, a convolutional layer 215, a subsampling layer 220, a convolutional layer 225, subsampling layer 230, fully connected layers 235 and 240, and an output layer 245. The input layer 210 in the depicted embodiment is configured to accept a 32×32 pixel image. The convolutional layer 215 generates 6 28×28 feature maps from the input layer, and so on. While a particular CNN 200 is depicted, more generally a CNN is composed of one or more convolutional layers, frequently with a subsampling step, and then one or more fully connected layers. 
In [0037]:
 Generally, performing ISBP from 3-way tensor to 3-way tensor tends to be more complicated than the above two cases, as the operations of the forward propagation between the input and output
In [0027]:
Accordingly, DCNN optimization component 140 can perform Importance Score Back Propagation (ISBP) for optimizing a CNN.
In [0041]:
If             
                
                    
                        B
                        P
                    
                    
                        c
                        o
                        n
                        v
                    
                    
                
                f
                n
                
                    
                        i
                        ,
                         
                        j
                    
                
                ≠
                1
            
        , this can indicate that the i.sup.th position in the output layer comes from a convolution operation involving the 
            
                
                    
                        j
                    
                    
                        t
                        h
                         
                    
                
            
        position in the input layer. 
(BRI: The layer output based on which the final output signal of a neural network is determined is the output layer, which is the final, topmost layer in the network architecture)
In [0023]:
 As discussed above, typical CNNs consist of convolutional (Conv) layer, pooling layer, non-linear layer (e.g. ReLU), normalization layer (e.g. local response normalization (LRN)) and fully connected (FC) layer, etc. The convolutional layer generally includes a set of trainable kernels, which extract local features from a small spatial region but cross-depth volume of the input tensor. Each kernel can be trained as a feature extractor for some specific visual features, such as an edge or a color in the first layer
(BRI: normalization provide an output that is a nonlinear transformation of their input)
in [0020] :
The deep convolutional neural network (DCNN) optimization component 140 is generally configured to optimize the structure of the trained DCNN model 138. In order to achieve a balance between the predictive power and model redundancy of CNNs, the DCNN optimization component 140 can learn the importance of convolutional kernels and neurons in FC layers from feature selection perspective. The DCNN optimization component 140 can optimize a CNN by pruning less important kernels and neurons based on their importance scores. The DCNN optimization component 140 can further fine-tune the remaining kernels and neurons to achieve a minimum loss of accuracy in the optimized DCNN.
	Chen does not explicitly disclose:
-	the layer determining the layer output using a non-linear normalization of the layer input.  
However, Lin discloses:
-	wherein the layer is configured to determine the layer output using a non-linear normalization of the layer input.  
In [0055]:
The machine-learning model processes the input to generate an output. An artificial neural network includes an input layer that consists of values in a data point. The next layer is called a hidden layer, and nodes at the hidden layer each receive one or more of the input values. Each node contains parameters (e.g., weights) to apply to the input values. Each node therefore essentially inputs the input values into a multivariate function (e.g., a non-linear mathematical transformation) to produce an output value. A next layer can be another hidden layer or an output layer. In either case, the nodes at the next layer receive the output values from the nodes at the previous layer, and each node applies weights to those values and then generates its own output value. 
(BRI:  Using the nonlinear transformation provides the direct method for determining an output signal based on input signal)

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output and also teaches non-linear normalization of the layer input within the context of the CNN consisting of a “normalization layer”.
Lin teaches a non-linear normalization of the layer input and determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
Claims 3-7 are rejected under 35 U.S.C. 103 as being anticipated over 
Chun-Fu CHEN et al. (hereinafter Chen) US 2019/0122113 A1,
in view of Jiu-Che Lin et.al. (hereinafter Lin) US 2023/0342016 A1
further in view of Seyedeh Sahar Sadrizadeh et.al. (hereinafter Sadri) US 2021/0209735 A1.
In regard to claim 3:
Chen and Lin do not explicitly disclose:
-	wherein the non-linear normalization includes mapping empirical percentiles of values from the group to percentiles of a predefined probability distribution.  
However, Sadri discloses:
-	wherein the non-linear normalization includes mapping empirical percentiles of values from the group to percentiles of a predefined probability distribution.  
In [0035]:
For further detail with respect to step 112, in an exemplary embodiment, a lower bound of a support set of the noise probability distribution may be equal to             
                
                    
                        p
                    
                    
                        l
                    
                
            
         and an upper bound of the support set may be equal to             
                
                    
                        p
                    
                    
                        u
                    
                
                 
            
        where 0≤             
                
                    
                        p
                    
                    
                        l
                    
                
            
          ≤              
                
                    
                        p
                    
                    
                        u
                    
                
            
         ≤1,

    PNG
    media_image2.png
    68
    432
    media_image2.png
    Greyscale

where              
                μ
            
         is the mean of the noise probability distribution an exemplary embodiment, when μ is close to upper bound            
                 
                
                    
                        p
                    
                    
                        u
                    
                
                 
            
        , a ratio of number of images that are corrupted by higher levels of impulsive noise to a number of plurality of training images 202 may increase


The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen, Lin and Sadri.
Chen a machine learning system  to receive layer inputs and determine the layer output using a non-linear normalization of the layer input.  
Lin teaches determining layer output using sorting, smoothing within the context of a probability distribution.
Sadri teaches probability distribution a normal distribution.
One of ordinary skill would have motivation to combine Chen, Lin and Sadri that can provide minimized loss function using gradient descent (Sadri ([0052])
In regard to claim 4:
Chen and Lin do not explicitly disclose:
-	wherein the predefined probability distribution is a standard normal distribution.  
However, Sadri discloses:
-	wherein the predefined probability distribution is a standard normal distribution.  
In [0049]:
In an exemplary embodiment, step 140 may include generating (l+1).sup.th plurality of training feature maps 216.
In [0049]:
In an exemplary embodiment, implementing l.sup.th non-linear activation function 228 may include implementing one of a rectified linear unit (ReLU) function or an exponential linear unit (ELU) function. In an exemplary embodiment, implementing l.sup.th non-linear activation function 228 may include implementing other types of non-linear activation functions such as leaky ReLU, scaled ELU, parametric ReLU, etc.
In [0035]:
In an exemplary embodiment, random variable p may be generated from a truncated Gaussian probability distribution defined within (            
                
                    
                        p
                    
                    
                        l
                    
                
            
        ,             
                
                    
                        p
                    
                    
                        u
                    
                
            
        ) 202. As a result, in an exemplary embodiment, a minimum level of impulsive noise in plurality of training images 202 may be equal to            
                
                    
                         
                        p
                         
                    
                    
                        l
                    
                
            
        . In contrast, in an exemplary embodiment, a maximum level of impulsive noise in plurality of training images 202 may be equal to             
                
                    
                        p
                    
                    
                        u
                    
                
            
        .
In regard to claim 5:
Chen does not explicitly disclose:
-	wherein to determine the layer output, the layer is configured to: 
receive a group of values of the layer input; 
-	sort the received values; 
-	compute percentile values for each position of the sorted values; 
-	compute interpolation targets using a quantile function of the predefined probability distribution; 
-	determine a function characterizing a linear interpolation of the sorted values and the interpolation targets; 
However, Lin discloses:
-	wherein to determine the layer output, the layer is configured to: 
receive a group of values of the layer input; 
in [0050]:
Deep learning is a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. Deep neural networks can learn in a supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manner.
in [0020]:
generate a virtual knob for each feature that is used to train the machine learning model. Each virtual knob can be capable of adjusting the representative value of the corresponding feature, thus adjusting the output generated by the machine-learning model. 
-	sort the received values; 
In [0041]:
Data interaction tool 154 can perform one or more preprocessing operations on the received input data. In some embodiments, the preprocessing operations can include a smoothing operation, a normalization operation, a dimensions reduction operations, a sort features operation, or any other operation configured to prepare data for training a machine-learning model. 
In [0041]:
the sort features operation can include ranking the selected feature based on an order of importance.
-	compute percentile values for each position of the sorted values; 
In [0091]:
determine confidence interval for each virtual knob. A confidence interval displays the probability that a parameter (e.g., virtual knob value) will fall between a pair of values around a mean
In [0091]:
in one example, processing logic can determine the confidence level of the intercept value using Formula 4, expressed below.

    PNG
    media_image3.png
    30
    442
    media_image3.png
    Greyscale

In [0092] where:             
                
                    
                        C
                        I
                    
                    
                        0.95
                    
                
            
         is the confidence interval with a 95% confidence level 
In [0093]:
            
                
                    
                        
                            
                                β
                            
                            
                                0
                            
                        
                         
                    
                    ^
                
            
        is the estimator of a parameter (e.g., a virtual knob value) 
In [0094]:
            
                
                    
                        t
                    
                    
                        0.95
                    
                
            
         is the t-statistic (e.g., ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error) 
[0095]
SE is the standard error of the estimator 
In [0096] :
            
                
                    
                        k
                    
                    
                        i
                        ,
                         
                         
                         
                        0.95
                    
                
            
         is the i-th knob value with 95% confidence level 
In [0097]:
             
                
                    
                        w
                    
                    
                        i
                    
                
                 
            
        is the i-th knob weighting.
(BRI:[computing percentile values for each position of sorted data is a recognized method for determining a confidence interval (CI) for a parameter. This approach, often called a percentile bootstrap, calculates the confidence interval by sorting simulated sample data and selecting specific percentiles (e.g., the 2.5th and 97.5th percentiles for a 95% CI) as the lower and upper bounds)
-	compute interpolation targets using a quantile function of the predefined probability distribution; 
In [0091]:
 In some embodiments, processing logic can determine confidence interval for each virtual knob. A confidence interval displays the probability that a parameter (e.g., virtual knob value) will fall between a pair of values around a mean. Confidence intervals measure the degree of uncertainty or certainty in a sampling method. In one embodiment, to determine a confidence interval for a virtual knob, processing logic can first determine a confidence interval of the intercept value (e.g., Po)
( BRI:  the probability that a parameter falls between a pair of values around the mean represents interpolation targets when using a quantile function (or its associated Cumulative Distribution Function, CDF) of a predefined probability distribution)
-	determine a function characterizing a linear interpolation of the sorted values and the interpolation targets; 
In [0090]:
Each virtual knob can be capable of adjusting the representative value of the corresponding feature, thus adjusting the machine-learning model. In some embodiments, to generate a virtual knob, processing logic can use a transform function (or any other applicable function) to modify one or more values representing each feature in the machine-learning model.
In [0042]:
The machine-learning model can include a representative function for each selected feature. In an illustrative example, machine-learning model 190 can be trained using linear regression, and expressed as seen in Formula 1 below, where the x value(s) represent each selected feature, the c value(s) represent corresponding coefficients, and the             
                
                    
                        P
                    
                    
                        0
                    
                
            
         value represents the intercept value, and y represents the label value:


    PNG
    media_image4.png
    22
    462
    media_image4.png
    Greyscale

( BRI: a virtual knob (or user-adjustable parameter) can be implemented in a machine-learning system to process and transform input features, which in turn can alter the model's output probabilities. Formula 1 represents a linear interpolation where x represents features, c represents the corresponding coefficients (slope) ,             
                
                    
                        P
                    
                    
                        0
                    
                
            
         represents the intercept, and y is the label. Linear regression in this context creates a bet-it straight line y=               
                
                    
                        P
                    
                    
                        0
                    
                
            
         +             
                
                    
                        C
                    
                    
                        1
                    
                
                
                    
                        X
                    
                    
                        1
                    
                
            
        +             
                
                    
                        C
                    
                    
                        2
                    
                
                
                    
                        X
                    
                    
                        2
                    
                
            
        + …)]
-	determine the layer output by processing the received values with the function.  
In [0042]:
The machine-learning model can include a representative function for each selected feature. In an illustrative example, machine-learning model 190 can be trained using linear regression, and expressed as seen in Formula 1 below, where the x value(s) represent each selected feature, the c value(s) represent corresponding coefficients, and the             
                
                    
                        P
                    
                    
                        0
                    
                
            
         value represents the intercept value, and y represents the label value:

    PNG
    media_image4.png
    22
    462
    media_image4.png
    Greyscale

(BRI: y is the layer output in Formula 1)

The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output using a non-linear normalization of the layer input.  
Lin teaches determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
In regard to claim 6:
Chen does not explicitly disclose:
-	wherein to determine the layer output, before determining the function, the layer is configured to smooth the sorted values using a smoothing operation.  
However, Lin discloses:
-	wherein to determine the layer output, before determining the function, the layer is configured to smooth the sorted values using a smoothing operation.  
In [0088]:
 At operation 412, processing logic performs one or more preprocessing operations on the input data. In some embodiments, the preprocessing operations can include a smoothing operation, a normalization operation, a dimensions reduction operations, a sort features operation, or any other operation configured to prepare data for training a machine-learning model.
In regard to claim 7:
Chen does not explicitly disclose:
-	wherein, to determine the layer output, the layer is further configured to scale and/or shift the values obtained after processing the received values with the function.  
However, Lin discloses:
-	wherein, to determine the layer output, the layer is further configured to scale and/or shift the values obtained after processing the received values with the function.  
In [0041]:
A normalization operation can bring the numerical data to a common or balanced scale without distorting the data. 
In [0045]:
The scaling constant can be configured to increase or decrease the adjustment factor of the virtual knob. This allows for calibrating the virtual knobs without needing to retrain the machine-learning model. The scaling constant can be generated manually (e.g., user input), or automatically (e.g., based on a predefined value associated with a particular feature) using, for example, a data table, optimization tool 160, etc.



The examiner interprets the theme of the invention as “ training a machine learning system using “normalized layers” and specifically provide a non-linear transformation to allow for increased performance of machine learning system.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Chen and Lin.
Chen a machine learning system  to receive layer inputs and determine the layer output using a non-linear normalization of the layer input.  
Lin teaches determining layer output using sorting, smoothing within the context of a probability distribution.
One of ordinary skill would have motivation to combine Chen and Lin that can provide optimized model parameters and improvement to the model in terms of its accuracy (Lin [0059]).
Conclusion
Any inquiry concerning this communication or earlier communications from the
examiner should be directed to TIRUMALE KRISHNASWAMY RAMESH whose telephone number is (571)272-4605. The examiner can normally be reached by phone.
Examiner interviews are available via telephone, in-person, and video conferencing
using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on phone (571-272-3768). The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be
obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit:
https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for
information about filing in DOCX format. 



For additional questions, contact the Electronic
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO
Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TIRUMALE K RAMESH/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Apr 04, 2023
Application Filed
Nov 01, 2023
Response after Non-Final Action
Mar 05, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/739,694
Patent 12518153
TRAINING MACHINE LEARNING SYSTEMS
5y 12m to grant Granted Jan 06, 2026
17/136,054
Patent 12293284
META COOPERATIVE TRAINING PARADIGMS
4y 4m to grant Granted May 06, 2025
17/064,561
Patent 12229651
BLOCK-BASED INFERENCE METHOD FOR MEMORY-EFFICIENT CONVOLUTIONAL NEURAL NETWORK IMPLEMENTATION AND SYSTEM THEREOF
4y 4m to grant Granted Feb 18, 2025
17/039,178
Patent 12131244
HARDWARE-OPTIMIZED NEURAL ARCHITECTURE SEARCH
4y 1m to grant Granted Oct 29, 2024
16/844,335
Patent 11803745
TERMINAL DEVICE AND METHOD FOR ESTIMATING FIREFIGHTING DATA
3y 6m to grant Granted Oct 31, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
18%
Grant Probability
20%
With Interview (+2.1%)
4y 7m (~1y 5m remaining)
Median Time to Grant
Low
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allowance rate.