Last updated: May 04, 2026
Application No. 18/398,629
OUT-OF-DISTRIBUTION DETECTION WITH PROJECTION OF GRADIENTS

Non-Final OA §103
Filed
Dec 28, 2023
Examiner
CHEN, JOSHUA NMN
Art Unit
2665
Tech Center
2600 — Communications
Assignee
Robert Bosch GmbH
OA Round
1 (Non-Final)
Interview Optional

— +26.1% interview lift. Examiner has a relatively high allowance rate (85%); +26.1% interview lift. A written response may suffice.
Based on 40 resolved cases, 2023–2026
Examiner Intelligence

CHEN, JOSHUA NMN View full profile →
Grants 85% — above average
Career Allowance Rate
34 granted / 40 resolved
+23.0% vs TC avg
Strong +26% interview lift
Without
With
+26.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
18.5%
-21.5% vs TC avg
§103
52.3%
+12.3% vs TC avg
§102
15.6%
-24.4% vs TC avg
§112
11.9%
-28.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/10/2024 and 02/09/2024 were filed and are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 6-9, 12-15, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (Low-Dimensional Gradient Helps Out-of-Distribution Detection, hereinafter Wu) in view of Ndiour et al. (SUBSPACE MODELING FOR FAST OUT-OF-DISTRIBUTION AND ANOMALY DETECTION, hereinafter Ndiour).

Regarding claims 1, 13, 20, Wu discloses
Claim 1: A computer-implemented method for detecting out-of-distribution data for a neural network, the method comprising:
Claim 13: A system for detecting out-of-distribution data for a neural network, the system comprising: a processor; and memory having instructions that, when executed by the processor, cause the processor to perform the following:
Claim 20: A non-transitory computer-readable medium having stored thereon computer-readable instructions that, when executed by a processor, cause the processor to execute a method for detecting out-of-distribution data for a neural network, the method comprising:
receiving a training dataset, wherein the training dataset includes in-distribution data including image data associated with one or more images (P. 7 A Evaluation on Large-scale ImageNet Benchmark: “We first evaluate our method on the large-scale ImageNet benchmark, which poses significant challenges due to the substantial volume of training data and model parameters. The comparison results are presented in Table II, where we evaluate six different detection methods using either features or our low-dimensional gradients as inputs. To control our computational overhead, we randomly select 50000 training samples as the training dataset for our detection algorithms”); 
training a neural network on the in-distribution data, wherein the neural network has a plurality of layers (P. 3 A Preliminary: “The framework of OOD detection can be described as follows. We consider a classification problem with                         
                            C
                        
                     classes, where                         
                            X
                        
                     stands for the input space and                         
                            Y
                        
                     for the label space. The joint data distribution over                         
                            X
                            ×
                            Y
                        
                     is denoted as                         
                            
                                
                                    D
                                
                                
                                    x
                                    y
                                
                            
                        
                    . Let                         
                            
                                
                                    f
                                
                                
                                    θ
                                
                            
                            :
                            X
                            →
                            Y
                        
                     be a model trained on samples drawn                         
                            i
                            .
                            i
                            .
                            d
                        
                     from                         
                            
                                
                                    D
                                
                                
                                    X
                                    Y
                                
                            
                        
                     with parameter                         
                            θ
                        
                    ”); 
receiving image data associated with a sample image for the neural network (P. 7 A Evaluation on Large-scale ImageNet Benchmark: “We first evaluate our method on the large-scale ImageNet benchmark, which poses significant challenges due to the substantial volume of training data and model parameters. The comparison results are presented in Table II, where we evaluate six different detection methods using either features or our low-dimensional gradients as inputs. To control our computational overhead, we randomly select 50000 training samples as the training dataset for our detection algorithms”); 
executing the neural network with the image data associated with the sample image to determine a gradient associated with the sample image; projecting the gradient into the subspace to derive a projection of the gradient (P. 2 Fig. 1: “Then, using a pre-extracted subspace where the principal components of training data gradients reside, we obtain low-dimensional representations through a projection operation. Finally, we feed the representations into the detection branch, where diverse score functions are designed based on these representations.”, P. 5 1) Combination with output-based methods: “Output-based approaches aim to capture the dissimilarities in feature representations between ID and OOD data by modulating the network output using a linear layer. Consequently, we introduce an auxiliary linear network, trained on our reduced gradient, to generate the necessary output for computing score functions. The network architecture is:                         
                            g
                            →
                            B
                            N
                            →
                            F
                            C
                            →
                            y
                        
                    ”); and 
determining that the image data associated with the sample image is out of distribution (OOD) based on a magnitude of the projection of the gradient (P. 5 1) Combination with output-based methods: “Output-based approaches aim to capture the dissimilarities in feature representations between ID and OOD data by modulating the network output using a linear layer. Consequently, we introduce an auxiliary linear network, trained on our reduced gradient, to generate the necessary output for computing score functions. The network architecture is:                         
                            g
                            →
                            B
                            N
                            →
                            F
                            C
                            →
                            y
                        
                    ”, P. 6 3) Combination with distance-based methods: “Features of ID data tend to cluster together and be away from those of OOD data; thus, previous works designed various distance measurements as score functions to detect OOD data. For our low-dimensional gradients, these methods should be equally effective since gradients share the same aggregation and separation properties as features, as shown in Figure 2. One typical measurement is the Mahalanobis distance [6] as follows:

    PNG
    media_image1.png
    62
    419
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    38
    30
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    34
    27
    media_image3.png
    Greyscale

where  and are the empirical class mean and covariance of training samples. The Mahalanobis distance-based method imposes a class-conditional Gaussian distribution assumption about the underlying gradient space, while another distance based approach named KNN [7] is more flexible and general without any distributional assumptions. It utilizes the Euclidean distance to the k-th nearest neighbor of training data as its score function, which can be expressed as follows:

    PNG
    media_image4.png
    51
    281
    media_image4.png
    Greyscale

Both and functions exhibit larger values on ID data and smaller values on OOD data since they take the negative value of the distance measurements.”).
However Wu does not explicitly disclose
generating a subspace of in-distribution data of the training dataset based on a sample of one of the layers trained with the in-distribution data.
Ndiour teaches
generating a subspace of in-distribution data of the training dataset based on a sample of one of the layers trained with the in-distribution data (P. 2 Linear Subspace Modeling: “For instance, we see that for Layer 0, the subspace dimension after applying PCA with 99.5% variability retention drops from 512 to 29, indicating that 99.5% of the information in the 512-dimensional features is actually contained within a 29-dimensional subspace!”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu with creating subspace by dimensionally reducing one of the layers of a neural network of Ndiour to effectively reduce the resources needed when analyzing images.

Regarding claims 2 and 14, dependent upon claims 1 and 13 respectively, Wu in view of  Ndiour teaches everything regarding claims 1 and 13.
Wu further teaches
the projection is parallel to the subspace (Fig. 1: Low-dimensional Gradient).

Regarding claims 3 and 15, dependent upon claims 2 and 14 respectively, Wu in view of  Ndiour and Wilson teaches everything regarding claims 2 and 14.
Wu further discloses
the determining the image data associated with the sample image is OOD is based on the magnitude of the projection being below a threshold (P. 6 3) Combination with distance-based methods: “Features of ID data tend to cluster together and be away from those of OOD data; thus, previous works designed various distance measurements as score functions to detect OOD data. For our low-dimensional gradients, these methods should be equally effective since gradients share the same aggregation and separation properties as features, as shown in Figure 2. One typical measurement is the Mahalanobis distance [6] as follows:

    PNG
    media_image1.png
    62
    419
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    38
    30
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    34
    27
    media_image3.png
    Greyscale

where  and are the empirical class mean and covariance of training samples. The Mahalanobis distance-based method imposes a class-conditional Gaussian distribution assumption about the underlying gradient space, while another distance based approach named KNN [7] is more flexible and general without any distributional assumptions. It utilizes the Euclidean distance to the k-th nearest neighbor of training data as its score function, which can be expressed as follows:

    PNG
    media_image4.png
    51
    281
    media_image4.png
    Greyscale

Both and functions exhibit larger values on ID data and smaller values on OOD data since they take the negative value of the distance measurements.”).

Regarding claims 6 and 18, dependent upon claims 1 and 13 respectively, Wu in view of  Ndiour teaches everything regarding claims 1 and 13.
Ndiour further teaches
the subspace is generated based on the last layer of the one or more layers (P. 3 Experimental setup and evaluation metrics: “We tested our method on three layers of each of the networks, with layers chosen to be located uniformly along the network path. The layers are labelled as 0, 1, and 2, with 0 being the (semantic) outermost layer, and 1, 2 being progressively deeper within the network”).

Regarding claim 7, dependent upon claim 1, Wu in view of  Ndiour teaches everything regarding claim 1.
Wu further discloses
the generating the subspace is based on singular value decomposition (SVD) (P. 2 Linear Subspace Modeling: “One popular choice for dimensionality reduction is principal component analysis (PCA) [17]. In this framework,  H and L are, respectively, Euclidean spaces                         
                            
                                
                                    R
                                
                                
                                    d
                                
                            
                        
                    and                         
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     with                         
                            m
                            ≪
                            d
                        
                    . T can then be calculated from the singular value decomposition (SVD) [18] of the data matrix D.”) and 
wherein the subspace is a significant representation of the layer (P. 2 Linear Subspace Modeling: “For instance, we see that for Layer 0, the subspace dimension after applying PCA with 99.5% variability retention drops from 512 to 29, indicating that 99.5% of the information in the 512-dimensional features is actually contained within a 29-dimensional subspace!”).

Regarding claim 8, dependent upon claim 1, Wu in view of  Ndiour teaches everything regarding claim 1.
Wu further discloses
the layer is one of a linear layer and a convolution layer (P. 6 Models and Hyper-parameters: “In the case of CIFAR10, we adopt the ResNet18 architecture as our base model. We train it using the SGD algorithm with weight decay 0.0005, momentum 0.9, cosine schedule, initial learning rate 0.1, label smoothing 0.1, epoch 200, and batch size 128.”).

Regarding claim 9, dependent upon claim 1, Wu in view of  Ndiour teaches everything regarding claim 1.
Wu further discloses
the one or more images are at least one of: numbers, text, audio, vector image, bitmap image, and sensor signal (P. 7 A Evaluation on Large-scale ImageNet Benchmark: “We first evaluate our method on the large-scale ImageNet benchmark, which poses significant challenges due to the substantial volume of training data and model parameters. The comparison results are presented in Table II, where we evaluate six different detection methods using either features or our low-dimensional gradients as inputs. To control our computational overhead, we randomly select 50000 training samples as the training dataset for our detection algorithms”).

Regarding claim 12, dependent upon claim 1, Wu in view of  Ndiour teaches everything regarding claim 1.
Wu further discloses 
the gradient is determined based on a comparison between the image data associated with the sample image and the layer of the one or more layers.
Ndiour further teaches generating a subspace that represents the training images based on one of the layers of the trained neural network (P. 2 Linear Subspace Modeling: “For instance, we see that for Layer 0, the subspace dimension after applying PCA with 99.5% variability retention drops from 512 to 29, indicating that 99.5% of the information in the 512-dimensional features is actually contained within a 29-dimensional subspace!”).

Regarding claim 19, dependent upon claim 13, Wu in view of  Ndiour teaches everything regarding claim 13.
Wu further discloses
the generating the subspace is based on singular value decomposition (SVD) (P. 2 Linear Subspace Modeling: “One popular choice for dimensionality reduction is principal component analysis (PCA) [17]. In this framework,  H and L are, respectively, Euclidean spaces                         
                            
                                
                                    R
                                
                                
                                    d
                                
                            
                        
                    and                         
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     with                         
                            m
                            ≪
                            d
                        
                    . T can then be calculated from the singular value decomposition (SVD) [18] of the data matrix D.”).

Claims 4-5, 10-11, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (Low-Dimensional Gradient Helps Out-of-Distribution Detection, hereinafter Wu) in view of Ndiour et al. (SUBSPACE MODELING FOR FAST OUT-OF-DISTRIBUTION AND ANOMALY DETECTION, hereinafter Ndiour) and Wilson et al. (Hyperdimensional Feature Fusion for Out-of-Distribution Detection, hereinafter Wilson).

Regarding claims 4 and 16, dependent upon claims 3 and 15 respectively, Wu in view of  Ndiour and Wilson teaches everything regarding claims 3 and 15.
Wilson further teaches
the determining the image data associated with the sample image is OOD is further based upon an angle between the gradient and the projection being greater than a second threshold (Figure. 1,  P. 3 Hyperdimensional Feature Fusion: “During deployment, we repeat the projection and bundling steps for a new input image, and use the cosine similarity to the class representatives to identify OOD samples.”, P. 4 Out-of-Distribution Detection: “During testing or deployment, an image x can be identified as OOD by obtaining its image descriptor y according to (4), and calculating the cosine similarity to each of the class-specific descriptors                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
            . Let                 
                    θ
                
             be the angle to the class descriptor                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
             that is most similar to y

    PNG
    media_image5.png
    56
    525
    media_image5.png
    Greyscale
 
The input x is then treated as OOD if theta is bigger than a threshold:                 
                    θ
                    >
                    
                        
                            θ
                        
                        
                            *
                        
                    
                
            .”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu in view of  Ndiour with determining if an image is out-of-distribution based on its cosine similarity to one of the class descriptor and other functions of Wilson to effectively reduce the training and inference time needed for determining if an image is out-of- distribution.

Regarding claim 5 and 17, dependent upon claims 4 and 16 respectively, Wu in view of  Ndiour and Wilson teaches everything regarding claims 4 and 16.
Wilson further teaches
the executing the neural network generates a first vector associated with the sample image, the projecting of the gradient into the subspace generates a second vector, and the determining the image data associated with the sample image is OOD is based on a magnitude of the second vector (Figure 1, P. 3 Bundling: “Concretely, given random vectors a, b and c, the vector will be similar to the bundles a⊕b, a⊕c, and a ⊕b⊕c, although in the final case it will be less similar as the bundled vector needs to be similar to all 3 input vectors.”, Figure. 1,  P. 3 Hyperdimensional Feature Fusion: “During deployment, we repeat the projection and bundling steps for a new input image, and use the cosine similarity to the class representatives to identify OOD samples.”, P. 4 Out-of-Distribution Detection: “During testing or deployment, an image x can be identified as OOD by obtaining its image descriptor y according to (4), and calculating the cosine similarity to each of the class-specific descriptors                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
            . Let                 
                    θ
                
             be the angle to the class descriptor                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
             that is most similar to y

    PNG
    media_image5.png
    56
    525
    media_image5.png
    Greyscale
 
The input x is then treated as OOD if theta is bigger than a threshold:                 
                    θ
                    >
                    
                        
                            θ
                        
                        
                            *
                        
                    
                
            .”).

Regarding claim 10, dependent upon claims 1, Wu in view of  Ndiour teaches everything regarding claim 1.
However Wu in view of  Ndiour does not explicitly teach
determining the image data associated with a second sample image is in distribution (ID) based on a magnitude of a second projection of a second gradient being above a threshold.  
Wilson teaches
determining the image data associated with a second sample image is in distribution (ID) based on a magnitude of a second projection of a second gradient being above a threshold (P. 4 Out-of-Distribution Detection: “During testing or deployment, an image x can be identified as OOD by obtaining its image descriptor y according to (4), and calculating the cosine similarity to each of the class-specific descriptors                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
            . Let                 
                    θ
                
             be the angle to the class descriptor                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
             that is most similar to y

    PNG
    media_image5.png
    56
    525
    media_image5.png
    Greyscale
 
The input x is then treated as OOD if theta is bigger than a threshold:                 
                    θ
                    >
                    
                        
                            θ
                        
                        
                            *
                        
                    
                
            .”).

Regarding claim 11, dependent upon claims 1, Wu in view of  Ndiour teaches everything regarding claim 1.
However Wu in view of  Ndiour does not explicitly teach
receiving image data associated with a second sample image for the neural network; 
executing the neural network with the image data associated with the second sample image to determine a second gradient associated with the second sample image; 
projecting the second gradient into the subspace to derive a second projection of the second gradient; and 
determining the image data associated with the second sample image is in distribution (ID) based on a magnitude of the second projection of the second gradient being above a threshold.
Wilson teaches
receiving image data associated with a second sample image for the neural network; executing the neural network with the image data associated with the second sample image to determine a second gradient associated with the second sample image (P. 1 Figure 1: “During testing, we repeat feature extraction, projection and bundling to obtain a hyperdimensional image descriptor. OOD samples will produce descriptors with a large angular distance to all class descriptors (red vector).”, P. 3 Bundling: “Concretely, given random vectors a, b and c, the vector will be similar to the bundles a⊕b, a⊕c, and a ⊕b⊕c, although in the final case it will be less similar as the bundled vector needs to be similar to all 3 input vectors.”); 
projecting the second gradient into the subspace to derive a second projection of the second gradient; and determining the image data associated with the second sample image is in distribution (ID) based on a magnitude of the second projection of the second gradient being above a threshold (P. 1 Figure 1: “During testing, we repeat feature extraction, projection and bundling to obtain a hyperdimensional image descriptor. OOD samples will produce descriptors with a large angular distance to all class descriptors (red vector).”, P. 4 Out-of-Distribution Detection: “During testing or deployment, an image x can be identified as OOD by obtaining its image descriptor y according to (4), and calculating the cosine similarity to each of the class-specific descriptors                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
            . Let                 
                    θ
                
             be the angle to the class descriptor                 
                    
                        
                            d
                        
                        
                            c
                        
                    
                
             that is most similar to y

    PNG
    media_image5.png
    56
    525
    media_image5.png
    Greyscale
 
The input x is then treated as OOD if theta is bigger than a threshold:                 
                    θ
                    >
                    
                        
                            θ
                        
                        
                            *
                        
                    
                
            .”).

Relevant Prior Art Directed to State of Art
Dadalto et al. (A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection, hereinafter Dadalto) is prior art not applied in the rejection(s) above. Dadalto discloses a method for identifying out-of-distribution data based on functional view of the network that exploits the sample’s trajectories through the various layers and their statistical dependencies. 

D’Agostino et al. (Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks, hereinafter D’Agostino) is prior art not applied in the rejection(s) above. D’Agostino discloses a modification of the Radial Basis Function Neural Network model by equipping its Gaussian kernel with a learnable precision matrix.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSHUA CHEN whose telephone number is (703)756-5394. The examiner can normally be reached M-Th: 9:30 am - 4:30pm ET F: 9:30 am - 2:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, STEPHEN R KOZIOL can be reached at (408)918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/J. C./Examiner, Art Unit 2665                                                                                                                                                                                                        
/Stephen R Koziol/Supervisory Patent Examiner, Art Unit 2665
Read full office action
Prosecution Timeline

Dec 28, 2023
Application Filed
Jan 21, 2026
Non-Final Rejection — §103
Apr 24, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

17/852,884
Patent 12614378
SYSTEMS AND METHODS TO PROCESS ELECTRONIC IMAGES TO DETERMINE HISTOPATHOLOGY QUALITY
3y 10m to grant Granted Apr 28, 2026
18/026,081
Patent 12602747
METHOD AND APPARATUS FOR DENOISING A LOW-LIGHT IMAGE
3y 1m to grant Granted Apr 14, 2026
17/904,842
Patent 12592090
COMPENSATION OF INTENSITY VARIANCES IN IMAGES USED FOR COLONY ENUMERATION
3y 7m to grant Granted Mar 31, 2026
17/978,489
Patent 12579614
IMAGING DEVICE
3y 4m to grant Granted Mar 17, 2026
18/170,803
Patent 12579678
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
3y 0m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+26.1%)
2y 9m (~5m remaining)
Median Time to Grant
Low
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allowance rate.