Last updated: April 19, 2026
Application No. 18/053,538
EFFICIENT LEARNING AND USING OF TOPOLOGIES OF NEURAL NETWORKS IN MACHINE LEARNING

Final Rejection §101§103
Filed
Nov 08, 2022
Examiner
DAY, ROBERT N
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
2 (Final)
Interview Optional

— +23.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 22 resolved cases, 2023–2026
Examiner Intelligence

DAY, ROBERT N View full profile →
Grants only 23% of cases
Career Allow Rate
5 granted / 22 resolved
-32.3% vs TC avg
Strong +23% interview lift
Without
With
+23.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
38 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
32.6%
-7.4% vs TC avg
§103
35.3%
-4.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
18.3%
-21.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is in response to the amendments filed 12 November 2025. Claims 1, 5, 8, 12, 15, 19, 21, and 25 are amended. Claims 1-25 are pending and have been examined.

Response to Arguments
Applicant's arguments, see page 10, filed 12 November 2025, with respect to the objection to Claims 1, 8, 15, and 21 for informalities have been fully considered and are persuasive.  The objection to Claims 1, 8, 15, and 21 for informalities has been withdrawn.

APPLICANT'S ARGUMENT: Applicant appears to assert (page 10, paragraph 3) that "In response to the claim objection, claims 1, 8, 15, and 21 have been amended. ... Applicant respectfully requests the withdrawal of the objection."
EXAMINER'S RESPONSE: Examiner agrees. The objection to Claims 1, 8, 15, and 21 for informalities has been withdrawn.

Applicant's arguments, see page 10, filed 12 November 2025, with respect to the rejection of Claims 1-25 under 35 U.S.C. 101 for abstract idea have been fully considered but they are not persuasive.

APPLICANT'S ARGUMENT: Applicant appears to argue (page 10, paragraph 5) that "The claims have been amended to address the current §101 rejections. As such, applicant submits that the present §101 rejections have been overcome."
EXAMINER'S RESPONSE: Examiner respectfully disagrees. As indicated in the 35 U.S.C. 101 rejection below, amended Claim 1 recites limitations that appear to recite mental process steps, such as observation, evaluation, judgment, opinion, at the claimed levels of generality: learning a probabilistic model, determining an inverse of a probabilistic model, converting a model into a discriminative model, converting a model into DNN, performing dropout based on statistical data, and decomposing a DNN by generating parallel and sequential execution schedules for memory sharing.
Amended Claim 1 recites additional elements to the claimed mental process steps: a graphics processor, training a DNN using labeled data, and setting precision of neuron weights. At the claimed levels of generality, these additional elements do not appear to integrate the mental process steps into a practical application or provide significantly more according to Step 2A Prong Two and Step 2B of the Alice/Mayo test. Therefore amended Claim 1 is directed to the recited mental processes and does not improve the functioning of a computer or other technology or technological field. The dependent claims are rejected under the rationales given below.

Applicant' s arguments, see pages 11-12, filed 12 November 2025, with respect to the rejection of Claims 1-3, 5, 6, 8-10, 12, 13, 15-17, 19, 21-23, and 25 under 35 U.S.C. 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

APPLICANT'S ARGUMENT: Applicant appears to argue (page 12, paragraph 3) that "The cited references do not teach or suggest an arrangement in which a graphics processor is to train a DNN using labeled data, where the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks, as recited in claim 1 as amended herein. Applicant can find no teaching or suggestion of such a feature anywhere in the cited references. Therefore, the cited references do not teach or suggest claim 1."
EXAMINER'S RESPONSE: Examiner notes that Applicant's arguments are moot in light of the new ground of rejection of amended Claims 1-3, 5, 6, 8-10, 12, 13, 15-17, 19, 21-23, and 25, which are now rejected under 35 U.S.C. 103 as being unpatentable over Patel in view of Chai in view of Barrow in view of Chen. The argued features, wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks, are taught by Chen.

Claim Objections
The objection to Claims 1, 8, 15, and 21 for informalities is withdrawn.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding Claim 1
Step 1
Claim 1 recites an apparatus, and thus the claimed machine falls within a statutory category of invention.
Step 2A Prong 1
The claim recites learn a structure of a generative probabilistic model, which is a mental process. The claim recites determine, based on the structure, a stochastic inverse of the generative probabilistic model, which is a mental process. The claim recites convert the stochastic inverse into a discriminative model, which is a mental process. The claim recites convert the discriminative model into a deep neural network (DNN) , which is a mental process. The claim recites performing methodological dropout of the neurons, wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons, which is a mental process. The claim recites wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The additional element a graphics processor invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element trained using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element train the DNN using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element setting a bit-precision of weights in neurons of the DNN independently from one another invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 2
Step 1
Regarding Claim 2, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites learn a structure of a generative probabilistic model (as recited by Claim 1), wherein the generative model is unsupervised and based on unlabeled data, which is a mental process. The claim recites convert the stochastic inverse into a discriminative model (as recited by Claim 1), wherein the discriminative model is supervised and based on labeled data, which is a mental process. The claim recites convert the stochastic inverse into a discriminative model (as recited by Claim 1), wherein the discriminative model is learned from the generative probabilistic model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 3
Step 1
Regarding Claim 3, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites to inverse the generative probabilistic model into multiple inverse models, which is a mental process. The claim recites wherein a bidirectional connection is added to connect latent variables having a common parent in each of the multiple inverse models to consolidate the multiple inverse models into a single inverse model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 4
Step 1
Regarding Claim 4, the rejection of Claim 3 is incorporated.
Step 2A Prong 1
The claim recites to convert the inverse model into the discriminative model by removing the bidirectional connection and adding a class node serving as a child node to latent leaves, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 5
Step 1
Regarding Claim 5, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites perform on-the-fly learning and updating of network topologies of the neural networks based on at least one of currently available data and historically available data relating to the topologies of the neural networks, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 6
Step 1
Regarding Claim 6, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites facilitate at least one of an end-to-end structure learning and a sub-network structure learning, which is a mental process. The claim recites facilitate feature bagging or coping with large scale data, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The additional element training large training sets invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 7
Step 1
Regarding Claim 7, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
Claim 7 recites the abstract ideas recited by parent Claim 1.
Step 2A Prong 2, Step 2B
The additional element wherein the graphics processor is co-located with an application processor on a common semiconductor package invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 8
Step 1
Claim 8 recites a method, and thus the claimed process falls within a statutory category of invention.
Step 2A Prong 1
The claim recites learning a structure of a generative probabilistic model, which is a mental process. The claim recites determining, based on the structure, a stochastic inverse of the generative probabilistic model, which is a mental process. The claim recites converting the stochastic inverse into a discriminative model, which is a mental process. The claim recites converting the discriminative model into a deep neural network (DNN) , which is a mental process. The claim recites performing methodological dropout of the neurons, wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons, which is a mental process. The claim recites wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The additional element a graphics processor invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element trained using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element training the DNN using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element setting a bit-precision of weights in neurons of the DNN independently from one another invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Claims 9-14, dependent on Claim 8, incorporate the rejection of Claim 8. Claims 9-14 incorporate substantively all the limitations of Claims 2-7, respectively, in method form and are rejected under the same rationales.

Regarding Claim 15
Step 1
Claim 15 recites at least one non-transitory machine-readable medium comprising instructions that when executed by a computing device, cause the computing device to perform operations, and thus the claimed manufacture falls within a statutory category of invention.
Step 2A Prong 1
The claim recites learning a structure of a generative probabilistic model, which is a mental process. The claim recites determining, based on the structure, a stochastic inverse of the generative probabilistic model, which is a mental process. The claim recites converting the stochastic inverse into a discriminative model, which is a mental process. The claim recites converting the discriminative model into a deep neural network (DNN) , which is a mental process. The claim recites performing methodological dropout of the neurons, wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons, which is a mental process. The claim recites wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The additional element a graphics processor invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element trained using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element training the DNN using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element setting a bit-precision of weights in neurons of the DNN independently from one another invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Claims 16-19, dependent on Claim 15, incorporate the rejection of Claim 15. Claims 16-19 incorporate substantively all the limitations of Claims 2-5, respectively, in non-transitory machine-readable medium form and are rejected under the same rationales.

Regarding Claim 20
Step 1
Regarding Claim 20, the rejection of Claim 15 is incorporated.
Step 2A Prong 1
The claim recites facilitating at least one of an end-to-end structure learning and a sub-network structure learning, which is a mental process. The claim recites facilitating feature bagging or coping with large scale data, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The additional element training large training sets invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the graphics processor is co-located with an application processor on a common semiconductor package invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Regarding Claim 21
Step 1
Claim 21 recites a system, and thus the claimed machine falls within a statutory category of invention.
Step 2A Prong 1
The claim recites learn a structure of a generative probabilistic model, which is a mental process. The claim recites determine, based on the structure, a stochastic inverse of the generative probabilistic model, which is a mental process. The claim recites convert the stochastic inverse into a discriminative model, which is a mental process. The claim recites convert the discriminative model into a deep neural network (DNN) , which is a mental process. The claim recites performing methodological dropout of the neurons, wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons, which is a mental process. The claim recites wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The additional element a memory; and a graphics processor communicably coupled to the memory invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element trained using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element train the DNN using labeled data invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element setting a bit-precision of weights in neurons of the DNN independently from one another invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.

Claims 22-25, dependent on Claim 21, incorporate the rejection of Claim 21. Claims 22-25 incorporate substantively all the limitations of Claims 2-5, respectively, in system form and are rejected under the same rationales.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1-3, 5, 6, 8-10, 12, 13, 15-17, 19, 21-23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Patel et al. (US 2018/0082172, hereinafter "Patel") in view of Chai, et al. (US 2019/0258917 A1, hereinafter "Chai") in view of Barrow et al., "Selective Dropout for Deep Neural Networks" (hereinafter "Barrow") in view of Chen, et al. "Training deep nets with sublinear memory cost" (hereinafter "Chen").
Regarding Claim 1, Patel teaches:
An apparatus comprising: a graphics processor (Patel, [0465]: "the computer system 1100 may include other devices, e.g., devices such as one or more graphics accelerators" and [0468]: "The computer system 1100 may be configured with a software infrastructure including an operating system, and perhaps also, one or more graphics APIs (such as OpenGL®, Direct3D, Java 3DTM)") to:
learn a structure of a generative probabilistic model (Patel, Fig. 10, step 1010, "receive input that specifies a generative probabilistic model, wherein the generative probabilistic model characterizes a conditional probability distribution for measurement data given a set of latent variables," where Patel's conditional probability distribution corresponds to the instant structure, and where Patel's distribution is learned from measurement data, as in [0018]: "We answer these questions by developing a new probabilistic framework for deep learning based on a Bayesian generative probabilistic model that explicitly captures variation due to nuisance variables. The graphical structure of the model enables it to be learned from data using classical expectation-maximization techniques");
determine, based on the structure, a stochastic inverse of the generative probabilistic model (Patel, [0271]: "Every generative model for labels and features             
                p
                
                        c
                        ,
                        x
                    
                                θ
                            
                                g
                            
         is naturally associated with a generative classifier defined by its class posterior, namely             
                
                        argmax
                    
                        c
                         
                p
                
                        c
                    
                        x
                        ,
                        
                                θ
                            
                                g
                            
        ," where Patel's generative classifier is the inverse of the generative model, as it inverts the conditional probability of feature             
                x
            
        , and stochastic as it may involves random features, as in [0020]: "the generative probabilistic model characterizes a conditional probability distribution for measurement data given a set of latent variables. The measurement data may be a random vector variable, whose components represent elements or features of interest in a given application");
convert the stochastic inverse into a discriminative model (Patel, [0280]: "we call the discriminative classifier             
                
                        p
                    
                    ~
                
                        c
                    
                        x
                        ,
                        
                                θ
                            
                                d
                            
                        =
                        ρ
                        
                                        θ
                                    
                                        g
                                    
         a discriminative counterpart (or relaxation) of the generative classifier," where Patel's relaxation corresponds to the instant convert);
convert the discriminative model into a deep neural network (DNN) (Patel, [0138]: "starting with a generative classifier with learning objective             
                
                        L
                    
                        g
                        e
                        n
                    
                        θ
                    
        , we ... arrive at a discriminative classifier with learning objective             
                
                        L
                    
                        d
                        i
                        s
                    
                        η
                    
        . We refer to this process as a discriminative relaxation of a generative classifier and the resulting classifier is a discriminative counterpart to the generative classifier" and [0134]: "applying this procedure to the generative DRM classifier (with constrained weights) yields the discriminative DCN [deep convolutional neural network] classifier," where Patel's DCN classifier corresponds to the instant DNN) that is trained using labeled data (Patel, [0208]: "DCNs are purely discriminative techniques and thus cannot benefit from unlabeled data. However, armed with a generative model we can perform hybrid discriminative- generative training (31) that enables training to benefit from both labeled and unlabeled data in a principled manner. This should dramatically increase the power of pre-training, by encouraging representations of the input that have disentangled factors of variation. This hybrid generative-discriminative learning is achieved by the optimization of a novel objective function for learning, that relies on both the generative model and its discriminative relaxation"); and
train the DNN using labeled data (Patel, Fig. 8, "Use ORM to do hybrid generative-discriminative learning that simultaneously incorporates labeled, unlabeled, and weakly labeled data" and Patel, [0208]: "This hybrid generative-discriminative learning is achieved by the optimization of a novel objective function for learning, that relies on both the generative model and its discriminative relaxation").
Patel does not explicitly teach training the DNN comprises at least setting a bit-precision of weights are set in neurons of the DNN independently from one another.
However, Chai teaches:
training the DNN comprises at least setting a bit-precision of weights in neurons of the DNN independently from one another (Chai, [0028]: "It is noted that bit precision can be adjusted on a layer-by-layer basis (such that a different number of bits is allocated to different layers), and need not be maintained at the same level for every layer of a neural network. ... It is understood that trimming or otherwise adjusting bits (215) is a relative terminology, such that the process 215 can be generically used to define a method to adjust the appropriate number of bits during training phase," where Chai's layerwise bit precision corresponds to the instant a bit-precision of weights, and where Chai's neural network may be a DNN, as in [0007]: "The subbands can be fed with different weights to a neural network for training of a neural network with multiple hidden layers").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Patel regarding training the DNN using labeled data with those of Chai regarding training the DNN comprises at least setting a bit-precision of weights are set in neurons of the DNN independently from one another.
The motivation to do so would be to facilitate lower memory requirements for a trained network (Chai, [0028]: "the process 215 can be used to add additional bits to the subband in one specific layer, while simultaneously an overall number of memory for the network can be reduced (e.g., by reducing the number of parameters, nodes, and synaptic connections in the neural network)").
The Patel/Chai combination does not explicitly teach performing methodological dropout of the neurons, wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons.
However, Barrow teaches:
performing methodological dropout of the neurons (Barrow, Abstract: "We present 3 new alternative methods for performing dropout on a deep neural network .... These methods select neurons to be dropped through statistical values calculated using a neurons change in weight, the average size of a neuron's weights, and the output variance of a neuron"), wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons (Barrow, p. 1, 1 Introduction: "We propose a new method of dropout that selectively chooses the best neurons (neurons which will have the biggest positive effect on the network if switched off) to be given a higher probability of being switched off on the assumption that dropout could be made more effective and efficient by not dropping neurons that should be forced to continue to learn," where Barrow's statistical values corresponds to instant historical statistical data and Barrow's selective choosing on an assumption corresponds to instant predictivity).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Patel/Chai combination regarding training the DNN using labeled data with those of Barrow regarding performing methodological dropout of the neurons, wherein the methodological dropout is performed in accordance with a predictivity based on historical statistical data relating to the neurons.
The motivation to do so would be to improve model accuracy resulting from training (Barrow, Abstract "increasing the probability of dropping neurons with smaller values of these statistics and decreasing the probability of those with larger statistics gave an improved result in training over 10,000 epochs. The most effective of these was found to be the Output Variance method, giving an average improvement of 1.17% accuracy over traditional dropout methods").
The Patel/Chai/Barrow combination teaches training a DNN using labeled data by setting a bit-precision of weights in neurons of the DNN and performing methodological dropout of the neurons.
The Patel/Chai/Barrow combination does not explicitly teach wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks.
However, Chen teaches:
wherein the DNN is decomposed by generating parallel and sequential (Chen, p. 3, 3 Memory Optimization with Computation Graph: "When training a deep convolutional/recurrent network, a great proportion of the memory is usually used to store the intermediate outputs and gradients. ... A smart allocation algorithm is able to assign the least amount of memory to these nodes by sharing memory when possible. ... ¶ We can only share memory between the nodes whose lifetime do not overlap. ... One option is to construct the conflicting graph of with each variable as node and edges between variables with overlapping lifespan .... An inplace operation can happen when there is no other pending operations that depend on its input. Memory sharing happens when a recycled tag is used by another node," where Chen's operations with non-overlapping lifetimes corresponds to the to instant sequential execution, and operations with overlapping lifetimes corresponds to claimed parallel execution) execution schedules (Chen, Figure 2, depicting step-by-step memory allocation for both shared and unshared memory, including a "Final Memory Plan," where Chen's step-by-step corresponds to the instant execution schedule) for memory sharing (Chen, Figure 1, "Memory allocation for each output op, same color indicates shared memory") at sub-network precision levels of the one or more of the neural networks (Chen, Figure 1, "A Possible Allocation Plan," which depicts the various individual layers of a deep neural network during training, which correspond to the instant sub-network precision levels).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Patel/Chai/Barrow combination regarding training a DNN using labeled data by setting a bit-precision of weights in neurons of the DNN and performing methodological dropout of the neurons with those of Chen regarding wherein the DNN is decomposed by generating parallel and sequential execution schedules for memory sharing at sub-network precision levels of the one or more of the neural networks.
The motivation to do so would be to reduce the memory consumption of deep neural network training (Chen, p. 1, Abstract: "As many of the state-of-the-art models hit the upper bound of the GPU memory, our algorithm allows deeper and more complex models to be explored .... Our experiments show that we can reduce the memory cost of a 1 ,000-layer deep residual network from 48G to 7G on lmageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences").

Regarding Claim 8, Patel teaches:
A method (Patel, [0134]: "Having motivated the distinction between the two types of models, in this section we will define a method for transforming one into the other that we call a discriminative relaxation") comprising: : precisely those steps recited by Claim 1. Claim 8 is rejected under the same rationale as Claim 1.

Regarding Claim 15, Patel teaches:
At least one non-transitory machine-readable medium comprising instructions that when executed by a computing device, cause the computing device to perform operations (Patel, [0470]: "a non-transitory computer-readable memory medium may be configured so that it stores program instructions and/or data, where the program instructions, if executed by a computer system, cause the computer system to perform a method") comprising: precisely those steps recited by Claim 1. Claim 15 is rejected under the same rationale as Claim 1.

Regarding Claim 21, Patel teaches:
A system (Patel, [0457]: "FIG. 11 illustrates one embodiment of a computer system 1100 that may be used to perform any of the method embodiments described herein") comprising: a memory (Patel, [0458]: "Computer system 1100 may include a processing unit 1110, a system memory 1112, a set 1115 of one or more storage devices, a communication bus 1120, a set 1125 of input devices, and a display system 1230"); and a graphics processor communicably coupled to the memory (Patel, [0465]: "the computer system 1100 may include other devices, e.g., devices such as one or more graphics accelerators" and [0468]: "The computer system 1100 may be configured with a software infrastructure including an operating system, and perhaps also, one or more graphics APIs (such as OpenGL®, Direct3D, Java 3DTM)"), the graphics processor to: perform precisely those steps recited by Claim 1. Claim 21 is rejected under the same rationale as Claim 1.

Regarding Claim 2, the rejection of Claim 1 is incorporated. The Patel/Chai/Barrow/Chen combination teaches:
wherein the generative model is unsupervised and based on unlabeled data, and wherein the discriminative model is supervised and based on labeled data (Patel, [0208]: "DCNs are purely discriminative techniques and thus cannot benefit from unlabeled data. However, armed with a generative model we can perform hybrid discriminative- generative training (31) that enables training to benefit from both labeled and unlabeled data in a principled manner"), wherein the discriminative model is learned from the generative probabilistic model (Patel, [0290] : "When the data come from a generative model, the corresponding discriminative relaxation of the generative classifier will learn it, given enough data").

Claims 9, 16, and 22 incorporate substantively all the limitations of Claim 2 in method, non-transitory machine-readable medium, and system forms, respectively, and are rejected under the same rationale.

Regarding Claim 3, the rejection of Claim 1 is incorporated. The Patel/Chai/Barrow/Chen combination teaches:
wherein the graphics processor is further to inverse the generative probabilistic model into multiple inverse models (Patel, [0280]: "Suppose that a set of parameters             
                
                        θ
                    
                        d
                    
                =
                p
                
                                θ
                            
                                g
                            
         is sufficient for computing the class posterior: i.e. the function             
                p
                
                        c
                    
                        x
                        ,
                        
                                θ
                            
                                g
                            
         depends on             
                
                        θ
                    
                        9
                    
         only through the (potentially smaller) set of parameters             
                
                        θ
                    
                        d
                    
        . Then we call the discriminative classifier             
                
                        p
                    
                    ~
                
                        c
                    
                        x
                        ,
                        
                                θ
                            
                                d
                            
                        =
                        ρ
                        
                                        θ
                                    
                                        g
                                    
         a discriminative counterpart (or relaxation) of the generative classifier. We denote the relationship between the generative classifier and its discriminative relaxation as             
                p
                
                        →
                    
                        d
                    
                        p
                    
                    ~
                
        ," where Patel's generative and discriminative classifiers are both inverse models of the generative probabilistic model, given as [0269]: "we can define a generative model for labels and features             
                p
                
                        c
                        ,
                        x
                    
                                θ
                            
                                g
                            
        , where             
                x
            
         contains all features except the class labels"),
wherein a bidirectional connection is added (Patel, [0065]: "This section develops the RM, a generative probabilistic model that explicitly captures nuisance transformations as latent variables. ... Finally, we show that, after the application of a discriminative relaxation, inference and learning in the DRM correspond to feedforward propagation and back propagation training in the DCN," where Patel's feedforward and backpropagation corresponds to the instant bidirectional connection) to connect latent variables having a common parent in each of the multiple inverse models (Patel, Figs. 2B and 2C, depicting latent variables of the Max-Sum Message and Input Feature Map layers of the generative and discriminative networks, which have a common input) to consolidate the multiple inverse models into a single inverse model (Patel, Fig. 2C, depicting a DCN, a consolidation of the generative and discriminative models, as in [0018]: "by relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks (DCNs) and random decision forests").

Claims 10, 17, and 23 incorporate substantively all the limitations of Claim 3 in method, non-transitory machine-readable medium, and system forms, respectively, and are rejected under the same rationale.

Regarding Claim 5, the rejection of Claim 1 is incorporated. The Patel/Chai/Barrow/Chen combination teaches:
perform on-the-fly learning and updating of network topologies of the neural networks based on at least one of currently available data and historically available data relating to the topologies of the neural networks (Patel, [0203]: "consider the problem of determining the number of filters in a convolutional layer for a DCN. ... For our first prototypes, we will focus on the AIC and BIC scoring algorithms (45), which reward a trained model's goodness-of-fit (e.g log-likelihood) and penalize its complexity (e.g. number of parameters)" and [0204]: "We will use AIC criterion to score models with different filter sizes per layer and pick the best one," where Patel's AIC criterion corresponds to the instant current data).

Claims 12, 19, and 25 incorporate substantively all the limitations of Claim 5 in method, non-transitory machine-readable medium, and system forms, respectively, and are rejected under the same rationale.

Regarding Claim 6, the rejection of Claim 1 is incorporated. The Patel/Chai/Barrow/Chen combination teaches:
facilitate at least one of an end-to-end structure learning and a sub-network structure learning (Patel, [0203]: "consider the problem of determining the number of filters in a convolutional layer for a DCN. … [W]e will focus on the AIC and BIC scoring algorithms (45), which reward a trained model's goodness-of-fit (e.g. log-likelihood) and penalize its complexity (e.g. number of parameters)," where Patel's determining the number of filters corresponds to the instant sub-network structure learning); and
facilitate feature bagging or coping with large scale data by training large training sets [Patel, [0190]: "most of the limitations of the DCN framework can be traced back to the fact that it is a discriminative classifier whose underlying generative model was not known. ... Finally, it is unable to learn from unlabeled data and to generalize from few examples. As a result, DCNs require enormous amounts of labeled data for training" and [0191]: "These limitations can be overcome by designing new deep networks based on new model structures (extended DRMs), new message-passing inference algorithms, and new learning rules, as summarized in Table 2," where Patel's methods avoid requirements for large-scale labeled training data).

Claim 13 incorporates substantively all the limitations of Claim 6 in method form and is rejected under the same rationale.

Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Patel et al. (US 2018/0082172, hereinafter "Patel") in view of Chai, et al. (US 2019/0258917 A1, hereinafter "Chai") in view of Barrow et al., "Selective Dropout for Deep Neural Networks" (hereinafter "Barrow") in view of Chen, et al. "Training deep nets with sublinear memory cost" (hereinafter "Chen") in view of Foley, et al., "A Low-Power Integrated x86-64 and Graphics Processor for Mobile Computing Devices" (hereinafter "Foley").
Regarding Claim 7, the rejection of Claim 1 is incorporated.
The Patel/Chai/Barrow/Chen combination teaches an apparatus comprising a graphics processor to learn a structure of a generative probabilistic model.
The Patel/Chai/Barrow/Chen combination does not explicitly teach the graphics processor is co-located with an application processor on a common semiconductor package.
However, Foley teaches:
the graphics processor is co-located with an application processor on a common semiconductor package (Foley, Fig. 4, depicting a CPU and GPU on an integrated die).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Patel/Chai/Barrow/Chen combination regarding an apparatus comprising a graphics processor to learn a structure of a generative probabilistic model with those of Foley regarding the graphics processor being co-located with an application processor on a common semiconductor package.
The motivation to do so would be to enable the apparatus to operate with reduced memory latency, improved request ordering, and reduced area and power (Foley, p. 222, VI. Fusion Basics: "The traditional model of a processor chip (with integrated NB) coupled with an integrated graphics processor has a number of shortfalls. The high-speed PHY coupling the two processors (shown in red in Fig. 4) occupies significant area and consumes power. ... Additionally, the link may present a bandwidth bottleneck. When the two dies are integrated, a wide (256 bits in each direction) data path from the graphics memory controller to the NB is added, allowing for full access to system memory from the GMC. This path provides GMC clients with a low latency path to non-snooped regions of system memory, reducing the minimum read latency by up to 40%. Compared to two-chip solutions, use of the on-die integrated GPU significantly reduces memory latency, improves request ordering, and reduces area and power").

Claim 14 incorporates substantively all limitations of Claim 7 in method form and is rejected under the same rationale.

Regarding Claim 20, the rejection of Claim 15 is incorporated. The Patel/Chai/Barrow/Chen combination teaches:
facilitate at least one of an end-to-end structure learning and a sub-network structure learning (Patel, [0203]: "consider the problem of determining the number of filters in a convolutional layer for a DCN. … [W]e will focus on the AIC and BIC scoring algorithms (45), which reward a trained model's goodness-of-fit (e.g. log-likelihood) and penalize its complexity (e.g. number of parameters)," where Patel's determining the number of filters corresponds to the instant sub-network structure learning); and
facilitate feature bagging or coping with large scale data by training large training sets [Patel, [0190]: "most of the limitations of the DCN framework can be traced back to the fact that it is a discriminative classifier whose underlying generative model was not known. ... Finally, it is unable to learn from unlabeled data and to generalize from few examples. As a result, DCNs require enormous amounts of labeled data for training" and [0191]: "These limitations can be overcome by designing new deep networks based on new model structures (extended DRMs), new message-passing inference algorithms, and new learning rules, as summarized in Table 2," where Patel's methods avoid requirements for large-scale labeled training data).
The Patel/Chai/Barrow/Chen combination does not explicitly teach the graphics processor is co-located with an application processor on a common semiconductor package.
However, Foley teaches:
the graphics processor is co-located with an application processor on a common semiconductor package (Foley, Fig. 4, depicting a CPU and GPU on an integrated die).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Patel/Chai/Barrow/Chen combination regarding an apparatus comprising a graphics processor to learn a structure of a generative probabilistic model with those of Foley regarding the graphics processor being co-located with an application processor on a common semiconductor package.
The motivation to do so would be to enable the apparatus to operate with reduced memory latency, improved request ordering, and reduced area and power (Foley, p. 222, VI. Fusion Basics: "The traditional model of a processor chip (with integrated NB) coupled with an integrated graphics processor has a number of shortfalls. The high-speed PHY coupling the two processors (shown in red in Fig. 4) occupies significant area and consumes power. ... Additionally, the link may present a bandwidth bottleneck. When the two dies are integrated, a wide (256 bits in each direction) data path from the graphics memory controller to the NB is added, allowing for full access to system memory from the GMC. This path provides GMC clients with a low latency path to non-snooped regions of system memory, reducing the minimum read latency by up to 40%. Compared to two-chip solutions, use of the on-die integrated GPU significantly reduces memory latency, improves request ordering, and reduces area and power").

Conclusion
Claims 4, 11, 18, and 24 are rejected as reciting subject-matter that is patent ineligible. After a thorough search of the art, no reference was found to teach or fairly suggest:
wherein the graphics processor is further to convert the inverse model into the discriminative model by removing the bidirectional connection and adding a class node serving as a child node to latent leaves.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT N DAY whose telephone number is (703)756-1519. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/R.N.D./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
Prosecution Timeline

Nov 08, 2022
Application Filed
Aug 20, 2025
Non-Final Rejection — §101, §103
Nov 12, 2025
Response Filed
Feb 09, 2026
Final Rejection — §101, §103
Mar 31, 2026
Applicant Interview (Telephonic)
Mar 31, 2026
Examiner Interview Summary
Apr 08, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/195,116
Patent 12406181
METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR UPDATING MODEL
2y 5m to grant Granted Sep 02, 2025
17/155,997
Patent 12229685
MODEL SUITABILITY COEFFICIENTS BASED ON GENERATIVE ADVERSARIAL NETWORKS AND ACTIVATION MAPS
2y 5m to grant Granted Feb 18, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
23%
Grant Probability
46%
With Interview (+23.2%)
4y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.