Last updated: May 29, 2026
Application No. 18/322,218
ON-THE-FLY DEEP LEARNING IN MACHINE LEARNING AT AUTONOMOUS MACHINES

Final Rejection §103
Filed
May 23, 2023
Priority
May 05, 2017 — provisional 62/502,294 +4 more
Examiner
HAUSMANN, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Intel Corporation
OA Round
6 (Final)
Interview Optional

— +21.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 76% grant rate with +21.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 870 resolved cases, 2023–2026
Examiner Intelligence

HAUSMANN, MICHELLE M View full profile →
Grants 76% — above average
Career Allowance Rate
663 granted / 870 resolved
+14.2% vs TC avg
Strong +21% interview lift
Without
With
+21.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
22 currently pending
Career history
895
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
94.8%
+54.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 870 resolved cases
Office Action

§103
DETAILED ACTION
Response to Amendment
Claims 1-20 are pending. Claims 2, 11, and 17 are amended.
Response to Arguments
Applicant’s arguments, see page 7, filed 11 March, 2026, with respect to the 35 USC 112a rejections of claims 2, 11, and 17, and priority note, along with accompanying amendments received on the same date, have been fully considered and are persuasive.  The 35 USC 112a rejections have been withdrawn. 
Applicant's remaining arguments have been fully considered but they are not persuasive. 
Applicant argues on page 9: This describes architectural connections between sub-networks, not the extraction of features learned by a first DNN for use in training a second DNN. Matsuda further explains that "the process of training the DNN formed by connecting independent sub-network 120 and dependent sub-network 122 using Japanese training data, and the process of training the DNN formed by connecting independent sub-network 120 and dependent sub-network 124 using English training data are repeated alternately." Matsuda et al., paragraph [0044]. This describes simultaneous training of connected sub-networks using training data, not extracting a feature learned by a first DNN and using that extracted feature to train a second DNN.
Matsuda et al. disclose "extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model": In this respect, for image recognition, if there is any category that can clearly distinguish objects, learning of DNNs for image recognition can efficiently be done category by category in place of the languages of the examples above, using the present invention, [0084], training a first DNN formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, [0011], the computer storing a category-independent sub-network used commonly for the plurality of categories, [0014], separating the first sub-network from other sub-networks and storing it as a category-independent sub-network in a storage medium, [0017], connecting it to the output stage of independent sub-network 230, user already has an independent sub-network, [0047] [indicates user independent], fixing independent sub-network 120, the DNN consisting of independent sub-network 120, [0048]
The interpretation that this means Matsuda et al. must use features is supported by the following: features common to multiple languages ([0035]) and “For example, detection of basic features of images such as edge detection as a basis for image recognition is conducted commonly, regardless of the nature of objects. On the other hand, identification of specific objects in images is considered to be based on higher features. Therefore, it is possible to form DNNs for identifying objects in images by a sub-network independent of any category of images (independent sub-network) and sub-networks dependent on categories (dependent sub-networks) dependent on categories prepared for each category of images” ([0085]) and as general knowledge of neural networks is that some sort of “feature” when interpreted very broadly must be used for training. It is likely the applicant means something more specific by the word “features” however in the very broad form it is claimed, this is indicated currently by Matsuda. The order of the networks is not just architecture, the order informs how the data from one network is used by another.
Applicant argues on pages 9-10: Additionally, while Matsuda mentions "image recognition" in paragraphs [0001], [0009], and [0032], these are merely generic boilerplate statements with no technical disclosure. Paragraph [0001] states that "DNNs have been applied to many applications such as image recognition and speech recognition," which is a generic statement only. Matsuda et al., paragraph [0001]. Paragraph [0032] states that "[t]hough the embodiments below mainly relate to speech recognition, application of the present invention is not limited thereto. By way of example, the present invention is also applicable to image recognition." Matsuda et al., paragraph [0032]. The entire detailed embodiment in Matsuda focuses on multi-language speech recognition (Japanese, English, Chinese) with no technical details about how the invention would be applied to image recognition or computer vision.
Examiner notes the claims do not demonstrate through the claim language how the neural networks are specific for image processing. Therefore in the broad sense claimed, Matsuda indicates computer vision by stating: the DNNs may be context-dependent DNNs or context-independent DNNs, [0016], camera(s) to provide device optimized functions such as speech recognition, image recognition and search, and speech synthesis, [0017], personal video recorders, [0020], first neural network, [0092]. While the reference does not disclose video frame it is implied through the use of video recorder data and a camera that is used for image recognition purposes.
Applicant argues on page 10: This describes architectural connections between neural networks for maintaining history, not extracting learned features from one network to train another.
He et al. also teach "extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model": (DNNs may be context-dependent DNNs or context-independent DNNs, [0016], camera(s) to provide device optimized functions such as speech recognition, image recognition and search, and speech synthesis, [0017], personal video recorders, [0020], first neural network, [0092]).
That this means features are learned is supported by: A processing unit can acquire datasets from respective data sources, each having a respective unique data domain. The processing unit can determine values of a plurality of features based on the plurality of datasets. The processing unit can modify input-specific parameters or history parameters of a computational model based on the values of the features. In some examples, the processing unit can determine an estimated value of a target feature based at least in part on the modified computational model and values of one or more reference features. In some examples, the computational model can include neural networks for several input sets. An output layer of at least one of the neural networks can be connected to the respective hidden layer(s) of one or more other(s) of the neural networks. In some examples, the neural networks can be operated to provide transformed feature value(s) for respective times (abstract) “Accordingly, in some examples, the computational model 224 includes a first neural network (e.g., neural network 402(3)) and a second neural network (e.g., neural network 402(3)) having respective input layers 404, respective hidden layers 406, and respective output layers 408. The estimating module 236 can be configured to determine the estimated value of the target feature based at least in part on an output of the hidden layer 406(3) of the second neural network 402(3) and to adjust the determined estimated value of the target feature based at least in part on an output of the hidden layer 406(2) of the first neural network 402(2)” [0092] At block 604, values of a plurality of features can be determined based at least in part on the plurality of datasets. Some examples are described above with reference to the extraction module 232. ([0096]). Again while the applicant likely has more details in the specification of what exactly the features are, as written, Matsuda and He teach these limitations.
Applicant argues on page 10: Dong teaches GPU acceleration primitives but does not teach extracting a feature learned by a first DNN and using that extracted feature to train a second DNN. Examiner notes Dong is only used to teach GPU acceleration primitives. Together with Matsuda and He however, the limitation is taught in full: As Dong et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device or the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training a DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train a DNN model, and Matsuda et al. and He et al. disclose/teach training a second DNN, in combination Matsuda et al. and He et al. and Dong et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device or the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train the second DNN model.
Applicant’s arguments with respect to claim(s) 2, 11, and 17 have been considered but are moot because the new ground of rejection does not rely on the combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4, 10, 12, 14, 15, 16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) in view of He et al. (US 20160379112 A1) in view of Dong et al. (“DNNMark: A Deep Neural Network Benchmark Suite for GPUs” February, 2017).

Regarding claims 1, 10, and 16, Matsuda et al. disclose a data processing system on a computing device, the data processing system comprising: method comprising, and non-transitory machine-readable medium storing instructions which, when executed by one or more processors including one or more general-purpose graphics processors, cause the one or more processors to perform operations comprising: one or more storage devices comprising a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device (acceleration of DNN learning for a specific application, [0001]), the deep learning framework to cause the one or more general-purpose graphics processors to perform operations comprising: extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable user-independent classification of an object within an input video frame (image recognition, [0001], [0032], In this respect, for image recognition, if there is any category that can clearly distinguish objects, learning of DNNs for image recognition can efficiently be done category by category in place of the languages of the examples above, using the present invention, [0084], training a first DNN formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, [0011], the computer storing a category-independent sub-network used commonly for the plurality of categories, [0014], separating the first sub-network from other sub-networks and storing it as a category-independent sub-network in a storage medium, [0017], connecting it to the output stage of independent sub-network 230, user already has an independent sub-network, [0047] [indicates user independent], fixing independent sub-network 120, the DNN consisting of independent sub-network 120, [0048]); and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including user-dependent data, the second DNN model an update of the first DNN model (training a second DNN formed by connecting the third sub-network to an output side of the first sub-network with training data belonging to the second category, and thereby realizing learning of the first and second DNNs, [0011], a category-dependent sub-network used for a specific category, computer training the sub-network used for a specific category using training data belonging to the specific category while fixing parameters of the category-independent sub-network, [0014], a deep neural network training device, training a first deep neural network formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, and training a second deep neural network formed by connecting the third sub-network to an output side of the first sub-network with training data belonging to the second category, and thereby realizing training of the first and second deep neural networks, [0018], obtaining dependent sub-network 124 for English, [0047], DNN, not-yet-learned dependent sub-network of a new language (for example, Chinese) (dependent sub-network for Chinese) 234 is connected to the output side of independent sub-network 120, [0048]). Matsuda et al. partly disclose the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train the second DNN model (various programming tool kits or program library installed in computer 340 [0065]).

Matsuda et al. do not explicitly disclose a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device, and do not entirely teach user dependent/independent. Matsuda et al. do not explicitly disclose the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train the second DNN model. Matsuda et al. do not explicitly disclose train the second DNN model based on the extracted feature and the dataset including the user-dependent data.

He et al. teach a data processing system on a computing device, the data processing system comprising: one or more storage devices comprising a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device (parallelize the training of the DNNs across multiple processing units, e.g., cores of a multi-core processor or multiple general-purpose graphics processing units (GPGPUs), [0016], Processing unit(s) 112 can be or include one or more single-core processors, multi-core processors, CPUs, GPUs, GPGPUs, or hardware logic components configured, e.g., via specialized programming from modules or APIs, to perform functions described herein, processing units 112 in computing device 102(3) can be a combination of one or more GPGPUs and one or more FPGAs, [0032]), the deep learning framework to cause the one or more general-purpose graphics processors to perform operations comprising: extracting, via the deep learning framework, a feature learned by a first deep neural network (DNN) model, wherein the first DNN model is a pre-trained DNN model for computer vision to enable user-independent classification of an object within an input video frame (DNNs may be context-dependent DNNs or context-independent DNNs, [0016], camera(s) to provide device optimized functions such as speech recognition, image recognition and search, and speech synthesis, [0017], personal video recorders, [0020], first neural network, [0092]); and training, via the deep learning framework, a second DNN model for computer vision based on the extracted feature and a dataset including user-dependent data (DNNs may be context-dependent DNNs or context-independent DNNs, [0016], determine the estimated value of the target feature based at least in part on an output of the hidden layer 406(3) of the second neural network 402(3) and to adjust the determined estimated value of the target feature based at least in part on an output of the hidden layer 406(2) of the first neural network 402(2), [0092]). 

As a user can be a type of context, an interpretation partly supported by for instance paragraph 73 of He et al., the context-independent and context-dependent networks are interpreted as the user-independent and user-dependent networks.

Matsuda et al. and He et al. are in the same art of deep neural networks (Matsuda et al., [0001]; He et al., [0016]). The combination of He et al. with Matsuda et al. will enable the use of context-independent and context-dependent DNNs. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the context-independent and context-dependent DNNs of He et al. with the invention of Matsuda et al. as this was known at the time of filing, the combination would have predictable results, and as He et al. indicate, “Various DNN training and operation techniques described herein can permit more efficiently analyzing data from disparate data sources.  Various examples can provide more effective ongoing training of neural networks, e.g., based on sensor readings, providing improved accuracy with reduced computational power compared to repeatedly retraining the neural networks.  Various examples operate multiple neural networks, permitting the operation of those neural networks to be carried out in parallel.  This parallel operation can permit operating the neural network with reduced computational load and memory requirements compared to operating a monolithic neural network” ([0148]), indicating the accuracy and efficiency advantage to having the context dependent and independent DNNs of He incorporated into the DNN configuration of Matsuda.

Matsuda et al. and He et al. do not explicitly disclose a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device or the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train the second DNN model. Matsuda et al. and He et al. do not explicitly disclose train the second DNN model based on the extracted feature and the dataset including the user-dependent data.

Dong et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device (execution behavior on a Nvidia K40 GPU, abstract, “We can leverage a Graphic Processing Unit (GPU), versus relying on a CPU,
to accelerate DNN processing… Thus, heterogeneous computer systems composed of
both CPUs and GPUs are becoming the defacto standard platform for deep learning algorithms. In such systems, the computation associated with each layer in the DNN is offloaded to the GPU
device”, part 1, A number of research studies have reported 2 to 3 orders of speedup when moving their application from a CPU to a GPU, part 2.2) the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors (“CuDNN [3] is a highly optimized CUDA library that provides primitives for computing deep neural networks. Given that this library can produce high performance and has been widely used in several deep learning frameworks, e.g. Caffe, we consider elements of cuDNN in our proposed benchmark suite. In addition to that, cuDNN provides a selection of algorithms and modes for certain primitives such as convolution, pooling, local response normalization, and etc., cuBLAS [7] is an implementation of BLAS (Basic Linear Algebra Subprograms) that run on top of the CUDA runtime. Similar to cuDNN, it is a finely-tuned library that provides API functions for matrix or vector operations, part 2.3), and training the DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train the DNN model (In this paper, we present DNNMark, a GPU benchmark suite that consists of a collection of deep neural network primitives, covering a rich set of GPU computing patterns, abstract, each of DNN primitive workloads can be easily invoked separately, without any sacrifice on configurability, part 1, CuDNN [3] is a highly optimized CUDA library that provides primitives for computing deep neural networks, forward and backward functions are where the actual deep learning computation takes place, part 2.3, 
Training and inference, Training deep neural networks, detailed mechanism for training can be further divided into a forward and a backward propagation, part 2.1.1, SUPPORTED DNN PRIMITIVES, part 3, convolution algorithm, part 3.1, Pooling, part 3.2, Local Response Normalization, part 3.3, Activation, part 3.4) [convolution, pooling, normalization, activation are all “executable” primitives].

As Dong et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device or the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training a DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train a DNN model, and Matsuda et al. and He et al. disclose/teach training a second DNN, in combination Matsuda et al. and He et al. and Dong et al. teach a graphics execution environment including instructions to provide a deep learning framework to accelerate deep learning operations via one or more general-purpose graphics processors of the computing device or the deep learning framework is to provide machine learning primitives accelerated via instructions executed by the one or more general-purpose graphics processors, and training the second DNN model includes executing one or more primitives provided by the deep learning framework to cause the one or more general-purpose graphics processors to perform operations to train the second DNN model.

Matsuda et al. and He et al. and Dong et al. are in the same art of deep neural networks (Matsuda et al., [0001]; He et al., [0016]; Dong et al., abstract). The combination of Dong et al. with Matsuda et al. and He et al. will enable the use of a library of primitives. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the primitives of Dong et al. with the invention of Matsuda et al. and He et al. as this was known at the time of filing, the combination would have predictable results, and as Dong et al. indicate “In this paper, we present DNNMark, a GPU benchmark suite that consists of a collection of deep neural network primitives, covering a rich set of GPU computing patterns. This suite is designed to
be a highly configurable, extensible, and flexible framework, in which benchmarks can run either individually or collectively” (abstract) indicating the customizability and flexibility benefit when Dong et al. is incorporated into the invention of Matsuda et al. and He et al..

Regarding claims 3, 14 and 19, Matsuda et al., He et al., and Dong et al. disclose the data processing system, method, and CRM of claims 1, 10, and 16. Matsuda et al. and Dong et al. further disclose machine learning primitives includes primitives to perform tensor convolution, at least one activation function, and a pooling operation (Matsuda et al. further disclose various programming tool kits or program library installed in computer 340 [0065]; Dong et al., tensor data, part 2.1, supported DNN primitives: convolution, activation, pooling, part 3-3.4).

Regarding claims 4, 15, and 20, Matsuda et al., He et al., and Dong et al. disclose the data processing system, method, and CRM of claims 3, 14, and 19. Dong et al. further teach the machine learning primitives includes primitives to implement basic linear algebra subprograms associated with respective layers of the second DNN model, the respective layers including a fully connected layer (Multiple neurons and activation functions can be grouped together to apply a series of the transformations on the input data. Several groups of data transformations can be concatenated to form a fully-connected feed forward network, All nodes are fully connected through edges associated with a set of learnable weights and biases, part 2.1, cuBLAS [7] is an implementation of BLAS (Basic Linear Algebra Subprograms) that run on top of the CUDA runtime, Our benchmark suite leverages this library to implement computation of the fully-connected layer, as the computation can be easily represented as a matrix-matrix multiplication, part 2.3) [As already taught above, Matsuda and He disclose a second DNN: Matsuda, second DNN, [0048], He et al., second DNN, [0092]).

Regarding claims 12 and 18, Matsuda et al., He et al., and Dong et al. disclose the data processing method and CRM of claims 10 and 17. Matsuda et al. and He et al. further indicate detecting an output associated with the first DNN model; generating training data based on the output associated with the first DNN; and training the second DNN model based on the training data (Matsuda, training a first DNN formed by connecting the second sub-network to an output side of the first sub-network with training data belonging to the first category, [0011]; He, determine the estimated value of the target feature based at least in part on an output of the hidden layer 406(3) of the second neural network 402(3) and to adjust the determined estimated value of the target feature based at least in part on an output of the hidden layer 406(2) of the first neural network 402(2), [0092]).

Claim(s) 2, 11, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) and He et al. (US 20160379112 A1) and Dong et al. (“DNNMark: A Deep Neural Network Benchmark Suite for GPUs” February, 2017) as applied to claims 1, 10 and 16 above, further in view of Georgescu et al. (US 20160174902 A1).

Regarding claims 2, 11, and 17, Matsuda et al., He et al., and Dong et al. disclose the data processing system, method, and CRM as in claims 1, 10, and 16. Matsuda et al., He et al., Dong et al. partly indicate the feature learned by the first DNN comprises a feature vector output by an intermediate layer of the first DNN and is used as training input for the second DNN (Matsuda et al., “Preferably, each of the first, second and third sub-networks includes an input layer and an output layer. The DNN training step includes: an initialization step of the computer initializing the first, second and third sub-networks; a first training step of the computer connecting neurons of the output layer of the first sub-network and neurons of the input layer of the second sub-network to form a first DNN, and training the first DNN with training data belonging to the first category; a second training step of the computer connecting neurons of the output layer of the first sub-network and neurons of the input layer of the third sub-network to form a second DNN, and training the second DNN with training data belonging to the second category; and an execution step of the computer executing the first and second training steps alternately until an end condition is satisfied”, [0012], Referring to FIG. 7, when independent sub-network 120 and dependent sub-network 122 are to be connected, each of the neurons of output layer 164 of independent sub-network 120 are connected to corresponding neurons of input layer 180 of dependent sub-network 122, to form neuron pairs 220, 222, . . . , 224, [0040], 429 dimensional feature vectors were used as the input to the DNNs, [0070]; He et al., An output layer of at least one of the neural networks can be connected to the respective hidden layer(s) of one or more other(s) of the neural networks. In some examples, the neural networks can be operated to provide transformed feature value(s) for respective times, abstract, The computing device operates a plurality of neural networks to provide an estimated value of a target feature based at least in part on the feature values, wherein each of the neural networks corresponds to a respective relative time period and includes a respective hidden layer communicatively connected with the hidden layer of another of the neural networks having a later relative time period. The computing device determines an error value of the estimated value of the target feature based at least in part on a corresponding training value, and trains the plurality of neural networks based at least in part on the error value and the feature values having times in the corresponding relative time period. The training includes adjusting parameters of the respective hidden layers of at least two of the neural networks, [0002], Individual ones of the neural networks 402(1)-402(W) have respective sets 404(1)-404(W) of one or more input layers (individually or collectively referred to herein as “input layers” with reference 404). As shown, individual ones of the neural networks 402 also have respective sets 406(1)-406(W) of one or more hidden layers (individually or collectively referred to herein as “hidden layers” with reference 406), and respective sets 408(1)-408(W) of one or more output layers (individually or collectively referred to herein as “output layers” with reference 408). In some examples, one or more of the neural networks 402, or one or more of the layers or sets of layers 404, 406, or 408, can be combined into combination neural networks, layers, or sets of layers. As used herein, the term “neural network” encompasses connected, independently-operable subnetworks of a larger neural network. In some of the examples, the neural networks 402 have respective, different neuron parameters of the respective input layers 404 and respective, different neuron parameters of the respective hidden layers 406, [0079]) however it is not clear the features referred to are necessarily in the form of feature vectors.

Georgescu et al. teach the feature learned by the first DNN comprises a feature vector output by a hidden layer of the first DNN and is used as training input for the second DNN (“In a possible implementation, the second deep neural network may be a discriminative deep neural network that inputs image patches of an image corresponding to the hypotheses in the position-orientation search space and for each image patch calculates a probability that the image patch is the object of interest” “The second deep neural network (either discriminative or regressor) can be trained in two stages of unsupervised pre-training of the hidden layers (e.g., using stacked DAE) for learning complex features from input image patches corresponding to the position-orientation hypotheses and supervised training of the output layer based on the features extracted by the hidden layers and the position-orientation hypotheses. Accordingly, the second deep neural network is trained based only on the position-orientation hypotheses that are generated from the position candidates detected using the first trained deep neural network” [0056], “Embodiments of the present invention utilize deep learning for 3D anatomical landmark detection. Embodiments of the present invention provide significantly accelerated detection speed, resulting in an efficient method that can detect an anatomical landmark in less than one second. Embodiments of the present invention utilize apply a two-stage classification strategy. In the first stage, a shallow network is trained with only one small hidden layer (e.g., with 64 hidden nodes). This network is applied to test all voxels in the volume in a sliding-window process to generate a number of candidates (e.g., 2000) for the second stage of classification. The second network is much bigger. In exemplary embodiment, the second network is a deep neural network that has three hidden layers, each with 2000 hidden nodes to obtain more discriminative power”, [0116]).

Matsuda et al. and Khoury et al. and Georgescu et al. are in the same art of neural networks (Matsuda et al., [0001]]; Georgescu et al., [0056]). The combination of Georgescu et al. with Matsuda et al., He et al., and Dong et al., will enable the use of a feature vector. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the feature vectors of Georgescu et al. with the invention of Matsuda et al., He et al., and Dong et al. as this was known at the time of filing, the combination would have predictable results, and as Georgescu et al. indicate “Embodiments of the present invention provide significantly accelerated detection speed, resulting in an efficient method that can detect an anatomical landmark in less than one second” ([0116]) implying an improvement to robustness when the inventions are combined. 

Claim(s) 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) and He et al. (US 20160379112 A1) and Dong et al. (“DNNMark: A Deep Neural Network Benchmark Suite for GPUs” February, 2017) as applied to claims 1, 10 and 16 above, further in view of Chakraborty et al. (US 20130093776 A1).

Regarding claim 5, Matsuda et al., He et al., and Dong et al. disclose the data processing system of claim 1. Matsuda et al., He et al., and Dong et al. do not disclose the graphics execution environment is a virtualized environment.

Chakraborty et al. teach the graphics execution environment is a virtualized environment (The host partition 1020 can be configured to provide remote desktop virtual graphics management (RDVGM) functions to the virtual machines 1011, [0079]).

The combination of Chakraborty et al. with Matsuda et al., He et al., and Dong et al., will enable the use of a virtualized environment. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the VMs of Chakraborty et al. with the invention of Matsuda et al., He et al., and Dong et al. as this was known at the time of filing, the combination would have predictable results, and as Chakraborty et al. indicate “Systems, methods, and computer readable media are disclosed for optimizing the processing of data, such as graphics data, received from clients in a remote computing system environment. Compared to current architectures, such optimization includes a reduction in usage of memory and CPU resources hosted and a reduction in data delivery latency to the clients” ([0005]), indicating the computational advantages to using the GPGPUs in the invention of Matsuda et al., He et al., and Dong et al..

Regarding claim 6, Matsuda et al., He et al., Dong et al., disclose the data processing system of claim 1. Matsuda et al., He et al., Dong et al. do not disclose the one or more general-purpose graphics processors are configurable into partitions and the graphics execution environment is to execute as a virtualized environment by one or more partitions of the general-purpose graphics processors.

Chakraborty et al. teach the one or more general-purpose graphics processors are configurable into partitions and the graphics execution environment is to execute as a virtualized environment by one or more partitions of the general-purpose graphics processors (“The host partition 1020 can be configured to provide remote desktop virtual graphics management (RDVGM) functions to the virtual machines 1011. The RDVGM can manage resource assignment and process control between physical resources of the remote computer server 1000 and vGPU 1016 resource assignment into each virtual machine guest operating system (OS) 1014. The RDVGM functions can include: managing the rendering, capturing, and compressing (RCC) processes, assigning GPU 112 resources to virtual machines 1011 through the vGPU 1016, assigning resource policies to virtual machines 1011, and load-balancing GPU 112 resources across multiple virtual machines 1011(A-N). The RDVGM can also assign appropriate GPU 112 resources to virtual machines 1011(A-N) at boot time”, [0079]).

The combination of Chakraborty et al. with Matsuda et al., He et al., and Dong et al., will enable the use of a virtualized environment. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the VMs of Chakraborty et al. with the invention of Matsuda et al., He et al., and Dong et al. as this was known at the time of filing, the combination would have predictable results, and as Chakraborty et al. indicate “Systems, methods, and computer readable media are disclosed for optimizing the processing of data, such as graphics data, received from clients in a remote computing system environment. Compared to current architectures, such optimization includes a reduction in usage of memory and CPU resources hosted and a reduction in data delivery latency to the clients” ([0005]), indicating the computational advantages to using the GPGPUs in the invention of Matsuda et al., He et al., and Dong et al..

Regarding claim 7, Matsuda et al., He et al., Dong et al., disclose the data processing system of claim 1. Dong et al. partly teach a network interface to enable communication with an external system, the external system including one or more general-purpose graphics processors; and wherein training the second DNN model for computer vision via the deep learning framework includes interfacing with an instance of the deep learning framework on the external system and training the second DNN model via the one or more general-purpose graphics processors of the external system (As indicated in Figure 6, we first build up the DNNMark library libdnnmark with external CUDA libraries, The Config Parser is in charge of interpreting information from the external configuration files and adding layers based on the configuration parameters to the layer pool, and a data structure that holds all of the configured layers supporting various DNN primitive types, part 4).

Chakraborty et al. teach a network interface to enable communication with an external system, the external system including one or more general-purpose graphics processors; and wherein training the second DNN model for computer vision via the deep learning framework includes interfacing with an instance of the deep learning framework on the external system and training the second DNN model via the one or more general-purpose graphics processors of the external system (Remote computing systems may enable users to access resources hosted by the remote computing systems. Servers on the remote computing systems can execute programs and transmit signals indicative of a user interface to clients that can connect by sending signals over a network conforming to a communication protocol such as TCP/IP, UDP, or other protocols, [0001], When used in a LAN networking environment, the computer 100 can be connected to the LAN 51 through a network interface controller (NIC) 114 or adapter, [0033], Each host partition can be configured as a graphics server 1020 that has access to physical GPU resources of the remote computer server 1000. Each host partition can also include management components for graphics rendering, capturing, and encoding. Each host partition can also include device drivers 1026 that provide an interface to physical GPUs 112 and to host-based encoders, such as ASICS (not shown). The device drivers 1026 can include GPU, CPU, and encoder specific drivers, [0075]).

As Matsuda and He taught a second DNN model (Matsuda, second DNN, [0048], He et al., second DNN, [0092]), together Matsuda, He, Dong and Chakraborty et al. teach a network interface to enable communication with an external system, the external system including one or more general-purpose graphics processors; and wherein training the second DNN model for computer vision via the deep learning framework includes interfacing with an instance of the deep learning framework on the external system and training the second DNN model via the one or more general-purpose graphics processors of the external system.

The combination of Chakraborty et al. with Matsuda et al., He et al., and Dong et al., will enable the use of a virtualized environment. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the VMs of Chakraborty et al. with the invention of Matsuda et al., He et al., and Dong et al. as this was known at the time of filing, the combination would have predictable results, and as Chakraborty et al. indicate “Systems, methods, and computer readable media are disclosed for optimizing the processing of data, such as graphics data, received from clients in a remote computing system environment. Compared to current architectures, such optimization includes a reduction in usage of memory and CPU resources hosted and a reduction in data delivery latency to the clients” ([0005]), indicating the computational advantages to using the GPGPUs in the invention of Matsuda et al., He et al., and Dong et al..

Claim(s) 8 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) and He et al. (US 20160379112 A1) and Dong et al. (“DNNMark: A Deep Neural Network Benchmark Suite for GPUs” February, 2017) as applied to claims 1, 10 and 16 above, further in view of Chan et al. (US 20170301109 A1).

Regarding claims 8 and 13, Matsuda et al., He et al., Dong et al. disclose the data processing system and method of claims 7 and 12. Matsuda and He et al. partly teach training the second DNN model includes training the second DNN model to perform computer vision operations for autonomous navigation (Matsuda, second DNN, [0048], He et al., satellite-based navigation system devices, [0020], second DNN, [0092]) however another reference is added to make this more explicit.

Chan et al. teach training the DNN model includes training the DNN model to perform computer vision operations for autonomous navigation (Systems and methods described herein incorporate autonomous navigation using a vision-based guidance system. The vision-based guidance system enables autonomous trajectory planning and motion execution by the described systems and methods without feedback or communication with external operators, abstract, In some embodiments, the vision-based guidance system 150 can automatically detect the object of interest in the sequence of images. In some embodiments, the vision-based guidance system 150 can apply background subtraction performed on registered images or can use specialized object detectors. In other embodiments, the vision-based guidance system 150 can utilize a convolutional neural network to detect the object of interest in one or more images, [0057], In various embodiments, the tracking confidence score can include a score output by the discriminative learning-based tracking algorithm. For example, the tracking confidence score can include raw scores output by a support virtual machine (SVM) or a rate of change of the scores output by a SVM, [0068]).

As Matsuda and He taught a second DNN model (Matsuda, second DNN, [0048], He et al., second DNN, [0092]), together Matsuda, He, Dong and Chan et al. teach training the second DNN model to perform computer vision operations for autonomous navigation.

The combination of Chan et al. with Matsuda et al., He et al., and Dong et al., will enable the use of autonomous navigation. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the navigation of Chan et al. with the invention of Matsuda et al., He et al., and Dong et al. as this was known at the time of filing, the combination would have predictable results, and as Chan et al. indicate “The vision-based guidance system enables autonomous trajectory planning and motion execution by the described systems and methods without feedback or communication with external operators,” (abstract) thereby indicating a commercial benefit to the combination of inventions.

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Matsuda et al. (US 20160110642 A1) and He et al. (US 20160379112 A1) and Dong et al. (“DNNMark: A Deep Neural Network Benchmark Suite for GPUs” February, 2017) as applied to claims 1, 10 and 16 above, further in view of Chang et al. (US 9658861 B2).

Regarding claim 9, Matsuda et al., He et al., Dong et al. disclose the data processing system of claim 1. Matsuda et al., He et al., Dong et al., do not disclose the one or more general-purpose graphics processors include multiple general-purpose graphics processors, the multiple general-purpose graphics processors interconnected via peer-to-peer links between the multiple general-purpose graphics processors.

Chang et al. teach the one or more general-purpose graphics processors include multiple general-purpose graphics processors, the multiple general-purpose graphics processors interconnected via peer-to-peer links between the multiple general-purpose graphics processors (“This disclosure is directed to assignment of a boot strap processor (BSP) to a processing core in a multi-core processor. The multi-core processor may include many cores, which may be central processing units (CPUs), graphical processing units (GPUs), general processing graphical processing units (GPGPUs), other processing logic, or a combination thereof. The various cores may be in communication with each other and/or an initialization core via an interconnect. The interconnect may be arranged as a mesh interconnect, a shared interconnect, a peer-to-peer (P2P) interconnect, or a ring interconnect”, col. 2, lines 15-25)

The combination of Chang et al. with Matsuda et al., He et al., and Dong et al., will enable the use of multiple general-purpose graphics processors. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the processors of Chang et al. with the invention of Matsuda et al., He et al., and Dong et al. as this was known at the time of filing, the combination would have predictable results, and as Chang et al. indicate “Until recently, computing devices typically included a single processing unit for each socket available on the computing device's main circuit board. More recently, the single processing unit has been improved to include multiple cores, which enable the processor to execute instructions in parallel using the various cores. An ability to include additional processors or cores on a chip becomes more readily available as the footprint of the cores continues to decrease through advancements in manufacturing” (col. 1, lines 20-30) demonstrating a computational efficiency benefit to combining inventions. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US 20170206892 A1 (The input network component and the speaker-specific adaptive model component (together referred to as the “first stage”) primarily act as a feature extractor, to provide input for the speaker-adaptive DNN (“second stage”). The number of neurons in the hidden layers of the first stage, and particularly the number of neurons in the adaptive model component, can be much smaller than the dimension of the hidden layers in the speaker-adaptive DNN (second-stage DNN). This means that, there are fewer parameters for estimation and can be very helpful for online recognition (e.g. during recognition of the test speaker, the system can be tuned to perform better, using as little as one minute of speech data from the test speaker) The resulting output of the bottleneck layer 19i is combined with the succeeding five frames and five preceding frames for the same training speaker to form a feature vector 26. This is input to the each neuron of the first layer 32a of the stage-2 DNN 32.) US 20160210551 A1 (In accordance with another embodiment, there is provided an apparatus for recognizing a language, the apparatus includes an input data preprocessor configured to generate a first input feature vector sequence and a second input feature vector sequence from input data; and an input data recognizer configured to perform forward estimation using the first input feature vector sequence based on first hidden layers of a neural network, and perform backward estimation using the second input feature vector sequence based on second hidden layers of the neural network, wherein the first hidden layers are separate from the second hidden layers.)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI HAUSMANN/Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Show 9 earlier events
Feb 03, 2025
Response Filed
Oct 01, 2025
Final Rejection mailed — §103
Dec 01, 2025
Response after Non-Final Action
Dec 18, 2025
Non-Final Rejection mailed — §103
Mar 11, 2026
Response Filed
Apr 03, 2026
Final Rejection mailed — §103
May 22, 2026
Applicant Interview (Telephonic)
May 22, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/267,598
Patent 12638400
Method for monitoring and/or controlling phase separation in chemical processes and samples
2y 11m to grant Granted May 26, 2026
18/348,495
Patent 12639803
SYSTEMS AND METHODS FOR MATERIAL ACCRETION DETECTION AND REMOVAL
2y 10m to grant Granted May 26, 2026
18/136,006
Patent 12629121
METHOD OF DETERMINING VESSEL FLUID FLOW VELOCITY
3y 1m to grant Granted May 19, 2026
18/034,833
Patent 12626375
HOMOGRAPHY MATRIX GENERATION APPARATUS, CONTROL METHOD, AND COMPUTER-READABLE MEDIUM
3y 0m to grant Granted May 12, 2026
18/179,635
Patent 12620252
INFORMATION SOURCE DETECTION USING UNIQUE WATERMARKS
3y 2m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

7-8
Expected OA Rounds
76%
Grant Probability
98%
With Interview (+21.3%)
3y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 870 resolved cases by this examiner. Grant probability derived from career allowance rate.