Last updated: May 29, 2026
Application No. 17/029,725
TRANSFER LEARNING FOR NEURAL NETWORKS

Final Rejection §103
Filed
Sep 23, 2020
Priority
Sep 25, 2019 — provisional 62/906,054
Examiner
TRAN, AMY NMN
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
6 (Final)
Interview Optional

— +47.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 37% grant rate with +47.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 30 resolved cases, 2023–2026
Examiner Intelligence

TRAN, AMY NMN View full profile →
Grants only 37% of cases
Career Allowance Rate
11 granted / 30 resolved
-18.3% vs TC avg
Strong +47% interview lift
Without
With
+47.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 9m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
91.8%
+51.8% vs TC avg
§102
1.1%
-38.9% vs TC avg
§112
5.5%
-34.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 30 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 03/11/2026 has been entered. The status of the claims is as follows:
Claims 1-20 are pending in the application.
Claims 1, 10 and 16 are amended.
Response to Arguments
In reference to the rejections under 35 USC § 103:
Argument 1:
Applicant asserts in Remarks pg. 10-11 that the cited references, taken alone or in hypothetical combination, at least fail to teach or suggest “determining the type of inference, which the machine learning model is pre-trained to perform, excludes one or more target classes for an application associated with the client device and to be used with the machine learning model”.  Applicant further stated that the cited Wierzynsky only allegedly discloses target classes, however, and is silent toward target classes for an application associated with the client device and to be used with the machine learning model, as recited in claim 1.
Response to Argument 1:
Applicant’s argument is not persuasive. The reference Wierzynsky discloses an AI application 402 associated with a client device, such as smartphone 502, that uses a neural network/ machine learning model to classify input data into recognized scene categories. In particular, ¶[0058] discloses that the AI application performs detection and recognition of a scene and configures device components differently depending on whether the recognized scene is an office, lecture hall, restaurant, or outdoor setting. ¶[0060] further discloses that, during runtime operation of the AI application on smartphone 502, image data is processed by a classify application and a deep neural network to detect and classify scenes based on visual input. These recognized scene categories correspond to the claimed classes for the application associated with the client device. Further, ¶[0061] and ¶[0066-0067] discloses that the machine learning model/ neural network receives data and generates labels/ classes based on training data. According to ¶[0074], the “target classes” is being interpreted as the specific car model categories that the neural network is trained to classify, such as “car model A” or a newly added class for “car model B”. Wierzynsky expressly states that a specific car model is associated with a specific label/ class, and ¶[0076-0077] further discloses adding a new class to account for a new car model.  Thus, the cited reference Wierzynsky teaches or at least reasonably suggests target classes for an application associated with a client device and used with the machine learning model.
Argument 2:
Applicant asserts in Remark pg. 12-13 that the cited references, taken alone or in hypothetical combination, at least fail to teach or suggest “updating, after the pruning, one or more parameters of the pruned machine learning model using a second portion of the set of additional training data associated with the one or more target classes determined to be excluded from the type of inference, wherein the second portion is separate from the first portion of the set of additional training data used to update the one or more parameters before the pruning”. Applicant further asserts that Wierzynsky only allegedly discloses updating using a second portion at the same as updating using a first portion, however, and is silent toward updating using a second portion after updating using a first portion, as recited by the claims.
Response to Argument 2:
Applicant’s argument is not persuasive because Fig. 1 of the cited reference Dai (See pg. 2 in Dai) teaches updating the pruned machine learning model after pruning using additional/ new training data. In particular, Fig. 1 shows a base network undergoing growth, followed by recoverable pruning, to produce a model. The figure then shows “new data” being provided to “growth on new data”, with the output fed back to the model. Thus, after the pruning step, the model is further grown/ updated using new data, which reasonably teaches updating parameters of the pruned machine learning model using second portion of additional training data. Further, because the framework is an incremental learning framework, the new data corresponds to additional training data used to adapt the model to newly encounter or excluded classes, including target classes not previously retained for the type of inference. Therefore, the cited references teaches or at least reasonably suggests the claimed post-pruning updating step. To the extend that the Applicant contends that Dai does not expressly disclose “target classes determined to be excluded from the type of inference”, Wierzynsky has already been cited to teach target classes being excluded from the type of inference. Therefore it would have been reasonable under 35 U.S.C 103 to combine Dai’s post pruning model update using additional data with Wierzynsky’s teaching of target classes being excluded from the type of inference, because both references relate to updating neural network model based on additional classes or training data in the incremental learning framework. Accordingly, the combined teachings render the claimed limitation obvious.
Argument 3:
Applicant asserts in Remarks pg. 19-21 that the cited references, taken alone or in hypothetical combination, at least fail to teach or suggest "providing, in response to a second request for additional training of the at least one pre-trained model corresponding to an application to be used with the at least one pre-trained model" as recited in claim 16. The Applicant states that Anjaneyapura only allegedly discloses receiving a second request for additional training, however, and fails to disclose a second request for additional training corresponding to an application to be used with the at least one pre-trained model, as recited in claim 16.
Response to Argument 3:
Applicant’s argument is not persuasive. Examiner respectfully notes that Anjaneyapura was not cited to teach this limitation “providing, in response to a second request for additional training of the at least one pre-trained model corresponding to an application to be used with the at least one pre-trained model”. Examiner cited Wierzynsky to teach this limitation in pg. 40-41 of the previous office action mailed 12/29/2025. Examiner notes that the cited paragraphs of Wierzynsky in the mapping below teach or suggest the claimed additional training of a pre-trained model. Specifically, ¶[0033] discloses that a neural network may already be trained on a labeled training set to classify objects from an input, such as identifying different types of cars, thereby teaching a pre-trained model corresponding to a classification application. Paragraph ¶[0033] further explains that, after the network has been trained, it may be desirable to add new classes and/or modify boundaries of existing classes. Paragraph ¶[0070] then discloses that the original training set may be augmented with additional data and labels to perform incremental learning, which teaches providing additional training to the previously trained model. Therefore, a subsequent request to add new classes or modify class boundaries for the classification application would reasonably correspond to a second request for additional training of the pre-trained model used with that application.
Argument 4: 
Applicant asserts in Remarks pg. 19-21 that the cited references, taken alone or in hypothetical combination, at least fail to teach or suggest “wherein the one or more classes, for the application, are separate from and in addition to one or more original classes of the pre-trained model”. Applicant further states that Wierzynsky only allegedly discloses one or more classes, however, and fails to disclose one or more classes for the application to be used with the pre-trained model, as recited in claim 16.
Response to Argument 4:
Applicant’s argument is not persuasive. Examiner has already explained above how Wierzynsky teaches “one or more classes for the application to be used with the pre-trained model” in Argument 1. The reference Wierzynsky discloses an AI application 402 associated with a client device, such as smartphone 502, that uses a neural network/ machine learning model to classify input data into recognized scene categories. In particular, ¶[0058] discloses that the AI application performs detection and recognition of a scene and configures device components differently depending on whether the recognized scene is an office, lecture hall, restaurant, or outdoor setting. ¶[0060] further discloses that, during runtime operation of the AI application on smartphone 502, image data is processed by a classify application and a deep neural network to detect and classify scenes based on visual input. These recognized scene categories correspond to the claimed classes for the application associated with the client device. Further, ¶[0061] and ¶[0066-0067] discloses that the machine learning model/ neural network receives data and generates labels/ classes based on training data.
Claim Rejections - 35 USC § 103 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 7, 9, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wierzynsky (US 2017/0024641 A1) in view of  Anjaneyapura et al. (WO 2019/104149 A1) (hereafter referred to as “Anjaneyapura”), Dai et al. (“Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks”) (hereafter referred to as “Dai”) and in further view of Nagaraju et al. (US 2018/0032915 A1)
Regarding Claim 1, Wierzynsky explicitly discloses:
wherein the stored machine learning model is pre-trained to perform a type of inference for one or more original classes; (Wierzynsky, ¶[0011]: “In one configuration, the first network has been previously trained on first labels for first data.”, ¶[0033]: “Neural networks may be trained on a training set that includes labels and corresponding data to classify objects from an input. For example, a first neural network may be trained on labeled images of cars to identify different types of cars. In some cases, it may be desirable to add new classes and/or modify the boundaries of existing classes after a network has been trained.”)
determining the type of inference, which the machine learning model is pre-trained to perform, excludes one or more target classes for an application associated with the [[a]] client device and to be used with the machine learning model; (Wierzynsky, ¶[0060]: “FIG. 5 is a block diagram illustrating the run-time
operation 500 of an AI application on a smartphone 502. The AI application may include a pre-process module 504 that may be configured (using for example, the JAVA programming language) to convert the format of an image 506 and then crop and/or resize the image 508. The pre-processed image may then be communicated to a classify application 510 that contains a SceneDetect Backend Engine 512 that may be configured (using for example, the C programming language) to detect and classify scenes based on visual input.”, ¶[0061]: “In one configuration, a model, such as a machine learning model, is configured for receiving second data that may be unlabeled. Additionally, the model may be configured to generate via a first network, second labels for the second data. Moreover, the first network may have been previously trained on first labels for first data. Furthermore, the model may be configured to train a second network on the second data and the second labels. It should be noted the first network and the second network may be defined on the same device or may be defined on different devices.”, ¶[0066]: “For example, the back propagation may use labeled images of cars to train a neural network to identify different car models.”, ¶[0074]: “For example, a specific car model, such as car model A, may be associated with a specific lgabel ( e.g., class). That is, images of car model A may be labeled as car model A. In this example, the specific car model may receive an update to its design, such as an update to the tail light design. Still, the second training set D' may not include labeled data for the updated car design. Therefore, the boundary of the existing class may be modified to account for the updated design so that the network still labels the updated car model as the specific car model. That is, in this example, the boundary of the car model A class is modified to categorize car model A with the updated tail light design as car model A rather than incorrectly categorizing the car model A with the updated tail light.”) [Examiner’s  note: the type of inference which the machine learning model is pre-trained to perform i.e., identifying car models, “excludes one or more target classes” is being interpreted as an update to the tail light design of car model A which is not included in the labeled data (i.e., the pre-trained model)]
updating, within a container executing on the [[a]] client device associated with the request, one or more parameters of the pre-trained machine learning model using a first portion of a set of additional training data associated with the one or more target classes determined to be excluded from the type of inference; (Wierzynsky, [0046]: “The weights may then be adjusted so as to reduce the error. This manner of adjusting the weights may be referred to as "back propagation" as it involves a "backward pass" through the neural network.”, ¶[0066]: “Machine learning networks, such as neural networks may be trained to classify items from an input, such as an image input and/or an audio input. In some cases, the neural network is trained via back propagation on labeled data. For example, the back propagation may use labeled images of cars to train a neural network to identify different car models.”, ¶[0070]: “In one configuration, the original training set may be augmented with additional data and labels to perform the incremental learning. That is, for incremental learning, it is desirable to augment the original training set with a new training set to avoid forgetting the classifications of the original training set. Incremental learning is not limited to augmenting classes or modifying the boundaries of existing classes as other incremental learning functions are also contemplated.”) [Examiner’s note: “updating one or more parameters of the machine learning model” is being interpreted as “the neural network is trained via back propagation”, “first portion set of the additional training data” i.e., augmenting the original training set with a new training set, “target classes determined to be excluded from the type of inference” is being interpreted as the process of back propagation using labeled images of cars to train a neural network to identify different car models]
using a second portion of the set of additional training data associated with the one or more target classes determined to be excluded from the type of inference, wherein the second portion is separate from the first portion of the set of additional training data used to update the one or more parameters before the pruning; (Wierzynsky, ¶[0070]: “In one configuration, the original training set may be augmented with additional data and labels to perform the incremental learning. That is, for incremental learning, it is desirable to augment the original training set with a new training set to avoid forgetting the classifications of the original training set.”, ¶[0071]: “In one configuration, when the first training set D is no longer available after training the first neural network F, a second neural network F' is specified to approximate the first neural network F. Specifically, when the first training set D is no longer available, the first neural network F may be applied to second data x'i, that does not include second labels y'i. In one configuration, the second data x'i, is substantially similar or identical to the first data x of the first training set D. Alternatively, the second data xi', may not be related to the first data x.”) [Examiner’s note: “a second portion of the set of additional training data” is being interpreted as the second data xi’, which is not related to (i.e., separate from) the first data x (i.e., the first portion set of additional training data)]
exporting the trained machine learning model for use in performing the type of inference for the one or more target classes represented in the set of additional training data, wherein the one or more target classes are separate from and in addition to the one or more original classes for the machine learning model. (Wierzynsky, ¶[0048]: “After learning, the DCN may be presented with new images 326 and a forward pass through the network may yield an output 328 that may be considered an inference or a prediction of the DCN.”, ¶[0033]: “Neural networks may be trained on a training set that includes labels and corresponding data to classify objects from an input. For example, a first neural network may be trained on labeled images of cars to identify different types of cars. In some cases, it may be desirable to add new classes and/or modify the boundaries of existing classes after a network has been trained.”) [Examiner’s note: Wierzynsky discloses performing inference or prediction with new images, which aligns with the concept of performing the type of inference for one or more target classes separated from the original classes.]
Wierzynsky fails to disclose:
providing, from an edge server to a client device in response to a request, a stored machine learning model obtained from a provider environment,
within a container on the [[a]] client device configured to perform the application
pruning, after the updating using the first portion of the set of additional training data, the machine learning model after the updating to form a trained machine learning model;
updating, after the pruning, one or more parameters of the pruned machine learning model
determining that the trained machine learning model, after the pruning and the updating using the second portion of the set of additional training data separate from the first portion of the set of additional training data, satisfies a specified accuracy criterion;
	However, Nagaraju explicitly discloses:
providing, from an edge server to a client device in response to a request, a stored machine learning model obtained from a provider environment, (Nagaraju, ¶[0061]: “The edge devices 12 discussed above can represent a broader category of computing devices commonly referred to as "client devices," which can each be operated under the control of a user. For example, FIG. 1 shows a client device 26 that can communicate with the components of the system 10 (e.g., the edge devices 12 or the server computer system 14) to receive or exchange information over the network 16. For example, a communication between the client device 26 and the components of the system 10 can include sending various requests and receiving data packets”, ¶[0062]: “the client device 26 or applications 28 running on the client device 26 may initiate communications with applications running on the edge devices 12 or the server computer system 14 to request specific content (e.g., edge data), and the applications at the edge devices 12 or the server computer system 14 may respond with the requested content stored in one or more data packets. Hence, the components of the system 10 can also represent a broader category of computing devices referred to as "host devices," which can host each other.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Nagaraju. Wierzynsky discloses a method of transfer learning in neural networks. Nagaraju discloses transmitting machine learning models to edge devices. One of ordinary skill would have motivation to combine Wierzynsky and Nagaraju because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
However, Anjaneyapura explicitly discloses:
within a container on the [[a]] client device configured to perform the application (Anjaneyapura, [0020]: “In some embodiments, users can create or utilize relatively simple containers adhering to a specification of a provider network, where the containers include code for how a machine learning model is to be trained and/or executed”, [0029]: “The user devices 102 can interact with the model training system 120 via frontend 129 of the model training system 120. For example, a user device 102 can provide a training request to the frontend 129 that includes a container image ( or multiple container images, or an identifier of one or multiple locations where container images are stored)”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Anjaneyapura. Wierzynsky discloses a method of transfer learning in neural networks. Anjaneyapura teaches techniques for packaging and deploying algorithms utilizing containers for training flexible machine learning to perform inference tasks. One of ordinary skill would have motivation to combine Wierzynsky and Anjaneyapura because the edge servers have the benefit of low latency as they are located closer to the end-users or devices, applying edge servers in training helps reducing the time it takes for data to travel between the server and the users (Anjaneyapura, ¶[00117])
However, Dai explicitly discloses:
pruning, after the updating using the first portion of the set of additional training data, the machine learning model after the updating to form a trained machine learning model;(Dai, Page 5, Figure 3: 
    PNG
    media_image1.png
    389
    1060
    media_image1.png
    Greyscale
, Page 5, Col. 2, Section 4.3.1: “In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration.”)
updating, after the pruning, one or more parameters of the pruned machine learning model (Dai, Pg. 2, Fig. 1: 
    PNG
    media_image2.png
    342
    802
    media_image2.png
    Greyscale
) [Examiner’s note: Fig. 1 shows a base network undergoing growth, followed by recoverable pruning, to produce a model. The figure then shows “new data” being provided to “growth on new data”, with the output fed back to the model. Thus, after the pruning step, the model is further grown/ updated using new data, which reasonably teaches updating parameters of the pruned machine learning model using second portion of additional training data. Further, because the framework is an incremental learning framework, the new data corresponds to additional training data used to adapt the model to newly encounter or excluded classes, including target classes not previously retained for the type of inference.]
determining that the trained machine learning model, after the pruning and the updating using the second portion of the set of additional training data separate from the first portion of the set of additional training data, satisfies a specified accuracy criterion; (Dai, Pg. 5, Col. 2, Section 4.3.1: “In the pruning process, we remove a connection w by setting its value as well as the value of its corresponding mask to 0 if and only if the following condition is satisfied: 
    PNG
    media_image3.png
    37
    384
    media_image3.png
    Greyscale
 where β is a pre-defined pruning ratio. Typically, we use 3                         
                            ≤
                             
                            β
                            ≥
                            5
                        
                     in our experiments. Note that connection pruning is an iterative process. In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Dai. Wierzynsky teaches a method of transfer learning in neural networks. Dai teaches an incremental learning framework based on a grow-and-prune neural network synthesis paradigm. One of ordinary skill would have motivation to combine Wierzynsky and Dai to produce a smaller model for the fine-tune process because a pruned model is smaller and can be fine-tuned more efficiently on new tasks, allowing for faster convergence and reduced training time (Dai, Page 5, Col. 2, Section 4.3.1: “Thus, we prune away redundant connections for compactness and to ensure efficient inference after the growth phase.”)
	Regarding Claim 2, the combination of Wierzynsky, Dai, Nagaraju and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
	Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
re-training the trained machine learning model after the pruning to increase an accuracy of the trained machine learning model. (Dai, Page 5, Col. 1, ¶[1]: “To reach the same target accuracy of 98.67%, our proposed method only requires 15 and 20 training epochs first on new data and then on all data, respectively.”, Page 5, Col. 2, Section 4.3.1, ¶[2]: “In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration.”) 
Regarding Claim 3, the combination of Wierzynsky, Nagaraju, Dai and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
wherein the. (Dai, Page 7, Col. 1, ¶[2]: “We split the training set (with 55K images) randomly into ten different parts of equal size. In the incremental learning experiments, we start with one part to train the initial model for subsequent updates. We then add one part as new data each time in the incremental learning scenario. For each update, we perform growth on new data and all data for 15 epochs and 20 epochs in the growth phase, respectively. Then, we prune the post-growth network for compactness.”, Page 7, Col. 2, ¶[1]: “whenever a pre-trained model with existing knowledge is available, our incremental learning approach always produces reduced training cost due to its capability of preserving existing knowledge effectively and distilling knowledge from new data efficiently”, and Page 2, Col. 1, ¶[1]: “we employ a pruning phase to remove redundant parameters to obtain a compact inference model.”) [Examiner’s note: The dataset used for pre-train step i.e., the training set with 55k images, additional training dataset i.e., adds one part as a new data each time]
Regarding Claim 4, the combination of Wierzynsky, Nagaraju, Dai and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
	Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
wherein the type of inference includes at least one of classification, object detection, image segmentation, or medical image diagnostics. (Wierzynsky, ¶[0066]: “Machine learning networks, such as neural networks may be trained to classify items from an input, such as an image input and/or an audio input. In some cases, the neural network is trained via back propagation on labeled data. For example, the back propagation may use labeled images of cars to train a neural network to identify different car models.”)
Regarding Claim 7, the combination of Wierzynsky, Nagaraju, Dai and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
	Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
performing augmentation of the additional training data before performing the updating before the pruning of the machine learning model, (Dai, Page 7, Col. 1, ¶[2]: “We split the training set (with 55K images) randomly into ten different parts of equal size. In the incremental learning experiments, we start with one part to train the initial model for subsequent updates. We then add one part as new data each time in the incremental learning scenario. For each update, we perform growth on new data and all data for 15 epochs and 20 epochs in the growth phase, respectively. Then, we prune the post-growth network for compactness.”) [Examiner’s note: performing augmentation of the additional training data i.e., add one part as new data each time]
the augmentation increasing an amount of additional training data through adjustment of at least one of orientation, color, resolution, or noise. (Wierzynsky, ¶[0060]: “The SceneDetect Backend Engine 512 may be configured to further preprocess 514 the image by scaling 516 and cropping 518. For example, the image may be scaled and cropped so that the resulting image is 224 pixels by 224 pixels. These dimensions may map to the input dimensions
of a neural network. The neural network may be configured by a deep neural network block 520 to cause various processing blocks of the SOC 100 to further process the image pixels with a deep neural network.”)
Regarding Claim 9, the combination of Wierzynsky, Nagaraju, Dai and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
	Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
wherein the toolkit is provided in container for execution on the client device. (Anjaneyapura, [00123]: “In some embodiments, the machine learning models may be "custom" algorithms developed by users, and/or use custom code to train using existing algorithms such as deep learning frameworks (e.g., TensorFlow, Apache MXNet, etc.).”, [00125]: “Accordingly, in some embodiments, the training and/or hosting of machine learning models can be performed without needing significant knowledge on the part of users as to how these models are to be trained or used. For example, in some embodiments users can select or create a container including machine learning related code - potentially using any language(s)/package(s) that the user desires”) [Examiner’s note: the toolkit is interpreted as the TensorFlow as it contains the custom algorithms and it is included in a container created by the users]
Regarding Claim 17, the combination of Wierzynsky, Nagaraju and Anjaneyapura discloses all the limitations of Claim 16 (as shown in the rejections above).
	Wierzynsky in view of Anjaneyapura and Nagaraju  fail to disclose:
pruning each machine learning model after the additional training is performed; and
	However, Dai explicitly discloses:
pruning each machine learning model after the additional training is performed; and (Dai, Page 2, Col. 1, ¶[1]: “We first grow and prune a model with the initial data. When new data arrive, the network undergoes a growth phase (first, based on new data and then on all available data) that increases its size to accommodate new data and knowledge. Then, we employ a pruning phase to remove redundant parameters to obtain a compact inference model”)
retraining the pruned machine learning models using the additional training data. (Dai, Page 4, Col. 2, Section 4.2.2, ¶[1]: “To reduce the training cost of a model update, we introduce a mechanism to speed up the growth phase. Specifically, we first employ connection growth and parameter training only on the previously unseen data for a pre-defined number of epochs whenever new data become available. Then, we merge the new data with all the previously available training data, and perform growth and training on all existing data.”) [Examiner’s note: a portion of the additional training data that was omitted from the first portion of additional training data i.e., all existing data]
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky, Anjaneyapura, Nagaraju and Dai. Wierzynsky teaches a method of transfer learning in neural networks. Dai teaches an incremental learning framework based on a grow-and-prune neural network synthesis paradigm. Nagaraju discloses transmitting machine learning models to edge devices. Anjaneyapura teaches techniques for packaging and deploying algorithms utilizing containers for training flexible machine learning to perform inference tasks. One of ordinary skill would have motivation to combine Wierzynsky, Anjaneyapura, Nagaraju and Dai to produce a smaller model for the fine-tune process because a pruned model is smaller and can be fine-tuned more efficiently on new tasks, allowing for faster convergence and reduced training time (Dai, Page 5, Col. 2, Section 4.3.1: “Thus, we prune away redundant connections for compactness and to ensure efficient inference after the growth phase.”)
Regarding Claim 20, the combination of Wierzynsky, Dai, Nagaraju and Anjaneyapura discloses all the limitations of Claim 16 (as shown in the rejections above).
	Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
performing augmentation of the additional training data before performing additional training of the machine learning model, (Dai, Page 7, Col. 1, ¶[2]: “We split the training set (with 55K images) randomly into ten different parts of equal size. In the incremental learning experiments, we start with one part to train the initial model for subsequent updates. We then add one part as new data each time in the incremental learning scenario. For each update, we perform growth on new data and all data for 15 epochs and 20 epochs in the growth phase, respectively. Then, we prune the post-growth network for compactness.”) [Examiner’s note: performing augmentation of the additional training data i.e., add one part as new data each time]
the augmentation increasing an amount of the additional training data through adjustment of at least one of orientation, color, resolution, or noise. (Wierzynsky, ¶[0060]: “The SceneDetect Backend Engine 512 may be configured to further preprocess 514 the image by scaling 516 and cropping 518. For example, the image may be scaled and cropped so that the resulting image is 224 pixels by 224 pixels. These dimensions may map to the input dimensions
of a neural network. The neural network may be configured by a deep neural network block 520 to cause various processing blocks of the SOC 100 to further process the image pixels with a deep neural network.”)

Claim(s) 5 is rejected under 35 U.S.C. 103 as being unpatentable over Wierzynsky (US 2017/0024641 A1) in view of Dai et al. (“Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks”) (hereafter referred to as “Dai”), Anjaneyapura et al. (WO 2019/104149 A1) (hereafter referred to as “Anjaneyapura”), Alexiuk et al. (WO 2015/188275 A1) (hereafter referred to as “Alexiuk”) and further in view of Nagaraju et al.
Regarding Claim 5, the combination of Wierzynsky, Dai, Nagaraju and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
Wierzynsky in view of Dai, Nagaraju and Anjaneyapura fails to disclose:
encrypting the trained machine learning model before exporting the trained machine learning model for use in performing the type of inference
	However, Alexiuk explicitly discloses
encrypting the trained machine learning model before exporting the trained machine learning model for use in performing the type of inference. (Alexiuk [00113], lines 7 – 9: “Once a model is trained, the classification model may be encrypted, compressed, logged for auditing, anonymized and/or associated with a jurisdictional identifier before transfer to/from the cloud.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky, Dai, Nagaraju and Anjaneyapura and Alexiuk. Wierzynsky teaches a method of transfer learning in neural networks. Dai teaches an incremental learning framework based on a grow-and-prune neural network synthesis paradigm. Nagaraju discloses transmitting machine learning models to edge devices. Anjaneyapura teaches Techniques for packaging and deploying algorithms utilizing containers for training flexible machine learning to perform inference tasks. Alexiuk teaches system and method for network based application development and implementation. One of ordinary skill would have motivation to combine Wierzynsky, Dai, Nagaraju and Anjaneyapura and Alexiuk because Alexiuk discloses that in collaborative environments or when sharing models with external parties, encryption ensures that only authorized users with the appropriate decryption keys can access and utilize the model. This secures collaborations and partnerships. (Alexiuk, page 53, lines 6-8)

Claim(s) 6, 8, 10-15, 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wierzynsky (US 2017/0024641 A1) in view of Dai et al. (“Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks”) (hereafter referred to as “Dai”), Anjaneyapura et al. (WO 2019/104149 A1) (hereafter referred to as “Anjaneyapura”) ,Ragesh & Rajesh (“Pedestrian Detection in Automotive Safety: Understanding State-of-the-Art”) (hereafter referred to as “Ragesh”) and in further view of Nagaraju et al. (US 2018/0032915 A1)
Regarding Claim 6, the combination of Wierzynsky, Dai, Nagaraju and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
Wierzynsky in view of Dai, Nagaraju and Anjaneyapura fails to disclose:
optimizing the trained machine learning model for specific hardware before the exporting, the specific hardware including one or more graphics processing units, one or more central processing units, or a combination thereof
	However, Ragesh explicitly discloses:
optimizing the trained machine learning model for specific hardware before the exporting, the specific hardware including one or more graphics processing units, one or more central processing units, or a combination thereof. (Ragesh, Page 47865, Col. 2, ¶[5]: “However, current trend is to combine In Vehicle Infotainment (IVI), Instrument Cluster (IC), and ADAS to be driven by a single module called eCockpit, to have the best integration and synchronization. The hardware platform should be carefully chosen to facilitate multi-core Central Preprocessing Unit (CPU) and Multicore Graphics Processing Unit (GPU) support for computationally complex ADAS processing. The ADAS algorithms should be optimized for the hardware to provide real-time response to the driver.”) [Examiner’s note: The highlights describe the process of optimizing machine learning model and ADAS algorithms for specific hardware configurations CPUs and GPUs]
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky, Dai, Nagaraju Anjaneyapura and Ragesh. Wierzynsky teaches a method of transfer learning in neural networks. Dai teaches an incremental learning framework based on a grow-and-prune neural network synthesis paradigm. Nagaraju discloses transmitting machine learning models to edge devices. Ragesh teaches different techniques used in pedestrian detection specific to the automotive application, along with a description of generic pedestrian detection solution architecture. Anjaneyapura teaches Techniques for packaging and deploying algorithms utilizing containers for training flexible machine learning to perform inference tasks. One of ordinary skill would have motivation to combine Wierzynsky, Dai, Anjaneyapura, Nagaraju and Ragesh because exporting the model for use with additional target classes provides flexibility in deploying the model across various applications and environments, which allows the model to be easily adapted and reused for different tasks without the need for retraining from scratch (Ragesh, Page 47882, Col. 1, Section VIII, ¶[1])
Regarding Claim 8, the combination of Wierzynsky, Nagaraju, Dai and Anjaneyapura discloses all the limitations of Claim 1 (as shown in the rejections above).
Wierzynsky in view of Dai, Nagaraju and Anjaneyapura further discloses:
for performing at least one of the updating before the pruning of the machine learning model, the pruning of the machine learning model, and the exporting of the trained machine learning model. (Dai, Page 2, Col. 1, ¶[1]: “We first grow and prune a model with the initial data. When new data arrive, the network undergoes a growth phase (first, based on new data and then on all available data) that increases its size to accommodate new data and knowledge. Then, we employ a pruning phase to remove redundant parameters to obtain a compact inference model”)
Wierzynsky in view of Dai, Nagaraju and Anjaneyapura fails to disclose:
providing a toolkit including at least a common interface and one or more modules
	However, Ragesh explicitly discloses:
providing a toolkit including at least a common interface and one or more modules (Ragesh, Page 47880, Col. 2, ¶[1]: “Many of these architectures can be freely downloaded and used with DL programming frameworks like TensorFlow, Caffe, CNTK, PyTorch, Keras, Deeplearning4j, Matlab Deep learning toolkit etc. to implement different DL solutions based on our requirement. We can either freshly train these networks for a new problem or can use the existing knowledge from the pre-trained models and additionally train for the new problem using transfer learning [93].”) [Examiner’s note: a common interface i.e., the frameworks, the toolkit i.e., Matlab Deep learning toolkit, one or more modules of the toolkit is interpreted as the pre-trained model as it is an important component of the toolkit]
	Regarding Claim 10, Wierzynsky explicitly discloses:
A system for performing transfer learning, comprising: at least one processor; and (Wierzynsky, ¶[0013]: “Another aspect of the present disclosure is directed to an apparatus for transfer learning having a memory unit and one or more processors coupled to the memory.”)
memory including instructions that, when executed by the at least one processor, cause the system to: (Wierzynsky, ¶[0101]: “A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.”)
determining the type of inference, which the machine learning model is pre-trained to perform, excludes one or more target classes for an application associated with the [[a]] client device and to be used with the machine learning model; (Wierzynsky, ¶[0060]: “FIG. 5 is a block diagram illustrating the run-time
operation 500 of an AI application on a smartphone 502. The AI application may include a pre-process module 504 that may be configured (using for example, the JAVA programming language) to convert the format of an image 506 and then crop and/or resize the image 508. The pre-processed image may then be communicated to a classify application 510 that contains a SceneDetect Backend Engine 512 that may be configured (using for example, the C programming language) to detect and classify scenes based on visual input.”, ¶[0061]: “In one configuration, a model, such as a machine learning model, is configured for receiving second data that may be unlabeled. Additionally, the model may be configured to generate via a first network, second labels for the second data. Moreover, the first network may have been previously trained on first labels for first data. Furthermore, the model may be configured to train a second network on the second data and the second labels. It should be noted the first network and the second network may be defined on the same device or may be defined on different devices.”, ¶[0066]: “For example, the back propagation may use labeled images of cars to train a neural network to identify different car models.”, ¶[0074]: “For example, a specific car model, such as car model A, may be associated with a specific lgabel ( e.g., class). That is, images of car model A may be labeled as car model A. In this example, the specific car model may receive an update to its design, such as an update to the tail light design. Still, the second training set D' may not include labeled data for the updated car design. Therefore, the boundary of the existing class may be modified to account for the updated design so that the network still labels the updated car model as the specific car model. That is, in this example, the boundary of the car model A class is modified to categorize car model A with the updated tail light design as car model A rather than incorrectly categorizing the car model A with the updated tail light.”) [Examiner’s  note: the type of inference which the machine learning model is pre-trained to perform i.e., identifying car models, “excludes one or more target classes” is being interpreted as an update to the tail light design of car model A which is not included in the labeled data (i.e., the pre-trained model)]
perform, within a container on the [[a]] client device configured to perform additional training, the additional training of the pre-trained selected machine learning model using a first portion of a set of additional training data for the type of inference and for the one or more target classes determined to be excluded from the type of inference; (Wierzynsky, [0046]: “The weights may then be adjusted so as to reduce the error. This manner of adjusting the weights may be referred to as "back propagation" as it involves a "backward pass" through the neural network.”, ¶[0066]: “Machine learning networks, such as neural networks may be trained to classify items from an input, such as an image input and/or an audio input. In some cases, the neural network is trained via back propagation on labeled data. For example, the back propagation may use labeled images of cars to train a neural network to identify different car models.”, ¶[0070]: “In one configuration, the original training set may be augmented with additional data and labels to perform the incremental learning. That is, for incremental learning, it is desirable to augment the original training set with a new training set to avoid forgetting the classifications of the original training set. Incremental learning is not limited to augmenting classes or modifying the boundaries of existing classes as other incremental learning functions are also contemplated.”) [Examiner’s note: “updating one or more parameters of the machine learning model” is being interpreted as “the neural network is trained via back propagation”, “first portion set of the additional training data” i.e., augmenting the original training set with a new training set, “target classes determined to be excluded from the type of inference” is being interpreted as the process of back propagation using labeled images of cars to train a neural network to identify different car models]
using at least a second portion of the set of additional training data for the one or more target classes determined to be excluded from the type of inference, wherein the second portion is separate from the first portion of the set of additional training data used to perform the additional training; (Wierzynsky, ¶[0070]: “In one configuration, the original training set may be augmented with additional data and labels to perform the incremental learning. That is, for incremental learning, it is desirable to augment the original training set with a new training set to avoid forgetting the classifications of the original training set.”, ¶[0071]: “In one configuration, when the first training set D is no longer available after training the first neural network F, a second neural network F' is specified to approximate the first neural network F. Specifically, when the first training set D is no longer available, the first neural network F may be applied to second data x'i, that does not include second labels y'i. In one configuration, the second data x'i, is substantially similar or identical to the first data x of the first training set D. Alternatively, the second data xi', may not be related to the first data x.”) [Examiner’s note: “a second portion of the set of additional training data” is being interpreted as the second data xi’, which is not related to (i.e., separate from) the first data x (i.e., the first portion set of additional training data)]
provide the retrained pruned machine learning model for the type of inference for the one or more target classes represented in the set of additional training data, wherein the one or more target classes are separate from and in addition to the one or more original classes of the selected machine learning model. (Wierzynsky, ¶[0048]: “After learning, the DCN may be presented with new images 326 and a forward pass through the network may yield an output 328 that may be considered an inference or a prediction of the DCN.”, ¶[0033]: “Neural networks may be trained on a training set that includes labels and corresponding data to classify objects from an input. For example, a first neural network may be trained on labeled images of cars to identify different types of cars. In some cases, it may be desirable to add new classes and/or modify the boundaries of existing classes after a network has been trained.”) [Examiner’s note: Wierzynsky discloses performing inference or prediction with new images, which aligns with the concept of performing the type of inference for one or more target classes separated from the original classes.]
	Wierzynsky fails to disclose:
select, from a set of stored pre-trained models for two or more different types of inference, a machine learning model pre-trained for a type of inference for one or more original classes to be provided to a client device;
within a container on the [[a]] client device configured to perform additional training
prune, after the additional training is performed, the selected machine learning model after the additional training;
retrain, after the selected machine learning model is pruned, the pruned machine learning model
determine that the retrained pruned machine learning model satisfies at least one performance criterion; and
However Ragesh explicitly discloses:
select, from a set stored of pre-trained models for two or more different types of inference, a machine learning model pre-trained for a type of inference; (Ragesh, Page 47865, Col. 1, ¶[3]: “In feature classifier (FC) approach, predefined features are used to represent pedestrian objects and a classifier classifies an object to a pedestrian or not based on the similarity of the features of the object-of-interest to that of the pre-trained model. The focus here will be how best a feature can be derived to distinguish a pedestrian object from other objects as well as how we arrive at a best pre-trained model to match the features of the object-of-interest.”) [Examiner’s note: The fact of arriving with the best pre-trained model is the same concept with selecting the best pre-trained model in the set of pre-trained models]
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Ragesh. Wierzynsky teaches a method of transfer learning in neural networks. Ragesh teaches different techniques used in pedestrian detection specific to the automotive application, along with a description of generic pedestrian detection solution architecture. One of ordinary skill would have motivation to combine Wierzynsky and Ragesh to obtain domain adaptation function in transfer learning because if selected task shares similarities with the model’s pre-training objectives, the pre-trained features can be repurposed with minimal adaptation.
However, Nagaraju explicitly discloses:
from a set stored of pre-trained models for two or more different types of
inference… to be provided to a client device; (Nagaraju, ¶[0061]: “The edge devices 12 discussed above can represent a broader category of computing devices commonly referred to as "client devices," which can each be operated under the control of a user. For example, FIG. 1 shows a client device 26 that can communicate with the components of the system 10 (e.g., the edge devices 12 or the server computer system 14) to receive or exchange information over the network 16. For example, a communication between the client device 26 and the components of the system 10 can include sending various requests and receiving data packets”, ¶[0062]: “the client device 26 or applications 28 running on the client device 26 may initiate communications with applications running on the edge devices 12 or the server computer system 14 to request specific content (e.g., edge data), and the applications at the edge devices 12 or the server computer system 14 may respond with the requested content stored in one or more data packets. Hence, the components of the system 10 can also represent a broader category of computing devices referred to as "host devices," which can host each other.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Nagaraju. Wierzynsky discloses a method of transfer learning in neural networks. Nagaraju discloses transmitting machine learning models to edge devices. One of ordinary skill would have motivation to combine Wierzynsky and Nagaraju because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
However, Anjaneyapura explicitly discloses:
within a container on the [[a]] client device configured to perform additional training (Anjaneyapura, [0020]: “In some embodiments, users can create or utilize relatively simple containers adhering to a specification of a provider network, where the containers include code for how a machine learning model is to be trained and/or executed”, [0029]: “The user devices 102 can interact with the model training system 120 via frontend 129 of the model training system 120. For example, a user device 102 can provide a training request to the frontend 129 that includes a container image ( or multiple container images, or an identifier of one or multiple locations where container images are stored)”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Anjaneyapura. Wierzynsky teaches a method of transfer learning in neural networks. Anjaneyapura teaches Techniques for packaging and deploying algorithms utilizing containers for training flexible machine learning to perform inference tasks. One of ordinary skill would have motivation to combine Wierzynsky and Anjaneyapura because the edge servers have the benefit of low latency as they are located closer to the end-users or devices, applying edge servers in training helps reducing the time it takes for data to travel between the server and the users (Anjaneyapura, ¶[00117])
However, Dai explicitly discloses:
prune, after the additional training is performed, the selected machine learning model after the additional training; (Dai, Page 5, Figure 3: 
    PNG
    media_image1.png
    389
    1060
    media_image1.png
    Greyscale
, Page 5, Col. 2, Section 4.3.1: “In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration.”)
retrain, after the selected machine learning model is pruned, the pruned machine learning model (Dai, Pg. 2, Fig. 1: 
    PNG
    media_image2.png
    342
    802
    media_image2.png
    Greyscale
) [Examiner’s note: Fig. 1 shows a base network undergoing growth, followed by recoverable pruning, to produce a model. The figure then shows “new data” being provided to “growth on new data”, with the output fed back to the model. Thus, after the pruning step, the model is further grown/ updated using new data, which reasonably teaches updating parameters of the pruned machine learning model using second portion of additional training data. Further, because the framework is an incremental learning framework, the new data corresponds to additional training data used to adapt the model to newly encounter or excluded classes, including target classes not previously retained for the type of inference.]
determine that the retrained pruned machine learning model satisfies at least one performance criterion; and (Dai, Pg. 5, Col. 2, Section 4.3.1: “In the pruning process, we remove a connection w by setting its value as well as the value of its corresponding mask to 0 if and only if the following condition is satisfied: 
    PNG
    media_image3.png
    37
    384
    media_image3.png
    Greyscale
 where β is a pre-defined pruning ratio. Typically, we use 3                         
                            ≤
                             
                            β
                            ≥
                            5
                        
                     in our experiments. Note that connection pruning is an iterative process. In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Dai. Wierzynsky teaches a method of transfer learning in neural networks. Dai teaches an incremental learning framework based on a grow-and-prune neural network synthesis paradigm. One of ordinary skill would have motivation to combine Wierzynsky and Dai to produce a smaller model for the fine-tune process because a pruned model is smaller and can be fine-tuned more efficiently on new tasks, allowing for faster convergence and reduced training time (Dai, Page 5, Col. 2, Section 4.3.1: “Thus, we prune away redundant connections for compactness and to ensure efficient inference after the growth phase.”)
Regarding Claim 11, the combination of Wierzynsky, Nagaraju, Dai, Ragesh and Anjaneyapura discloses all the limitations of Claim 10 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses:
wherein the additional training utilizes training data for at least one additional classification for the type of inference than was used to pre-train the selected machine learning model. (Dai, Page 7, Col. 1, ¶[2]: “We split the training set (with 55K images) randomly into ten different parts of equal size. In the incremental learning experiments, we start with one part to train the initial model for subsequent updates. We then add one part as new data each time in the incremental learning scenario. For each update, we perform growth on new data and all data for 15 epochs and 20 epochs in the growth phase, respectively. Then, we prune the post-growth network for compactness.”, Page 7, Col. 2, ¶[1]: “whenever a pre-trained model with existing knowledge is available, our incremental learning approach always produces reduced training cost due to its capability of preserving existing knowledge effectively and distilling knowledge from new data efficiently”, and Page 2, Col. 1, ¶[1]: “we employ a pruning phase to remove redundant parameters to obtain a compact inference model.”) [Examiner’s note: The dataset used for pre-train step i.e., the training set with 55k images, additional training dataset i.e., adds one part as a new data each time]
Regarding Claim 12, the combination of Wierzynsky, Dai, Nagaraju, Ragesh and Anjaneyapura discloses all the limitations of Claim 10 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses:
iteratively prune and re-train the selected machine learning model as long as the pruned model continues to satisfy the at least one performance criterion. (Dai, Page 5, Col. 1, ¶[1]: “To reach the same target accuracy of 98.67%, our proposed method only requires 15 and 20 training epochs first on new data and then on all data, respectively.”, Page 5, Col. 2, Section 4.3.1, ¶[2]: “In each iteration, we prune the weights that have the smallest values (e.g., smallest 5%), and retrain the network to recover its accuracy. Once the desired accuracy is achieved, we start the next pruning iteration.”)
Regarding Claim 13, the combination of Wierzynsky, Nagaraju, Dai, Ragesh and Anjaneyapura discloses all the limitations of Claim 10 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses:
augment the additional training data before performing the additional training.  (Dai, Page 7, Col. 1, ¶[2]: “We split the training set (with 55K images) randomly into ten different parts of equal size. In the incremental learning experiments, we start with one part to train the initial model for subsequent updates. We then add one part as new data each time in the incremental learning scenario. For each update, we perform growth on new data and all data for 15 epochs and 20 epochs in the growth phase, respectively. Then, we prune the post-growth network for compactness.”) [Examiner’s note: performing augmentation of the additional training data i.e., add one part as new data each time]
Regarding Claim 14, the combination of Wierzynsky, Nagaraju, Dai, Ragesh and Anjaneyapura discloses all the limitations of Claim 10 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses
evaluate performance of two or more of the set of pre-trained models on at least a subset of the additional training data before selecting the machine learning model. (Ragesh, Page 47865, Col. 1, ¶[3]: “In feature classifier (FC) approach, predefined features are used to represent pedestrian objects and a classifier classifies an object to a pedestrian or not based on the similarity of the features of the object-of-interest to that of the pre-trained model. The focus here will be how best a feature can be derived to distinguish a pedestrian object from other objects as well as how we arrive at a best pre-trained model to match the features of the object-of-interest.”) [Examiner’s note: The fact of arriving with the best pre-trained model is the same concept with selecting the best pre-trained model in the set of pre-trained models] 
Regarding Claim 15, the combination of Wierzynsky, Nagaraju, Dai, Ragesh and Anjaneyapura discloses all the limitations of Claim 10 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses:
provide a toolkit including at least a common interface and one or more modules (Ragesh, Page 47880, Col. 2, ¶[1]: “Many of these architectures can be freely downloaded and used with DL programming frameworks like TensorFlow, Caffe, CNTK, PyTorch, Keras, Deeplearning4j, Matlab Deep learning toolkit etc. to implement different DL solutions based on our requirement. We can either freshly train these networks for a new problem or can use the existing knowledge from the pre-trained models and additionally train for the new problem using transfer learning [93].”) [Examiner’s note: a common interface i.e., the frameworks, the toolkit i.e., Matlab Deep learning toolkit, one or more modules of the toolkit is interpreted as the pre-trained model as it is an important component of the toolkit]
	for performing at least one of the additional training, the model pruning, data 
augmentation, and the model export (Dai, Page 2, Col. 1, ¶[1]: “We first grow and prune a model with the initial data. When new data arrive, the network undergoes a growth phase (first, based on new data and then on all available data) that increases its size to accommodate new data and knowledge. Then, we employ a pruning phase to remove redundant parameters to obtain a compact inference model”)
wherein the toolkit is provided in the container on the client device. (Anjaneyapura, [00123]: “In some embodiments, the machine learning models may be "custom" algorithms developed by users, and/or use custom code to train using existing algorithms such as deep learning frameworks (e.g., TensorFlow, Apache MXNet, etc.).”, [00125]: “Accordingly, in some embodiments, the training and/or hosting of machine learning models can be performed without needing significant knowledge on the part of users as to how these models are to be trained or used. For example, in some embodiments users can select or create a container including machine learning related code - potentially using any language(s)/package(s) that the user desires”) [Examiner’s note: the toolkit is interpreted as the TensorFlow as it contains the custom algorithms and it is included in a container created by the users]
Regarding Claim 18, the combination of Wierzynsky, Nagaraju, Dai, Ragesh and Anjaneyapura discloses all the limitations of Claim 16 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses:
for performing at least one of the additional training, model pruning, data augmentation, and model export, (Dai, Page 2, Col. 1, ¶[1]: “We first grow and prune a model with the initial data. When new data arrive, the network undergoes a growth phase (first, based on new data and then on all available data) that increases its size to accommodate new data and knowledge. Then, we employ a pruning phase to remove redundant parameters to obtain a compact inference model”)
wherein the toolkit is provided in a software container for execution on a target computing device. . (Anjaneyapura, [00123]: “In some embodiments, the machine learning models may be "custom" algorithms developed by users, and/or use custom code to train using existing algorithms such as deep learning frameworks (e.g., TensorFlow, Apache MXNet, etc.).”, [00125]: “Accordingly, in some embodiments, the training and/or hosting of machine learning models can be performed without needing significant knowledge on the part of users as to how these models are to be trained or used. For example, in some embodiments users can select or create a container including machine learning related code - potentially using any language(s)/package(s) that the user desires”) [Examiner’s note: the toolkit is interpreted as the TensorFlow as it contains the custom algorithms and it is included in a container created by the users]
providing a toolkit including at least a common interface and one or more modules for performing at least one of the additional training, model pruning, data augmentation, and model export, (Ragesh, Page 47880, Col. 2, ¶[1]: “Many of these architectures can be freely downloaded and used with DL programming frameworks like TensorFlow, Caffe, CNTK, PyTorch, Keras, Deeplearning4j, Matlab Deep learning toolkit etc. to implement different DL solutions based on our requirement. We can either freshly train these networks for a new problem or can use the existing knowledge from the pre-trained models and additionally train for the new problem using transfer learning [93].”) [Examiner’s note: a common interface i.e., the frameworks, the toolkit i.e., Matlab Deep learning toolkit, one or more modules of the toolkit is interpreted as the pre-trained model as it is an important component of the toolkit]
Regarding Claim 19, the combination of Wierzynsky, Nagaraju, Dai, Ragesh and Anjaneyapura discloses all the limitations of Claim 16 (as shown in the rejections above).
Wierzynsky in view of Dai, Ragesh, Nagaraju and Anjaneyapura further discloses:
selecting the at least one pre-trained model from a set of pre-trained models based at least in part upon the at least one type of inference to be performed. (Ragesh, Page 47865, Col. 1, ¶[3]: “In feature classifier (FC) approach, predefined features are used to represent pedestrian objects and a classifier classifies an object to a pedestrian or not based on the similarity of the features of the object-of-interest to that of the pre-trained model. The focus here will be how best a feature can be derived to distinguish a pedestrian object from other objects as well as how we arrive at a best pre-trained model to match the features of the object-of-interest.”) [Examiner’s note: The fact of arriving with the best pre-trained model is the same concept with selecting the best pre-trained model in the set of pre-trained models]

Claim(s) 16 is rejected under 35 U.S.C. 103 as being unpatentable over Wierzynsky (US 2017/0024641 A1) in view of Anjaneyapura et al. (WO 2019/104149 A1) (hereafter referred to as “Anjaneyapura”) and further in view of Nagaraju et al. (US 2018/0032915 A1)
Regarding Claim 16, Wierzynsky explicitly discloses:
providing, in response to the request for the stored pre-trained model, at least one pre-trained model stored in a local storage location of the edge server; and (Wierzynsky, ¶0036]: “In an aspect of the present disclosure, the instructions loaded into the general-purpose processor 102 may comprise code for receiving second labels generated by a first network using second data. The first network was previously trained on first labels and first data. The instructions loaded into the general-purpose processor 102 may also comprise code for training a second network on the second labels and the second data.”, ¶[0038]: “As illustrated in FIG. 2, the system 200 may have multiple local processing units 202 that may perform various operations of methods described herein. Each local processing unit 202 may comprise a local state memory 204 and a local parameter memory 206 that may store parameters of a neural network.”)
providing, in response to a second request for additional training corresponding to an application to be used with the at least one pre-trained model, additional training data to cause the additional training of the at least one pre-trained model using the additional training data for the at least one type of inference for one or more classes represented in the additional training data, (Wierzynsky, ¶[0011]: “In one configuration, the first network has been previously trained on first labels for first data.”, ¶[0033]: “Neural networks may be trained on a training set that includes labels and corresponding data to classify objects from an input. For example, a first neural network may be trained on labeled images of cars to identify different types of cars. In some cases, it may be desirable to add new classes and/or modify the boundaries of existing classes after a network has been trained.”, ¶[0070]: “In one configuration, the original training set may be augmented with additional data and labels to perform the incremental learning. That is, for incremental learning, it is desirable to augment the original training set with a new training set to avoid forgetting the classifications of the original training set. Incremental learning is not limited to augmenting classes or modifying the boundaries of existing classes as other incremental learning functions are also contemplated.”, )
wherein the one or more classes, for the application, are separate from and in addition to one or more original classes for the pre-trained model, wherein the second request specifies at least one of the one or more classes. (Wierzynsky, ¶[0060]: “FIG. 5 is a block diagram illustrating the run-time operation 500 of an AI application on a smartphone 502. The AI application may include a pre-process module 504 that may be configured (using for example, the JAVA programming language) to convert the format of an image 506 and then crop and/or resize the image 508. The pre-processed image may then be communicated to a classify application 510 that contains a SceneDetect Backend Engine 512 that may be configured (using for example, the C programming language) to detect and classify scenes based on visual input.”, ¶[0070]: “In one configuration, the original training set may be augmented with additional data and labels to perform the incremental learning. That is, for incremental learning, it is desirable to augment the original training set with a new training set to avoid forgetting the classifications of the original training set.”, ¶[0071]: “In one configuration, when the first training set D is no longer available after training the first neural network F, a second neural network F' is specified to approximate the first neural network F. Specifically, when the first training set D is no longer available, the first neural network F may be applied to second data x'i, that does not include second labels y'i. In one configuration, the second data x'i, is substantially similar or identical to the first data x of the first training set D. Alternatively, the second data xi', may not be related to the first data x.”) [Examiner’s note: “a second portion of the set of additional training data” is being interpreted as the second data xi’, which is not related to (i.e., separate from) the first data x (i.e., the first portion set of additional training data)]
Wierzynsky fails to disclose:
receiving, through an interface associated with an edge server, a request for a stored pre-trained model to be provided which is able to perform at least one type of inference;	
in response to the request for the stored pre-trained model
in response to a second request for additional training corresponding to an application to be used with the at least one pre-trained model
	However, Nagaraju explicitly discloses:
receiving, through an interface associated with an edge server, a request for a stored pre-trained model to be provided (Nagaraju, ¶[0061]: “The edge devices 12 discussed above can represent a broader category of computing devices commonly referred to as "client devices," which can each be operated under the control of a user. For example, FIG. 1 shows a client device 26 that can communicate with the components of the system 10 (e.g., the edge devices 12 or the server computer system 14) to receive or exchange information over the network 16. For example, a communication between the client device 26 and the components of the system 10 can include sending various requests and receiving data packets”, ¶[0062]: “the client device 26 or applications 28 running on the client device 26 may initiate communications with applications running on the edge devices 12 or the server computer system 14 to request specific content (e.g., edge data), and the applications at the edge devices 12 or the server computer system 14 may respond with the requested content stored in one or more data packets. Hence, the components of the system 10 can also represent a broader category of computing devices referred to as "host devices," which can host each other.”)
in response to the request for the stored pre-trained model (Nagaraju, ¶[0061]: “The edge devices 12 discussed above can represent a broader category of computing devices commonly referred to as "client devices," which can each be operated under the control of a user. For example, FIG. 1 shows a client device 26 that can communicate with the components of the system 10 (e.g., the edge devices 12 or the server computer system 14) to receive or exchange information over the network 16. For example, a communication between the client device 26 and the components of the system 10 can include sending various requests and receiving data packets”, ¶[0062]: “the client device 26 or applications 28 running on the client device 26 may initiate communications with applications running on the edge devices 12 or the server computer system 14 to request specific content (e.g., edge data), and the applications at the edge devices 12 or the server computer system 14 may respond with the requested content stored in one or more data packets. Hence, the components of the system 10 can also represent a broader category of computing devices referred to as "host devices," which can host each other.”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky and Nagaraju. Wierzynsky discloses a method of transfer learning in neural networks. Nagaraju discloses transmitting machine learning models to edge devices. One of ordinary skill would have motivation to combine Wierzynsky and Nagaraju because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E): “Obvious to try” choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of the ordinary skill in the art.
However Anjaneyapura explicitly discloses:
which is able to perform at least one type of inference; (Anjaneyapura, [0033]: “The model training system 120 can use the information provided by the user device 102 to train a machine learning model in one or more pre-established virtual machine instances 122 in some embodiments… The model training system 120 can automatically scale up and down based on the volume of training requests received from user devices 102 via frontend 129, thereby relieving the user from the burden of having to worry about over-utilization ( e.g., acquiring too little computing resources and suffering performance issues) or under-utilization (e.g., acquiring more computing resources than necessary to train the machine learning models, and thus overpaying).”, [00124]: “With these hosted models (e.g., inference code 1024 executed by a container 1022), client applications 1008 - whether hosted within the provider network 199 or external to the provider network 199 - can issue requests via one or more inference endpoints 1010 (e.g., as HTTP requests) to perform inference using the model”) [Examiner’s note: a pre-trained model i.e., the pre-established virtual machine instances]
in response to a second request for additional training corresponding to an application to be used with the at least one pre-trained model (Anjaneyapura, [0033]: “The model training system 120 can use the information provided by the user device 102 to train a machine learning model in one or more pre-established virtual machine instances 122 in some embodiments… The model training system 120 can automatically scale up and down based on the volume of training requests received from user devices 102 via frontend 129, thereby relieving the user from the burden of having to worry about over-utilization ( e.g., acquiring too little computing resources and suffering performance issues) or under-utilization (e.g., acquiring more computing resources than necessary to train the machine learning models, and thus overpaying).”, [00124]: “With these hosted models (e.g., inference code 1024 executed by a container 1022), client applications 1008 - whether hosted within the provider network 199 or external to the provider network 199 - can issue requests via one or more inference endpoints 1010 (e.g., as HTTP requests) to perform inference using the model”) [Examiner’s note: a pre-trained model i.e., the pre-established virtual machine instances]
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Wierzynsky, Nagaraju and Anjaneyapura. Wierzynsky teaches a method of transfer learning in neural networks. Anjaneyapura teaches Techniques for packaging and deploying algorithms utilizing containers for training flexible machine learning to perform inference tasks. Nagaraju discloses transmitting machine learning models to edge devices. One of ordinary skill would have motivation to combine Wierzynsky, Nagaraju and Anjaneyapura because the edge servers have the benefit of low latency as they are located closer to the end-users or devices, applying edge servers in training helps reducing the time it takes for data to travel between the server and the users (Anjaneyapura, ¶[00117])
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY TRAN whose telephone number is (571)270-0693. The examiner can normally be reached Monday - Friday 7:30 am - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/AMY TRAN/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Show 17 earlier events
Jan 28, 2025
Applicant Interview (Telephonic)
Apr 11, 2025
Request for Continued Examination
Apr 16, 2025
Response after Non-Final Action
Oct 27, 2025
Non-Final Rejection mailed — §103
Mar 03, 2026
Examiner Interview Summary
Mar 03, 2026
Applicant Interview (Telephonic)
Mar 11, 2026
Response Filed
May 11, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/200,331
Patent 12639615
ENTANGLEMENT FORGING FOR QUANTUM SIMULATIONS
5y 2m to grant Granted May 26, 2026
17/173,605
Patent 12626120
AUTOMATED PIXEL-WISE LABELING OF ROCK CUTTINGS BASED ON CONVOLUTIONAL NEURAL NETWORK-BASED EDGE DETECTION
5y 3m to grant Granted May 12, 2026
17/226,399
Patent 12602582
DYNAMIC DISTRIBUTED TRAINING OF MACHINE LEARNING MODELS
5y 0m to grant Granted Apr 14, 2026
17/137,588
Patent 12468932
IDENTIFYING RELATED MESSAGES IN A NATURAL LANGUAGE INTERACTION
4y 10m to grant Granted Nov 11, 2025
16/996,310
Patent 12462185
SCENE GRAMMAR BASED REINFORCEMENT LEARNING IN AGENT TRAINING
5y 2m to grant Granted Nov 04, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

7-8
Expected OA Rounds
37%
Grant Probability
84%
With Interview (+47.2%)
4y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 30 resolved cases by this examiner. Grant probability derived from career allowance rate.