Last updated: May 29, 2026
Application No. 17/845,063
ACCELERATED TRANSFER LEARNING AS A SERVICE FOR NEURAL NETWORKS

Non-Final OA §103
Filed
Jun 21, 2022
Examiner
RYLANDER, BART I
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
3 (Non-Final)
Interview Optional

— +13.6% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 65% grant rate with +13.6% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 117 resolved cases, 2023–2026
Examiner Intelligence

RYLANDER, BART I View full profile →
Grants 65% — above average
Career Allowance Rate
76 granted / 117 resolved
+10.0% vs TC avg
Moderate +14% lift
Without
With
+13.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
15 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
2.0%
-38.0% vs TC avg
§103
95.1%
+55.1% vs TC avg
§102
1.5%
-38.5% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 117 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 9/11/2025.
Applicant’s remarks made in amendment filed 9/11/2025.
Claims 1, 10, and 17 are amended. Claims 1-20 are presented for examination.
Response to Arguments
Applicant presents arguments.  Each is addressed.
Applicant argues “Given these details, Applicant respectfully submits that the additional limitations are more than abstract idea grouping.  It is impractical for the human mind to perform these additional limitations” as amended. (Remarks, page 10, paragraph 2, line 5.) Examiner agrees.  The rejections under 35 U.S.C. § 101 are withdrawn.
Applicant argues “Applicant respectfully traverses the § 103 rejections because the Examiner failed to state a prima facie case of obviousness and/or the current amendments to the claims now render the Examiner’s arguments moot.” (Remarks, page 10, paragraph 5, line 2.)  The argument is moot in view of new areas of the cited prior art found that teach the amended limitations.
Applicant argues “The other independent claims, i.e., claims 10 and 17 recite similar limitations and are allowable over Whatmough in view of Singh and further in view of Feuz for at least the same or similar reasons.” (Remarks, page 14, paragraph 2, line 2.)  However, claim 1 remains rejected.  Claims 10 and 17 which recite the same substantive limitations as claim 1 remain rejected as well. The dependent claims remain rejected at least for depending from rejected base claims.
Claim Interpretation
The specification includes a special definition of a computer storage medium as non-transitory.  See specification paragraph [0052].
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-14, and 17-20 are rejected under 35 U.S.C. § 103 as being unpatentable over Whatmough, et al (FixyNN: Efficient Hardware for Mobile Computer Vision Via Transfer Learning, herein Whatmough), Singh, et al (LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning, herein Singh) and Feuz et al (Ranking and automatic selection of machine learning models, herein Feuz).
Regarding claim 1,
	Whatmough teaches a method for processing input data using one or more accelerated machine learning models and one or more scenario specific machine learning models  (Whatmough, Figure 1, and, page 1, abstract, line 2 “This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transferred part, and the programmable part being learnt on the target dataset.”

    PNG
    media_image1.png
    566
    549
    media_image1.png
    Greyscale

In other words, “FixyNN” which consists of a fixed-weight feature extractor that generates ubiquitous CNN features… is a method for processing input data, input is input data, fixed weight feature extractor (FFE) is accelerated machine learning model, and Task Specific CNN Back-End tasks is one or more scenario specific machine learning models.) , the method comprising: 
	receiving input data, wherein the input data specifies a scenario of a task for processing the input data (Whatmough, Figure 1. And, page 1, abstract, line 4 “Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transferred part, and the programmable part being learnt on the target dataset.” In other words, from Figure 1, input is receiving input, the output of the shared front end is input to the task specific CNNs, and the programmable part being learnt on the target dataset is the input data specifies a scenario of a task for processing the input data.); 
	 generating, based on the input data, a set of input [feature vectors] (Whatmough, Figures 1 and 3.

    PNG
    media_image2.png
    218
    936
    media_image2.png
    Greyscale

 In other words, from Figure 1, input picture is input data, and from Figure 3,  1 x 1 x C is an input vector.) ; 
automatically generating, based on the set of input feature vectors and a [resource usage condition] as specified by the scenario of the task for processing, the set of input feature vectors (Whatmough, Figure 1, and, page 1, abstract, line 2 “This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features…” and, page 2, column 1, paragraph 2, line 10 “The front-end layers are implemented as a heavily optimized fixed-weight feature extractor (FFE) hardware accelerator. The second part of the network is unique for each dataset, and hence needs to be implemented on a canonical programmable CNN hardware accelerator (Nvidia; Arm). Following this system architecture, FixyNN diverts a significant portion of the computational load from the CNN accelerator to the highly-efficient FFE, enabling much greater performance and energy efficiency.” In other words, feature extractor generates ubiquitous CNN features, is from input data automatically generating the set of input feature vectors.), by 
a programmable hardware accelerator of a plurality of distinct types of programmable hardware accelerators using an accelerated machine learning model, a set of scenario specific feature vectors (Whatmough, Figure 1, and page 2, column 1, paragraph 2, line 12  “The second part of the network is unique for each dataset, and hence needs to be implemented on a canonical programmable CNN hardware accelerator (Nvidia; Arm).” In other words, canonical programmable CNN hardware accelerator, e.g. Nvidia, Arm is a programmable hardware accelerator of the plurality of distinct types of programmable hardware accelerators.),  wherein 
[the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators] , 
the set of scenario specific feature vectors is specific to a scenario for further processing the input data using a scenario specific machine learning model of a plurality of scenario specific machine learning models (Whatmough, Figure 1, and page 2, column 1, paragraph 2, line 3  “Our approach (Figure 1) divides a CNN into two parts. The first part of the network implements a set of layers that are common for all CV tasks, essentially producing a set of universal low-level CNN features that are shared for multiple different tasks or datasets. The second part of the network provides a task-specific CNN back-end. These two CNN parts are then processed on different customized hardware.” In other words, the first part of the model provides scenario specific feature vectors for processing by the second part of the model, the task-specific CNN models is a plurality of scenario specific machine learning models.) , and 
the scenario specific machine learning model is smaller in resource consumption than the accelerated machine learning model (Whatmough, page 2, column 1, paragraph 2, line 10 “The front-end layers are implemented as a heavily optimized fixed-weight feature extractor (FFE) hardware accelerator. The second part of the network is unique for each dataset, and hence needs to be implemented on a canonical programmable CNN hardware accelerator (Nvidia; Arm). Following this system architecture, FixyNN diverts a significant portion of the computational load from the CNN accelerator to the highly-efficient FFE, enabling much greater performance and energy efficiency.”  In other words, a significant proportion of the computational load from the CNN accelerator to the highly-efficient FFE is the scenario specific machine learning models will have a lessor load which is smaller in resource consumption.) ; 
generating, based on the set of scenario specific feature vectors, [by a central processing unit (CPU)] using the scenario specific machine learning model, output data, wherein [the CPU] executes operations slower in processing speed than the programmable hardware accelerator (Whatmough, Figure 1, and, page 2, column 1, paragraph 2, line 10 “The front-end layers are implemented as a heavily optimized fixed-weight feature extractor (FFE) hardware accelerator. The second part of the network is unique for each dataset, and hence needs to be implemented on a canonical programmable CNN hardware accelerator (Nvidia; Arm). Following this system architecture, FixyNN diverts a significant portion of the computational load from the CNN accelerator to the highly-efficient FFE, enabling much greater performance and energy efficiency.” In other words, front end layers are heavily optimized fixed weight feature extractor (FFE) hardware enabling much greater performance and energy efficiency, back end layers which are the second part, i.e. which is unique for each dataset is slower in operating speed, and, from Figure 1, outputting “CAT” is generating output data.).
Thus far, Whatmough does not explicitly teach feature vector.
Singh teaches feature vector (Sing, Figure 2, Figure 3, and, page 501, column 2,  line 1 “Accelerator optimization options. Table 2 describes commonly used high-level synthesis (HLS) [40] pragmas to optimize an accelerator design on an FPGA. These optimization options constitute a part of our ML feature vector for training.”

    PNG
    media_image3.png
    289
    553
    media_image3.png
    Greyscale

In other words, ML feature vector is feature vector.)
	Singh teaches based on resource usage condition as specified by the scenario of the task for processing (Singh, page 500, column 1, paragraph 2, line 7 “Second, LEAPER greatly reduces (up to 10×) the training overhead by transferring a base model, trained for a low-end, edge FPGA platform, to predict performance or resource usage for an accelerator implementation on a new, unknown high-end environment rather than building a new model from scratch.” In other words, to predict performance and resource usage is based on resource usage condition.)
	Singh teaches the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators (Singh, page 500, column 1, paragraph 2, line 1 “We demonstrate LEAPER across five state-of-the-art, high-end, cloud FPGA-based platforms with three different interconnect technologies on six real-world applications. We present two keys results. First, LEAPER achieves, on average across six evaluated workloads and five FPGAs, 85% accuracy when we use our transferred model for prediction in a cloud environment. Second, LEAPER greatly reduces (up to 10×) the training overhead by transferring a base model, trained for a low-end, edge FPGA platform, to predict performance or resource usage for an accelerator implementation on a new, unknown high-end environment rather than building a new model from scratch.” In other words, transferred model is accelerated machine learning model and demonstrate LEAPER across five… FPGA-based platforms is executable as a common model by respective distinct type of programmable hardware accelerators.)
Both Whatmough and Singh are directed to transfer learning of machine learning models, among other things. Whatmough teaches a method for processing input data using one or more accelerated machine learning models and one or more scenario specific machine learning models, the method comprising receiving input data, wherein the input data specifies a scenario of a task for processing the input data, generating, based on the input data, a set of input vectors, automatically generating, based on the set of input vectors, the set of input feature vectors; but does not explicitly teach feature vectors, based on a resource usage condition as specified by the scenario of the task for processing, or the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators.  Singh teaches feature vectors, and based on resource usage condition as specified by the scenario of the task for processing, and the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators.  
	In view of the teaching of Whatmough it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Singh into Whatmough.  This would result in a method for processing input data using one or more accelerated machine learning models and one or more scenario specific machine learning models, the method comprising receiving input data, wherein the input data specifies a scenario of a task for processing the input data, generating, based on the input data, a set of input feature vectors, automatically generating, based on the set of input feature vectors and on a resource usage condition as specified by the scenario of the task for processing, the set of input feature vectors, and the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators.
	One of ordinary skill in the art would be motivated to do this because training is expensive in terms of computation and resources, and improved transfer learning is a way to reduce this expense. (Singh, page 499, column  1, line 6 “First, training requires large amounts of data (features extracted from design synthesis and implementation tools), which is cost-inefficient because of the time-consuming accelerator design and implementation process. Second, a model trained for a specific environment cannot predict performance or resource usage for a new, unknown environment. In a cloud system, renting a platform for data collection to build an ML model can significantly increase the total-cost ownership (TCO) of a system. Third, ML-based models trained using a limited number of samples are prone to overfitting. To overcome these limitations, we propose LEAPER, a transfer learning-based approach for prediction of performance and resource usage in FPGA-based systems.”)
Thus far, the combination of Whatmough and Singh does not explicitly teach by a central processing unit (CPU).
	Feuz teaches by a central processing unit (CPU) (Feuz, abstract, line 2 “In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to provide to a software application one or more machine learning models from different providers. The trained models are suited to a task or data type specified by the developer. The one or more models are selected from a registry of machine learning models, their task specialties, cost, and performance, such that the application specified cost and performance requirements are met.” And, page 25, paragraph 4, line 1 “Each computing device can also include one or more processing devices that implement some or all of the machine-learned model and/or perform other related operations. Example processing devices include one or more of: a central processing unit (CPU); a visual processing unit (VPU); a graphics processing unit (GPU); a tensor processing unit (TPU); a neural processing unit (NPU); a neural processing engine; a core of a CPU, VPU, GPU, TPU, NPU or other  processing device; an application specific integrated circuit (ASIC); a field programmable gate array (FPGA); a co-processor; a controller; or combinations of the processing devices described above. Processing devices can be embedded within other hardware components such as, for example, an image sensor, accelerometer, etc.”  In other words, CPU is by a central processing unit.)	Both Feuz and the combination of Whatmough and Singh are directed to machine learning models, among other things.  The combination of Whatmough and Singh teaches a method for processing input data using one or more accelerated machine learning models and one or more scenario specific machine learning models, the method comprising receiving input data, wherein the input data specifies a scenario of a task for processing the input data, generating, based on the input data, a set of input feature vectors, automatically generating, based on the set of input feature vectors and on a resource usage condition as specified by the scenario of the task for processing, the set of input feature vectors, and the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators; but does not explicitly teach using a central processing unit.  Feuz teaches using a central processing unit.
	In view of the teaching of the combination of Whatmough and Singh, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Feuz into the combination of Whatmough and Singh. This would result in a method for processing input data using one or more accelerated machine learning models and one or more scenario specific machine learning models, the method comprising receiving input data, wherein the input data specifies a scenario of a task for processing the input data, generating, based on the input data, a set of input feature vectors, automatically generating, based on the set of input feature vectors and on a resource usage condition as specified by the scenario of the task for processing, the set of input feature vectors, and the accelerated machine learning model is executable as a common model by respective distinct types of programmable hardware accelerators of the plurality of distinct types of programmable hardware accelerators, and using a central processing unit.  
	One of ordinary skill in the art would be motivated to do this because it can be difficult to service multiple machine learning needs with only one machine learning model and allowing for generic CPUs in addition to accelerators would be effective and helpful. (Feuz, page 3, paragraph 2, line 1 “Machine learning models are available as a service from various cloud-based or on-device providers. In situations where competing machine learning models are available, e.g., in a marketplace of models, developers that wish to employ machine learning in their software applications do not currently have a mechanism to automatically select and use the model that is most suited for the problem specific to their software applications.”)
Regarding claim 4,
	The combination of Whatmough, Singh, and Feuz teaches the method of claim 1, further comprising 
	determining, based upon the input data the accelerated machine learning model from the one or more accelerated machine learning models (Whatmough, Figure 1, and, abstract, line 2 “This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN.” In other words, processes a dataset is input data and dataset-specific CNN is Task Specific CNN Back-end is chosen based on the input data.).  
Regarding claim 5,
	The combination of Whatmough, Singh, and Feuz teaches the method of claim 4, wherein 
	the determination is based upon a task associated with the input data (Whatmough, Figure 1, and, abstract, line 2 “This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN.” In other words, dataset is input data, and, Task specific is task associated with dataset.).  
Regarding claim 6,
	The combination of Whatmough, Singh, and Feuz teaches the method of claim 4, wherein 
	the determination is based upon a type of specific machine learning model associated with the input data (Whatmough,  Figure 1, and, abstract, line 2 “ This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transferred part, and the programmable part being learnt on the target dataset.” In other words, from Figure 1, Task specific CNN back-ends are specific machine learning models, and dataset-specific CNN is a machine learning model associated with the input data.).  
Regarding claim 7,
	The combination of Whatmough, Singh, and Feuz teaches the method of claim 1, wherein 
the hardware accelerator comprises one or more of: a field-programable gate array (FPGA); a graphics processing unit (GPU); a tensor processing unit (TPU); or an application-specific integrated circuit (ASIC) (Whatmough, Figure 4, and page 3, column 1, paragraph 5, line 1 “FixyNN consists of two hardware components: the FFE, and a programmable CNN accelerator. The FFE is generated using our DeepFreeze tool (Section 3.3). We use 8-bit precision for weights and activation data, and 32-bit for accumulators. For ASIC implementation experiments, we use Synopsys Design Compiler with TSMC 16nm Fin-FET process technology to characterize silicon area.”

    PNG
    media_image4.png
    329
    548
    media_image4.png
    Greyscale

In other words, ASIC implementation is the hardware accelerator comprises of one or more of: a field-programable gate array (FPGA); a graphics processing unit (GPU); a tensor processing unit (TPU); or an application-specific integrated circuit (ASIC). ). 
Regarding claim 8,
	The combination of Whatmough, Singh, and Feuz teaches the method of claim 1, wherein 
	the accelerated machine learning model comprises common portions of a plurality of machine learning models, the common portions being executed by the plurality of machine learning models, such that the accelerated machine learning model can be used with the plurality of machine learning models (Whatmough, page 5, column 2, paragraph 2, line 10 “Therefore, in FixyNN we propose to fix only a portion of the front-end of the network, and use a canonical programmable accelerator to process the remainder (Figure 1). The fixed portion provides a set of more universal CNN features specific to the application domain of vision tasks, whereas the programmable portion of the network specific to a given a dataset.” In other words, the fixed portion provides a set of more universal CNN features is the accelerated machine learning model comprises common portions of a plurality of machine learning models, and, fixing only the more universal features specific to the domain is the accelerated machine learning model can be used with the plurality of machine learning models because it is processing the fixed portion common to all the back-end tasks.).  
Regarding claim 9,
	The combination of Whatmough, Singh, and Feuz teaches the method of claim 8, wherein 
	the accelerated machine learning model is generated based upon the common portions, wherein the common portions are compiled for execution on the hardware accelerator (Whatmough, page 5, column 2, paragraph 2, line 10. See above mapping. And, page 2, column 1, paragraph 2, line 1 “In this paper we describe FixyNN, which builds upon both of these trends, by means of a hardware/CNN co-design approach to CNN inference for CV on mobile devices. Our approach (Figure 1) divides a CNN into two parts. The first part of the network implements a set of layers that are common for all CV tasks, essentially producing a set of universal low-level CNN features that are shared for multiple different tasks or datasets.” In other words, FFE is accelerated machine leaning model, FixyNN hardware is hardware accelerator, and the first part of the network implements a set of layers that are common for all CV tasks is the common portions are compiled for execution on the hardware accelerator.).
16. 	Claims 10-12 are systems comprising at least one processor, at least one field-programmable gate array (FPGA), and memory claims corresponding to method claims 1, 8, and 9, respectively.  Otherwise, they are the same.  The combination of Whatmough, Singh, and Feuz teaches a system comprising at least one processor, at least one FPGA, and memory (Singh, page 3, column 1, paragraph 3, line 1 “Figure 3b shows the execution timeline from our host CPU to the FPGA board. We make use of CAPI in a coarse-grained way as we offload the entire application to the FPGA. CAPI ensures that a PE accesses the entire CPU memory.” In other words, CPU is processor, FPGA is FPGA, and memory is memory.) Therefore, claims 10-12 are rejected for the same reasons as claims 1, 8, and 9, respectively.
17. 	Regarding claim 13,
	The combination of Whatmough, Singh, and Feuz teaches the system of claim 10, wherein 
	the one or more scenario specific machine learning models are trained using a set of training data generated by the accelerated machine learning model (Whatmough, page 6, column 1, paragraph 2, line 1 “The procedure for training an image classification model
on a given dataset is as follows. We start by assuming the fixed feature extractor has already been defined, using the MobileNet architecture trained on the ImageNet data. The early-layer weights are fixed for the feature extractor, while the remainder of the network is fine-tuned on the target dataset.” In other words, the early layer weights are fixed for the feature extractor, is the accelerated machine learning model is trained, and the remainer of the network is fine-tuned on the target dataset is using the training dataset generated by the accelerated machine learning model.) .  
18.	Regarding claim 14,
	The combination of Whatmough, Singh, and Feuz teaches the system of claim 10, wherein 
	the one or more scenario specific machine learning models are trained using a set of training data (Whatmough, page 6, column 1, paragraph 2, line 1 “The procedure for training an image classification model on a given dataset is as follows. We start by assuming the fixed feature extractor has already been defined, using the MobileNet architecture trained on the ImageNet data. The early-layer weights are fixed for the feature extractor, while the remainder of the network is fine-tuned on the target dataset.” In other words, image classification model is one or more scenario specific machine learning models, and fine-tuned on the target dataset is trained using a set of training data.), wherein 
	portions of the one or more scenario specific machine learning models that correspond to the accelerated machine learning model are locked during the training process 
(Whatmough, page 6, column 1, paragraph 2, line 1 “The procedure for training an image classification model on a given dataset is as follows. We start by assuming the fixed feature extractor has already been defined, using the MobileNet architecture trained on the ImageNet data. The early-layer weights are fixed for the feature extractor, while the remainder of the network is fine-tuned on the target dataset.” In other words, early-layer weights are fixed for the feature extractor is portions of the one or more scenarios specific machine learning models that correspond to the accelerated machine learning model are locked during the training process.). 
19.	Claims 17-20 are computer storage medium claims that contain instructions that are executed by at least one processor, that correspond to method claims 1, and 7-9, respectively.  Otherwise, they are the same.  The combination of Whatmough, Singh, and Feuz teaches a computer storage medium that contains instructions that are executed on at least on processor. (Singh,  page 3, column 1, paragraph 3, line 1 “Figure 3b shows the execution timeline from our host CPU to the FPGA board. We make use of CAPI in a coarse-grained way as we offload the entire application to the FPGA. CAPI ensures that a PE accesses the entire CPU memory.” In other words, cpu is processor, and memory is a computer storage medium that contains instructions that are executed on at least one processor.) Therefore, claims 17-20 are rejected for the same reasons as claims 1, and 7-9, respectively. 
20.	Claims 2-3, and 15-16 are rejected under 35 U.S.C. § 103 as being unpatentable over Whatmough, Singh, and Feuz, further in view of Feuz.
Regarding claim 2
The combination of Whatmough, Singh, and Feuz teaches the method of claim 1, wherein 
the one or more scenario specific machine learning models are trained to perform a specific task [based upon a request] associated with the input data (Whatmough, Figure 1, and, page 2, column 1, paragraph 3, subparagraph 3 “Demonstration of the use of transfer learning to generalize a single common FFE to train a number of different back-end models for different datasets.” In other words, different back-end models are one or more scenario specific machine learning models, and train… for different datasets is trained to perform a specific task based upon a specific task associated with the input data. ).  
Thus far, the combination of Whatmough, Singh, and Feuz does not explicitly teach based upon a request.
Feuz teaches based upon a request (Feuz, page 3, paragraph 2, line 2 “In situations where competing machine learning models are available, e.g., in a marketplace of models, developers that wish to employ machine learning in their software applications do not currently have a mechanism to automatically select and use the model that is most suited for the problem specific to their software applications.” And, page 6, paragraph 2, line 8 “Selection of the ML service provider is transparent to the app developer, e.g., the API enables the app to request and obtain a suitable model for each supported ML task type.”  In other words, app to request is based upon a request.)
It would be obvious to combine Feuz into the combination of Whatmough, Singh, and Feuz at least for the reasons used to combine Feuz into the combination of Whatmough and Singh used in the claim 1.
Regarding claim 3,
The combination of Whatmough, Singh, Feuz, and Feuz teaches the method of claim 2, wherein providing the output determination comprises 
performing a task determined by the one or more scenario specific machine learning models (Whatmough, Figure 1, In other words, from Figure 1, the specific task of identifying what is pictured in an input image, in this case a cat, is performing a task determined by the one or more task specific back-end models. ).  
Claims 15-16 are system claims that correspond to method claims 2-3, respectively.  Otherwise, they are the same. Therefore, claims 15-16 are rejected for the same reasons as claims 2-3, respectively.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Show 5 earlier events
Sep 11, 2025
Response Filed
Oct 28, 2025
Final Rejection mailed — §103
Dec 24, 2025
Interview Requested
Jan 13, 2026
Applicant Interview (Telephonic)
Jan 13, 2026
Examiner Interview Summary
Feb 25, 2026
Request for Continued Examination
Mar 09, 2026
Response after Non-Final Action
Mar 20, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/617,994
Patent 12608632
ERROR DETECTION DEVICE, ERROR DETECTION METHOD, AND ERROR DETECTION PROGRAM
4y 4m to grant Granted Apr 21, 2026
17/509,582
Patent 12555002
RULE GENERATION FOR MACHINE-LEARNING MODEL DISCRIMINATORY REGIONS
4y 3m to grant Granted Feb 17, 2026
17/204,188
Patent 12530572
Method for Configuring a Neural Network Model
4y 10m to grant Granted Jan 20, 2026
18/072,677
Patent 12530622
GENERATING NEW DATA BASED ON CLASS-SPECIFIC UNCERTAINTY INFORMATION USING MACHINE LEARNING
3y 1m to grant Granted Jan 20, 2026
17/956,120
Patent 12493826
AUTOMATIC MACHINE LEARNING FEATURE BACKWARD STRIPPING
3y 2m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
79%
With Interview (+13.6%)
3y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 117 resolved cases by this examiner. Grant probability derived from career allowance rate.