Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1 – 4 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claims 1–4 recite “ a plurality of second output layers ,” but later recite “ the second output layer ,” singular (in “ training only the first output layer and the second output layer of the machine learning model using the downstream task; and training the entire machine learning model that includes the first output layer and the second output layer using the downstream task ”) . A person having ordinary skill in the art would be unable to determine whether “ the second output layer ” reference s A) any one layer of, B) a particular single layer of, or C) all of the layers of, the “ plurality of second output layers .” The claims are therefore indefinite. In further examination below, “ the second output layer ” will be interpreted as any single layer of the “ plurality of second output layers. ” Further, the claims recite “ for a machine learning model that includes a plurality of preliminarily trained layers ” and then “ the plurality of layers .” A person having ordinary skill in the art would be unable to determine if a reference to “ the plurality of layers ” is intended to be identified with the “ plurality of preliminarily trained layers, ” or whether “ the plurality of layers ” may include layers that are not preliminarily trained. In further examination below, “ the plurality of layers ” will be identified as the “ plurality of preliminarily trained layers .” Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim 3 rejected under 35 U.S.C. 102(a) (1) as being anticipated by Kumar et al., “Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution,” February 2022, https://arxiv.org/abs/2202.10054v1 (hereafter Kumar ). Kumar teaches: “ A machine learning method comprising ”: Kumar , section 4, paragraph 1, “We run experiments on ten benchmark datasets with deep neural networks and see that given good pretrained features, fine-tuning (FT) does better ID but worse OOD than linear probing (LP). As predicted by the theory, we find that LP-FT does better than both methods [ A machine learning method ] .” “ for a machine learning model that includes a plurality of preliminarily trained layers ”: Kumar , section 4, paragraph 3, “We use a CLIP pretrained ViT-B/16 for ImageNet. For the other datasets we use a ResNet-50 architecture [ i.e., a ResNet with 50 layers ] and consider a diverse range of pretraining methods [ a machine learning model that includes a plurality of preliminarily trained layers ] and datasets.” “ a first output layer formed according to a downstream task and coupled to a final layer of the plurality of layers, and a plurality of second output layers that is coupled to respective outputs of layers other than the final layer of the plurality of layers and has a same configuration as the first output layer ”: Kumar , section 4, paragraph 3, “We use a CLIP pretrained ViT-B/16 for ImageNet. For the other datasets we use a ResNet-50 architecture [ i.e., a ResNet with 50 layers ; first output layer interpreted as the layer coupled to the “head,” the layer preceding this layer interpreted as a final layer of the plurality of layers ; the last layer and one other layer not coupled to the final layer interpreted as the plurality of second output layers ] and consider a diverse range of pretraining methods and datasets”; Kumar, section B.2, paragraph 3, “DomainNet: We use the dataset splits in Tan et al. (2020) which is also used by follow-up work, e.g., in Prabhu et al. (2021). This is different from the original version of the DomainNet dataset (Peng et al., 2019), specifically Tan et al. (2020) note that some domains and classes contain many mislabeled outliers, so they select the 40 most common classes from the ‘sketch’, ‘real’, ‘clipart’ and ‘painting’ domains. We use the ‘sketch’ domain as ID, and all other domains (‘real’, ‘clipart’, ‘painting’) as OOD, and in the main paper we report the average accuracies across the OOD domains. In Table 3 we see that the same trends hold for each of the three OOD domains. We use a CLIP (Radford et al., 2021) pretrained ResNet-50 model [ the layers in the model, including the first output layer and the plurality of second output layers are designed to be combined to produce model output , hence producing a first output layer formed according to a downstream task and a plurality of second output layers that … has a same configuration as the first output layer , and fine-tune for 50 epochs (since this is a smaller dataset).” “ training only the first output layer and the second output layer of the machine learning model using the downstream task ”: Kumar , Fig. 1, [ showing, in step b, training using backpropagation between two layers at the “head,” interpreted as the first output layer and the second output layer , hence , training only the first output layer and the second output layer of the machine learning model using the downstream task ]. “ and training the entire machine learning model that includes the first output layer and the second output layer using the downstream task ”: Kumar , Fig. 1, [ showing, in step a, end-to-end training using backpropagation that includes the two layers trained in part b, hence training the entire machine learning model that includes the first output layer and the second output layer using the downstream task ]. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1–2 and 4 rejected under 35 U.S.C. 103 over Kumar in view of Garcia et al., US Pre-Grant Publication No. 2019/0258928 (hereafter Garcia ). Regarding claim 1 : Claim 1 is analogous to claim 3, except that claim 1 is to “ [a] non-transitory computer-readable recording medium storing a machine learning program for causing a computer to execute a process comprising ” the method steps of claim 3 , which are taught by Kumar , as detailed in the 102(a) rejection, above. Kumar does not teach “ [a] non-transitory computer-readable recording medium storing a machine learning program for causing a computer to execute a process comprising. ” Garcia teaches “ [a] non-transitory computer-readable recording medium storing a machine learning program for causing a computer to execute a process comprising ”: Garcia , paragraph 0110, “The techniques may be implemented by computer software which, when executed by a computer, causes the computer to implement the method described above and/or to implement the resulting ANN. Such computer software may be stored by a non-transitory machine-readable medium such as a hard disk, optical disk, flash memory or the like, and implemented by data processing apparatus comprising one or more processing elements [ non-transitory computer-readable recording medium storing a machine learning program for causing a computer to execute a process comprising ].” Garcia and Kumar are analogous arts as they are both related to the training of machine learning models. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the non-transitory computer-readable recording medium of Garcia with the teachings of Kumar to arrive at the present invention, in order to provide a mechanism for storing the program instructions fort execution on a processor, as stated in Garcia , paragraph 0 110 , “ The techniques may be implemented by computer software which, when executed by a computer, causes the computer to implement the method described above and/or to implement the resulting ANN .” Regarding claim 2 : Kumar as modified by Su teaches “ [t] he non-transitory computer-readable recording medium according to claim 1.” Kumar further teaches “ for a preliminarily trained model that includes the plurality of preliminarily trained layers”: Kumar , section 4, paragraph 3, “We use a CLIP pretrained ViT-B/16 for ImageNet. For the other datasets we use a ResNet-50 architecture [ i.e., a ResNet with 50 layers ] and consider a diverse range of pretraining methods [ a preliminarily trained model that includes the plurality of preliminarily trained layers ] and datasets.” Garcia further teaches: “ the recording medium storing the machine learning program for causing the computer to execute the process further comprising”: Garcia , paragraph 0110, “The techniques may be implemented by computer software which, when executed by a computer, causes the computer to implement the method described above and/or to implement the resulting ANN. Such computer software may be stored by a non-transitory machine-readable medium such as a hard disk, optical disk, flash memory or the like, and implemented by data processing apparatus comprising one or more processing elements [ the recording medium storing the machine learning program for causing the computer to execute the process further comprising ].” “ replacing an output layer coupled to the final layer of the plurality of layers with the first output layer formed according to the downstream task ”: Garcia , Figs. 12a-12d, [ showing layer 1240 acting as a final layer of the plurality of layers] ; Garcia , paragraph 0106, “FIG. 12b: the layers 1220, 1230, are replaced by a replacement layer 1225 [ replacing an output layer coupled to the final layer of the plurality of layers with the first output layer formed according to the downstream task , formed according to the downstream task interpreted as designed to be part of the same model ] . Here, the step 1100 involves detecting data signals (when training data is applied) for a first position x 1 , . . . , x N (such as the input to the layer 1220 ) and a second position y 1 , . . . , y N (such as the output of the layer 1230) in the ordered series of layers of neurons; the insertion layer is the layer 1225 and the step 1120 involves deriving an initial approximation of at least a set of weights (W init and/or b init ) for the insertion layer 1225 using a least squares approximation from the data signals detected for the first position and a second position. This provides an example of providing the insertion layer to replace one or more layers of the base ANN.” “ and coupling the respective second output layers that have the same configuration as the first output layer to the respective outputs of the layers other than the final layer of the plurality of layers to generate the machine learning model ”: Garcia , Figs. 12a-12d, [ showing replacement layer 1225 in Fig. 12b connected to previous layer 1210, and further l a yer 1240, hence all layers in the model remain connected, hence, coupling the respective second output layers that have the same configuration as the first output layer to the respective outputs of the layers other than the final layer of the plurality of layers to generate the machine learning model] ; Garcia and Kumar are analogous arts as they are both related to the training of machine learning models. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the layer replacement of Garcia with the teachings of Kumar to arrive at the present invention, in order to provide an efficient mechanism for trying different model variants, as stated in Garcia , paragraph 0004, “Designing and training such a DNN is typically very time consuming. When a new DNN is developed for a given task, many so-called hyper-parameters (parameters related to the overall structure of the network) must be chosen empirically. For each possible combination of structural hyper-parameters, a new network is typically trained from scratch and evaluated. While progress has been made on hardware (such as Graphical Processing Units providing efficient single instruction multiple data (SIMD) execution) and software (such as a DNN library developed by NVIDIA called cuDNN) to speed-up the training time of a single structure of a DNN, the exploration of a large set of possible structures remains still potentially slow.” Regarding claim 4: Claim 4 is analogous to claim 2, except that lacks the use of the non-transitory computer-readable recording medium . Claim 4 is therefore rejected by the same reasoning. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Tan, US Patent No. 11 , 604,993, discloses methods for modifying trained models, including replacing model layers. Sandler, US Pre-Grant Publication No. 2020/0104706, discloses a method of transfer learning including replacing a convolution layer or filter in one model with a replacement derived from another model. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT VINCENT SPRAUL whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (703) 756-1511 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT M-F 9:00 am - 5:00 pm . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT MICHAEL HUNTLEY can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (303) 297-4307 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /VAS/ Examiner, Art Unit 2129 /MICHAEL J HUNTLEY/ Supervisory Patent Examiner, Art Unit 2129