Office Action Analysis: 18173347 — Proxy Task Design Tools for Neural Architecture Search

Office Action

§103 §112
DETAILED ACTION
Status of Claims
This Office action is responsive to communications filed on 2023-02-23. Claim(s) 1-20 is/are pending and are examined herein.
Claim(s) 5-8, 10, 13-15, and 19 is/are objected to. 
Claim(s) 8-9, 15-16, and 20 is/are rejected under 35 USC 112(b).
Claim(s) 1-20 is/are rejected under 35 USC 103.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after 2013-03-16, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The attached information disclosure statement(s) (IDS), submitted on 2023-02-23, is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the attached information disclosure statement(s) is/are being considered by the examiner.

Claim Objections
Claim(s) 5-8, 10, 13-15, and 19 is/are objected to because of the following informalities: 
Claims 5 and 13 recite a minimum amount of candidate models [emphasis added] but this should be “a minimum number of candidate models” for grammaticality (because “models” is a count noun, not a mass noun). 
Claims 6, 14, and 19 recite the correlation score [emphasis added] but this should be “each correlation score” for proper antecedent basis (since the parent claim introduces a plurality of correlation scores, one for each of the plurality of proxy task choices). Dependent claims 7-8 inherit the objection. 
Claims 8 and 15 recites the shortest amount of time [emphasis added] but this should be “a shortest amount of time” for proper antecedent basis (since neither the claims nor their parents previously introduce a “shortest amount of time”). 
Claim 10 recites lower the variance [emphasis added] but this should be “lower the score variance or smoothness” for consistency of nomenclature and proper antecedent basis. 

Appropriate correction is required.
	
Claim Rejections - 35 USC 112(b)
The following is a quotation of 35 USC 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 USC 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 8-9, 15-16, and 20 is/are rejected under 35 USC 112(b) or 35 USC 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 USC 112, the applicant), regards as the invention.

Claims 8 and 15 recite selecting a proxy task choice that obtained the threshold correlation score within the shortest amount of time [emphasis added]. This is indefinite because it is not clear that “the threshold correlation score” has proper antecedent basis, and if it does, how the limitation recited in this dependent might change the interpretation of the respective parent claim (i.e., claims 7 and 14, respectively). The parent claim enumerates three distinct stopping criteria in the alternative, which means the broadest reasonable interpretation of the parent claim requires only one of the three stopping criteria to be satisfied. Only one of those criteria introduces a “threshold correlation score”. Stated differently, the parent claim appears to encompass at least three distinct species, and in two of those three species, there is no “threshold correlation score” to provide antecedent basis for the recitation of “the threshold correlation score” recited in the dependent. This renders unclear whether “the threshold correlation score” in the dependent actually has antecedent basis, and if so, if it further forces that the parent claim be read so that it is directed specifically towards the species which introduces the “threshold correlation score”. The specification provides no clear guidance regarding this point. For the purpose of compact prosecution, the claim is interpreted broadly so that the “threshold correlation score” recited by this claim is not necessarily bound in scope by any elements introduced in the parent claim. 

Claims 9, 16, and 20 recites a reduced period of time without indicating the period of time with respect to which the “reduced period of time” is “reduced”. MPEP 2173.05(b) indicates that a “claim may be rendered indefinite when a limitation of the claim is defined by reference to an object and the relationship between the limitation and the object is not sufficiently defined” and, in the present instance, the claim element “a reduced period of time” appears to be defined by reference to an insufficiently specified object. Consequently, the claim element is indefinite. For the purpose of compact prosecution, the claim is interpreted broadly as encompassing any period of time (since any period of time is “reduced” with respect to any longer period). Dependent claim 10 inherits the rejection. 

Claim Rejections - 35 USC 103
The following is a quotation of 35 USC 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 USC 102(b)(2)(C) for any potential 35 USC 102(a)(2) prior art against the later invention.

Claim(s) 1-7, 11-14, and 17-19 is/are rejected under 35 USC 103 as being unpatentable over Jivko SINAPOV et al. (Learning Inter-Task Transferability in the Absence of Target Task Samples, published 2015-05-04; hereafter, “Sinapov”) in view of Ruochen WANG et al. (RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving, published 2021-08-18; hereafter, “Wang”). 

Claim 1
Sinapov discloses: 
A method for automatically determining a proxy task ([Sinapov, section 4.1]: Sinapov discloses a system whose goal, given a target task T_j in T_{target}, is to “select a task T_i in T_{source} such that T_i serves as an effective source for learning T_j” [Sinapov, section 4.1 second paragraph]. Selecting T_i in T_{source} maps to “determining a proxy task” as recited by the claim.)
comprising: determining, by one or more processors, a plurality of correlation candidate models to evaluate each of a plurality of proxy task choices ([Sinapov, section 4.1 and 5.2]: As noted above, Sinapov discloses a set T_{source} of source tasks [Sinapov, section 4.1]. It also discloses an agent “learn[ing] to play each task” [Sinapov, section 5.2 paragraph beginning “Varying”]. The set T_{source} of source tasks maps to the “plurality of proxy task choices” of the claim, and agents performing the source tasks maps to the “plurality of correlation candidate models” of the claim. Sinapov further discloses implementing the methods on a Condor Cluster system [Sinapov, section 5.2 paragraph beginning “All told”]. The processors in the cluster map to the “one or more processors” of the claim.)
generating, by the one or more processors, full-training scores for each of the plurality of correlation candidate models; ([Sinapov, figure 2]: Sinapov discloses computing rewards after each episode of training [Sinapov, figure 2; see also, section 5.2 paragraph beginning “Varying”]. The rewards for each source task after the last training episode for that task maps to the “full-training scores” of the claim.)
generating, by the one or more processors, a correlation score for each of the plurality of proxy task choices using the plurality of correlation candidate models and the full-training scores; ([Sinapov, sections 4.2 and 5.2]: Sinapov discloses that the system can compute, for each T_i in T_{source}, the value “hat{B}(T_i, T_j), i.e., the expected benefit of transferring T_i to T_j” [Sinapob, section 4.2 first paragraph]. To estimate this benefit of transfer, the agent is trained on task T_j starting with the policy learned on task T_i [Sinapov, section 5.2 paragraph beginning “One the baseline curves”]. The quantity hat{B}(T_i, T_j) for each source task T_i maps to the “correlation score for each of the plurality of task choices” of the claim. Since estimating these quantities uses the models trained on the source tasks, it “us[es] the plurality of correlation candidate models and the full-training scores” with the “plurality of correlation candidate models” and the “full-training scores” as mapped above.)
ranking, by the one or more processors, the plurality of proxy task choices based on the correlation scores and training time; ([Sinapov, sections 4.3 and 5.2]: Sinapov discloses creating a ranked list R_j = [T_{{1}}, T_{{2}}, …, T_{{P}}] of source tasks according to the expected benefits, i.e., hat{B}(T_{{k}}, T_j) geq hat{B}(T_{{k+1}}, T_j} for all k [Sinapov, section 4.3.1 second paragraph]. In other words, the ranked list R_j is “based on the correlation scores” as mapped above. It is also based on “training time” since the estimates hat{B}(T_i, T_j) are computed after the baseline models for each task have been trained [Sinapov, section 5.2 paragraph beginning “Once the baseline curves”].)
selecting, by the one or more processors, a proxy task choice of the plurality of proxy task choices based on the ranking; and outputting, by the one or more processors, instructions associated with the selected proxy task choice. ([Sinapov, section 4.3]: Sinapov indicates that the best possible source task is defined as T^* = argmax_{T_i in T_{source}} B(T_i, T_j). In other words, T^* = T_{{1}} maps to the “selected proxy task choice” of the claim. The use of T^* for transfer learning maps to the “instructions associated with the selected proxy task choice” of the claim.)

While Sinapov discusses source task selection in the general context of transfer learning, it does not describe a specific application to neural architecture search. In other words, Sinapov does not distinctly disclose:
for a neural architecture search… for the neural architecture search;

Wang is in the field of machine learning. Moreover, Sinapov in view of Wang discloses: 
for a neural architecture search… for the neural architecture search; ([Wang, abstract and algorithm 1]: Wang discloses a method of neural architecture search called “NOn-uniform Successive Halving (NOSH)” [Wang, abstract] which includes training steps [Wang, algorithm 1; see also, section 3.2 paragraph beginning “Initialization”]. In the combination, the source task selection procedure of Sinapov is used to determine the training tasks used for the neural architecture search of Wang. The applicant is also invited to consult YLi and Zoph as cited in the conclusion of this Office action.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the source/proxy task selection method of Sinapov with the neural architecture search method of Wang because the latter “reduces the search budget by ~5x while achieving competitive or even better than previous state-of-the-art predictor based methods” [Wang, abstract], thereby resulting in an efficient system overall. 

Claim 2
Sinapov in view of Wang discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, further comprising] receiving, by the one or more processors, the plurality of proxy task choices for the neural architecture search. ([Sinapov, section 4.1]: As noted above, the set T_{source} of source tasks maps to the “plurality of proxy task choices” of the claim.)

The same motivation to combine applies. 

Claim 3
Sinapov in view of Wang discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein determining the plurality of correlation candidate models further comprises:] randomly sampling a first plurality of models from a search space for the neural architecture search; ([Wang, section 3.4]: Wang discloses “initializ[ing] the pool by randomly sampling K_{init} architectures from the search space” [Wang, section 3.4 first paragraph]. In other words, the pool of K_{init} architectures from the search space map to the “first plurality” of the claim.)
training each of the first plurality of models for a first fraction of a full training time for the neural architecture search; ([Wang, section 3.2 and algorithm 1]: Part of the input to the algorithm is the “schedule E = {e^{(l)}}_{l = 1}^N [which] represents the training epoch for every architecture at each level, where e^{(i)} < e^{(i+1)}, i = 1 ~ (N – 1), and e^{(N)} is the maximum number of epochs (fully trained)” [Wang, section 3.2 paragraph beginning “We introduce”; see also, algorithm 1]. In other words, the initial pool of K_{init} architectures is trained for e^{(1)} epochs [Wang, section 3.2 paragraph beginning “Initialization”; see also, algorithm 1 line 7]. Thus e^{(N)} maps to the “full training time” of the claim, and e^{(1)} maps to a “first fraction” of the full training time of the claim.)
and rejecting a first portion of the first plurality of models which do not add to a score distribution for one or more metrics among the first plurality of models, wherein a second plurality of models corresponds to the first portion subtracted from the first plurality of models. ([Wang, section 3.2 and algorithm 1]: Wang discloses that “architectures in with the validation accuracy in the bottom K_{init}(1-r) will be terminated (kept in level 1), while the top K_{init}r architectures will be trained further to e^{(2)} epochs and upgrade to level 2” [Wang, section 3.2 paragraph beginning “Initialization”]. In other words, the bottom K_{init}(1-r) map to the “first portion” of the claim, and the top K_{init}r map to the “second plurality” of the claim.)

The same motivation to combine applies. 

Claim 4
Sinapov in view of Wang discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 3, wherein determining the plurality of correlation candidate models further comprises:] training each of the second plurality of models for a second fraction of the full training time for the neural architecture search, ([Wang, section 3.2 and algorithm 1]: As noted above, Wang discloses that “the top K_{init}r architectures will be trained further to e^{(2)} epochs” [Wang, section 3.2 paragraph beginning “Initialization”]. In other words, e^{(2)} maps to the “second fraction” of the full training time of the claim.)
the second fraction being greater than the first fraction; ([Wang, section 3.2]: As noted above, Wang discloses that e^{(i)} < e^{(i+1)} for all i [Wang, section 3.2 paragraph beginning “We introduce”]. Since e^{(1)} maps to the “first fraction” and e^{(2)} to the “second fraction” of the claim, it is indeed the case that the “second fraction [is] greater than the first fraction” as required by the claim.)
and rejecting a second portion of the second plurality of models which do not add to a score distribution for the one or more metrics among the second plurality of models. ([Wang, section 3.2]: Wang indicates that the “process repeats until the maximum training epoch e^{(N)} is reached” [Wang, section 3.2 paragraph beginning “Initialization”]. In other words, the K_{init}r(1-r) architectures which terminate at level 2 map to the “second portion” of the claim.)

The same motivation to combine applies. 

Claim 5
Sinapov in view of Wang discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 4, wherein determining the plurality of correlation candidate models further comprises] iteratively repeating training and rejecting of models until meeting a minimum amount of candidate models. ([Wang, section 3.2]: As noted above, Wang indicates that the “process repeats until the maximum training epoch e^{(N)} is reached” [Wang, section 3.2 paragraph beginning “Initialization”]. The K_{init}r^{N-1} level-N candidates map to the “minimum amount of candidate models” of the claim.)

The same motivation to combine applies. 

Claim 6
Sinapov in view of Wang discloses the elements of the parent claim(s). 
[The method of claim 1, wherein generating the correlation scores for each of the plurality of proxy task choices further comprises:] training each of the plurality of correlation candidate models; ([Sinapov, section 5.2]: As noted under the parent claim, Sinapov discloses agents “learn[ing] to play each task” [Sinapov, section 5.2 paragraph beginning “Varying”], the agents playing source tasks mapping to the “plurality of correlation candidate models” of the claim. In other words, the learning of the agents maps to the “training” of the claim.)
and during the training: monitoring one or more metrics and training time; ([Sinapov, figure 1 and section 5.2]: Sinapov discloses 2500 training episodes for each task [Sinapov, section 5.2 paragraph beginning “Varying”] and it tracks rewards as the number of training episodes changes [Sinapov, figure 1]. In other words, keeping track of present reward maps to “monitoring one or more metrics” as recited by the claim, and keeping track of the training episode counter maps to “monitoring… training time” as recited by the claim. Both of these are performed “during the training” as required by the claim.)
and continuously computing the correlation score based on the full-training scores. ([Sinapov, sections 4.2 and 5.2, and figure 3]: As noted under the parent claim, an expected benefit hat{B}(T_i, T_j) maps to a “correlation-score” of the claim, and these are “based on the full-training scores” because their computation uses the models trained on the source tasks. Sinapov also shows computing the expected benefit of transfer hat{B}(T_i, T_j) “continuously” as training progresses [Sinapov, figure 3].)

The same motivation to combine applies. 

Claim 7
Sinapov in view of Wang discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 6, wherein generating the correlation scores for each of the plurality of proxy task choices further comprises] stopping the training once a threshold correlation score is obtained or at least one of a threshold amount of time or a threshold for the one or more metrics is exceeded. ([Sinapov, section 5.2]: As noted above, Sinapov discloses 2500 training episodes for each task [Sinapov, section 5.2 paragraph beginning “Varying”]. In other words, 2500 training episodes maps to (at least) the “threshold amount of time” of the claim. Training in Sinapov stops when the training episode counter exceeds 2500, i.e., when the “threshold amount of time… is exceeded” as recited by the claim.)

The same motivation to combine applies.

Claim 11
Sinapov discloses: 
A system comprising: one or more processors; and one or more storage devices coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations for automatically ([Sinapov, section 5.2]: Sinapov discloses that the methods disclosed therein are implemented on a Condor Cluster system [Sinapov, section 5.2 paragraph beginning “All told”]. The cluster maps to the “system” of the claim, the processors in the cluster map to the “one or more processors” of the claim, and memory or hard drives in the cluster to the “one or more storage devices” of the claim.)
determining a proxy task ([Sinapov, section 4.1]: Sinapov discloses a system whose goal, given a target task T_j in T_{target}, is to “select a task T_i in T_{source} such that T_i serves as an effective source for learning T_j” [Sinapov, section 4.1 second paragraph]. Selecting T_i in T_{source} maps to “determining a proxy task” as recited by the claim.)
the operations comprising: determining a plurality of correlation candidate models to evaluate each of a plurality of proxy task choices ([Sinapov, section 4.1 and 5.2]: As noted above, Sinapov discloses a set T_{source} of source tasks [Sinapov, section 4.1]. It also discloses an agent “learn[ing] to play each task” [Sinapov, section 5.2 paragraph beginning “Varying”]. The set T_{source} of source tasks maps to the “plurality of proxy task choices” of the claim, and agents performing the source tasks maps to the “plurality of correlation candidate models” of the claim.)
generating full-training scores for each of the plurality of correlation candidate models; ([Sinapov, figure 2]: Sinapov discloses computing rewards after each episode of training [Sinapov, figure 2; see also, section 5.2 paragraph beginning “Varying”]. The rewards for each source task after the last training episode for that task maps to the “full-training scores” of the claim.)
generating a correlation score for each of the plurality of proxy task choices using the plurality of correlation candidate models and the full-training scores; ([Sinapov, sections 4.2 and 5.2]: Sinapov discloses that the system can compute, for each T_i in T_{source}, the value “hat{B}(T_i, T_j), i.e., the expected benefit of transferring T_i to T_j” [Sinapob, section 4.2 first paragraph]. To estimate this benefit of transfer, the agent is trained on task T_j starting with the policy learned on task T_i [Sinapov, section 5.2 paragraph beginning “One the baseline curves”]. The quantity hat{B}(T_i, T_j) for each source task T_i maps to the “correlation score for each of the plurality of task choices” of the claim. Since estimating these quantities uses the models trained on the source tasks, it “us[es] the plurality of correlation candidate models and the full-training scores” with the “plurality of correlation candidate models” and the “full-training scores” as mapped above.)
ranking the plurality of proxy task choices based on the correlation scores and training time; ([Sinapov, sections 4.3 and 5.2]: Sinapov discloses creating a ranked list R_j = [T_{{1}}, T_{{2}}, …, T_{{P}}] of source tasks according to the expected benefits, i.e., hat{B}(T_{{k}}, T_j) geq hat{B}(T_{{k+1}}, T_j} for all k [Sinapov, section 4.3.1 second paragraph]. In other words, the ranked list R_j is “based on the correlation scores” as mapped above. It is also based on “training time” since the estimates hat{B}(T_i, T_j) are computed after the baseline models for each task have been trained [Sinapov, section 5.2 paragraph beginning “Once the baseline curves”].)
selecting a proxy task choice of the plurality of proxy task choices based on the ranking; and outputting instructions associated with the selected proxy task choice. ([Sinapov, section 4.3]: Sinapov indicates that the best possible source task is defined as T^* = argmax_{T_i in T_{source}} B(T_i, T_j). In other words, T^* = T_{{1}} maps to the “selected proxy task choice” of the claim. The use of T^* for transfer learning maps to the “instructions associated with the selected proxy task choice” of the claim.)

While Sinapov discusses source task selection in the general context of transfer learning, it does not describe a specific application to neural architecture search. In other words, Sinapov does not distinctly disclose:
for a neural architecture search… for the neural architecture search;

Wang is in the field of machine learning. Moreover, Sinapov in view of Wang discloses: 
for a neural architecture search… for the neural architecture search; ([Wang, abstract and algorithm 1]: Wang discloses a method of neural architecture search called “NOn-uniform Successive Halving (NOSH)” [Wang, abstract] which includes training steps [Wang, algorithm 1; see also, section 3.2 paragraph beginning “Initialization”]. In the combination, the source task selection procedure of Sinapov is used to determine the training tasks used for the neural architecture search of Wang. The applicant is also invited to consult YLi and Zoph as cited in the conclusion of this Office action.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the source/proxy task selection method of Sinapov with the neural architecture search method of Wang because the latter “reduces the search budget by ~5x while achieving competitive or even better than previous state-of-the-art predictor based methods” [Wang, abstract], thereby resulting in an efficient system overall. 

Claims 12, 13, and 14 inherit limitations from claim 11 and recite additional limitations which are substantially similar to those recited by claims 3, 5, and 6-7, respectively, so they are rejected by the same rationale. 

Claim 17
Sinapov discloses: 
A non-transitory computer readable medium for storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations for automatically ([Sinapov, section 5.2]: Sinapov discloses that the methods disclosed therein are implemented on a Condor Cluster system [Sinapov, section 5.2 paragraph beginning “All told”]. The processors in the cluster map to the “one or more processors” of the claim, and any hard drive in the cluster to the “non-transitory computer readable storage medium” of the claim.)
determining a proxy task ([Sinapov, section 4.1]: Sinapov discloses a system whose goal, given a target task T_j in T_{target}, is to “select a task T_i in T_{source} such that T_i serves as an effective source for learning T_j” [Sinapov, section 4.1 second paragraph]. Selecting T_i in T_{source} maps to “determining a proxy task” as recited by the claim.)
the operations comprising: determining a plurality of correlation candidate models to evaluate each of a plurality of proxy task choices ([Sinapov, section 4.1 and 5.2]: As noted above, Sinapov discloses a set T_{source} of source tasks [Sinapov, section 4.1]. It also discloses an agent “learn[ing] to play each task” [Sinapov, section 5.2 paragraph beginning “Varying”]. The set T_{source} of source tasks maps to the “plurality of proxy task choices” of the claim, and agents performing the source tasks maps to the “plurality of correlation candidate models” of the claim.)
generating full-training scores for each of the plurality of correlation candidate models; ([Sinapov, figure 2]: Sinapov discloses computing rewards after each episode of training [Sinapov, figure 2; see also, section 5.2 paragraph beginning “Varying”]. The rewards for each source task after the last training episode for that task maps to the “full-training scores” of the claim.)
generating a correlation score for each of the plurality of proxy task choices using the plurality of correlation candidate models and the full-training scores; ([Sinapov, sections 4.2 and 5.2]: Sinapov discloses that the system can compute, for each T_i in T_{source}, the value “hat{B}(T_i, T_j), i.e., the expected benefit of transferring T_i to T_j” [Sinapob, section 4.2 first paragraph]. To estimate this benefit of transfer, the agent is trained on task T_j starting with the policy learned on task T_i [Sinapov, section 5.2 paragraph beginning “One the baseline curves”]. The quantity hat{B}(T_i, T_j) for each source task T_i maps to the “correlation score for each of the plurality of task choices” of the claim. Since estimating these quantities uses the models trained on the source tasks, it “us[es] the plurality of correlation candidate models and the full-training scores” with the “plurality of correlation candidate models” and the “full-training scores” as mapped above.)
ranking the plurality of proxy task choices based on the correlation scores and training time; ([Sinapov, sections 4.3 and 5.2]: Sinapov discloses creating a ranked list R_j = [T_{{1}}, T_{{2}}, …, T_{{P}}] of source tasks according to the expected benefits, i.e., hat{B}(T_{{k}}, T_j) geq hat{B}(T_{{k+1}}, T_j} for all k [Sinapov, section 4.3.1 second paragraph]. In other words, the ranked list R_j is “based on the correlation scores” as mapped above. It is also based on “training time” since the estimates hat{B}(T_i, T_j) are computed after the baseline models for each task have been trained [Sinapov, section 5.2 paragraph beginning “Once the baseline curves”].)
selecting a proxy task choice of the plurality of proxy task choices based on the ranking; and outputting instructions associated with the selected proxy task choice. ([Sinapov, section 4.3]: Sinapov indicates that the best possible source task is defined as T^* = argmax_{T_i in T_{source}} B(T_i, T_j). In other words, T^* = T_{{1}} maps to the “selected proxy task choice” of the claim. The use of T^* for transfer learning maps to the “instructions associated with the selected proxy task choice” of the claim.)

While Sinapov discusses source task selection in the general context of transfer learning, it does not describe a specific application to neural architecture search. In other words, Sinapov does not distinctly disclose:
for a neural architecture search… for the neural architecture search;

Wang is in the field of machine learning. Moreover, Sinapov in view of Wang discloses: 
for a neural architecture search… for the neural architecture search; ([Wang, abstract and algorithm 1]: Wang discloses a method of neural architecture search called “NOn-uniform Successive Halving (NOSH)” [Wang, abstract] which includes training steps [Wang, algorithm 1; see also, section 3.2 paragraph beginning “Initialization”]. In the combination, the source task selection procedure of Sinapov is used to determine the training tasks used for the neural architecture search of Wang. The applicant is also invited to consult YLi and Zoph as cited in the conclusion of this Office action.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the source/proxy task selection method of Sinapov with the neural architecture search method of Wang because the latter “reduces the search budget by ~5x while achieving competitive or even better than previous state-of-the-art predictor based methods” [Wang, abstract], thereby resulting in an efficient system overall. 

Claims 18 and 19 inherit limitations from claim 11 and recite additional limitations which are substantially similar to those recited by claims 3 and 6-7, respectively, so they are rejected by the same rationale. 

Claim(s) 8 and 15 is/are rejected under 35 USC 103 as being unpatentable over Sinapov in view of Wang, further in view of Benjamin WILSON et al. (US20230244982A1, effectively filed 2022-01-28; hereafter, “Wilson”). 

Claim 8
Sinapov in view of Wang discloses the elements of the parent claim(s). It might not distinctly disclose: 
[The method of claim 7, wherein selecting the proxy task choice of the plurality of proxy tasks further comprises] selecting a proxy task choice that obtained the threshold correlation score within the shortest amount of time.  

Wilson is in the field of machine learning. Moreover, Sinapov in view of Wang and Wilson discloses:
[The method of claim 7, wherein selecting the proxy task choice of the plurality of proxy tasks further comprises] selecting a proxy task choice that obtained the threshold correlation score within the shortest amount of time. ([Wilson, 0076]: Wang discloses making a selection of a model based on predetermined criteria, such as the criterion of “a speed by which a model provides… a prediction that satisfies a minimum threshold” [Wilson, 0076]. In the combination, the models of Wilson are the models trained on source tasks on which transfer learning is performed, as disclosed in Sinapov, and the threshold of Wilson is taken to be the “threshold correlation score” of the claim. With these mappings, making a selection of proxy task based on the speed by which a prediction satisfying a threshold is provided maps to the step of “selecting a proxy task choice” as recited by the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method of source task selection for neural architecture search as disclosed by Sinapov in view of Wang with the selection of source tasks which quickly achieve threshold performance as described in Wilson because this would ensure that the system runs quickly.

Claims 15 inherits limitations from claim 11 and recites additional limitations which are substantially similar to those recited by claims 8, so it is rejected by the same rationale. 

Claim(s) 9, 16, and 20 is/are rejected under 35 USC 103 as being unpatentable over Sinapov in view of Wang, further in view of Tomoyuki OKUDA et al. (Non-parametric Prediction Interval Estimate for Uncertainty Quantification of the Prediction of Road Pavement Deterioration, published 2018; hereafter, “Okuda”).

Claim 9
Sinapov in view of Wang discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, further comprising:] randomly sampling, by the one or more processors, the search space to find a model for testing variance ([Wang, section 3.4]: Wang discloses “randomly sampling K_{init} architectures from the search space” [Wang, section 3.4 first paragraph]. Any one of the K_{init} architectures thus sampled maps to the “model” of the claim. The examiner notes that “for testing variance” is a field of use limitation which is disclosed by the combination as proposed below.)

The same motivation to combine applies. 

Sinapov does not distinctly disclose performing a bootstrapping procedure on a model. In other words, Sinapov in view of Wang does not distinctly disclose: 
running, by the one or more processors, training for a plurality of copies of the model for a reduced period of time; 
and measuring, by the one or more processors, at least one of a score variance or smoothness of the plurality of copies of the model.  

Okuda is in the field of machine learning. It discloses a bootstrapping procedure for neural networks [Okuda, section III.B]. In particular, Sinapov in view of Wang and Okuda discloses: 
running, by the one or more processors, training for a plurality of copies of the model for a reduced period of time; ([Okuda, sections III.B and IV.D]: Okuda discloses resampling the training set B times to produce B bootstrap samples [Okuda, section III.B part 2)] and, “[f]or each of the B [neural network] models, n_{ae} epochs are learned using each bootstrap sample” [Okuda, section III.B part 3)]. In other words, the B learning steps of [Okuda, section III.B part 3)] map to the “training for a plurality of copies of the model” of the claim. The examiner notes that, in the specific examples described in Okuda, n_{ae} is either 100 or 10 and is, in particular, less than (i.e., “reduced” relative to) the 400 training epochs used for the original model [Okuda, section IV.D].)
and measuring, by the one or more processors, at least one of a score variance or smoothness of the plurality of copies of the model. ([Okuda, section III.B]: Okuda discloses computing a confidence interval using the bootstrapped models [Okuda, section III.B part 4)]. This confidence interval (or, alternatively, its width) falls under the broadest reasonable interpretation of a “score variance or smoothness” as recited by the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method of source task selection for neural architecture search as disclosed by Sinapov in view of Wang with the bootstrapping method disclosed by Okuda because it decreases the computational cost “to about 1/38 that of the usual bootstrap method” and because it avoids overestimating [Okuda, section 1 last paragraph], thereby resulting in an effective and efficient method of simulating the distribution of an estimator. 

Claims 16 and 20 inherit limitations from claims 11 and 17, respectively, and recite additional limitations which are substantially similar to those recited by claims 9, so they are rejected by the same rationale. 

Claim(s) 10 is/are rejected under 35 USC 103 as being unpatentable over Sinapov in view of Wang and Okuda, further in view of Matthew BJONERUD et al. (US20190102835A1, published 2019-04-04; hereafter, “Bjonerud”)

Claim 10
Sinapov in view of Wang and Okuda discloses the elements of the parent claim(s). It does not distinctly disclose: 
[The method of claim 9, further comprising:] determining, by the one or more processors, the score variance or smoothness is above a threshold; and outputting one or more instructions to lower the variance.  

Bjonerud is in the field of machine learning. Moreover, Sinapov in view of Wang, Okuda, and Bjonerud discloses: 
[The method of claim 9, further comprising:] determining, by the one or more processors, the score variance or smoothness is above a threshold; and outputting one or more instructions to lower the variance. ([Bjonerud, 0134]: Bjonerud discloses that if “the variance exceeds the variance thresholds in input 710 then artificial intelligence system 730 modifies its decision criteria to reduce the variance” [Bjonerud, 0134]. In the combination, the input of Bjonerud corresponds to the intervals produced by the bootstrapping method disclosed in Okuda. The variance thresholds of Bjonerud map to the “threshold” of the claim, and modification of decision criteria to reduce variance maps to the “instructions to lower the [score] variance [or smoothness]” of the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the method of source task selection for neural architecture search as disclosed by Sinapov in view of Wang and Okuda with reduction of variances as described by Bjonerud because reducing variances means having more certainty in model estimates, thereby resulting in a more robust system.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yuhong LI et al. (Generic Neural Architecture Search via Regression, published 2021-11-17; hereafter, “YLi”) discloses a method of neural architecture search [YLi, abstract] which includes a proxy task search which searches a proxy task search space for a task that maximizes a correlation with the target task [YLi, section 3.2]. It discloses many of the same features described in the independent claim, including an explicit link between proxy task selection and neural architecture search. 
Barrett ZOPH et al. (Learning Transferable Architectures for Scalable Image Recognition, published 2018-04-11; hereafter, “Zoph”) discloses a method of neural architecture search which “search[es] for a good architecture on a proxy dataset” [Zoph, section 1 paragraph beginning “In this paper”]. In other words, Zoph could be used in place of Wang to link the method of source task selection in the general context of transfer learning context disclosed by Sinapov with a specific application to neural architecture search. 
Liam LI et al. (A System for Massively Parallel Hyperparameter Tuning, published 2020-03-16; hereafter “LLi”) gives a description of the Successive Halving Algorithm (SHA) [LLi, section 3.1 and algorithm 1]. This description could be used to map, for example, the limitations in claims 3-5. Li further discloses a more robustly parallelizable variant of this algorithm called Asynchronous SHA (ASHA) [LLi, section 3.2 and algorithm 2].
Héctor ALLENDE et al. (Robust Bootstrapping Neural Networks, published 2004; hereafter, “Allende”) is one of many references to be found in the prior art discussing the use of bootstrapping to estimate the distribution of an estimator derived from a neural network (and, in particular, quantities derived from such a distribution, such as variances or confidence intervals). For example, the B replicates in the algorithm described in [Allende, section 4.3] correspond to the “plurality of copies” of claim 9. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Shishir AGRAWAL whose telephone number is +1 703-756-1183. The examiner can normally be reached Monday through Thursday, 08:30-14:30 Pacific Time.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey SHMATOV can be reached on +1 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is +1 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at +1 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call +1 800-786-9199 (IN USA OR CANADA) or +1 571-272-1000.

/S.A./Examiner, Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Proxy Task Design Tools for Neural Architecture Search

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Proxy Task Design Tools for Neural Architecture Search

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email