Last updated: May 29, 2026
Application No. 17/861,607
METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM INCLUDING A PLURALITY OF OUTPUTS

Final Rejection §103
Filed
Jul 11, 2022
Priority
Jul 23, 2021 — DE 10 2021 207 937.7
Examiner
THAI, JASMINE THANH
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
2 (Final)
This examiner grants 25% of cases after interview

— +56.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

THAI, JASMINE THANH View full profile →
Grants only 25% of cases
Career Allowance Rate
6 granted / 24 resolved
-30.0% vs TC avg
Strong +56% interview lift
Without
With
+56.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
16 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
83.6%
+43.6% vs TC avg
§102
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 01/05/2026 have been fully considered but they are not persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 103, 
Alleged No teaching of Assignment of the same probability to each edge
	In Remarks p. 9, Applicant contends:
	“Nothing in Figure 2 or in the uniform sampling in the quote taken from Section 3 of Li 
discusses the assignment of the same probability to each edge. In addition, the gloss that the 
Patent Office places on this aspect of Li ("wherein sampling uniformly is randomly sampling 
from the list of options such that each option (ie directed edge from node i to subsequent nodes) 
has the same probability") is unproven because the Patent Office fails to provide any reasoning 
establishing that "sampling uniformly" is interchangeable in meaning with the claimed 
assignment of the same probability to each edge.”
	The relevant claim limitations appear to be “each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected” in claim 1. 
As noted in the previous Office Action, Li teaches (emphasis added):
(Li, Section 3, …”3. Finally, moving from node to node, we sample uniformly from the set of possible choices for each decision that needs to be made [each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; wherein sampling uniformly is randomly sampling from the list of options such that each option (ie directed edge from node i to subsequent nodes) has the same probability].”)
After careful consideration, the argument is considered unpersuasive as Li discloses “moving from node to node, we sample uniformly from the set of possible choices for each decision that needs to be made.” Examiner notes that the decided algorithm of uniform sampling the possible choices (edges) at each node reads upon the limitation as Examiner breaks down the interpretation of uniform sampling at each node:
Sampling the possible choices for each node teaches selecting (with which… the respective edge is selected)
Uniformly teaches (…selected with equal probability)
Uniform meaning in the same way/equally/evenly
Performing uniform sampling at each node is thus interpreted to be the sampling rule determined to select the path through the directed graph that would initialize an equal probability across edges. In other words, choosing this particular algorithm is assigning the probabilities of the edges.
Examiner further notes that uniform sampling is different from an exemplary algorithm of always selecting the first edge; which would not be uniform or selecting with equal probability as the selecting is biased to an arbitrary first edge.
The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 3, 5-6 and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over US Pub. No. US20200175362A1 Zhang et al. (“Zhang”) in view of Li, Liam, and Ameet Talwalkar. "Random search and reproducibility for neural architecture search." (“Li”) in further view of Veniat, Tom, and Ludovic Denoyer. "Learning time/memory-efficient deep architectures with budgeted super networks." (“Veniat”)
In regards to claim 1, 
Zhang teaches A computer-implemented method for creating a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, comprising the following steps: 
(Zhang, “[0024] Various embodiments of the present disclosure provide an efficient AutoML algorithm for lifelong learning. In some embodiments, this efficient AutoML algorithm is referred to as a Regularize, Expand and Compress (REC). In these embodiments, REC involves first searching a best new neural network architecture for the given tasks in a continuous learning mode. Tasks may include image classification, image segmentation [segmentation], object detection [object description] and/or many other computer vision tasks. The best neural network architecture can solve multiple different tasks simultaneously [a first output outputting the segmentation of the image and a second output outputting the object description; ie solving (providing outputs for) image segmentation and object detection], without catastrophic forgetting of old tasks' information, even when there is no access to old tasks' training data.”)
However, Zhang does not explicitly teach providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input node, output node, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; selecting a path through the directed graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as the second output, the path through the directed graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; creating the machine learning system as a function of the selected path and training the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selecting a path step and the creating and training a machine learning system step; and creating the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.

Li teaches providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input node, output node, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; 
(Li, Section 3, “Our algorithm is designed for an arbitrary search space with a DAG representation [providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges; see figure 2], and in our in our experiments in Section 4, we use the same search spaces as that considered by DARTS [34] for the standard CIFAR-10 and PTB NAS benchmarks…
1. For each node in the DAG, determine what decisions must be made. In the case of the PTB search space, we need to choose a node as input and a corresponding operation to apply to generate the output of the node.
2. For each decision, identify the possible choices for the given node. In the case of the PTB search space, if we number the nodes from 1 to N, node i can take the outputs of nodes 0 to node i  1 as input (the initial input to the cell is index 0 and is also a possible input). Additionally, we can choose an operation from {tanh, relu, sigmoid, and identity} to apply to the output of node i [the input node, output node, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge].
3. Finally, moving from node to node, we sample uniformly from the set of possible choices for each decision that needs to be made [each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; wherein sampling uniformly is randomly sampling from the list of options such that each option (ie directed edge from node i to subsequent nodes) has the same probability].”)

    PNG
    media_image1.png
    325
    883
    media_image1.png
    Greyscale

Li teaches selecting a path through the directed graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as the second output, the path through the directed graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; 
(Li, Figure 2: Recurrent Cell on PTB Benchmark. The best architecture found by random search with weight-sharing in Section A.3 is depicted [selecting a path through the directed graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution; wherein the completion of the path from input node to the output node is interpreted to be the “predefined property with respect to a data resolution” as resolution is interpreted to mean completion and all of the subset of nodes in the path (all of which) satisfy said predefined property ie a completed path]. Each numbered square is a node of the DAG and each edge represents the flow of data from one node to another after applying the indicated operation along the edge. Nodes with multiple incoming edges (i.e., node 0 and output node h_{t} [at least one additional node being selected from the subset, which serves as the second output] concatenate the inputs to form the output of the node [the path through the directed graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; wherein the path is selected from uniform sampling ie a function of probabilities assigned to the edges]”).
Li teaches creating the machine learning system as a function of the selected path and training the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; 
multiple repeating of the selecting a path step and the creating and training a machine learning system step; and creating the machine learning system as a function of the directed graph; 
(Li, Section 3, “In order to combine random search with weight-sharing, we simply use randomly sampled architectures to train the shared weights. Shared weights are updated by selecting a single architecture for a given minibatch and updating the shared weights by back-propagating through the network with only the edges and operations as indicated by the architecture activated [creating a machine learning system ie the network provided after training as a function of the selected path ie architecture activated and training the created machine learning system ie updating the shared weights by backpropagation, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph wherein since each edge is an operation, the weights of the particular operation are updated per the shared weights and the probabilities of the edges of the path being adapted wherein since the architecture is selected, the probability of the edges of the path is 100% and thus adapted to update the shared weights]. Hence, the number of architectures used to update the shared weights is equivalent to the total number of minibatch training iterations [multiple repeating of the selecting a path step and the creating and training a machine learning system step and creating the machine learning system as a function of the directed graph].”)
Li teaches wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.
(Li, Section 3, 2. For each decision, identify the possible choices for the given node. In the case of the PTB search space, if we number the nodes from 1 to N, node i can take the outputs of nodes 0 to node i  1 as input (the initial input to the cell is index 0 and is also a possible input). Additionally, we can choose an operation from {tanh, relu, sigmoid, and identity} to apply to the output of node i.
3. Finally, moving from node to node, we sample uniformly from the set of possible choices for each decision that needs to be made [wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability; wherein sampling uniformly is randomly sampling from the list of options such that each option has the same probability].”)

However, Zhang and Li do not explicitly teach wherein during training of the machine learning system, a cost function is optimized, the cost function including one first function, which evaluates an efficiency of the machine learning system with respect to its outputs, and includes one second function, which estimates a latency and/or a computer resource consumption of the machine learning system as a function of a length of the path and of the operations of the edges.
Veniat teaches wherein during training of the machine learning system, a cost function is optimized, the cost function including one first function, which evaluates an efficiency of the machine learning system with respect to its outputs, and includes one second function, which estimates a latency and/or a computer resource consumption of the machine learning system as a function of a length of the path and of the operations of the edges
(Veniat, Supplemental Material Stochastic costs in the REINFORCE algorithm, “Distributed computation cost Taking the real-life example of a network which will, once optimized, have to run on a given computing infrastructure, the distributed computation cost is a measure of how ”parallelizable” an architecture is. This cost function takes the following three elements as inputs (i)A network architecture (represented as a graph for instance) [one second function, which estimates a latency and/or a computer resource consumption of the machine learning system as a function of a length of the path and of the operations of the edges; see fig. 8], (ii)An allocation algorithm and (iii) a maximum number of concurrent possible operations. The cost function then returns the number of computation cycles required to run the architecture given the allocation strategy [a cost function is optimized, the cost function including one first function, which evaluates an efficiency of the machine learning system with respect to its outputs].

    PNG
    media_image2.png
    206
    350
    media_image2.png
    Greyscale
”)
Zhang and Li are both considered to be analogous to the claimed invention because they are in the same field of neural architecture search. Zhang is further reasonably pertinent to the problem the inventor faced (multi-task learning). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang to incorporate the teachings of Li in order to provide a novel random search with weight-sharing algorithm that outperforms random search with early-stopping (Li, Abstract, “Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In this work, in order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS [41], a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results. We note the lack of source material needed to exactly reproduce these results, and further discuss the robustness of published results given the various sources of variability in NAS experimental setups. Relatedly, we provide all information (code, random seeds, documentation) needed to exactly reproduce our results, and report our random search with weight-sharing results for each benchmark on multiple runs.”)
Veniat considered to be analogous to the claimed invention because they are in the same field of neural architecture search and is further reasonably pertinent to a problem the inventor faced (making efficient use of computational resources). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang and Li to incorporate the teachings of Veniat in order to provide a real-world computation cost of the network and a means for optimizing the network per the obtained cost (Veniat, Supplemental Material Stochastic costs in the REINFORCE algorithm, “Distributed computation cost Taking the real-life example of a network which will, once optimized, have to run on a given computing infrastructure, the distributed computation cost is a measure of how ”parallelizable” an architecture is.”)

In regards to claim 3, 
Zhang and Li and Veniat teach The method as recited in claim 1, 
Li teaches wherein the nodes of the subset, which all satisfy a predefined property with respect to a data resolution, are each also assigned a probability, the probabilities of the nodes of the subset being normalized.
Examiner interprets normalized to “mean that a drawing of the respective elements is equally probable, i.e., initially there is no preference for certain NOIs and/or edges and/or paths present” in light of the specification of the instant application, (specification, pg. 8 line 20- pg. 9 line 4)
(Li, Section 3, 2. For each decision, identify the possible choices for the given node. In the case of the PTB search space, if we number the nodes from 1 to N, node i can take the outputs of nodes 0 to node i  1 as input (the initial input to the cell is index 0 and is also a possible input). Additionally, we can choose an operation from {tanh, relu, sigmoid, and identity} to apply to the output of node i.
3. Finally, moving from node to node, we sample uniformly from the set of possible choices for each decision that needs to be made [wherein the nodes of the subset, which all satisfy a predefined property with respect to a data resolution, are each also assigned a probability, the probabilities of the nodes of the subset being normalized; wherein uniform sampling is used to obtain the nodes of the subset (completed path which satisfy the predefined property)].”)

In regards to claim 5, 
Zhang and Li and Veniat teaches The method as recited in claim 3, 
Li teaches wherein the probabilities of the nodes of the subset are initially set to a probability that all nodes of the subset are initially selected with equal probability.
(Li, Section 3, 2. For each decision, identify the possible choices for the given node. In the case of the PTB search space, if we number the nodes from 1 to N, node i can take the outputs of nodes 0 to node i  1 as input (the initial input to the cell is index 0 and is also a possible input). Additionally, we can choose an operation from {tanh, relu, sigmoid, and identity} to apply to the output of node i.
3. Finally, moving from node to node, we sample uniformly from the set of possible choices for each decision that needs to be made [wherein the probabilities of the nodes of the subset are initially set to a probability that all nodes of the subset are initially selected with equal probability; wherein uniform sampling is used to set the probabilities of the nodes equally].”)

In regards to claim 6, 
Zhang and Li and Veniat teaches The method as recited in claim 1, 
Zhang teaches wherein when selecting the path, at least two additional nodes are selected, a path through the directed graph including at least two paths, each of which extends via one of the additional nodes to the output node, and the two paths from the input node to the additional nodes being created separately from one another starting at the additional nodes up to the input node.
(Zhang, “[0077] The system then adaptively trains a network architecture of the machine learning model to generate an adapted machine learning model based on incorporating inherent correlations between the new task and the existing task (step 710). For example, in various embodiments, in step 710, the system may generate and identify an adapted network architecture based on MWC as discussed above. In using MWC, the system may incorporate inherent correlations between the existing task and the new task and identify the added layer as a task-specific layer for the new task. Also, for example, the system may train the ML model to perform the new task using training data for the new task without access to the training data for the old task. 
[0078] In some embodiments, to adapt the network architecture in step 710 [wherein when selecting the path], the system may expand the network architecture for the ML model to perform the new task using AutoML, for example, by training child network architectures using wider and deeper operators as discussed with regard to FIG. 5 above. The expanded network architecture may include adding a layer to the network architecture and expanding one or more existing layers of the network architecture [at least two additional nodes are selected, a path through the directed graph including at least two paths, each of which extends via one of the additional nodes to the output node, and the two paths from the input node to the additional nodes being created separately from one another starting at the additional nodes up to the input node; wherein Zhang provides deeper and wider operators to expand the network architecture (see fig. 5)].”)

    PNG
    media_image3.png
    494
    704
    media_image3.png
    Greyscale

Zhang and Li and Veniat are both considered to be analogous to the claimed invention because they are in the same field of neural architecture search. Zhang is further reasonably pertinent to the problem the inventor faced (multi-task learning). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li and Veniat to incorporate the teachings of Zhang in order to provide a mechanism for multi-task based lifelong learning to improve the network for better performance adaptive to new tasks (Zhang, “[0004] In many real-world applications, batches of data arrive periodically (e.g., daily, weekly, or monthly) with the data distribution changing over time. This presents an opportunity (or demand) for lifelong learning or continual learning and is an important issue in improving artificial intelligence. The primary goal of lifelong learning is to learn consecutive tasks without forgetting the knowledge learned from previously trained tasks and leverage the previous knowledge to obtain better performance or faster convergence on the newly coming task. One simple way is to finetune the model for every new task. However, such retraining typically degenerates the model performance on both new tasks and the old ones. If the new tasks are largely different from the old ones, it might not be possible to learn the optimal model for the new tasks. Meanwhile, the retrained representations may adversely affect the old tasks, causing them to drift from their optimal solution. This can cause “catastrophic forgetting”—a phenomenon where training a model to perform new tasks interferes the previously learned old knowledge. This leads to a performance degradation or even overwriting of the old knowledge by the new knowledge. Another issue for lifelong learning is resource consumption. A model that is continually trained may increase dramatically in terms of consumed resources (e.g., model size), which may be disadvantageous in applications where resources are limited, for example, in mobile device or mobile computing applications.”)

Claims 8 and 9 are rejected on the same rationale under 35 U.S.C. 103 as claim 1.

Claim(s) 4 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Li and Veniat in further view of Bram28 (2018, December 12th). Re: What is the probability of passing through a node in a directed graph [Discussion post]. {Link: https://math.stackexchange.com/questions/3036994/what-is-the-probability-of-passing-through-a-node-in-a-directed-graph} (“Bram28”)
In regards to claim 4,
Zhang and Li and Veniat teaches The method as recited in claim 3, 
Bram28 teaches wherein the probabilities of the nodes of the subset are initially set to a probability that a first number of paths is set by the respective node of the subset divided by a total number of paths through the directed graph.
Examiner’s note: Since the algorithm of Li sets each node to the probability of 1/[number of available paths from that node] ie uniform sampling on each node, Li must teach the probabilities of the nodes of the subset are initially set to a probability that a first number of paths is set by the respective node of the subset divided by a total number of paths through the directed graph; however, for clarity, Examiner provides Bram to teach the probability of a respective node in view of the total number of paths.
(Bram28, “OK, so then just compute the probability of getting to a node by computing the probability of getting to any of its predecessors, and multiplying that by the probability of following the edge from that predecessor to the node in question. The image below shows the results (green means the probability of taking the edge, while red means the probability of getting to the node [the probabilities of the nodes of the subset are initially set to a probability that a first number of paths is set by the respective node of the subset divided by a total number of paths through the directed graph; see red]):

    PNG
    media_image4.png
    780
    1165
    media_image4.png
    Greyscale

Wherein Bram further provides an exemplary calculation:
Just as an example: the probability of going through node 7 is the probability of going through either of nodes 4, 5, or 6, respectively multiplied by the probability of taking the edge from that node to node 7. Thus:

    PNG
    media_image5.png
    47
    544
    media_image5.png
    Greyscale

Also, just for a sanity check, let's make sure the probability of getting to node 
    PNG
    media_image6.png
    39
    631
    media_image6.png
    Greyscale
”)
Bram considered to be analogous to the claimed invention because they are in the same field of probabilities. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang and Li to incorporate the teachings of Bram in order to provide clarity to the probabilities of each respective node having equal likelihood of going to any available paths in view of the total number of paths (John Slaine, first reply under Bram28 response, 
    PNG
    media_image7.png
    47
    603
    media_image7.png
    Greyscale
”)

Claim(s) 2 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Li and Veniat in further view of Su, Xiu, et al. "Prioritized architecture sampling with monto-carlo tree search." (“Su”)
In regards to claim 2,
Zhang and Li and Veniat teaches The method as recited in claim 1, 
Su teaches wherein for each respective node of the subset, a total number of first subpaths from the respective node of the subset up to the input node and a total number of second subpaths from the respective node of the subset up to the output node are counted, the probabilities of those edges contained in the first subpaths are each initially set to a number of possible paths which connect the input node to the respective node of the subset and extend over those edges contained in the first subpaths, divided by the total number of the first subpaths, and the probabilities of those edges contained in the second subpaths are each initially set to a number of possible paths which connect the output node to the respective node of the subset and extend over those edges contained in the second subpaths, divided by the total number of the second subpaths.
(Su, Section 3.1, “However, we argue that in a chain-structured network, the selection of operation at each layer should depend on operations in the previous layers.
To capture the dependency among layers and leverage the limited combinations of operations for better understanding of the search space, we replace P (o(l)) in Eq.(1) with a conditional distribution for each 2 ≤ l ≤ L. Therefore, we reformulate Eq. (1) as follows:

    PNG
    media_image8.png
    108
    681
    media_image8.png
    Greyscale

where P (o(l)|o(1), … , o(l−1)) is the conditional probability distribution of the operation selection in the layer l conditioned on its previous layers 1 to l − 1. Note that l = 1 has no previous layer, so P (o(1)) is still independent.
Inspired by Eq.(2), we find this conditional probability distribution [the probabilities of those edges contained in the first subpaths ie conditional probabilities of the ancestor nodes are each initially set to a number of possible paths which connect the input node to the respective node of the subset and extend over those edges contained in the first subpaths, divided by the total number of the first subpaths, and the probabilities of those edges contained in the second subpaths ie conditional probabilities of subsequent nodes from the respective node to the output node are each initially set to a number of possible paths which connect the output node to the respective node of the subset and extend over those edges contained in the second subpaths, divided by the total number of the second subpaths] of search space can be naturally modeled into a tree-based structure; the MCTS is targeting this structure for a better exploration-exploitation trade-off. As a result, we propose to model the search space with a MCT T. In MCT, each node v(l)i∈T [for each respective node of the subset] corresponds to selecting an operation o(l)i∈O for the layer l under the condition of its ancestor nodes [a total number of first subpaths from the respective node of the subset up to the input node ie ancestor nodes], so the architecture representation α={O(l)}l∈{1,…,L} can also be uniquely identified in the MCT [a total number of second subpaths from the respective node of the subset up to the output node are counted]. As Figure 2 shows, the architectures are independently represented by paths in the MCT, and different choices of operations lead to different child trees; thus, the dependencies of all the operation selections can be naturally formed.”)
Su is considered to be analogous to the claimed invention because they are in the same field of neural architecture search with a particular focus on the probabilities of paths. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang and Li and Veniat to incorporate the teachings of Su in order to provide a method to consider previous layers by incorporating a Monte Carlo tree search to capture the dependency among layers as doing so provides the benefit of improved search efficiency and performance (Su, Abstract, “One-shot neural architecture search (NAS) methods significantly reduce the search cost by considering the whole search space as one network, which only needs to be trained once. However, current methods select each operation independently without considering previous layers. Besides, the historical information obtained with huge computation cost is usually used only once and then discarded. In this paper, we introduce a sampling strategy based on Monte Carlo tree search (MCTS) with the search space modeled as a Monte Carlo tree (MCT), which captures the dependency among layers. Furthermore, intermediate results are stored in the MCT for future decisions and a better exploration-exploitation balance. Concretely, MCT is updated using the training loss as a reward to the architecture performance; for accurately evaluating the numerous nodes, we propose node communication and hierarchical node selection methods in the training and search stages, respectively, which make better uses of the operation rewards and hierarchical information. Moreover, for a fair comparison of different NAS methods, we construct an open-source NAS benchmark of a macro search space evaluated on CIFAR-10, namely NAS-Bench-Macro. Extensive experiments on NASBench-Macro and ImageNet demonstrate that our method significantly improves search efficiency and performance. For example, by only searching 20 architectures, our obtained architecture achieves 78:0% top-1 accuracy with 442M FLOPs on ImageNet.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Pub. No. US20200265315A1: Zoph et al. teaches Neural architecture search
US Pub. No. US20210142166A1: Chu et al. teaches Hypernetwork training method and device, electronic device and storage medium
US Pub. No. US20190370648A1: Zoph et al. teaches Neural architecture search for dense image prediction tasks
NPL: Guo, Zichao, et al. "Single path one-shot neural architecture search with uniform sampling." Computer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part XVI 16. Springer International Publishing, 2020.
NPL: Casale, Francesco Paolo, Jonathan Gordon, and Nicolo Fusi. "Probabilistic neural architecture search." arXiv preprint arXiv:1902.05116 (2019).
NPL: Cenciarelli, Pietro, Daniele Gorla, and Ivano Salvo. "A Polynomial-time Algorithm for Detecting the Possibility of Braess Paradox in Directed Graphs." arXiv preprint arXiv:1610.09320 (2016).
US Pub. No. US20080052692A1 LinkedIn teaches System, Method and Computer Program Product for Checking a Software Entity

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        




/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jul 11, 2022
Application Filed
Aug 05, 2025
Non-Final Rejection mailed — §103
Jan 05, 2026
Response Filed
Feb 02, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773
Patent 12561603
SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA
4y 7m to grant Granted Feb 24, 2026
17/588,175
Patent 12555000
GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE
4y 0m to grant Granted Feb 17, 2026
17/676,775
Patent 12462154
METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS
3y 8m to grant Granted Nov 04, 2025
17/470,900
Patent 12395590
REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING
3y 11m to grant Granted Aug 19, 2025
17/357,626
Patent 12380361
Federated Machine Learning Management
4y 1m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
81%
With Interview (+56.3%)
3y 9m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allowance rate.