Detailed Action
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/02/2026 has been entered.
Response to argument
Applicant's arguments filed 01/02/2026 ("Arguments/Remarks") have been fully considered but
they are not persuasive.
Argument – 1: (page: 5 – 7) Applicant contends: “Applicant disagrees that the statements from [0007] and [0008] of the published version of the specification, on which the previous Amendment relied on in its Prong Two analysis, presents an improvement only in a conclusory manner because these statements are at the same level of specificity as the specification statements of technological improvement that the Appeal Review Panel credited in its Ex Parte Desjardins, Appeal No. 2024-000567 (PTAB September 26, 2025, Appeals Review Panel Decision).
… Here, as noted above, the specification explains in [0007] that "The inventors provide that the probabilities of the outgoing edges are not selected to be the same for each edge, but in such a way that every possible path has the same probability as a result of the one-shot model." According to [0008], "[t]his has the advantage that more superior architectures may ultimately be found that would not have been found in the case of a conventional initialization of the edges." Thus, the specification links the stated technological improvement concerning the ability to find more superior architectures, at least in part, to the feature of initializing path selection so that all paths have the same probability of being drawn. The claims recite this feature (and therefore integrate the alleged judicial exception into a practical application) by reciting "the probabilities being initially set to a value that paths are drawn at the same probability starting from the edge up to the output node." Given these statements of technological improvement in the area of machine learning system creation, a proper Prong Two analysis would have found that the claims represent an integration of the alleged judicial exception found in Prong One into a practical application. Based on this discussion, withdrawal of the Section 101 rejection is requested.”
Regarding the above argument, the Examiner notes that in the Ex parte Desjardins, the claim is directed to a specific enhancement in how a machine learning model train and how the model estimates the importance of parameters learned from a first task and then updates those parameters during training on a second task using a penalty term and how performance improve on the second task while preserving performance on the first task, rather than an abstract idea or mathematical concept. The specification provides the required technical details explaining how model parameters are adjusted to optimize performance on a new task while protecting performance on a prior task, which addresses the technical problem of knowledge degradation. The claim also reflects the disclosed improvement in the specification. Unlike the Ex parte Desjardins, in the instant application, the paragraphs cited by the Applicant (¶[0016] and ¶[0017]) merely state that edge probabilities in a one shot model are initialized such that all input to output paths have equal sampling probability, with the asserted benefit that superior architectures may ultimately be found that would not have been found in the case of a conventional initialization of the edges. However, the paragraphs fail to explain how such probabilities are computed or how the parameters of the machine learning system are adjusted and do not disclose any specific mechanisms, formulas or training procedures for achieving or maintaining this equalized path sampling. As a result, the alleged improvement is directed to an abstract idea of probabilistic model organization or evaluation, not to a technical improvement in the functioning of a machine learning system and therefore does not meaningfully limit the claim beyond the abstract concept itself.
Argument – 2: (page: 8) Applicant contends: “Page 11 of the Office Action addresses the claimed cost function as follows: so that a cost function is optimized; (Houlsby, "[0046] By repeatedly updating the values of the controller parameters in this manner, the system 100 can train the controller neural network 110 to generate new paths that result in task neural networks that have increased performance on the particular task, i.e.: to maximize the expected accuracy [so that a cost function is optimized] on the validation set of the architectures proposed by the controller neural network 110.") Although unclear, it seems that the maximization of the expected accuracy on a validation set of proposed architectures in Housby somehow represents a cost function that is being optimized, according to the patent Office's telling. Applicant disagrees that this disclosure in Housby represents any sort of cost function, whether optimized or not, and in order to clarify this distinction, Applicant has amended the claims to recite "the cost function includes a term”
Regarding the above argument, the Examiner notes that, Housby, ¶[0046] describes training the controller neural network 110 by repeatedly updating its parameters to maximize the expected accuracy of the task neural networks on validation set. The process effectively adjusts the controller so that an objective metric (i.e.: cost function) is optimized.
Applicant’s arguments with respect to amended claim(s) have been considered but are moot, because arguments/remarks are directed to amended claim limitations. The rejections are noted in the current office action to address amended claim limitations.
As to the remaining dependent claims, applicant argue that they are allowable due to their respective direct and indirect dependencies upon one of the aforementioned Independent claims. The examiner respectfully disagrees, independent claims were not allowable as stated in the paragraph above in this “Response to Arguments” section in this office action.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim(s) 1 – 6 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without significantly more.
In step 1, of the 101-analysis set forth in the MPEP 2106, the examiner has determined
that the following limitations recite a process that, under the broadest reasonable interpretation, falls within one or more statutory categories (processes).
In step 2A prong 1, of the 101-analysis set forth in MPEP 2106, the examiner has determined
that the following limitations recite a process that, under broadest reasonable interpretation, covers
a mental process but for the recitation of generic computer components:
2. Regarding claim 1,
randomly drawing a plurality of paths through the graph
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves making an observation of a given graph and making a judgment about randomly drawing a path within the graph, see (MPEP 2106.04)).
randomly drawing a path as a function of the adjusted probabilities
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves making an observation of a given graph and iteratively modifying the probabilities associated with selecting specific edges or paths in the graph to draw random path, see (MPEP 2106.04)).
adjusting parameters of the machine learning system and the probabilities of the edges of the path during training, so that a cost function is optimized
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing system performance, evaluating how parameter values affect a cost measure and modify to these values to improve the outcome, see (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
providing a directed graph having an input node and an output node that are connected via a plurality of edges and nodes, each edge of the edges being assigned a probability that characterizes at which probability the edge is drawn by being sampled or selected, the probabilities being initially set to a value that paths are drawn at the same probability starting from the edge up to the output node;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data transmission as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity, See MPEP (2106.05(g))).
training the machine learning systems corresponding to the paths
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
wherein the cost function includes a term that characterizes a cost for carrying out the machine learning system using a hardware configuration;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
after the cost function is optimized … and creating the machine learning system corresponding to the drawn path
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitations (V and VII), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP
2106.05(f).
Regarding limitation (VI), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).
Regarding limitation (IV), recites additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
3. Claim(s) 5 and 6 recite limitations analogous limitations as claim 1, so are rejected under similar rationale.
4. Regarding claim 2, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein starting from a selected node, all possible paths to the output node are counted, a value of the probability of each edge of those edges that are connected proceeding from the selected node is initially set to a number of the possible paths running via the edge, divided by a number of the counted possible paths.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves numbering possible paths in a graph, evaluating counts and assign relative values based on these counts, see (MPEP 2106.04)).
5. Regarding claim 3, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein all possible paths up to the output node are counted for each node of the directed graph, a value of the probability of each edge of the edges is initially set to a number of the possible paths from the output node of the edge divided by a number of the possible paths of an input node of the edge.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves for each possible path, compare qualities for these paths and assign probabilities based on the comparison, see (MPEP 2106.04)).
6. Regarding claim 4, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein in the process of drawing the path, the path is iteratively created, the subsequent edge being randomly selected at each node from the possible subsequent edges, which are connected to the node, as a function of its assigned probability.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing available choices, considering assigning likelihoods and making a selection at each step. see (MPEP 2106.04)).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7. Claim(s) 1, 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Houlsby et al., Pub. No.:US20220092416A1, in view of Chockler, et al., Pub. No.:US20080052692A1, Scarselli, et al., "The graph neural network model" and Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications."
8. Regarding claim 1, Houlsby teaches: A computer-implemented method for creating a machine learning system, the method comprising the following steps:
having an input node and an output node that are connected via a plurality of edges and nodes,
(Houlsby, “[0034] The search space of possible architectures for the task neural network is represented as a graph of nodes connected by edges [that are connected via a plurality of edges and nodes]. Each node in the graph represents a decision point in selecting the architecture and each edge in the graph represents an action. In particular, the actions represented by outgoing edges from a given node are the possible decisions that can be made at the decision point represented by the given node. Each path through the graph that starts at an initial node [having an input node] in the graph and ends at a terminal node [an output node that are connected via a plurality of edges and nodes] in the graph determines or represents an architecture for the task neural network.”)
and training the machine learning systems corresponding to the paths;
(Houlsby, “[0043] In particular, during an iteration of the training procedure [training the machine learning systems], the system 100 generates a batch of paths 112 [corresponding to the paths] using the controller neural network 110 in accordance with current values of the controller parameters.”)
adjusting parameters of the machine learning system and the probabilities of the edges of the path during training, so that a cost function is optimized; and creating the machine learning system corresponding to the drawn path.
(Houlsby, “[0042] Generally, the system 100 determines the final architecture for the task neural network by training [during training] the controller neural network 110 to iteratively adjust the values of the controller parameters [adjusting parameters of the machine learning system and the probabilities of the edges of the path]. At each iteration, the controller neural network can generate one or more paths [drawing a path as a function of the adjusted probabilities], each representing an architecture to be trained and evaluated, and the controller parameters can be updated or adjusted based on the results of each evaluation. In this way, the controller neural network can be used to design an improved task neural network for a specific task [and creating the machine learning system corresponding to the drawn path], such as image classification or other forms of image processing.”)
so that a cost function is optimized;
(Houlsby, “[0046] By repeatedly updating the values of the controller parameters in this manner, the system 100 can train the controller neural network 110 to generate new paths that result in task neural networks that have increased performance on the particular task, i.e.: to maximize the expected accuracy [so that a cost function is optimized] on the validation set of the architectures proposed by the controller neural network 110.”)
after the cost function is optimized,
(Houlsby, “[0046] By repeatedly updating the values of the controller parameters in this manner, the system 100 can train the controller neural network 110 [creating the machine learning system] to generate new paths [corresponding to the drawn path] that result in task neural networks that have increased performance on the particular task, i.e., to maximize the expected accuracy on the validation set of the architectures proposed by the controller neural network 110.
[0047] Once the controller neural network 110 has been trained [after the cost function is optimized] (i.e.: the controller parameters (which govern path generation), are updated iteratively during training), the system 100 can select the architecture that had the best (for example, highest) performance measure as the final architecture of the task neural network or can generate a new path [drawing a path as a function of the adjusted probabilities] (i.e.: after training (optimization), the system can use the trained controller to generate new architecture paths, which is equivalent to drawing random paths according to the learned edge probabilities) through the search space in accordance with the trained values of the controller parameters (i.e. using the trained controller neural network) and use the architecture defined by the new path as the final architecture of the task neural network.”)
Houlsby does not teach:
providing a directed graph,
each edge of the edges being assigned a probability that characterizes at which probability the edge is drawn by being sampled or selected, the probabilities being initially set to a value that paths are drawn at the same probability starting from the edge up to the output node;
randomly drawing a plurality of paths through the graph
randomly drawing a path
wherein the cost function includes a term that characterizes a cost for carrying out the machine learning system using a hardware configuration
Chockler teaches: providing a directed graph,
(Chockler, “[0021]Method 200 starts by stage 220 of providing a direct acyclic graph [providing a directed graph] representative of possible execution paths of the software entity. Multiple successor nodes that succeed a certain parent node are associated with different execution probabilities. It is noted that the direct acyclic graph can include multiple parents nodes and each of these parent nodes can be associated with multiple successor nodes and that the execution probability associated with one successor node can differ from the execution probability of another successor node. It is noted that some of the successor nodes that succeed the same parent node can have the same probability.”)
each edge of the edges being assigned a probability that characterizes at which probability the edge is drawn by being sampled or selected,
(Chockler, “[0008] System, method and computer program product for checking a software entity, the method includes: providing a direct acyclic graph representative of possible execution paths of the software entity; wherein multiple successor nodes [each edge of the edges] (i.e.: a node with multiple successor nodes (or edges), each successor node (or edge) is associated with a different execution probability) that succeed a certain parent node are associated with different execution probabilities; [being assigned a probability that characterizes at which probability the edge] (i.e.: edges in the graph are assigned probabilities that characterize their likelihood of being selected during execution) randomly selecting a successor node out of the multiple successor nodes in response the execution probabilities; [is drawn by being sampled or selected] (i.e.: selects one of the successor nodes based on the probabilities assigned to each edge) and checking the software entity in response to the selection.”)
the probabilities being initially set to a value that paths are drawn at the same probability starting from the edge up to the output node;
(Chockler, “[0058] Conveniently, processor 160 is adapted to determine a number (E) of software entity check iterations required for testing a predefined number (n) of execution paths at a certain probability (p). If the execution probabilities of successor nodes of the direct acyclic graph are defined such that different paths have the same execution probability [the probabilities being initially set to a value that paths are drawn at the same probability starting from the edge up to the output node], then p is bounded by e−c and E equals n*1n(n)+c*n, wherein c is a positive integer.”)
Chockler and Houlsby are related to the same field of endeavor (i.e.: an architecture which utilizes neural networks for architecture search involves obtaining a computational graph with nodes, edges, and associated weightings, representing various candidate models as subgraphs). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Chockler with teachings of Houlsby to allow for the random selection of successor nodes based on execution probabilities, potentially enhancing the exploration of diverse architectures by prioritizing paths that are more likely to yield optimal results (Chockler, ¶ [0008]).
Houlsby in view of Chockler do not teach:
randomly drawing a plurality of paths through the graph
randomly drawing a path
wherein the cost function includes a term that characterizes a cost for carrying out the machine learning system using a hardware configuration
Scarselli teaches:
randomly drawing a plurality of paths through the graph
(Scarselli, page: 73, “The presented results were averaged over five different runs. In each run, the data set was a collection of random graphs constructed [randomly drawing a plurality of paths through the graph] by the following procedure: each pair of nodes was connected with a certain probability ; the resulting graph was checked to verify whether it was connected and if it was not, random edges were inserted until the condition was satisfied.")
randomly drawing a path
(Scarselli, page: 74, “In our experiments, the data set consisted of 600 connected random graphs [randomly drawing a path] (constructed using δ = 0.2), equally divided into a training set, a validation set, and a test set. A smaller subgraph S, which was randomly generated in each trial, was inserted into every graph of the data set. Thus, each graph contained at least a copy of , even if more copies might have been included by the random construction procedure.”)
Scarselli, Houlsby and Chockler are related to the same field of endeavor (i.e.: an architecture which utilizes neural networks for architecture search involves obtaining a computational graph with nodes, edges, and associated weightings, representing various candidate models as subgraphs). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Scarselli with teachings of Houlsby and Chockler to allow for a more comprehensive assessment of the GNN’s ability to generalize and optimize across different graph structures, leading to more robust and efficient neural architectures. (Scarselli, page: 14, “The subgraph matching problem was used also to evaluate the performance of the GNN model and to experimentally verify the findings about the computational cost of the model described in Section III. For this purpose, some experiments have been carried out varying the number of nodes, the number of edges in the data set, the number of hidden units in the neural networks implementing the GNN, and the dimensionality of the state. In the base case, the training set contained ten random graphs, each one made of 20 nodes and 40 edges, the networks implementing the GNN had five hidden neurons, and the state dimension was 2. The GNN was trained for 1000 epochs and the results were averaged over ten trials.”)
Houlsby in view of Chockler and Scarselli do not teach:
wherein the cost function includes a term that characterizes a cost for carrying out the machine learning system using a hardware configuration
Howard teaches:
wherein the cost function includes a term that characterizes a cost for carrying out the machine learning system using a hardware configuration
(Howard, page: 4, “Width multiplier has the effect of reducing computational cost [wherein the cost function includes a term that characterizes a cost for carrying out the machine learning system] and the number of parameters quadratically by roughly α 2 . Width multiplier can be applied to any model structure to define a new smaller model with a reasonable accuracy, latency and size trade off [using a hardware configuration]. It is used to define a new reduced structure that needs to be trained from scratch.”)
Howard, Houlsby, Chockler and Scarselli are related to the same field of endeavor (i.e.: an architecture which utilizes neural networks for architecture search involves obtaining a computational graph with nodes, edges, and associated weightings, representing various candidate models as subgraphs). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Howard with teachings of Houlsby, Chockler and Scarselli to efficiently balance computational resources and performance allowing model builders to tailor architectures to meet application specific constraints. (Howard, Abstract).
9. Claim 6 recites analogous limitations as claim 1, so is rejected under similar rationale.
10. Regarding claim 5, Houlsby teaches: A non-transitory machine-readable memory medium on which is stored a computer program for creating a machine learning system, the method comprising the following steps:
(Houlsby, “[0089] Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded [on which is stored a computer program for creating a machine learning system] on a tangible non transitory storage medium [A non-transitory machine-readable memory medium] for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.”)
The rest of the limitations are analogous to claim 1, so is rejected under similar rationale.
11. Claim(s) 2, 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Houlsby in view of Chockler, Scarselli, Howard and in further view of Meunier et al., Pub. No.: US20200394249A1.
12. Regarding claim 2, Houlsby in view of Chockler, Scarselli and Howard teach the method of claim 1.
Houlsby in view of Chockler, Scarselli and Howard do not teach:
wherein starting from a selected node, all possible paths to the output node are counted, a value of the probability of each edge of those edges that are connected proceeding from the selected node is initially set to a number of the possible paths running via the edge, divided by a number of the counted possible paths.
Meunier teaches:
wherein starting from a selected node, all possible paths to the output node are counted, a value of the probability of each edge of those edges that are connected proceeding from the selected node is initially set to a number of the possible paths running via the edge, divided by a number of the counted possible paths.
(Meunier, “[0070] In order to illustrate the general problem setting, reference is made to FIG. 1. Consider the network depicted in FIG. 1, where the task is to maximize the information flow from node Q to other nodes given a limited budget of edges to be used. In contrast to the general problem defined later, this example assumes equal weights of all nodes. Each edge of the network is labeled with a probability value [a value of the probability of each edge of those edges] denoting the probability of a successful communication [that are connected proceeding from the selected node is initially set to a number of the possible paths running via the edge]. A straightforward solution to this problem, is to activate all edges. Assuming each node to have one unit of information, the expected information flow of this solution can be shown to be ≈2.51. While maximizing the information flow, this solution incurs the maximum possible communication cost. A traditional trade-off between these single-objective solutions is using a probability maximizing Dijkstra spanning tree, as depicted in FIG. 2. The expected information flow in this setting can be shown to aggregate to 1.59 units, while requiring six edges to be activated. Yet, it can be shown that the solution depicted in FIG. 3 dominates this solution: Only five edges are used, thus further reducing the communication cost, while achieving a higher expected information flow of ≈2.02 units of information to Q.”)
divided by a number of the counted possible paths.
(Meunier, “[0137] A Monte-Carlo Sampling is controlled by a parameter Samplesize which corresponds to the number of samples taken to approximate the information flow of a cyclic component to its hub vertex. In each iteration, we can reduce the amount of samples by introducing confidence interval for the information flow for each edge e∈candList that is probed. The idea is to prune the sampling of any probed edge e for which we can conclude that, at a sufficiently large level of significance □, there must exist another edge e′≠e in candList such that e′ is guaranteed to have a higher information flow that e, based on the current number of samples only. To generate these confidence intervals, we recall that, following Equation 4 the expected information flow to Q is the sample-average of the sum of information flow of each individual vertex [divided by a number of the counted possible paths] (i.e.: Once the total sum of information flow from all trials, divide it by the total number of samples to get the average amount of information flow from a vertex to Q across all your trials). For each vertex v, the random event of being connected to Q in a random possible follows a binomial distribution, with an unknown success probability p. To estimate p, given a number S of samples and a number 0≤s≤S of ‘successful’ samples in which Q is reachable from v, we borrow techniques from statistics to obtain a two sided 1-□□ confidence interval of the true probability p. A simple way of obtaining such confidence interval is by applying the Central Limit Theorem of Statistics to approximate a binomial distribution by a normal distribution.”)
Meunier, Houlsby, Chockler, Scarselli and Howard are related to the same field of endeavor (i.e.: an architecture which utilizes neural networks for architecture search involves obtaining a computational graph with nodes, edges, and associated weightings, representing various candidate models as subgraphs). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Meunier with teachings of Houlsby, Chockler, Scarselli and Howard to add the ability to optimize data propagation in networks by dynamically balancing efficiency and information flow based on runtime requirements to enable the system to adaptively choose between faster, less information-intensive routes or more comprehensive data paths that require more computation (Meunier, ¶[0010]).
13. Regarding claim 3, Houlsby in view of Chockler, Scarselli and Howard teach the method of claim 1.
Houlsby in view of Chockler, Scarselli and Howard do not teach:
wherein all possible paths up to the output node are counted for each node of the directed graph, a value of the probability of each edge of the edges is initially set to a number of the possible paths from the output node of the edge divided by a number of the possible paths of an input node of the edge.
Meunier teaches: wherein all possible paths up to the output node are counted
(Meunier Fig. 1)
PNG
media_image1.png
218
304
media_image1.png
Greyscale
[AltContent: textbox ([wherein starting from a selected node, all possible paths to the output node are counted] (i.e.: all edges starting from input node to all other nodes are counted.))]
for each node of the directed graph,
(Meunier, “[0072] A probabilistic directed graph [of the directed graph] is given by G=(V, E, W, P), where V is a set of vertices, E ⊆V×V is a set of edges, W: V→Figure US20200394249A1-20201217-P00001 + is a function that maps each vertex to a positive value representing the information weight of the corresponding vertex and P:E→(0, 1] is a function that maps each edge to its corresponding probability of existing in G.”)
a value of the probability of each edge of the edges is initially set to a number of the possible paths from the output node of the edge divided by a number of the possible paths of an input node of the edge.
(Meunier, “[0139] Let S be a set of possible graphs drawn from the probabilistic graph G [a value of the probability of each edge of the edges is initially set to a number of the possible paths from the output node of the edge], and let be the fraction of possible graphs [divided by a number of the possible paths] in S in which Q is reachable from v [of an input node of the edge]. With a likelihood of 1-□, the true probability E( (Q, v, G)) that Q is reachable from v in the probabilistic graph G is in the interval: p±z·√{square root over ({circumflex over (p)}(1−{circumflex over (p)}))}, (6)
where z is the 100_(1−0.5·□) percentile of the standard normal distribution. We denote the lower bound as Elb( (Q, v, G)) and the upper bound as Eub((Q, v, G)). We use □=0.05.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Meunier with teaching of Houlsby, Chockler, Scarselli and Howard for the same reasons disclosed for claim 2.
14. Regarding claim 4, Houlsby in view of Chockler, Scarselli and Howard teach the method of claim 1.
Houlsby in view of Chockler, Scarselli and Howard do not teach:
wherein in the process of drawing the path, the path is iteratively created, the subsequent edge being randomly selected at each node from the possible subsequent edges, which are connected to the node, as a function of its assigned probability.
Meunier teaches:
wherein in the process of drawing the path, the path is iteratively created,
(Meunier, “[0159] In step 5 a list of candidate edges to be potentially added iteratively [the path is iteratively created] to the component tree CT is generated.”)
the subsequent edge being randomly selected at each node from the possible subsequent edges,
(Meunier, “[0073] In a probabilistic graph G, the existence of each edge is a random variable. Thus, the topology of G is a random variable, too. The sample space of this random variable is the set of all possible graphs. A possible graph g=(Vg, Eg) of a probabilistic graph G is a deterministic graph which is a possible outcome of the random variables representing the edges of G [the subsequent edge being randomly selected at each node from the possible subsequent edges]. The graph g contains a subset of edges of G, i.e., Eg ⊆E. The total number of such possible graphs is 2|E<1|, where |E<1| represents the number of edges e∈E having P (e)<1, because for each such edge, we have two cases as to whether or not that edge is present in the graph.”)
which are connected to the node, as a function of its assigned probability.
(Meunier, “[0076] Let G=(V, E, W, P) be a probabilistic graph and let va, vb∈V be two nodes such that va≠vb. An (acyclic) path(va, vb)=va, v1, v2, . . . , vb be a sequence of vertices [which are connected to the node, as a function of its assigned probability], such that ∀vi ∈path(va, vb): (vi∈V) and ∀vi, vj∈path(va, vb): vi≠vj.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Meunier with teaching of Houlsby, Chockler, Scarselli and Howard for the same reasons disclosed for claim 2.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Xu et al., "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search.", (2019).
Xu propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures.
Stamoulis, et al. "Single-path nas: Designing hardware-efficient convnets in less than 4 hours.", (2019)
Stamoulis propose Single-Path NAS, a novel NAS method for designing hardware-efficient ConvNets in less than 4 hours. Our key insight is illustrated in Figure 1 (right). We build upon the observation that different candidate convolutional operations in NAS can be viewed as subsets of a single “superkernel”.
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902 or via email: matiyas.maru@uspto.gov. The examiner can normally be reached Monday 8:00am - Friday 4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571)431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.
/M.T.M./ Examiner, Art Unit 2148
/MICHELLE T BECHTOLD/ Supervisory Patent Examiner, Art Unit 2148