DETAILED ACTION
This action is responsive to the Application/amendment filed on 10/24/2025. Claims 1-4 and 6-9 are pending in the case. Claim(s) 5 has been cancelled. Claims 1, 8, and 9 are independent claims. Claims 1, 8 and 9 are amended.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-9 rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea/mental process.
Regarding claim 1:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites randomly drawing a multitude of subgraphs by the directed graph as a function of the respective variables… in which edges are drawn as a function of the respective variables assigned to the edges, according to the exploration probability and edges are drawn based on a probability sampled from a uniform distribution of probabilities, according to the exploration probability which under the broadest reasonable interpretation, covers performance of the limitation in the mind with aid of pencil and paper. The limitations encompass a user randomly drawing a graph with respect to variables. See 2106.04.(a)(2).III.C.
The claim recites the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables… wherein the change of the respective variables takes place as a function of an exploration probability which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user changing values based on a distribution. See 2106.04.(a)(2).III.C.
The claim recites and drawing a last subgraph, as a function of the adapted respective variables which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user randomly selecting a subgraph. See 2106.04.(a)(2).III.C.
The claim recites during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized which is an abstract idea (Mathematical Relationships (see MPEP 2106.04(a)(2)(I)(A)))).
The claim recites providing a directed graph including(recites insignificant extra-solution activity of data gathering (see MPEP 2106.05(g)) which under the broadest reasonable interpretation, covers performance of the limitation in the mind with the aid of pen and paper. The limitations encompass a user drawing out nodes and connect them. See 2106.04.(a)(2).III.C.
Subject Matter Eligibility Analysis Step 2A Prong 2:
computer-implemented(merely recites a generic computer on which to perform the abstract idea, e.g. "apply it on a computer" (see MPEP 2106.05(f)))
one or multiple input and output nodes, which are connected via a multitude of edges and nodes, a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn (merely specifies a particular technological environment in which the abstract idea is to take place, i.e. a field of use (see MPEP 2106.05(h)))
training a machine learning system corresponding to a drawn subgraph of the multitude of subgraphs, wherein … creating the machine learning system corresponding to the last subgraph (merely recites a generic computer on which to perform the abstract idea, e.g. "apply it on a computer" (see MPEP 2106.05(f)))
Subject Matter Eligibility Analysis Step 2B:
Additional elements (a) and (c) do not integrate the abstract idea into a practical application nor do the additional limitation provide significantly more than the abstract idea because the limitation amount to no more than mere instructions to apply the exception using a generic computer component. Please see MPEP §2106.05(f).
Additional elements (d) do not integrate the abstract idea into a practical application nor do the additional limitation provide significantly more than the abstract idea because the limitation merely specifies a field of use in which the abstract idea is to take place, i.e. a field of use (see MPEP 2106.05(h)).
The additional element(s) (a) (b) and (c) in claim 1 do/does not include any additional elements , when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor significantly more than the judicial exception for the reasons set forth in step 2A prong 2 analysis above. The claim is not patent eligible.
Regarding claim 2:
The rejection of claim 1 is incorporated and further claim recites further additional
elements/limitations:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites wherein, when a measure of the distribution of the values of the respective variables relative to a predefined target measure of a target distribution is greater which is an abstract idea (Mathematical Relationships (see MPEP 2106.04(a)(2)(I)(A)))).
The claim recites the respective variables are changed in such a way that edges having an essentially equal probability are drawn which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user deciding probability for a set of a data/a probability distribution. See 2106.04.(a)(2).III.C.
Subject Matter Eligibility Analysis Step 2A Prong 2:
The claim does not contain elements that would warrant a Step 2A Prong 2 analysis.
Subject Matter Eligibility Analysis Step 2B:
The claim does not include any additional element, when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 3:
The rejection of claim 1 is incorporated and further claim recites further additional
elements/limitations:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites wherein the change of the respective variables takes place as a function of an entropy of the directed graph, and a number of training steps which have already been carried out which, under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user changing variables based on judging a variable and how far along the training process is. See 2106.04.(a)(2).III.C.
Subject Matter Eligibility Analysis Step 2A Prong 2:
The claim does not contain elements that would warrant a Step 2A Prong 2 analysis.
Subject Matter Eligibility Analysis Step 2B:
The claim does not include any additional element, when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 4:
The rejection of claim 3 is incorporated and further claim recites further additional
elements/limitations:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites when the entropy is greater than a predefined target entropy, which is an abstract idea (Mathematical Relationships (see MPEP 2106.04(a)(2)(I)(A)))).
The claim recites a parameter by which the respective variables are changed is changed in such a way that it changes values of the respective variables so that the probability distribution characterizing the respective variables has a lesser similarity to a uniform distribution, which, under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user choosing a probability distribution that is less uniform. See 2106.04.(a)(2).III.C.
The claim recites and when the ascertained entropy is smaller than the predefined target entropy, which is an abstract idea (Mathematical Relationships (see MPEP 2106.04(a)(2)(I)(A)))).
The claim recites the parameter is changed in such a way that it changes values of the respective variables, so that the probability distribution characterizing the respective variables characterizes a uniform distribution which, under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user choosing a probability distribution that is more uniform. See 2106.04.(a)(2).III.C.
Subject Matter Eligibility Analysis Step 2A Prong 2:
The claim does not contain elements that would warrant a Step 2A Prong 2 analysis.
Subject Matter Eligibility Analysis Step 2B:
The claim does not include any additional element, when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 6:
The rejection of claim 1 is incorporated and further claim recites further additional
elements/limitations:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites wherein the change of the respective variables takes place using a temperature scaling which is an abstract idea (Mathematical Calculations (see MPEP 2106.04(a)(2)(I)(C))).
Subject Matter Eligibility Analysis Step 2A Prong 2:
The claim does not contain elements that would warrant a Step 2A Prong 2 analysis.
Subject Matter Eligibility Analysis Step 2B:
The claim does not include any additional element, when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 7:
The rejection of claim 6 is incorporated and further claim recites further additional
elements/limitations:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites wherein, during the temperature scaling, the respective variables are scaled as a function of a temperature which is changed as a function of the distribution of the values of the respective variables which is an abstract idea (Mathematical Calculations (see MPEP 2106.04(a)(2)(I)(C))).
Subject Matter Eligibility Analysis Step 2A Prong 2:
The claim does not contain elements that would warrant a Step 2A Prong 2 analysis.
Subject Matter Eligibility Analysis Step 2B:
The claim does not include any additional element, when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.
Regarding claim 8:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites randomly drawing a multitude of subgraphs by the directed graph as a function of the respective variables… in which edges are drawn as a function of the respective variables assigned to the edges, according to the exploration probability and edges are drawn based on a probability sampled from a uniform distribution of probabilities, according to the exploration probability which under the broadest reasonable interpretation, covers performance of the limitation in the mind with aid of pencil and paper. The limitations encompass a user randomly drawing a graph with respect to variables. See 2106.04.(a)(2).III.C.
The claim recites the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables… wherein the change of the respective variables takes place as a function of an exploration probability which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user changing values based on a distribution. See 2106.04.(a)(2).III.C.
The claim recites and drawing a last subgraph, as a function of the adapted respective variables which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user randomly selecting a subgraph. See 2106.04.(a)(2).III.C.
The claim recites during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized which is an abstract idea (Mathematical Relationships (see MPEP 2106.04(a)(2)(I)(A)))).
The claim recites provide a directed graph including(recites insignificant extra-solution activity of data gathering (see MPEP 2106.05(g)) which under the broadest reasonable interpretation, covers performance of the limitation in the mind with the aid of pen and paper. The limitations encompass a user drawing out nodes and connect them. See 2106.04.(a)(2).III.C.
Subject Matter Eligibility Analysis Step 2A Prong 2:
A non-transitory machine-readable memory element on which is stored a computer program for creating a machine learning system, the computer program, when executed by a computer, causing the computer to perform the following(merely recites a generic computer on which to perform the abstract idea, e.g. "apply it on a computer" (see MPEP 2106.05(f)))
one or multiple input and output nodes, which are connected via a multitude of edges and nodes, a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn (merely specifies a particular technological environment in which the abstract idea is to take place, i.e. a field of use (see MPEP 2106.05(h)))
training a machine learning system corresponding to a drawn subgraph of the multitude of subgraphs, wherein … creating the machine learning system corresponding to the last subgraph (merely recites a generic computer on which to perform the abstract idea, e.g. "apply it on a computer" (see MPEP 2106.05(f)))
Subject Matter Eligibility Analysis Step 2B:
Additional elements (a) and (c) do not integrate the abstract idea into a practical application nor do the additional limitation provide significantly more than the abstract idea because the limitation amount to no more than mere instructions to apply the exception using a generic computer component. Please see MPEP §2106.05(f).
Additional elements (d) do not integrate the abstract idea into a practical application nor do the additional limitation provide significantly more than the abstract idea because the limitation merely specifies a field of use in which the abstract idea is to take place, i.e. a field of use (see MPEP 2106.05(h)).
The additional element(s) (a) (b) and (c) in claim 1 do/does not include any additional elements , when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor significantly more than the judicial exception for the reasons set forth in step 2A prong 2 analysis above. The claim is not patent eligible.
Regarding claim 9:
Subject Matter Eligibility Analysis Step 2A Prong 1:
The claim recites randomly drawing a multitude of subgraphs by the directed graph as a function of the respective variables… in which edges are drawn as a function of the respective variables assigned to the edges, according to the exploration probability and edges are drawn based on a probability sampled from a uniform distribution of probabilities, according to the exploration probability which under the broadest reasonable interpretation, covers performance of the limitation in the mind with aid of pencil and paper. The limitations encompass a user randomly drawing a graph with respect to variables. See 2106.04.(a)(2).III.C.
The claim recites the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables… wherein the change of the respective variables takes place as a function of an exploration probability which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user changing values based on a distribution. See 2106.04.(a)(2).III.C.
The claim recites and drawing a last subgraph, as a function of the adapted respective variables which under the broadest reasonable interpretation, covers performance of the limitation in the mind. The limitations encompass a user randomly selecting a subgraph. See 2106.04.(a)(2).III.C.
The claim recites during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized which is an abstract idea (Mathematical Relationships (see MPEP 2106.04(a)(2)(I)(A)))).
The claim recites providing a directed graph including(recites insignificant extra-solution activity of data gathering (see MPEP 2106.05(g)) which under the broadest reasonable interpretation, covers performance of the limitation in the mind with the aid of pen and paper. The limitations encompass a user drawing out nodes and connect them. See 2106.04.(a)(2).III.C.
Subject Matter Eligibility Analysis Step 2A Prong 2:
device configured to create a machine learning system(merely recites a generic computer on which to perform the abstract idea, e.g. "apply it on a computer" (see MPEP 2106.05(f)))
one or multiple input and output nodes, which are connected via a multitude of edges and nodes, a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn (merely specifies a particular technological environment in which the abstract idea is to take place, i.e. a field of use (see MPEP 2106.05(h)))
train a machine learning system corresponding to a drawn subgraph of the multitude of subgraphs, wherein … create the machine learning system corresponding to the last subgraph (merely recites a generic computer on which to perform the abstract idea, e.g. "apply it on a computer" (see MPEP 2106.05(f)))
Subject Matter Eligibility Analysis Step 2B:
Additional elements (a) and (c) do not integrate the abstract idea into a practical application nor do the additional limitation provide significantly more than the abstract idea because the limitation amount to no more than mere instructions to apply the exception using a generic computer component. Please see MPEP §2106.05(f).
Additional elements (d) do not integrate the abstract idea into a practical application nor do the additional limitation provide significantly more than the abstract idea because the limitation merely specifies a field of use in which the abstract idea is to take place, i.e. a field of use (see MPEP 2106.05(h)).
The additional element(s) (a) (b) and (c) in claim 1 do/does not include any additional elements , when considered separately and in combination, that amount to an integration of the judicial exception into a practical application, nor significantly more than the judicial exception for the reasons set forth in step 2A prong 2 analysis above. The claim is not patent eligible.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1 and 5-9 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Xie et al. (SNAS: STOCHASTIC NEURAL ARCHITECTURE SEARCH, henceforth known as Xie).
Regarding claim 1:
Xie discloses a computer-implemented method(Xie, Page 6, Footnote 2, “All the experiments were performed using NVIDIA TITAN Xp GPUs”) for creating a machine learning system(Xie, Page 5, Paragraph 7, “First, SNAS is applied to search for convolutional cells in a small parent network on CIFAR-10 and we choose the best cells based on their search validation accuracy. Then, a larger network is constructed by stacking the learned cells (child graphs) and is retrained on CIFAR-10 to compare the performance of SNAS with other state-of-the-art methods.”)
Xie discloses providing a directed graph including one or multiple input and output nodes, which are connected via a multitude of edges and nodes(Xie, Figure 1, where Figure 1 shows the nodes 0, 1, 2, and 3, multiple edges connecting the nodes and multiple input and output nodes), a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn;(Xie, Page 3, Paragraph 2, “As shown in the left of Figure 1, the search space, i.e. a cell, is represented using a directed acyclic graph (DAG), which is called parent graph. Nodes xi in this DAG represent latent representation, whose dimensions are simply ignored to avoid abuse of notations. In convolutional networks, they are feature maps. Edges (i, j) represent information flows and possible operations Oi,j to be selected between two nodes xi and xj.” where Zi,j is a variable representing a architecture distribution over edges is considered a respective variable being assigned to each respective edge which characterizes a probability with which the respective edge is drawn (See also: Xie, Page 3, Paragraph 4, “Thanks to the fact that the volume of structural decisions, which pick ~O i;j for edge (i; j), is generally tractable in a cell, we represent it with a distribution p(Z). Multiplying each one-hot random variable Zi;j to each edge (i; j) in the DAG, we obtain a child graph”)
Xie discloses randomly drawing a multitude of subgraphs by the directed graph(Xie, Page 3, Paragraph 1 and Equation 2, “…the search space is represented with a set of one-hot random variables from a fully factorizable joint distribution, multiplied as a mask to select operations in the graph” where the parent DAG has edges (i, j) with multiple operations and one operation is sample per edge to form a child graph()) as a function of the respective variables(Xie, Page 3, Equation 2, where Zij child graph has intermediate nodes indicating what operation is to be used and the selection is stochastic which is considered drawing random subgraphs based on the function of respective variables), the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables(Xie, Page 3, Paragraph 5, “In SNAS, we simply assume that p(Z) is fully factorizable, whose factors are parameterized with α and learnt along with operation parameters θ” where the distribution of p(Z) that are parameterized with α with operation parameters θ being updated using gradient descent corresponds to updating respective variables of the original graph), wherein the change of the respective variables takes place as a function of an exploration probability(Xie, Page 2, Paragraph 1, “Sampling from this search space is made differentiable by relaxing the architecture distribution with concrete distribution” where the parent DAG has edges (i, j) with multiple operations and one operation is sampling per edge to form a child graph using the concrete distribution as a continuous relaxation of pα(Z) corresponds to changing respective variables and drawing edges as a function of the respective variables of an exploration probability), in which edges are drawn as a function of the respective variables assigned to the edges, according to the exploration probability(Xie, Page 4, Equation 5, where Zki,j is the determines how that edge contributes to the sampled child graph and equation 5 shows explicitly that sampling depends on αi,j and the SoftMax is applied producing the Concrete (Gumbel-SoftMax) distribution which corresponds to edges drawn as a function of the variables assigned to edges according to an exploration probability) and edges are drawn based on a probability sampled from a uniform distribution of probabilities, according to the exploration probability(Xie, Page 4, Equation 5 and paragraph 2, “Uki,j is a uniform random variable” where a uniform random variable corresponds to a random variable with a uniform distribution which is considered have edges drawn based on a probability sampled from a uniform distribution according to the exploration probability)
Xie discloses training a machine learning system corresponding to a drawn subgraph of the multitude of subgraphs(Xie, Page 6, Paragraph 11, “…we follow this assumption in evaluation stage, stacking more cells (child graphs) to build a deeper network. This network is trained from scratch…”), wherein during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized(Xie, Page 4, Equations 6, where Equation 6 provides gradients with respect to parameters(θ) and architecture parameters(α) shows adaptation during training based on respective variables) and drawing a last subgraph, as a function of the adapted respective variables, and creating the machine learning system corresponding to the last subgraph(“First, SNAS is applied to search for convolutional cells in a small parent network on CIFAR-10 and we choose the best cells based on their search validation accuracy. Then, a larger network is constructed by stacking the learned cells (child graphs) and is retrained on CIFAR-10 to compare the performance of SNAS with other state-of-the-art methods.” Where stacking learning child graphs is considered drawing a subgraph with respect to adapted respective variables and creating a machine learning system corresponding to a subgraph)
Regarding claim 6:
The rejection of claim 1 with prior art Xie is incorporated and further:
Xie discloses wherein the change of the respective variables takes place using a temperature scaling(Xie, Page 4, Equation 5 and Paragraph 2, “λ is the temperature of the SoftMax, which is steadily annealed to be close to zero in SNAS” and where Equation 5 shows that the sampling using temperature λ with architecture distribution during training is considered a change in respective variable taking place using temperature scaling)
Regarding claim 7:
The rejection of claim 6 with prior art Xie is incorporated and further:
Xie discloses wherein, during the temperature scaling, the respective variables are scaled as a function of a temperature(Xie, Page 4, Paragraph 2, “λ is the temperature of the SoftMax, which is steadily annealed to be close to zero in SNAS”) which is changed as a function of the distribution of the values of the respective variables(Equation 5, where Zki,j uses 1/λ in SoftMax and the steady annealing of λ during training is tied to the evolving architecture distribution which is considered temperature changing as a function of the distribution of values of the respective variables as early α values are uncertain and will have a higher λ values and, as the α distribution sharpens from training, λ is decreased to encourage hard selections reflecting a learned distribution)
Regarding claim 8:
Xie discloses a non-transitory machine-readable memory element on which is stored a (Xie, Page 6, Footnote 2, “All the experiments were performed using NVIDIA TITAN Xp GPUs”) computer program for creating a machine learning system(Xie, Page 5, Paragraph 7, “First, SNAS is applied to search for convolutional cells in a small parent network on CIFAR-10 and we choose the best cells based on their search validation accuracy. Then, a larger network is constructed by stacking the learned cells (child graphs) and is retrained on CIFAR-10 to compare the performance of SNAS with other state-of-the-art methods.”)
Xie discloses providing a directed graph including one or multiple input and output nodes, which are connected via a multitude of edges and nodes(Xie, Figure 1, where Figure 1 shows the nodes 0, 1, 2, and 3, multiple edges connecting the nodes and multiple input and output nodes), a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn;(Xie, Page 3, Paragraph 2, “As shown in the left of Figure 1, the search space, i.e. a cell, is represented using a directed acyclic graph (DAG), which is called parent graph. Nodes xi in this DAG represent latent representation, whose dimensions are simply ignored to avoid abuse of notations. In convolutional networks, they are feature maps. Edges (i, j) represent information flows and possible operations Oi,j to be selected between two nodes xi and xj” where Zi,j is a variable representing a architecture distribution over edges is considered a respective variable being assigned to each respective edge which characterizes a probability with which the respective edge is drawn (See also: Xie, Page 3, Paragraph 4, “Thanks to the fact that the volume of structural decisions, which pick ~O i;j for edge (i; j), is generally tractable in a cell, we represent it with a distribution p(Z). Multiplying each one-hot random variable Zi;j to each edge (i; j) in the DAG, we obtain a child graph”)
Xie discloses randomly drawing a multitude of subgraphs by the directed graph(Xie, Page 3, Paragraph 1 and Equation 2, “…the search space is represented with a set of one-hot random variables from a fully factorizable joint distribution, multiplied as a mask to select operations in the graph” where the parent DAG has edges (i, j) with multiple operations and one operation is sample per edge to form a child graph()) as a function of the respective variables(Xie, Page 3, Equation 2, where Zij child graph has intermediate nodes indicating what operation is to be used and the selection is stochastic which is considered drawing random subgraphs based on the function of respective variables), the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables(Xie, Page 3, Paragraph 5, “In SNAS, we simply assume that p(Z) is fully factorizable, whose factors are parameterized with α and learnt along with operation parameters θ” where the distribution of p(Z) that are parameterized with α with operation parameters θ being updated using gradient descent corresponds to updating respective variables of the original graph), wherein the change of the respective variables takes place as a function of an exploration probability(Xie, Page 2, Paragraph 1, “Sampling from this search space is made differentiable by relaxing the architecture distribution with concrete distribution” where the parent DAG has edges (i, j) with multiple operations and one operation is sampling per edge to form a child graph using the concrete distribution as a continuous relaxation of pα(Z) corresponds to changing respective variables and drawing edges as a function of the respective variables of an exploration probability), in which edges are drawn as a function of the respective variables assigned to the edges, according to the exploration probability(Xie, Page 4, Equation 5, where Zki,j is the determines how that edge contributes to the sampled child graph and equation 5 shows explicitly that sampling depends on αi,j and the SoftMax is applied producing the Concrete (Gumbel-SoftMax) distribution which corresponds to edges drawn as a function of the variables assigned to edges according to an exploration probability) and edges are drawn based on a probability sampled from a uniform distribution of probabilities, according to the exploration probability(Xie, Page 4, Equation 5 and paragraph 2, “Uki,j is a uniform random variable” where a uniform random variable corresponds to a random variable with a uniform distribution which is considered have edges drawn based on a probability sampled from a uniform distribution according to the exploration probability)
Xie discloses training a machine learning system corresponding to a drawn subgraph of the multitude of subgraphs(Xie, Page 6, Paragraph 11, “…we follow this assumption in evaluation stage, stacking more cells (child graphs) to build a deeper network. This network is trained from scratch…”), wherein during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized(Xie, Page 4, Equations 6, where Equation 6 provides gradients with respect to parameters(θ) and architecture parameters(α) shows adaptation during training based on respective variables) and drawing a last subgraph, as a function of the adapted respective variables, and creating the machine learning system corresponding to the last subgraph(“First, SNAS is applied to search for convolutional cells in a small parent network on CIFAR-10 and we choose the best cells based on their search validation accuracy. Then, a larger network is constructed by stacking the learned cells (child graphs) and is retrained on CIFAR-10 to compare the performance of SNAS with other state-of-the-art methods.” Where stacking learning child graphs is considered drawing a subgraph with respect to adapted respective variables and creating a machine learning system corresponding to a subgraph)
Regarding claim 9:
Xie discloses a device configured to (Xie, Page 6, Footnote 2, “All the experiments were performed using NVIDIA TITAN Xp GPUs”) create a machine learning system (Xie, Page 5, Paragraph 7, “First, SNAS is applied to search for convolutional cells in a small parent network on CIFAR-10 and we choose the best cells based on their search validation accuracy. Then, a larger network is constructed by stacking the learned cells (child graphs) and is retrained on CIFAR-10 to compare the performance of SNAS with other state-of-the-art methods.”)
Xie discloses providing a directed graph including one or multiple input and output nodes, which are connected via a multitude of edges and nodes(Xie, Figure 1, where Figure 1 shows the nodes 0, 1, 2, and 3, multiple edges connecting the nodes and multiple input and output nodes), a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn;(Xie, Page 3, Paragraph 2, “As shown in the left of Figure 1, the search space, i.e. a cell, is represented using a directed acyclic graph (DAG), which is called parent graph. Nodes xi in this DAG represent latent representation, whose dimensions are simply ignored to avoid abuse of notations. In convolutional networks, they are feature maps. Edges (i, j) represent information flows and possible operations Oi,j to be selected between two nodes xi and xj” where Zi,j is a variable representing a architecture distribution over edges is considered a respective variable being assigned to each respective edge which characterizes a probability with which the respective edge is drawn (See also: Xie, Page 3, Paragraph 4, “Thanks to the fact that the volume of structural decisions, which pick ~O i;j for edge (i; j), is generally tractable in a cell, we represent it with a distribution p(Z). Multiplying each one-hot random variable Zi;j to each edge (i; j) in the DAG, we obtain a child graph”)
Xie discloses randomly drawing a multitude of subgraphs by the directed graph(Xie, Page 3, Paragraph 1 and Equation 2, “…the search space is represented with a set of one-hot random variables from a fully factorizable joint distribution, multiplied as a mask to select operations in the graph” where the parent DAG has edges (i, j) with multiple operations and one operation is sample per edge to form a child graph()) as a function of the respective variables(Xie, Page 3, Equation 2, where Zij child graph has intermediate nodes indicating what operation is to be used and the selection is stochastic which is considered drawing random subgraphs based on the function of respective variables), the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables(Xie, Page 3, Paragraph 5, “In SNAS, we simply assume that p(Z) is fully factorizable, whose factors are parameterized with α and learnt along with operation parameters θ” where the distribution of p(Z) that are parameterized with α with operation parameters θ being updated using gradient descent corresponds to updating respective variables of the original graph), wherein the change of the respective variables takes place as a function of an exploration probability(Xie, Page 2, Paragraph 1, “Sampling from this search space is made differentiable by relaxing the architecture distribution with concrete distribution” where the parent DAG has edges (i, j) with multiple operations and one operation is sampling per edge to form a child graph using the concrete distribution as a continuous relaxation of pα(Z) corresponds to changing respective variables and drawing edges as a function of the respective variables of an exploration probability), in which edges are drawn as a function of the respective variables assigned to the edges, according to the exploration probability(Xie, Page 4, Equation 5, where Zki,j is the determines how that edge contributes to the sampled child graph and equation 5 shows explicitly that sampling depends on αi,j and the SoftMax is applied producing the Concrete (Gumbel-SoftMax) distribution which corresponds to edges drawn as a function of the variables assigned to edges according to an exploration probability) and edges are drawn based on a probability sampled from a uniform distribution of probabilities, according to the exploration probability(Xie, Page 4, Equation 5 and paragraph 2, “Uki,j is a uniform random variable” where a uniform random variable corresponds to a random variable with a uniform distribution which is considered have edges drawn based on a probability sampled from a uniform distribution according to the exploration probability)
Xie discloses training a machine learning system corresponding to a drawn subgraph of the multitude of subgraphs(Xie, Page 6, Paragraph 11, “…we follow this assumption in evaluation stage, stacking more cells (child graphs) to build a deeper network. This network is trained from scratch…”), wherein during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized(Xie, Page 4, Equations 6, where Equation 6 provides gradients with respect to parameters(θ) and architecture parameters(α) shows adaptation during training based on respective variables) and drawing a last subgraph, as a function of the adapted respective variables, and creating the machine learning system corresponding to the last subgraph(“First, SNAS is applied to search for convolutional cells in a small parent network on CIFAR-10 and we choose the best cells based on their search validation accuracy. Then, a larger network is constructed by stacking the learned cells (child graphs) and is retrained on CIFAR-10 to compare the performance of SNAS with other state-of-the-art methods.” Where stacking learning child graphs is considered drawing a subgraph with respect to adapted respective variables and creating a machine learning system corresponding to a subgraph)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2 are rejected under 35 U.S.C. 103 as being obvious over Xie et al. (SNAS: STOCHASTIC NEURAL ARCHITECTURE SEARCH, henceforth known as Xie) and Chen et al. (DRNAS: DIRICHLET NEURAL ARCHITECTURE SEARCH, henceforth known as Chen)
Regarding claim 2:
The rejection of claim 1 with prior art Xie is incorporated and further:
Chen discloses wherein, when a measure of the distribution of the values of the respective variables(Chen, “we select Dirichlet distribution to model its behavior, i.e., q(θ|β) ~ Dir(β), where β represents the Dirichlet concentration parameter” where β is the Dirichlet concentration distribution learned parameter controlling sampling and β is considered a measure of a distribution) relative to a predefined target measure of a target distribution(Chen, Page 3, Paragraph 5, “Therefore, we add a penalty term in the objective (2) to regularize the distance between and the anchor β* = 1, which corresponds to a symmetric Dirichlet” where β* is considered a predefined target measure of a target distribution) is greater(Chen, Page 4, Equation 5 and Proposition 1, where the operationally if β >1 the penalty term is increased to bring β closer to 1) , the respective variables are changed in such a way that edges having an essentially equal probability are drawn(Chen, Page 3, Paragraph 5, “Therefore, we add a penalty term in the objective (2) to regularize the distance between and the anchor β* = 1, which corresponds to a symmetric Dirichlet” where a symmetric Dirichlet is a uniform distribution and is considered an equal probability as all concentration parameters equal to 1)
References Xie and Chen are analogous art because they are from the same field of endeavor of automated design and optimization of neural architecture search(NAS) for improved performance of machine learning tasks.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Xie and Chen before him or her, to modify the Gumbel-SoftMax of Xie to include the Dirichlet sampling of Chen to model probabilities on each edge as a distribution and maintain more control over exploration, exploitation and sparse sampling. The suggestion/motivation for doing so would have been “The concentration parameter β controls the sampling behavior of Dirichlet distribution and is crucial in balancing exploration and exploitation during the search phase. Let βo denote the concentration parameter assign to operation o. When βo << 1 for most o = 1 ~ |O|, Dirichlet tends to produce sparse samples with high variance, reducing the training stability; when βo >> 1 for most o = 1 ~ |O|, the samples will be dense with low variance, leading to insufficient exploration.”(Chen, Page 3, Paragraph 5)
Claim(s) 3-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xie et al. (SNAS: STOCHASTIC NEURAL ARCHITECTURE SEARCH, henceforth known as Xie) and Haarnoja et al. (Soft Actor-Critic Algorithms and Applications, henceforth known as Haarnoja)
Regarding claim 3:
The rejection of claim 1 with prior art Xie is incorporated and further:
Haarnoja discloses wherein the change of the respective variables takes place as a function of an entropy of the directed graph(Haarnoja, Page 7, Paragraph 2, “Our aim is to find a stochastic policy with maximal expected return that satisfies a minimum expected entropy constraint” where minimum expected entropy constraint is considered a predefined target entropy), and a number of training steps which have already been carried out(Haarnoja, Page 2, Paragraph 2, “To resolve this issue, we devise an automatic gradient-based temperature tuning method that adjusts the expected entropy over the visited states to match a target value” where the temperature(entropy control) adapts continuously as training proceeds is considered changing respective variables as a function of entropy and a number of training steps)
Regarding claim 4:
The rejection of claim 3 with prior art Xie-Haarnoja is incorporated and further:
Haarnoja discloses when the entropy is greater than a predefined target entropy(Haarnoja, Page 12, Equation 18, where the H is the predefined target entropy and -logπt(at|st) is the instantaneous/current entropy for the sampled action at at state st and the gradient finds the difference between the predefined target and instantaneous entropy), a parameter by which the respective variables are changed is changed in such a way that it changes values of the respective variables (Haarnoja, Page 12, Equation 18, when current entropy is greater than target entropy, the gradient is negative and the temperature(α) decreases which decreases the emphasis on the entropy term in policy objective, which is considered changing respective variables as this will change training/sampling by altering exploration behavior and learning trajectory), so that the probability distribution characterizing the respective variables has a lesser similarity to a uniform distribution(Haarnoja, Page 12, Equation 18, where a lower temperature produces a sharper probability distribution and is considered less similar to a uniform distribution)
Haarnoja discloses and when the ascertained entropy is smaller than the predefined target entropy(Haarnoja, Page 12, Equation 18, where the H is the predefined target entropy and -logπt(at|st) is the instantaneous/current entropy for the sampled action at at state st and the gradient finds the difference between the predefined target and instantaneous entropy), the parameter is changed in such a way that it changes values of the respective variables(Haarnoja, Page 12, Equation 18, when current entropy is less than target entropy, the gradient is positive and the temperature(α) increases which increases the emphasis on the entropy term in policy objective, which is considered changing respective variables as this will change training/sampling by altering exploration behavior and learning trajectory), so that the probability distribution characterizing the respective variables characterizes a uniform distribution(Haarnoja, Page 12, Equation 18, where a higher temperature produces a flatter probability distribution and is considered more similar to a uniform distribution)
References Xie and Haarnoja are analogous art because they are from the same field of endeavor of using gradient-based optimization of training deep networks.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Xie and Haarnoja before him or her, to modify the Gumbel-SoftMax of Xie to include the entropy evaluation of Haarnoja to maximize randomness while achieving a goal to encourage trying various actions instead of narrowing in on a deterministic strategy. The suggestion/motivation for doing so would have been “The maximum entropy objective has a number of conceptual and practical advantages. First, the policy is incentivized to explore more widely, while giving up on clearly unpromising avenues. Second, the policy can capture multiple modes of near-optimal behavior. In problem settings where multiple actions seem equally attractive, the policy will commit equal probability mass to those actions”(Haarnoja, Page 4, Paragraph 3)
Response to Arguments
Applicant's arguments filed 10/24/2025 have been fully considered but they are not persuasive. A breakdown can be found below:
112:
Examiner finds that amended language has overcome previous 112 rejections.
101:
Applicant appears to argue on page 6-7 that a technological improvement citing to the specification and the use of an exploration probability for decision making for subgraph/edge selection generation.
Examiner respectfully disagrees as, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Applicant appears to be interpreting a narrower claim as the current claims do not positively recite additional elements that provide a technological improvement. Examiner’s review of the arguments interprets the improvement prosed by Applicant as being provided by the claimed abstract idea choosing options based on equal probability, which does not result in an improvement in technology. Applicants example does not explain how any additional elements reflect this improvement, or how the improvement is affected by any claimed additional elements. At best applicants example describes in an improvement provided by the claimed abstract idea of making decisions.
102/103:
Applicant appears to argue on page 8-10 that Xie does not disclose randomly drawing a multitude of subgraphs by the directed graph as a function of respective variables, the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables, wherein the change of the respective variables take place as a function of an exploration probability in which: edges are drawn as a function of the respective variables assigned to the edges according to the exploration probability and edges are drawn based on a probability sampled from a uniform distribution of probabilities according to the exploration probability as recited in claim 1.
Examiner respectfully disagrees as Xie discloses “randomly drawing a multitude of subgraphs by the directed graph as a function of respective variables” as Xie, Page 3, Paragraph 4, “Thanks to the fact that the volume of structural decisions, which pick ~O i;j for edge (i; j)…Multiplying each one-hot random variable Zi;j to each edge (i; j) in the DAG, we obtain a child graph” shows the that for each edge i,j there is an operation performed to obtain child graphs.
Xie discloses "the respective variables being changed in the directed graph as a function of a distribution of values of the respective variables” as Xie, Page 3, Paragraph 5, “In SNAS, we simply assume that p(Z) is fully factorizable, whose factors are parameterized with α and learnt along with operation parameters θ” shows that the persistent variables in the original graph are parameterized with the distribution pα(Z) and are changed as the sampled graphed edges influence the loss that produces computed gradients and the gradient descent updates the original variables(See Xie, Page 3, Equation 3, where Equation 3 shows the expected loss over the sample and Xie, Page 4, Paragraph 3 and Equation 6, Here with the surrogate loss L for each sample, we provide its gradient w.r.t xj , θki,j and αki,j” where Equation 6 and Paragraph 3 explains the gradients are computer after evaluating loss and computed with respect to original distribution parameters α)
Xia discloses “wherein the change of the respective variables take place as a function of an exploration probability in which: edges are drawn as a function of the respective variables assigned to the edges according to the exploration probability and edges are drawn based on a probability sampled from a uniform distribution of probabilities according to the exploration probability” as Xie, Page 2, Paragraph 1, “Sampling from this search space is made differentiable by relaxing the architecture distribution with concrete distribution” as the parent DAG has edges (i, j) with multiple operations and one operation is sampling per edge to form a child graph using the concrete distribution as a continuous relaxation drawn from pα(Z) that corresponds to a exploration probability with Xie, Page 4, Equation 5 and paragraph 2, “Uki,j is a uniform random variable” where a uniform random variable corresponds to a random variable with a uniform distribution(every value is between 0 and 1 is equally likely i.e. (0,1)) which is considered have edges drawn based on a probability sampled from a uniform distribution according to the exploration probability as the concrete distribution uses the uniform random variable and the concrete distribution is the mechanism to sample the edge-selection variables that determine sampled child graphs)
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES JEFFREY JONES JR whose telephone number is (703)756-1414. The examiner can normally be reached Monday - Friday 8:00 - 5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.J.J./Examiner, Art Unit 2122
/KAKALI CHAKI/ Supervisory Patent Examiner, Art Unit 2122