Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1, 7, 8, and 9-11 have been amended by Applicant. Claim 2 is cancelled and no new claims have been added. Claim 1 and 3-11 are currently pending.
Response to Arguments
Claim Rejections under 35 U.S.C. 112(a)
The rejection of claims 9-11 under 35 U.S.C. 112(a) has been maintained. Claims 9-11 (as amended) have been further rejected under 35 U.S.C. 112(a) for lack of written description supporting the amendments to said claims.
Applicant's arguments filed 10/17/2025 have been fully considered but they are not persuasive.
Applicant’s arguments as to the standing rejections of claims 9-11 under 35 U.S.C. 112 appear to only address the rejection of claims 9-11 under 35 U.S.C. 112(b) for lack of clarity as to the unamended limitation reciting “as determined by each type of facility included in the system”. However, no arguments or remarks were provided as to the rejection under 35 U.S.C. 112(a) for lack of written description supporting the claim limitation “each respective convolution function is implemented as an activation function of a convolution layer of the graph convolution neural network”, as recited in unamended claims 9-11. Hence the rejection of claims 9-11 under 35 U.S.C. 112(a) as to this limitation has been maintained.
Applicant has further amended claims 9-11 to similarly recite “and the activation function assigned to each facility type is determined based on the operational characteristics of the facility type, such that at least two different facility types are assigned different activation functions”. Nevertheless, said amended limitation also lacks support in Applicant’s original disclosure. Thus, claims 9-11 have been further rejected under 35 U.S.C. 112(a).
Applicant has pointed to paragraphs [0010], [0016], and [0032] as supporting the amendments to claims 1, 7, 8, and 9-11. However, upon examination of the cited paragraphs, they contain no disclosure as to the claimed activation function nor the rest of the claimed features in claims 9-11 (as amended).
As stated in the rejection of claims 9-11 (as amended) under 35 U.S.C. 112(a) the only reference to “activation function” that Examiner could find is in Paragraph [0030] of the Specification. To this effect, Pararagraph [0030] of the Specification only contains a generic disclosure of “inputting vectors of an output layer to an activation function-such as a sigmoid function, an ReLU, and a softmax function”. Hence, neither paragraph [0030] of the specification nor the rest of Applicant’s original disclosure support the limitations “each respective convolution function is implemented as an activation function of a convolution layer of the graph convolution neural network” nor “and the activation function assigned to each facility type is determined based on the operational characteristics of the facility type, such that at least two different facility types are assigned different activation functions”. Furthermore, there are several references to the “convolution function” in the Specification but it is not clear how it is defined. Hence, claims 9-11 (as amended) have been rejected under 35 U.S.C. 112(a), as the claimed subject matter which was not described in Applicant’s original disclosure in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor had possession of the claimed invention, at the time the application was filed.
Claim Rejections under 35 U.S.C. 112(b)
The rejection of claims 9-11 (as amended) under 35 U.S.C. 112(b), as to the limitation “as determined by each type of facility included in the system”, has been withdrawn in view of Applicants amendment to claims 9-11 eliminating said limitation.
Claim Rejections under 35 U.S.C. 103
The rejection of claims 1-11 (as amended) has been withdrawn in view of Applicant’s amendments of independent claims 1, 7, and 8. However, upon further consideration and in view of said amendments a new grounds of rejection has been made under 35 U.S.C. 103.
Applicant's arguments filed 10/17/2025 have been fully considered but they are not persuasive.
Applicant argues that the combination of Lee in view of Shelhamer and Li does not teach or suggest the selection and optimization of the activation function “per facility type” (in page 10 of Applicant’s remarks), in claims 1, 7, and 8 (as amended).
Examiner respectfully disagrees with Applicant’s argument as it is directly contradicted by the Li reference itself. As stated in the instant office action in the rejection of claim 1 (as amended) Li has been shown to teach the limitation wherein a selection of the activation function for each facility type is optimized… based on a facility-specific performance metric. To this effect, Li, Paragraph [0062] has been cited as teaching the recommendation engine can update the weights using a backpropagation technique, e.g., backpropagation using stochastic gradient descent [i.e., optimization]. For each layer, the recommendation engine can compute a gradient of the activation function against a ground-truth value. The ground-truth can be the historical performance metric of a computational graph on each type of computing device. Because Li teaches optimizing (using stochastic gradient descent) of the activation function for each type of computing device, Li (Paragraph [0062]) has been understood to read on the argued limitation. Hence, claims 1, 7, and 8 (as amended) have been rejected under 35 U.S.C. 103.
Examiner notes that claims 1, 7, and 8 have been further rejected under 35 U.S.C. 103 over the new grounds in further view of Schmidhuber based on the amended limitation reciting wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric. To that effect, Schmidhuber, Paragraph [0082] teaches we apply the basic idea to the incremental skill training of ONE [i.e., an LSTM with a front-end in form of a convolutional neural net (CNN) implemented on fast graphics processing units GPUs]. Both the predictive skills acquired by gradient descent and the task-specific control skills acquired by black box optimization are collapsed into one single network (namely, ONE itself) through pure gradient descent, by retraining ONE on all input-output traces of all previously learned behaviors that are still deemed useful. Towards this end, ONE is retrained to reproduce control behaviors of successful past versions of ONE, but without really executing the behaviors in the environment (usually the expensive part). Simultaneously, all input-output traces ever observed (including those of failed trials) can be used to train ONE to become a better predictor of future inputs, given previous inputs and actions. Of course, this requires storing input-output traces of all trials (e.g., in a computer-based memory storage device, not shown in FIG. 1). That is, once a new skill has been learned, e.g., by a copy of ONE (or even by another machine learning device), e.g., through slow trial and error-based evolution or reinforcement learning, ONE can be retrained in through gradient-based methods on stored input/output traces of all previously learned control and prediction skills still considered worth memorizing. In particular, standard gradient descent through backpropagation in discrete graphs of nodes with differentiable activation functions can be used to squeeze many expensively evolved skills into the limited computational resources of ONE. [Note: [0082] understood to teach optimization of an activation function by stochastic gradient descent and reinforcement learning based on limited computational resources of the machine learning device]; Schmidhuber, Paragraph [0004] further teaches the LSTM may have a front-end in form of a convolutional neural net (CNN) implemented on fast graphics processing units GPUs. Such a CNN-LSTM combination may be considered an RNN for purposes of the current disclosure.; Schmidhuber [0109] further teaches ONE's agent may be virtually any kind of physical system, component, or process facilitated or performed by a physical system or component. ONE's agent may include any one or more of a variety of different kinds of sensors, etc. Moreover, ONE's agent may include any one or more of a variety of different kinds of devices or components that are able to perform, or cause to be performed, actions. These devices or components may be or include any one or more of a variety of motors, actuators, etc. [Note: [0082] in view of [0109] understood to read on optimization of an activation function per facility type].
Hence, claims 1, 7, and 8 (as amended) have been rejected over the new grounds of rejection under 35 U.S.C. 103 over the combination of Lee in view of Shelhamer, Li and Schmidhuber.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 9-11 (as amended) are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Analogous claims 9-11 recite “wherein the graph structure is a graph convolution neural network, and each respective convolution function is implemented as an activation function of a convolution layer of the graph convolution neural network, wherein the facility type is defined as a class of nodes in the graph structure corresponding to a specific physical or logical function in the system, and the activation function assigned to each facility type is determined based on the operational characteristics of the facility type, such that at least two different facility types are assigned different activation functions”. However, the Specification does not support the claimed limitations “each respective convolution function is implemented as an activation function of a convolution layer of the graph convolution neural network” nor “and the activation function assigned to each facility type is determined based on the operational characteristics of the facility type, such that at least two different facility types are assigned different activation functions”. The only reference to “activation function” that Examiner could find is in Paragraph [0030] of the Specification which only states as follows:
For example, the neural network generator 100 determines a coefficient aij in accordance with a rule based on a graph attention network. FIG. 8 is a diagram for explaining a method in which the neural network generator 100 determines a coefficient aij. The neural network generator 100 derives a coefficient aij by inputting a vector (Whi,Whj) obtained by combining a vector Whi obtained by multiplying an amount of feature hi of an assumption node RNi which is a propagation source by a propagation matrix W with a vector Whj obtained by multiplying an amount of feature hj of an assumption node RNj which is a propagation destination by the propagation matrix W to an individual neural network a (attention), inputting vectors of an output layer to an activation function-such as a sigmoid function, an ReLU, and a softmax function, normalizing the vectors, and adding the vectors. The individual neural network a includes parameters and the like obtained in advance for an event to be analyzed.
As can be seen, Pararagraph [0030] of the Specification only contains a generic disclosure of “inputting vectors of an output layer to an activation function-such as a sigmoid function, an ReLU, and a softmax function”. Neither paragraph [0030] of the specification nor the rest of Applicant’s original disclosure support the limitations “each respective convolution function is implemented as an activation function of a convolution layer of the graph convolution neural network” nor “and the activation function assigned to each facility type is determined based on the operational characteristics of the facility type, such that at least two different facility types are assigned different activation functions”. Furthermore, there are several references to the “convolution function” in the Specification but it is not clear how it is defined. Hence, claims 9-11 (as amended) have been rejected under 35 U.S.C. 112(a), as the claimed subject matter which was not described in Applicant’s original disclosure in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor had possession of the claimed invention, at the time the application was filed.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 3, 7, and 8 (as amended) are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US 20200285944 A1, filed Mar. 8, 2019 and published Sep. 10, 2020) in view of Shelhamer et al., “Loss is its own Reward: Self-Supervision for Reinforcement Learning” (2017), Li et al. (US 20210073028 A1, filed Oct. 11, 2019 and published Mar. 11, 2021), and Schmidhuber (US 20190197403 A1, filed Dec. 21, 2018 and published Jun. 27, 2019)
Regarding claim 1, Lee teaches an information processing device (Lee, Paragraph [0146] teaches a computing system, such as one including computing system 1500 of FIG. 15, can be configured to perform the illustrative flows and techniques described above according to some embodiments… As stored, the instructions represent programmable modules that include code or data executable by a processor(s) of the computer system.), comprising: a processor (Lee, Paragraph [0146] teaches processor(s)) configured to: a associate a node and an edge with attributes and to define a convolution function associated with a model representing data of a graph structure representing a system structure on the basis of data regarding the graph structure (Lee, Paragraph [0003] teaches performing operations by one or more processing devices based on a graph convolutional neural network model that includes one or more graph convolutional layers. The operations include, by at least one graph convolutional layer of the graph convolutional neural network model, receiving a dataset that identifies a set of entities representable by nodes in a graph, features for each respective entity that are representable by attributes of the corresponding node in the graph, and connections among the set of entities, where the connections are representable by edges connecting the nodes in the graph.); input a state of the system into the model, the processor being configured to obtain, for each time step, a policy function as a probability distribution of a structural change and a state value function for reinforcement learning for a system of one or more structurally changed models which have been changed with assumable structural changes from the model for each time step (Lee, Paragraph [0027] further teaches the attention mechanism uses a node state matrix and two trainable functions to produce a probability vector indicating the relevancy of different motifs and a probability vector indicating the relevancy of different step sizes for each respective target node.; Lee, Paragraph [0026] teaches each graph convolutional layer of the MCN is configured to select, for each node in a set of nodes in the graph, a respective type of motif from multiple pre-defined types of motifs (e.g., edges, triangles, etc.) and a respective step size k from a set of step sizes (e.g., 1 to K) using an attention mechanism. The attention mechanism uses a node state matrix and two trainable functions to produce a probability vector indicating the relevancy of different motifs and a probability vector indicating the relevancy of different step sizes for each respective target node, and then select the motif and step size for each respective target node based on the probability vectors.; Lee, Paragraph [0100] further teaches The outputs of the two functions ƒ.sub.l and ƒ′.sub.l are softmaxed to form probability distributions over {1, . . . , T} and {1, . . . , K}, respectively. As such, from a node i's state, the functions recommend the most relevant type of motif t and step size k for node i to integrate information from, based on the probability distribution.; Lee, Paragraph [0105] further teaches in some embodiments, the attention mechanism is trained using a second loss function based on reinforcement learning.; Lee, Paragraph [0032] further teaches as used herein, the term “graph convolutional neural network model” refers to a neural network model configured to perform graph convolution on graph-structured data), …
However, Lee does not distinctly disclose the remaining limitations.
Nevertheless, Shelhamer teaches:
… and the processor being configured to evaluate the structural changes in the system on the basis of the policy function (Shelhamer, Section 4.4 teaches tracking the changing policy distribution which so helps more than any potential inference among the RL and auxiliary losses); perform reinforcement learning by using a reward value as a cost generated when the structural change is applied to the system, the state value function, and the model, to optimize the structural change in the system (Shelhamer, Abstract, teaches Reinforcement learning optimizes policies for expected cumulative reward… To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses; Shelhamer, Section 1, Col. 1, teaches end-to-end reinforcement learning (RL) addresses representation learning at the same time as policy optimization.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee, with the reinforcement learning teachings, as taught by Shelhamer in order to improve the data efficiency and policy returns of end-to-end reinforcement learning. (Shelhamer, Abstract).
The combination at the least suggest the limitation define a respective convolution function for each type of facility included in the system (Lee, Paragraph [0050] teaches performing an activation function on the outputs of the graph convolution layer and further teaches several activation functions may be used.; Lee, Paragraph [0003] further teaches The operations include, by at least one graph convolutional layer of the graph convolutional neural network model, receiving a dataset that identifies a set of entities representable by nodes in a graph, features for each respective entity that are representable by attributes of the corresponding node in the graph, … The operations further include classifying an entity in the set of entities or determining a connection between two entities in the set of entities, based on outputs of a graph convolutional layer of the graph convolutional neural network model.). However, Li more clearly teaches the limitation as provided below.
Li teaches define a respective convolution function for each type of facility included in the system (Li, Paragraph [0058] teaches the graph convolutional neural network can be trained jointly with a neural collaborative filtering network—described below—to generate embeddings for similar computational graphs according to an objective function. When the graph convolutional neural network and the neural collaborative filtering network are trained jointly, resultant output from the neural collaborative filtering network is a set of performance metrics for each type of computing device; Li, Paragraph [0059] further teaches at each layer, the GCN can execute one or more activation functions from respective input received at the layer.; Li, Paragraph [0062] further teaches for each layer, the recommendation engine can compute a gradient of the activation function against a ground-truth value. The ground-truth can be the historical performance metric of a computational graph on each type of computing device. The recommendation engine can update weights based on the computed gradient of activation outputs [i.e., used to compute the activation function output]. ; Li, Paragraph [0009] further teaches the recommendation engine can predict a performance metric for each of the different types of computing device present in the distributed computing network. The recommendation engine is configured to generate recommendations for one or more types of computing device to schedule the job to, as well as a quantity specifying the amount of computing resources of each type that is recommended to be assigned. The recommendation can be provided as input to a scheduling system; Li, Paragraph [0015] teaches the machine learning model is configured to receive, as input, a computational graph representing the operations of an input job, and generate, as output, a set of performance metrics measuring predicted performance of the computational graph on each type of computing device in the distributed computing network.; Li, Paragraph [0061] teaches to extract features from inputs and outputs of a given node in the computational graph separately, the GCN can separately aggregate activation function outputs for outputs and inputs of the node, respectively, to learn whether performance of the computational graph on a particular type of computing device is dominated by the inputs or the outputs.; Li, Paragraph [0016] teaches the machine learning model can include a graph convolutional neural network (“GCN”).; [Note: each type of computing device, as disclosed in Li, has been understood to read on each type of facility in the system] ; [Note: the output of the activation function in Li is maintaining the state of the facility.]).
Li further teaches wherein a selection of the activation function for each facility type is optimized… based on a facility-specific performance metric optimizing the activation functions for each layer (Li, Paragraph [0062] teaches the recommendation engine can update the weights using a backpropagation technique, e.g., backpropagation using stochastic gradient descent [i.e., optimization]. For each layer, the recommendation engine can compute a gradient of the activation function against a ground-truth value. The ground-truth can be the historical performance metric of a computational graph on each type of computing device.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee, with the reinforcement learning teachings, as taught by Shelhamer, to further include the features of the recommendation engine comprising a graph convolutional neural network, as taught by Li, in order to predict a performance metric for each of the different types of computing device present in the distributed computing network. (Li, Paragraph [0009]).
However, the combination in view of Li does not distinctly disclose wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric.
Nevertheless, Schmidhuber teaches wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric (Schmidhuber, Paragraph [0082] teaches we apply the basic idea to the incremental skill training of ONE. Both the predictive skills acquired by gradient descent and the task-specific control skills acquired by black box optimization are collapsed into one single network (namely, ONE itself) through pure gradient descent, by retraining ONE on all input-output traces of all previously learned behaviors that are still deemed useful. Towards this end, ONE is retrained to reproduce control behaviors of successful past versions of ONE, but without really executing the behaviors in the environment (usually the expensive part). Simultaneously, all input-output traces ever observed (including those of failed trials) can be used to train ONE to become a better predictor of future inputs, given previous inputs and actions. Of course, this requires storing input-output traces of all trials (e.g., in a computer-based memory storage device, not shown in FIG. 1). That is, once a new skill has been learned, e.g., by a copy of ONE (or even by another machine learning device), e.g., through slow trial and error-based evolution or reinforcement learning, ONE can be retrained in through gradient-based methods on stored input/output traces of all previously learned control and prediction skills still considered worth memorizing. In particular, standard gradient descent through backpropagation in discrete graphs of nodes with differentiable activation functions can be used to squeeze many expensively evolved skills into the limited computational resources of ONE. [Note: [0082] understood to teach optimization of an activation function by stochastic gradient descent and reinforcement learning based on limited computational resources of the machine learning device]; Schmidhuber, Paragraph [0004] further teaches the LSTM may have a front-end in form of a convolutional neural net (CNN) implemented on fast graphics processing units GPUs. Such a CNN-LSTM combination may be considered an RNN for purposes of the current disclosure.; Schmidhuber [0109] further teaches ONE's agent may be virtually any kind of physical system, component, or process facilitated or performed by a physical system or component. ONE's agent may include any one or more of a variety of different kinds of sensors, etc. Moreover, ONE's agent may include any one or more of a variety of different kinds of devices or components that are able to perform, or cause to be performed, actions. These devices or components may be or include any one or more of a variety of motors, actuators, etc. [Note: [0082] in view of [0109] understood to read on optimization of an activation function per facility type]).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee in view of Shelhamer and Lee, to further include the training process of the recurrent neural network, as taught by Schmidhuber. During the recurrent neural network’s training and evolution (described herein), gradient-based compression of policies and data streams simplifies the recurrent neural network, squeezing the essence of the recurrent neural network’s previously learned skills and knowledge into the code implemented within the recurrent weight matrix of the recurrent neural network itself. This can improve the recurrent neural network’s ability to generalize and quickly learn new, related tasks. (Schmidhuber, Paragraph [0010])
Regarding claim 3, the combination of Lee in view of Shelhamer, Li, and Schmidhuber teaches all of the limitations of claim 1, and the combination further teaches wherein the processor is further configured to: output a set of parameters as coefficients of the convolution function obtained as a result of the reinforcement learning, update the set of parameters of the convolution function on the basis of the set of parameter output by the reinforcement learner, reflect the updated set of parameters in the model and to evaluate the model obtained by reflecting the updated set of parameters (Lee Paragraph [0033] teaches the term trainable function refers to a function, at least some parameters of which are determined using techniques such as regression, risk minimization, back propagation, clustering, and the like with or without using training data; Lee, Paragraph [0128] further teaches once the model is trained, the parameters are loaded and prediction is performed.).
Regarding claim 7, Lee teaches a computer-implemented method for processing information by one or more hardware device, the method comprising: associating a node and an edge with attributes; defining a convolution function associated with a model representing data of a graph structure representing a system structure on the basis of data regarding the graph structure (Lee, Paragraph [0003] teaches performing operations by one or more processing devices based on a graph convolutional neural network model that includes one or more graph convolutional layers. The operations include, by at least one graph convolutional layer of the graph convolutional neural network model, receiving a dataset that identifies a set of entities representable by nodes in a graph, features for each respective entity that are representable by attributes of the corresponding node in the graph, and connections among the set of entities, where the connections are representable by edges connecting the nodes in the graph.); inputting a state of the system into the model (Lee, Paragraph [0027] further teaches the attention mechanism uses a node state matrix and two trainable functions to produce a probability vector indicating the relevancy of different motifs and a probability vector indicating the relevancy of different step sizes for each respective target node); obtaining, for each time step, a policy function as a probability distribution of a structural change and a state value function for reinforcement learning for a system of one or more structurally changed models which have been changed with assumable structural changes from the model for each time step, and a processor (Lee, Paragraph [0146] teaches processor(s)) being configured (Lee, Paragraph [0026] teaches each graph convolutional layer of the MCN is configured to select, for each node in a set of nodes in the graph, a respective type of motif from multiple pre-defined types of motifs (e.g., edges, triangles, etc.) and a respective step size k from a set of step sizes (e.g., 1 to K) using an attention mechanism. The attention mechanism uses a node state matrix and two trainable functions to produce a probability vector indicating the relevancy of different motifs and a probability vector indicating the relevancy of different step sizes for each respective target node, and then select the motif and step size for each respective target node based on the probability vectors.; Lee, Paragraph [0100] further teaches The outputs of the two functions ƒ.sub.l and ƒ′.sub.l are softmaxed to form probability distributions over {1, . . . , T} and {1, . . . , K}, respectively. As such, from a node i's state, the functions recommend the most relevant type of motif t and step size k for node i to integrate information from, based on the probability distribution.; Lee, Paragraph [0105] further teaches in some embodiments, the attention mechanism is trained using a second loss function based on reinforcement learning.; Lee, Paragraph [0032] further teaches as used herein, the term “graph convolutional neural network model” refers to a neural network model configured to perform graph convolution on graph-structured data);
However, Lee does not distinctly disclose the remaining limitations.
Nevertheless, Shelhamer teaches:
evaluating the structural changes in the system on the basis of the policy function (Shelhamer, Section 4.4 teaches tracking the changing policy distribution which so helps more than any potential inference among the RL and auxiliary losses); and performing reinforcement learning by using a reward value as a cost generated when the structural change is applied to the system, the state value function, and the model, to optimize the structural change in the system (Shelhamer, Abstract, teaches Reinforcement learning optimizes policies for expected cumulative reward… To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses; Shelhamer, Section 1, Col. 1, teaches end-to-end reinforcement learning (RL) addresses representation learning at the same time as policy optimization);
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee, with the reinforcement learning teachings, as taught by Shelhamer in order to improve the data efficiency and policy returns of end-to-end reinforcement learning. (Shelhamer, Abstract).
The combination at the least suggest the limitation defining a respective convolution function for each type of facility included in the system (Lee, Paragraph [0050] teaches performing an activation function on the outputs of the graph convolution layer and further teaches several activation functions may be used.; Lee, Paragraph [0003] further teaches The operations include, by at least one graph convolutional layer of the graph convolutional neural network model, receiving a dataset that identifies a set of entities representable by nodes in a graph, features for each respective entity that are representable by attributes of the corresponding node in the graph, … The operations further include classifying an entity in the set of entities or determining a connection between two entities in the set of entities, based on outputs of a graph convolutional layer of the graph convolutional neural network model.). However, Li more clearly teaches the limitation as provided below.
Li teaches defining a respective convolution function for each type of facility included in the system (Li, Paragraph [0058] teaches the graph convolutional neural network can be trained jointly with a neural collaborative filtering network—described below—to generate embeddings for similar computational graphs according to an objective function. When the graph convolutional neural network and the neural collaborative filtering network are trained jointly, resultant output from the neural collaborative filtering network is a set of performance metrics for each type of computing device; Li, Paragraph [0059] further teaches at each layer, the GCN can execute one or more activation functions from respective input received at the layer.; Li, Paragraph [0009] further teaches the recommendation engine can predict a performance metric for each of the different types of computing device present in the distributed computing network. The recommendation engine is configured to generate recommendations for one or more types of computing device to schedule the job to, as well as a quantity specifying the amount of computing resources of each type that is recommended to be assigned. The recommendation can be provided as input to a scheduling system; Li, Paragraph [0015] teaches the machine learning model is configured to receive, as input, a computational graph representing the operations of an input job, and generate, as output, a set of performance metrics measuring predicted performance of the computational graph on each type of computing device in the distributed computing network.; Li, Paragraph [0061] teaches to extract features from inputs and outputs of a given node in the computational graph separately, the GCN can separately aggregate activation function outputs for outputs and inputs of the node, respectively, to learn whether performance of the computational graph on a particular type of computing device is dominated by the inputs or the outputs.; Li, Paragraph [0016] teaches the machine learning model can include a graph convolutional neural network (“GCN”).).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee, with the reinforcement learning teachings, as taught by Shelhamer, to further include the features of the recommendation engine comprising a graph convolutional neural network, as taught by Li, in order to predict a performance metric for each of the different types of computing device present in the distributed computing network. (Li, Paragraph [0009]).
Li further teaches wherein a selection of the activation function for each facility type is optimized… based on a facility-specific performance metric optimizing the activation functions for each layer (Li, Paragraph [0062] teaches the recommendation engine can update the weights using a backpropagation technique, e.g., backpropagation using stochastic gradient descent [i.e., optimization]. For each layer, the recommendation engine can compute a gradient of the activation function against a ground-truth value. The ground-truth can be the historical performance metric of a computational graph on each type of computing device.).
However, the combination in view of Li does not distinctly disclose wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric.
Nevertheless, Schmidhuber teaches wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric (Schmidhuber, Paragraph [0082] teaches we apply the basic idea to the incremental skill training of ONE. Both the predictive skills acquired by gradient descent and the task-specific control skills acquired by black box optimization are collapsed into one single network (namely, ONE itself) through pure gradient descent, by retraining ONE on all input-output traces of all previously learned behaviors that are still deemed useful. Towards this end, ONE is retrained to reproduce control behaviors of successful past versions of ONE, but without really executing the behaviors in the environment (usually the expensive part). Simultaneously, all input-output traces ever observed (including those of failed trials) can be used to train ONE to become a better predictor of future inputs, given previous inputs and actions. Of course, this requires storing input-output traces of all trials (e.g., in a computer-based memory storage device, not shown in FIG. 1). That is, once a new skill has been learned, e.g., by a copy of ONE (or even by another machine learning device), e.g., through slow trial and error-based evolution or reinforcement learning, ONE can be retrained in through gradient-based methods on stored input/output traces of all previously learned control and prediction skills still considered worth memorizing. In particular, standard gradient descent through backpropagation in discrete graphs of nodes with differentiable activation functions can be used to squeeze many expensively evolved skills into the limited computational resources of ONE. [Note: [0082] understood to teach optimization of an activation function by stochastic gradient descent and reinforcement learning based on limited computational resources of the machine learning device]; Schmidhuber, Paragraph [0004] further teaches the LSTM may have a front-end in form of a convolutional neural net (CNN) implemented on fast graphics processing units GPUs. Such a CNN-LSTM combination may be considered an RNN for purposes of the current disclosure.; Schmidhuber [0109] further teaches ONE's agent may be virtually any kind of physical system, component, or process facilitated or performed by a physical system or component. ONE's agent may include any one or more of a variety of different kinds of sensors, etc. Moreover, ONE's agent may include any one or more of a variety of different kinds of devices or components that are able to perform, or cause to be performed, actions. These devices or components may be or include any one or more of a variety of motors, actuators, etc. [Note: [0082] in view of [0109] understood to read on optimization of an activation function per facility type]).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee in view of Shelhamer and Lee, to further include the training process of the recurrent neural network, as taught by Schmidhuber. During the recurrent neural network’s training and evolution (described herein), gradient-based compression of policies and data streams simplifies the recurrent neural network, squeezing the essence of the recurrent neural network’s previously learned skills and knowledge into the code implemented within the recurrent weight matrix of the recurrent neural network itself. This can improve the recurrent neural network’s ability to generalize and quickly learn new, related tasks. (Schmidhuber, Paragraph [0010])
Regarding claim 8, Lee teaches a non-transitory computer-readable storage medium that stores computer-executable instructions that cause one or more computers, when executed by the one or more computers (Lee, Paragraph [0146] teaches a computing system, such as one including computing system 1500 of FIG. 15, can be configured to perform the illustrative flows and techniques described above according to some embodiments. Instructions for performing the operations of the illustrative flows can be stored as computer-readable instructions on a non-transitory computer-readable medium of the computer system.), to at least: associate a node and an edge with attributes; define a convolution function associated with a model representing data of a graph structure representing a system structure on the basis of data regarding the graph structure (Lee, Paragraph [0003] teaches performing operations by one or more processing devices based on a graph convolutional neural network model that includes one or more graph convolutional layers. The operations include, by at least one graph convolutional layer of the graph convolutional neural network model, receiving a dataset that identifies a set of entities representable by nodes in a graph, features for each respective entity that are representable by attributes of the corresponding node in the graph, and connections among the set of entities, where the connections are representable by edges connecting the nodes in the graph.); input a state of the system into the model (Lee, Paragraph [0027] further teaches the attention mechanism uses a node state matrix and two trainable functions to produce a probability vector indicating the relevancy of different motifs and a probability vector indicating the relevancy of different step sizes for each respective target node); obtain, for each time step, a policy function as a probability distribution of a structural change and a state value function for reinforcement learning for a system of one or more structurally changed models which have been changed with assumable structural changes from the model for each time step, and a processor (Lee, Paragraph [0146] teaches processor(s)) being configured (Lee, Paragraph [0026] teaches each graph convolutional layer of the MCN is configured to select, for each node in a set of nodes in the graph, a respective type of motif from multiple pre-defined types of motifs (e.g., edges, triangles, etc.) and a respective step size k from a set of step sizes (e.g., 1 to K) using an attention mechanism. The attention mechanism uses a node state matrix and two trainable functions to produce a probability vector indicating the relevancy of different motifs and a probability vector indicating the relevancy of different step sizes for each respective target node, and then select the motif and step size for each respective target node based on the probability vectors.; Lee, Paragraph [0100] further teaches The outputs of the two functions ƒ.sub.l and ƒ′.sub.l are softmaxed to form probability distributions over {1, . . . , T} and {1, . . . , K}, respectively. As such, from a node i's state, the functions recommend the most relevant type of motif t and step size k for node i to integrate information from, based on the probability distribution.; Lee, Paragraph [0105] further teaches in some embodiments, the attention mechanism is trained using a second loss function based on reinforcement learning.; Lee, Paragraph [0032] further teaches as used herein, the term “graph convolutional neural network model” refers to a neural network model configured to perform graph convolution on graph-structured data);
However, Lee does not distinctly disclose the remaining limitations.
Nevertheless, Shelhamer teaches:
evaluate the structural changes in the system on the basis of the policy function (Shelhamer, Section 4.4 teaches tracking the changing policy distribution which so helps more than any potential inference among the RL and auxiliary losses); and perform reinforcement learning by using a reward value as a cost generated when the structural change is applied to the system, the state value function, and the model, to optimize the structural change in the system (Shelhamer, Abstract, teaches Reinforcement learning optimizes policies for expected cumulative reward… To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses; Shelhamer, Section 1, Col. 1, teaches end-to-end reinforcement learning (RL) addresses representation learning at the same time as policy optimization.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee, with the reinforcement learning teachings, as taught by Shelhamer in order to improve the data efficiency and policy returns of end-to-end reinforcement learning. (Shelhamer, Abstract).
The combination at the least suggests the limitation defining a respective convolution function for each type of facility included in the system (Lee, Paragraph [0050] teaches performing an activation function on the outputs of the graph convolution layer and further teaches several activation functions may be used.; Lee, Paragraph [0003] further teaches The operations include, by at least one graph convolutional layer of the graph convolutional neural network model, receiving a dataset that identifies a set of entities representable by nodes in a graph, features for each respective entity that are representable by attributes of the corresponding node in the graph, … The operations further include classifying an entity in the set of entities or determining a connection between two entities in the set of entities, based on outputs of a graph convolutional layer of the graph convolutional neural network model.). However, Li more clearly teaches the limitation as provided below.
Li teaches define a respective convolution function for each type of facility included in the system (Li, Paragraph [0058] teaches the graph convolutional neural network can be trained jointly with a neural collaborative filtering network—described below—to generate embeddings for similar computational graphs according to an objective function. When the graph convolutional neural network and the neural collaborative filtering network are trained jointly, resultant output from the neural collaborative filtering network is a set of performance metrics for each type of computing device; Li, Paragraph [0059] further teaches at each layer, the GCN can execute one or more activation functions from respective input received at the layer.; Li, Paragraph [0009] further teaches the recommendation engine can predict a performance metric for each of the different types of computing device present in the distributed computing network. The recommendation engine is configured to generate recommendations for one or more types of computing device to schedule the job to, as well as a quantity specifying the amount of computing resources of each type that is recommended to be assigned. The recommendation can be provided as input to a scheduling system; Li, Paragraph [0015] teaches the machine learning model is configured to receive, as input, a computational graph representing the operations of an input job, and generate, as output, a set of performance metrics measuring predicted performance of the computational graph on each type of computing device in the distributed computing network.; Li, Paragraph [0061] teaches to extract features from inputs and outputs of a given node in the computational graph separately, the GCN can separately aggregate activation function outputs for outputs and inputs of the node, respectively, to learn whether performance of the computational graph on a particular type of computing device is dominated by the inputs or the outputs.; Li, Paragraph [0016] teaches the machine learning model can include a graph convolutional neural network (“GCN”).).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee, with the reinforcement learning teachings, as taught by Shelhamer, to further include the features of the recommendation engine comprising a graph convolutional neural network, as taught by Li, in order to predict a performance metric for each of the different types of computing device present in the distributed computing network. (Li, Paragraph [0009]).
Li further teaches wherein a selection of the activation function for each facility type is optimized… based on a facility-specific performance metric optimizing the activation functions for each layer (Li, Paragraph [0062] teaches the recommendation engine can update the weights using a backpropagation technique, e.g., backpropagation using stochastic gradient descent [i.e., optimization]. For each layer, the recommendation engine can compute a gradient of the activation function against a ground-truth value. The ground-truth can be the historical performance metric of a computational graph on each type of computing device.).
However, the combination in view of Li does not distinctly disclose wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric.
Nevertheless, Schmidhuber teaches wherein a selection of the activation function for each facility type is optimized by reinforcement learning based on a facility-specific performance metric (Schmidhuber, Paragraph [0082] teaches we apply the basic idea to the incremental skill training of ONE. Both the predictive skills acquired by gradient descent and the task-specific control skills acquired by black box optimization are collapsed into one single network (namely, ONE itself) through pure gradient descent, by retraining ONE on all input-output traces of all previously learned behaviors that are still deemed useful. Towards this end, ONE is retrained to reproduce control behaviors of successful past versions of ONE, but without really executing the behaviors in the environment (usually the expensive part). Simultaneously, all input-output traces ever observed (including those of failed trials) can be used to train ONE to become a better predictor of future inputs, given previous inputs and actions. Of course, this requires storing input-output traces of all trials (e.g., in a computer-based memory storage device, not shown in FIG. 1). That is, once a new skill has been learned, e.g., by a copy of ONE (or even by another machine learning device), e.g., through slow trial and error-based evolution or reinforcement learning, ONE can be retrained in through gradient-based methods on stored input/output traces of all previously learned control and prediction skills still considered worth memorizing. In particular, standard gradient descent through backpropagation in discrete graphs of nodes with differentiable activation functions can be used to squeeze many expensively evolved skills into the limited computational resources of ONE. [Note: [0082] understood to teach optimization of an activation function by stochastic gradient descent and reinforcement learning based on limited computational resources of the machine learning device]; Schmidhuber, Paragraph [0004] further teaches the LSTM may have a front-end in form of a convolutional neural net (CNN) implemented on fast graphics processing units GPUs. Such a CNN-LSTM combination may be considered an RNN for purposes of the current disclosure.; Schmidhuber [0109] further teaches ONE's agent may be virtually any kind of physical system, component, or process facilitated or performed by a physical system or component. ONE's agent may include any one or more of a variety of different kinds of sensors, etc. Moreover, ONE's agent may include any one or more of a variety of different kinds of devices or components that are able to perform, or cause to be performed, actions. These devices or components may be or include any one or more of a variety of motors, actuators, etc. [Note: [0082] in view of [0109] understood to read on optimization of an activation function per facility type]).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee in view of Shelhamer and Lee, to further include the training process of the recurrent neural network, as taught by Schmidhuber. During the recurrent neural network’s training and evolution (described herein), gradient-based compression of policies and data streams simplifies the recurrent neural network, squeezing the essence of the recurrent neural network’s previously learned skills and knowledge into the code implemented within the recurrent weight matrix of the recurrent neural network itself. This can improve the recurrent neural network’s ability to generalize and quickly learn new, related tasks. (Schmidhuber, Paragraph [0010])
Claims 4, 5, and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Shelhamer, Li, and Schmidhuber, as applied to claim 1, and further in view of Morris et al. (US 20190378010 A1, filed Jun. 12, 2018 and published Dec. 12, 2019)
Regarding claim 4, the combination of Lee in view of Shelhamer, Li, and Schmidhuber teaches all of the limitations of claim 1, however the combination does not distinctly disclose the remaining limitations.
Nevertheless, Morris teaches the remaining limitations of wherein the processor is further configured to: incorporate a candidate for the structural change as a candidate node into the graph structure in the system and to configure the candidate node as the convolution function of a unidirectional connection, and configure the model using the convolution function of the unidirectional connection (Morris, Paragraph [0099] teaches a first node may be associated with the function x.sup.2, a second node may be associated with the function x.sup.3, and the relationship between the nodes may have a weight value of one-half. If the number 2 is input into the first node, it may become 4. It may then be passed to the second node and multiplied by the weight value, becoming 2. It may then be cubed to result in 8. A relationship between one or more nodes may be unidirectional or multi-directional such that, for example, a value from a first node to a second node may be returned by the second node to the first node after processing.; Morris, Paragraph [0121] teaches the machine learning system calculates and assigns, in step 612 and step 614, one or more customized functions to nodes in the graph structure based on the entity type of the node.; Morris, Abstract, teaches machine learning models, semantic networks, adaptive systems, artificial neural networks, convolutional neural networks, and other forms of knowledge processing systems are disclosed. An ensemble machine learning system is coupled to a graph module storing a graph structure, wherein a collection of entities and the relationships between those entities forms nodes and connection arcs between the various nodes.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee in view of Shelhamer, Li, and Schmidhuber, to further include the unsupervised machine learning system to automate functions on a graph structure wherein an ensemble machine learning system is coupled to a graph structure, wherein a collection of entities and the relationships between those entities forms nodes and connection arcs between the various nodes, as taught by Morris. Moreover, with the transaction data 708 and historical data 710 being voluminous, a feature vector assists in making the machine learning systems 700A, 700B, 700C more efficient by reducing the number of parameters separately analyzed by the machine learning system. (Morris, Abstract and Paragraphs [0001] , [0010], and [0126])
Regarding claim 5, the combination of Lee in view of Shelhamer, Li, Schmidhuber, and Morris teaches all of the limitations of claim 4, and the combination further teaches wherein the processor is further configured to evaluate, by parallel processing, the model for each combination of the candidate node with a node connected to the candidate node, using the model in which the candidate node is connected to the graph structure (Morris, Paragraph [0060] teaches the processing nodes a-n comprise parallel processes executing on multiple servers in a data center.; Morris, Paragraph [0074] further teaches the ensemble may use parallel ensemble techniques.; Morris, Paragraph [0161] further teaches The output of one machine learning model may be used as the input of another machine learning model, and/or multiple machine learning models may execute in parallel, such that decision-making may comprise the use of a limitless number of machine learning models. [Note: Lee, Paragraph [0158] teaches the order of the blocks presented in the examples above can be varied—for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.]).
Regarding claim 6, the combination of Lee in view of Shelhamer, Li, and Schmidhuber teaches all of the limitations of claim 1, however the combination does not distinctly disclose the remaining limitations.
Nevertheless, Morris teaches the remaining limitations of wherein the processor is further configured to present a structural change of the system evaluated by the evaluator, together with a cost associated with the structural change of the system (Morris, Paragraph [0090] teaches the system may check for updates in contracts. Contracts specifying a relationship between two entities may be modified, causing a corresponding change in a graph representation of the two entities. As such, if there is a change in a contract, an entity, a relationship between one or more entities.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee in view of Shelhamer, Li, and Schmidhuber, to further include the unsupervised machine learning system to automate functions on a graph structure wherein an ensemble machine learning system is coupled to a graph structure, wherein a collection of entities and the relationships between those entities forms nodes and connection arcs between the various nodes, as taught by Morris. Moreover, with the transaction data 708 and historical data 710 being voluminous, a feature vector assists in making the machine learning systems 700A, 700B, 700C more efficient by reducing the number of parameters separately analyzed by the machine learning system. (Morris, Abstract and Paragraphs [0001] , [0010], and [0126])
Claims 9, 10 and 11 (as amended) are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Shelhamer, Li, and Schmidhuber, as applied to claim 1, and further in view of Qian et al., “Adaptive activation functions in convolutional neural networks”, (July 6, 2017)
Regarding claim 9, the combination of Lee in view of Shelhamer, Li, and Schmidhuber teaches all of the limitations of claim 1, and the combination further teaches wherein the graph structure is a graph convolution neural network, and each respective convolution function is implemented as an activation function of a convolution layer of the graph convolution neural network, wherein the facility type is defined as a class of nodes in the graph structure corresponding to a specific physical or logical function in the system , and the activation function assigned to each facility type is determined based on the operational characteristics of the facility type, such that at least two different facility types are assigned different activation functions. (Li, Paragraph [0058] teaches the graph convolutional neural network can be trained jointly with a neural collaborative filtering network—described below—to generate embeddings for similar computational graphs according to an objective function. When the graph convolutional neural network and the neural collaborative filtering network are trained jointly, resultant output from the neural collaborative filtering network is a set of performance metrics for each type of computing device; Li, Paragraph [0059] further teaches at each layer, the GCN can execute one or more activation functions from respective input received at the layer.; Li, Paragraph [0009] further teaches the recommendation engine can predict [i.e., determine] a performance metric for each of the different types of computing device present in the distributed computing network. The recommendation engine is configured to generate recommendations for one or more types of computing device to schedule the job to, as well as a quantity specifying the amount of computing resources of each type that is recommended to be assigned. The recommendation can be provided as input to a scheduling system; Li, Paragraph [0015] teaches the machine learning model is configured to receive, as input, a computational graph representing the operations of an input job, and generate, as output, a set of performance metrics measuring predicted performance of the computational graph on each type of computing device in the distributed computing network.; Li, Paragraph [0061] teaches to extract features from inputs and outputs of a given node in the computational graph separately, the GCN can separately aggregate activation function outputs for outputs and inputs of the node, respectively, to learn whether performance of the computational graph on a particular type of computing device is dominated by the inputs or the outputs.; Li, Paragraph [0016] teaches the machine learning model can include a graph convolutional neural network (“GCN”).; [Note: each type of computing device, as disclosed in Li, has been understood to read on each type of facility in the system]; [Note: the output of the activation function in Li is maintaining the state of the facility.]).
However, the combination does not distinctly disclose the activation function assigned to each facility type is determined … such that at least two different facility types are assigned different activation functions.
Nevertheless, Qian teaches the activation function assigned to each facility type is determined … such that at least two different facility types are assigned different activation functions (Qian, pg. 205, Col. 1, par. 1 teaches when choosing activation functions for a specific CNN, activation functions in all activation layers in this CNN are confined to only one of the above types. Since their functional forms are designed as convex functions, it has restricted the representation ability of learning nonlinear transformation to some degree. By utilizing this rectified unit family, we aim to design an activation function to improve the ability of learning non-linear transformation and be adaptive to the inputs. We expect the designed activation function to have more flexible forms which can be determined in a data-driven way.; Qian, pg. 205, col. 1, par. 2, further teaches in order to make the learned activation operation be adapted to the specific inputs, the other is gated activation, in which the activation operation is learned by nonlinearly combining basic activation functions. For the sake of further improving the ability of learning non-linear transformation, we extend the above structures and propose the hierarchical activation as the third strategy… The goal of hierarchical activation is to allow basic activation functions being combined to be learned directly from the data and be adapted to the specific inputs.; Qian, pg. 206, col. 1, par. 1, teaches the second strategy is adapted to the specific inputs. The learning process in this strategy leads to a learned gating mask which determines an adaptive mixture of LReLU and ELU to adapt to the specific inputs. To note the importance of the gating mask, this strategy is referred to as gated activation. Both of these two strategies involve the combination of activation functions with predefined parameters. It is a further extension to these strategies that these activation functions themselves can be learned, which means parameters of activation functions can be determined in a data-driven way.; Qian, Section 4, teaches we evaluate the proposed activation strategies with multiple deep CNNs. [Note: the activation function adapted to specific inputs and/or data-driven activation functions, as disclosed in Qian, understood to read on “such that at least two different facility types are assigned different activation functions” wherein the facility type has been understood to be the input in Qian.]
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the graph convolutional networks with motif-based attention, as taught by Lee in view of Shelhamer, Lee, and Schmidhuber, to further include the adaptive activation functions in convolutional neural networks, as taught by Qian, in order to improve the ability of activation functions in learning non-linear transformations and can adapt to the inputs and lead to better performances than ReLU and its variants on benchmarks with various scales. (Qian, Abstract and Section 5)
Regarding claim 10 (as amended),
Claim 10 recites analogous limitations as those recited in claim 9. Therefore, claim 10 is rejected under the same rationale and motivation as claim 9.
Regarding claim 11 (as amended),
Claim 11 recites analogous limitations as those recited in claim 9. Therefore, claim 11 is rejected under the same rationale and motivation as claim 9.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, USMAAN SAEED can be reached at 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.R.B./Examiner, Art Unit 2146
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146