Office Action Analysis: 18319472 — DYNAMIC INTENT-BASED NETWORK COMPUTING JOB ASSIGNMENT USING REINFORCEMENT LEARNING

Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities: the specification occasionally uses plural form for value neural networks and policy neural networks, but appears to implement only one neural network of each type. Either change pluralized networks of each type to singular, or remove references to singular neural networks and change mentions of ‘the two neural networks’ to ‘the two types of neural networks’. Paragraph [0026] of the specification defines a reward function as gaining 1 if a request is accepted, and 0 if a request is rejected. However, Paragraph [0066] of the specification defines a reward function as gaining 1 if the request is accepted, and -1 if a request is rejected, and does not make clear that it is different from the reward in Paragraph [0026]. Either make clear the two reward functions are different, or change one to match the other.
Appropriate correction is required.

Claim Objections
Claim 1 is objected to because of the following informalities: there are 2 step k). Either change the first k) and l) steps to be part of step j), or change the bottom k) step to m); in the bottom step k) the j should have a parenthesis after it. Steps in a method claim should begin the limitations with gerunds. Please change each limitation action to start with a verb ending in -ing. 
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps, and as having indefinite language.  See MPEP § 2172.01.  The omitted steps are: a step to end the method and return the information. Currently, the claim appears to endlessly repeat steps e)-j), even if a size of the batch is equal to a threshold, due to the last step. The indefinite language being: …deploy the action, and update physical resources if the request is accepted; it is unclear whether the condition ‘if the request is accepted’ applies to only ‘update physical resources’ or if it also applies to ‘deploy the action’. For the purposes of this application, the examiner interprets the language to mean that both activities are dependent on the condition ‘if the request is accepted’. The indefinite language being: g) use request r and a current network state s as input to policy NN and predict a reward distribution over the action space; it is unclear what policy NN is referring to. For the purposes of this application, the examiner interprets the language ‘policy NN’ to mean the policy NN that was created in step c). It is also unclear if the policy NN is doing the predicting. For the purposes of this application, the examiner interprets the language to mean that the policy NN is not doing the predicting.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidelines (“2019 PEG”).
Claim 1
Step 1: The claim recites “A dynamic, intent-based network computing job assignment method…”. Therefore, this claim is directed to the statutory category of a process.
Step 2A, Prong 1: The claim recites, inter alia:
define a discounted cumulative reward function; This limitation recites a mental process using evaluation, judgment and opinion, with aid of pen and paper to think of an algorithm to reward successful actions. See MPEP 2016.04(a)(2)(III);
define an action space; This limitation recites a mental process using evaluation, judgment and opinion, with aid of pen and paper to think of possible actions;
f)   add the request r and a current network state s to a batch; This limitation recites a mental process using evaluation, judgment and opinion, with aid of pen and paper to put a request and a network state into a grouping;
g)  use the request r and current network state s as input to policy NN and predict a reward distribution over the action space; This limitation recites a mental process using evaluation, judgment and opinion, with aid of pen and paper to guess, using data to inform, the rewards for a set of possible actions; 
h)  select an action that has a maximum predicted reward relative to other actions; 
This limitation recites a mental process using evaluation, judgment and opinion, with aid of pen and paper to choose the action with greatest reward;
i)   using the selected action, determine to accept or reject the request r, deploy the action, and update physical resources if the request is accepted; This limitation recites a mental process using evaluation, judgment and opinion, with aid of pen and paper to choose whether to accept the request r. Then, the action is deployed, which could mean that the action is simply written down on paper, and physical resources are updated, which could mean updating written records. In addition, ‘deploy the action, and update physical resources’ is a contingent limitation, so does not necessarily need to be considered, as it is interpreted as dependent on the condition, ‘if the request is accepted’ which is not guaranteed to occur. 
Step 2A, Prong 2: This judicial exception is not integrated into a practical application. The additional elements of the claim are as follows:
create a policy neural network (policy NN); This limitation is recited at a high level of generality and recites use of a generic computer algorithm to perform an abstract idea. Mere recitation that a judicial exception is to be performed using a generic computer algorithm in its ordinary capacity cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
create a value neural network (value NN); This limitation is recited at a high level of generality and recites use of a generic computer algorithm to perform an abstract idea. Mere recitation that a judicial exception is to be performed using a generic computer algorithm in its ordinary capacity cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
k)  train the value NN; This limitation, being part of creating a value neural network, is an insignificant extra-solution activity of machine learning training. See MPEP 2106.05(g);
l)  train the policy NN; This limitation, being part of creating a policy neural network, is an insignificant extra-solution activity of machine learning training. See MPEP 2106.05(g).
	When viewed in combination with the abstract idea limitations claimed, the additional elements do not integrate the abstract idea into a practical application and does not amount to significantly more than the judicial exception. The additional elements are directed to using generic methods to train the neural networks. Merely including instructions to train a neural network is insufficient to demonstrate integration into a practical application.
Step 2B: As discussed above with respect to integration into a practical application, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. 
	Training the Neural Network: The training of a neural network, unless explicitly and specifically given structure or purpose, does not provide significantly more, as it is well-understood, routine, and conventional activity. See Recentive Analytics, Inc. v. Fox Corp., Fox Broadcasting Company, LLC, Fox Sports Productions, LLC.

When considering the elements as a whole, the claim amounts to instructions to apply abstract ideas using generic computer algorithms. For the reasons given above, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Training Reinforcement Learning Agents To Learn Farsighted Behaviors By Predicting In Latent Space (US 20210158162 A1) by Hafner et al., hereafter Hafner, as applied to claim 1 above, and further in view of TRAINING A POLICY FOR MANAGING A COMMUNICATION NETWORK ENVIRONMENT (US 20240275691 A1) by Kattepur et al., hereafter Kattepur.
Regarding claim 1, Hafner teaches:
	A dynamic, intent-based network computing job assignment method using reinforcement learning, the method comprising the steps of: ((Hafner) Paragraph [0021], “This specification describes a training system implemented as computer programs on one or more computers in one or more locations for training a policy neural network that can be used to control a reinforcement learning agent interacting with an environment by, at each of multiple time steps, processing a policy network input derived from data characterizing the current state of the environment at the time step (i.e., an “observation”) to generate an action selection output specifying an action to be performed by the agent.”)
define a discounted cumulative reward function; ((Hafner) Paragraph [0027], “To generate each trajectory of latent representations, the system 100 makes use of a reward neural network 130, a value neural network 140, and a transition neural network 150.” A reward neural network is a discounted cumulative reward function.)
define an action space; ((Hafner) Paragraph [0021], “…to generate an action selection output specifying an action to be performed by the agent.” An action space is an action selection output.)
create a policy neural network (policy NN); ((Hafner) Paragraph [0021], “…training a policy neural network…”)
create a value neural network (value NN); ((Hafner) Paragraph [0027], “…a value neural network 140…”)
while there exists a new request r:
f) add the request r and a current network state s to a batch; ((Hafner) Paragraph [0025], "Once trained, the representation neural network 110 and the policy neural network 120 can be deployed and used to control the agent interacting with the environment, i.e., by generating a latent representation from each new observation and then selecting an action to be performed by the agent in response to the new observation using the action selection output generated from the latent representation." A representation neural network is a current network state and each new observation is a new request r.)
g) use request r and current network state s as input to policy NN and
predict a reward distribution over the action space; 
h) select an action that has a maximum predicted reward relative to
other actions;	
((Hafner) Paragraph [0039], “Either during or after the dynamics learning of the system, the training engine 160 trains, by using reinforcement learning techniques, the policy neural network 120 and the value neural network 130 on the “imagined” trajectory data generated using the representation, reward and transition neural networks and based on processing information contained in the training tuple set. In particular, the training engine 160 trains the policy neural network 120 to generate action selection outputs that can be used to select actions that maximize a cumulative measure of rewards received by the agent and that cause the agent to accomplish an assigned task.” Using a representation and processing information contained in a training tuple set is using a current network state and a request. These are used as input to a policy neural network to generate action selection outputs (an action space) that select actions that maximize cumulative reward which means it must also predict the reward distribution. In addition, the agent accomplishing an assigned task is deploying an action.)
		j) if a size of the batch is equal to a threshold 
then
	k) train the value NN; and 
	l) train the policy NN; ((Hafner) Paragraph [0033], “A training engine 160 can train the neural networks to determine, e.g., from initial values, trained values of the network parameters 158, including respective network parameters of the representation neural network 110, policy neural network 120, value neural network 130, reward neural network 140, and transition neural network 150.”)
else repeat steps e) – j);
		k) repeat steps e) – j).
However, Hafner does not explicitly teach:
using the selected action, determine to accept or reject the request r, deploy the action, and update physical resources if the request is accepted;  
Kattepur teaches:
	Using the selected action, determining to accept or reject the request ((Kattepur) Paragraph [0116] “…after generating an explanation for the queried action at step 356, the training node sends the explanation to the entity in step 358. In step 360, the training node receives from the entity feedback on the explanation for the queried action, the feedback comprising at least one of acceptance or rejection of the explanation for the queried action.” A request is being interpreted as including an explanation for a queried action from Kattepur.), and deploying the action, and updating physical resources if the request is accepted ((Kattepur) Paragraph [0166] “…whenever a is selected by the policy, what is executed is <a, e>: action a is output to the managed system or environment for execution, and explanation tree e is output to a memory or other storage function. The explanation tree will serve as the basis for an explanation of the action a, should that action be queried.” Action a is selected by the policy when action a is accepted. Action a being output to the system or environment is deploying the action. Outputting explanation tree e to memory is updating physical resources.). Kattepur also teaches explanation feedback (accepting and rejecting requests) improves the accuracy of the belief states (action space) and assists with adapting a policy to changes in an environment.
		Kattepur is in the analogous art of using reinforcement learning with a policy to predict and select actions.
		Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kattepur with the teachings of Hafner. One having ordinary skill in the art would have been motivated to combine the feedback mechanism (using a selected action to accept or reject a request), and updating of physical resources as in Kattepur with the method of defining a reward function, and an action space, creating a policy neural network and value neural network, adding a request and current network state to a batch, using the request and network sate as input to the policy neural network, predicting a reward distribution over the action space, selecting an action that has a maximum predicted reward relative to other actions, and deploying the action as in Hafner in order to improve the accuracy of the action space and assist with adapting the policy to changes in an environment. This combination would cause the predictable result that is the invention described in the instant application. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to creating and training combinations of neural networks using reinforcement learning. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DYLAN H LAI whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 7:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 5712524241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/D. H. L./
Examiner
Art Unit 2144


/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144
Read full office action
DYNAMIC INTENT-BASED NETWORK COMPUTING JOB ASSIGNMENT USING REINFORCEMENT LEARNING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

DYNAMIC INTENT-BASED NETWORK COMPUTING JOB ASSIGNMENT USING REINFORCEMENT LEARNING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email