DETAILED ACTION
Claims 1-20 are pending in this action.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 9 and 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 12, 15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Green (US PGPUB No. 2018/0253512) in further view of Lee et al. (US PGPUB No. 2010/0023307) [hereinafter “Lee”].
As per claim 1, Green teaches a computing system comprising: a computation engine comprising processing circuitry ([0030], server system with processors), wherein the computation engine is configured to obtain interaction data generated by a reinforcement learning agent ([0060], obtaining stimuli data from a reinforcement learning agent in a simulation environment), the interaction data characterizing one or more tasks in an environment ([0050], tests and stimuli are generated by the reinforcement agent to interact with the training and simulation environment) and characterizing one or more interactions of the reinforcement learning agent previously performed with respect to the environment ([0058]-[0060], these tests and stimuli are further generated and interacted with in the simulation environment), the interaction data being agnostic to the environment and the one or more interactions previously performed with respect to the environment ([0060], reinforcement agent generates interaction data for a simulation environment which means the stimuli is agnostic to the environment) and the one or more interactions previously performed with respect to the environment having been performed according to trained policies for the reinforcement learning agent ([0085], during testing, the reinforcement agent operates based on policies it was trained on, i.e. expert policy, reward functions, etc.), wherein the computation engine is configured to process the interaction data to apply a first analysis function to the one or more tasks to generate first elements ([0062], applying various algorithms post-training during simulation which used to analyze and learn from the simulation environment), wherein the computation engine is configured to process the interaction data to apply a second analysis function to the one or more interactions to generate second elements ([0062], various algorithms including reward functions are also used by the reinforcement agent), the first analysis function different than the second analysis function ([0062], various different algorithms can be applied by the RL agent), wherein the computation engine is configured to process at least one of the first elements and the second elements to generate third elements denoting one or more characteristics of the one or more interactions ([0064], algorithms and tests are used to determine various characteristics about the interaction with the simulated environment including whether various goals are met), and wherein the computation engine is configured to output an indication of the third elements, wherein the indication of the third elements provides to a user, an explanation of the one or more interactions of the reinforcement learning agent with the environment ([0059], providing the simulation coverage result information regarding goals and generated tests).
Green does not explicitly teach wherein to generate the first elements, the computation engine applies a transition analysis function to the one or more interactions to identify a reinforcement learning agent transition based on a calculated evenness of a state distribution of the reinforcement learning agent transition. Lee teaches wherein to generate the first elements, the computation engine applies a transition analysis function to the one or more interactions to identify a reinforcement learning agent transition ([0061], identifying optimal actions based on perceived states which are different states are calculated for the reinforcement agent see [0046] and [0057]) based on a calculated evenness of a state distribution of the reinforcement learning agent transition ([0057], calculated standard deviation used to determine states which determine actions, i.e. transitions, based on optimal policy see also [0056]).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green with the teachings of Lee, wherein to generate the first elements, the computation engine applies a transition analysis function to the one or more interactions to identify a reinforcement learning agent transition based on a calculated evenness of a state distribution of the reinforcement learning agent transition, to continuously reinforce and correct movements and/or decisions made by agents associated with devices including vehicles.
As per claim 2, the combination of Green and Lee teaches the computing system of claim 1, wherein the computation engine is configured to output a request for a decision for one or more actions to perform by the reinforcement learning agent within the environment (Green; [0078]-[0079], the system takes user input identifying which of a set of verification goals must be achieved by the RL agent), wherein the computation engine is configured to receive, from the user, decision data indicating a decision of the user responsive to the request for the decision (Green; [0078], user’s selection of coverage goal is interpreted as a decision), and wherein the computation engine is configured to process the decision data to modify the trained policies for the reinforcement learning agent (Green; [0070], RL agent is constantly modifying its trained policy), retrain the reinforcement learning agent (Examiner Note: this is an optional feature but a citation will be provided to expedite prosecution) (Green; [0052], retraining is interpreted to be the same as modifying the RL agent policy), or provide control to the reinforcement learning agent (Green; [0060], RL agent is given control to generate stimulus and learn after goals and analysis is performed).
As per claim 3, the combination of Green and Lee teaches the computing system of claim 1, wherein the computation engine is configured to execute the reinforcement learning agent to perform the one or more tasks to generate the trained policies (Green; [0060], RL agent is executed to generate stimulus and meet goals using internal policy formed during training see also [0070]).
As per claim 4, the combination of Green and Lee teaches the computing system of claim 1, wherein the first elements comprise an indication of the identified reinforcement learning agent transition (Green; [0052], RL agent policy includes state/action pairs, i.e. transitions).
As per claim 5, the combination of Green and Lee teaches the computing system of claim 1, wherein the first elements comprise an indication of the identified interaction (Green; [0052], the action of the state/action pair is interpreted as an interaction).
As per claim 12, the combination of Green and Lee teaches the computing system of claim 1, wherein the computation engine is configured to generate, based on the third elements, one or more training scenarios (Green; [0050], RL agent/learning test generator configured to train itself which includes generating stimulus see [0068]).
As per claim 15, the substance of the claimed invention is identical or substantially similar to that of claim 1. Accordingly, this claim is rejected under the same rationale.
As per claim 16, the combination of Nakada, Sim and Lee teaches the method of claim 15, wherein the first analysis comprises one of a transition analysis function or a reward analysis function (Nakada; [0039], movement policy analysis and reward basis function both applied).
As per claim 17, the combination of Nakada, Sim and Lee teaches the method of claim 15, wherein the second analysis function comprises an interaction analysis function (Nakada; [0158]-[0160], agent can exist in VR where it interacts with the environment and the movement policy is analyzed).
As per claim 20, the substance of the claimed invention is identical or substantially similar to that of claim 1. Accordingly, this claim is rejected under the same rationale.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Green and Lee in view of Arel et al. (US Patent No. 9,536,191) [hereinafter “Arel”].
As per claim 6, the combination of Green and Lee teaches the computing system of claim 1.
The combination of Green and Lee does not explicitly teach wherein the second analysis function comprises an interaction analysis function, wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data. Arel teaches wherein the second analysis function comprises an interaction analysis function, wherein the second elements comprise at least one of observation frequency data (Col. 12, lines 10-20, frequency of visitation to a particular state), outlier data (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature), or certainty data (Abstract, confidence scores used to facilitate reinforcement learning).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Arel, wherein the second analysis function comprises an interaction analysis function, wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
Claims 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Green and Lee in view of Nag et al. (US PGPUB No. 2020/0065128) [hereinafter “Nag”].
As per claim 7, the combination of Green and Lee teaches the computing system of claim 1, wherein the first analysis function is a function included in an environmental analysis level of a multi-level introspection framework (Green; [0060], RL-based agent analyzes an environment including a simulation environment), wherein the second analysis function is a function included in an interaction analysis level of the multi-level introspection framework (Green; [0060], RL-based agent also analyzes actions including interactions with stimuli in the simulation environment).
The combination of Green and Lee does not explicitly teach wherein to process the first elements and the second elements the computation engine is configured to apply a meta-analysis function of a meta-analysis level of the multi-level introspection framework. Nag teaches wherein to process the first elements and the second elements the computation engine is configured to apply a meta-analysis function of a meta-analysis level of the multi-level introspection framework (Abstract, using observational data and metadata to facilitate reinforcement learning).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Nag, wherein to process the first elements and the second elements the computation engine is configured to apply a meta-analysis function of a meta-analysis level of the multi-level introspection framework, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
As per claim 19, the combination of Green and Lee teaches the method of claim 15.
The combination of Green and Lee does not explicitly teach wherein processing the first elements and the second elements comprises applying a meta-analysis function. Nag teaches wherein processing the first elements and the second elements comprises applying a meta-analysis function (Abstract, using observational data and metadata to facilitate reinforcement learning).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Nag, wherein processing the first elements and the second elements comprises applying a meta-analysis function, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Green and Lee in view of Van Seijen et al. (US PGPUB No. 2018/0165603) [hereinafter “Van Seijen”].
As per claim 8, the combination of Green and Lee teaches the computing system of claim 1.
The combination of Green and Lee does not explicitly teach wherein the first analysis function comprises a value function, wherein the first elements comprise respective values indicating expected respective rewards for one or more state of the environment, wherein the second analysis function comprises a transition probability function, wherein the second elements comprise transition probability values each indicating a probability of a transition to a new state of the environment given a state of the environment and an action and the computation engine is configured to compute at least one of local minima or maxima, absolute minima or maxima, observation variance outliers, or strict-difference variance outliers the values and the transition probability value. Van Seijen teaches wherein the first analysis function comprises a value function, wherein the first elements comprise respective values indicating expected respective rewards for one or more state of the environment ([0073], expected value function for sum of rewards), wherein the second analysis function comprises a transition probability function, wherein the second elements comprise transition probability values each indicating a probability of a transition to a new state of the environment given a state of the environment and an action ([0072], probability values indicating transition to a state based on a current state and action) and the computation engine is configured to compute at least one of local minima or maxima ([0150]-[0151], calculating a local-max to greedily maximize a reinforcement agent to local conditions/environment), absolute minima or maxima (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature), observation variance outliers (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature), or strict-difference variance outliers based on the values and the transition probability value (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Van Seijen, wherein the first analysis function comprises a value function, wherein the first elements comprise respective values indicating expected respective rewards for one or more state of the environment, wherein the second analysis function comprises a transition probability function, wherein the second elements comprise transition probability values each indicating a probability of a transition to a new state of the environment given a state of the environment and an action and the computation engine is configured to compute at least one of local minima or maxima, absolute minima or maxima, observation variance outliers, or strict-difference variance outliers the values and the transition probability value, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Green and Lee in view of Taylor et al. (US PGPUB No. 2019/0236458) [hereinafter “Taylor”].
As per claim 11, the combination of Green and Lee teaches the computing system of claim 1.
The combination of Green and Lee does not explicitly teach wherein to output the indication of the third elements the computation engine is configured to: compute, based on the third elements, summary data for a plurality of analysis functions, the summary data comprising one or more of: a maxima state, a state-action pair with associated certainty, a state with associated frequency value, a most likely sequence from a minima state to a maxima state, or a most likely sequence from a maxima state to a minima state; and output, to a display device, the summary data. Taylor teaches wherein to output the indication of the third elements the computation engine is configured to: compute, based on the third elements, summary data for a plurality of analysis functions, the summary data comprising one or more of (Examiner Note: though only one is required, citations will be provided for all features to expedite prosecution): a maxima state ([0130], finding maximum state based on current state and action), a state-action pair with associated certainty ([0130], state action with a confidence score), a state with associated frequency value (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature), a most likely sequence from a minima state to a maxima state (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature), or a most likely sequence from a maxima state to a minima state (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature); and output, to a display device, the summary data ([0102] and [0208], outputs will be available for display on computer device/s).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Taylor, wherein to output the indication of the third elements the computation engine is configured to: compute, based on the third elements, summary data for a plurality of analysis functions, the summary data comprising one or more of: a maxima state, a state-action pair with associated certainty, a state with associated frequency value, a most likely sequence from a minima state to a maxima state, or a most likely sequence from a maxima state to a minima state; and output, to a display device, the summary data, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nakada and Green in view of Cella et al. (US PGPUB No. 2019/0171187) [hereinafter “Cella”].
As per claim 13, the combination of Green and Lee teaches the computing system of claim 1.
The combination of Nakada, Sim and Lee does not explicitly teach wherein the interaction data is for one of: an autonomous vehicle, a conversational assistant, a medical system, a network automation system, a home automation system, or an industrial control system. Cella teaches wherein the interaction data is for one of: an autonomous vehicle ([0231], data collecting for industrial vehicles and support for autonomous control of the system see [0801]), a conversational assistant ([0235], natural language/speech processing), a medical system ([0229], medical diagnostic), a network automation system ([0135], network transport system with automated action), a home automation system ([0229], home automation), or an industrial control system (Abstract, industrial IOT systems).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Cella, wherein the interaction data is for one of: an autonomous vehicle, a conversational assistant, a medical system, a network automation system, a home automation system, or an industrial control system, to provide the efficiency and productivity inducing features of multi-level data analysis to multiple relevant data fields and environments.
Claims 14 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Green and Lee in view of Dey et al. (US PGPUB No. 2010/0106603) [hereinafter “Dey”].
As per claim 14, the combination of Green and Lee teaches the computing system of claim 1.
The combination of Green and Lee do not explicitly teach wherein the computation engine is configured to receive a query for a most likely sequence for the reinforcement learning agent, wherein the third elements comprise the most likely sequence. Dey teaches wherein the computation engine is configured to receive a query for a most likely sequence for the reinforcement learning agent, wherein the third elements comprise the most likely sequence ([0065], determining the probabilities of a number of paths or sequences of actions including the greatest probability, i.e. most likely).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Dey, wherein the computation engine is configured to receive a query for a most likely sequence for the reinforcement learning agent, wherein the third elements comprise the most likely sequence, to provide the reinforcement and correction models further predictive capabilities with respect to future actions beyond the next state.
As per claim 18, the combination of Green and Lee teaches the method of claim 15, and a value analysis function ([0062] and [0085], reward function regarding state and action).
The combination of Green and Lee does not explicitly teach wherein the second analysis function comprise one of an observation frequency analysis function, an observation-action frequency analysis function, or a value analysis function, and wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data. Dey teaches wherein the second analysis function comprise one of an observation frequency analysis function or an observation-action frequency analysis function ([0065]-[0067], future action predicted based on past observed behavior), and wherein the second elements comprise at least one of observation frequency data ([0008], observation of past behavior of agent including frequency of what has occurred historical see [0005]), outlier data (Examiner Note: this feature is an optional feature but would likely overcome the current rejection if included as a required feature), or certainty data ([0031]-[0033], probability data based on observed context-aware behavior).
At the time of filing, it would have been obvious to one of ordinary skill in the art to combine Green and Lee with the teachings of Dey, wherein the second analysis function comprise one of an observation frequency analysis function, an observation-action frequency analysis function, or a value analysis function, and wherein the second elements comprise at least one of observation frequency data, outlier data, or certainty data, to provide the reinforcement and correction models more layers of data and metadata to make more precise modifications to the underlying learning models.
Response to Arguments
Applicant's arguments with respect to the rejection of claims 1-20 under 35 U.S.C. 103 have been fully considered and are persuasive. New prior art references, Green, Arel, Nag, Taylor, Van Seijen and Dey, have been introduced and cited to.
Examiner notes the action remains non-final.
To expedite prosecution, Examiner is open to conducting an interview to discuss claim amendments to overcome the current rejection and/or place the application in condition for allowance.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Nagaraja (US PGPUB No. 2018/0260700), Naghshvar et al. (US PGPUB No. 2020/0150672), Devasia et al. (2020/0192370), Shibuya et al. ("Suggestion of probabilistic reward-independent knowledge for dynamic environment in reinforcement learning," 2011 International Symposium on Micro-NanoMechatronics and Human Science, Nagoya, Japan, 2011, pp. 140-145, doi: 10.1109/MHS.2011.6102175), Chen et al. ("Reinforcement Learning on Computational Resource Allocation of Cloud-based Wireless Networks," 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 2020, pp. 1-6, doi: 10.1109/WF-IoT48130.2020.9221234) and Davoodi et al. ("Feature-Based Interpretable Reinforcement Learning based on State-Transition Models," arXiv:2105.07099, May 23, 2021) all disclose various aspects of transition state analysis using graph type and shape distributions including domain-independent test data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER C SHAW whose telephone number is (571)270-7179. The examiner can normally be reached Max Flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carl Colin can be reached on 571-272-3862. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PETER C SHAW/Primary Examiner, Art Unit 2493 January 17, 2026