Last updated: May 04, 2026

Application No. 18/228,742

ONLINE LEARNING SYSTEM WITH CONTEXTUAL BANDITS FEEDBACK AND LATENT STATE DYNAMICS

Non-Final OA §101§103

Filed

Aug 01, 2023

Examiner

LUO, KATE H

Art Unit

6216

Tech Center

6200

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +33.3% interview lift. Examiner has a relatively high allowance rate (78%); +33.3% interview lift. A written response may suffice.

Based on 498 resolved cases, 2023–2026

Examiner Intelligence

LUO, KATE H View full profile →

Grants 78% — above average

Career Allowance Rate

387 granted / 498 resolved

+17.7% vs TC avg

Strong +33% interview lift

Without

With

+33.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

2 currently pending

Career history

500

Total Applications

across all art units

Statute-Specific Performance

§101

10.1%

-29.9% vs TC avg

§103

64.5%

+24.5% vs TC avg

§102

10.3%

-29.7% vs TC avg

§112

6.5%

-33.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 498 resolved cases

Office Action

§101 §103

DETAILED ACTION The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. 1. Claims 1-20 are presented for examination. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter Claim 1: A method for triggering actions in a sequence of time steps within a multi-armed bandit process, said method comprising: sequentially performing, by one or more processors of a computer system, time steps t (t =0,1, ..., N), wherein N&gt;2, wherein performing time step 0 comprises providing: an initial value po of a latent state probability vectort of dimension Z respectively associated with Z specified latent states wherein Z&gt; 2; an initial value (Ɵ, ϕ ) of Hidden Markov Model (HMM) parameters (Ɵ, ϕ ) and for each action (a) of K specified actions wherein K &gt; 2: an initial value of a mean reward vector /^( a ) of dimension Z, wherein performing time step t (t = 1, 2, ..., N) comprises: receiving, from an external system that is external to the computer system, a context (xt), said context xt being one context of X specified contexts, wherein X&gt;2; executing a HMM parameter transformation to compute pt, using a conditional probability distribution p(xt |z) and inputs comprising xt or wherein {xt} isxi,x2,... and xt. , selecting an action (at) from the K actions, said action (at) maximizing a function F(a) having a dependence on a reward estimate vector of dimension Z comprising the mean reward estimate or a stochastic reward estimate vector ( ( a )); sending an electromagnetic signal to a hardware machine, said electromagnetic signal directing the hardware to perform the selected action at; receiving an identification of a dynamic reward (rt) resulting from the hardware machine having performed the selected action at; updating the mean reward estimate /^ a t as a function of rt and pt; and computing an update of the latent state probability vector pt(z) for each latent state z (z = 1, 2, ..., Z), said update of pt(z) comprising a dependence on rt or {rt}, at, and wherein {rt} is ri, r2,... and rt. The claim limitations in the abstract idea have been highlighted in bold above. The remaining limitations are “additional elements”. Similar limitations comprise the abstract ideas of claims 1 9 and 20. MPEP 2106 III provides a flowchart for the subject matter eligibility test for product and processes. The claim analysis following the flowchart is as follows: Step 1 : Is the claim to a process, machine, manufacture or composition of matter? Yes. Claim s 1 and 19 recite methods which are process . Claim 20 recite s a system , which is a machine . Step 2A, Prong One : Does the claim recite an abstract idea, law of nature, or nature phenomenon? Yes. The highlighted claim limitations constitute an abstract idea because the broadest reasonable interpretation of these steps fall within the mathematical concepts and mental process groupings of abstract ideas. Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No. The additional elements claim limitations recite routine data exchange or data gathering between generic computer devices and us ing generic computer to perform mere instructions of mathematical concepts and mental process . It amounts to no more than mere instructions to apply the exception using a generic computer. Even when viewed in combination, there additional elements do not integrate the recited judicial exception into a practical application, and the claim is directed to the judicial exception. (Step 2A: Yes). Step 2B : Does the claim recite additional elements that amount to significantly more than the judicial exception? No. As explained with respected to Step 2A, Prong Two, there are two additional elements. The additional elements cannot provide an inventive concept because they conduct routine data exchange or data gathering between generic computer devices and use generic computer to perform mere instructions of mathematical concepts and mental process . Therefore, claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Regarding claim s 1 9 and 20 , all claimed limitations are set forth and rejected as per discussion for claim 1 . Claims 2-13 constitute an abstract idea because the broadest reasonable interpretation of these claim limitations fall s within the mathematical concepts and mental process groupings of abstract ideas. Claims 14-18 recite generic computer components. Here, the computer is used as a tool to perform mathematical concepts and mental process or conduct routine data gathering . It amounts to no more than mere instructions to apply the exception using a generic computer. Thus, claims 1-20 are not eligible subject matter under 35 USC 101. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 1 . Claim s 1-5, 12-15 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chandak et al. (US Publication No. 2020/0241878 ) in view of Luo (US Publication No . US10,755,026 ). Regarding claim 1 , Chandak et al. meets the claim limitations, as follows: A method for triggering actions in a sequence of time steps within a multi-armed bandit process (Fig. 1, i.e. A digital action proposal system ) , said method comprising: sequentially performing, by one or more processors of a computer system, time steps t (t =0,1, ..., N), wherein N&gt;2 (Fig. 1 , para[0043]-[0047] , i.e. Sever(s) 102 includes analytics system 104 and Digital Action proposal system 106 to determine proposed digital actions ) , wherein performing time step 0 comprises providing: an initial value po of a latent state probability vector t of dimension Z respectively associated with Z specified latent states wherein Z&gt; 2; an initial value (Ɵ, ϕ ) of Hidden Markov Model (HMM) parameters (Ɵ, ϕ ) and for each action (a) of K specified actions wherein K &gt; 2: an initial value of a mean reward vector /^( a ) of dimension Z , (Fig. 2 , para[ 0053 ] -[005 7 ] , [0085],[0133] , i.e. the digital action proposal system 106 operates to utilize a state-based digital action proposal policy using the initial states and the initial distribution do of a discrete-time Markov d ecision process (MDP ) . Each time step is associated with the tuple (s.sub.t, a.sub.t, r.sub.t, s.sub.t+1) where the reward r.sub.t given by the function R(s,a) ) wherein performing time step t (t = 1, 2, ..., N) comprises: receiving, from an external system that is external to the computer system, a context (xt), said context xt being one context of X specified contexts, wherein X&gt;2 (Fig. 1 and 4A, i.e. the training next state 404 include the states of one or more client device that have interacted with the environment in the past.) ; executing a HMM parameter transformation to compute pt, using a conditional probability distribution p(xt|z) and inputs comprising xt or wherein {xt} isxi,x2,... and xt. (Fig. 1 and 4A, para[005 5 ] - [0057] , [0066] i.e. a latent representation generator 406 that generates a latent representation 408 based on the training current state 402 and the training next state 404 as input s . Here, the probability of the digital action is a conditional probability distribution as expressed in equation 2 and 3. ) , selecting an action (at) from the K actions, said action (at) maximizing a function F(a) having a dependence on a reward estimate vector of dimension Z comprising the mean reward estimate or a stochastic reward estimate vector ( ( a )) (Fig. 4A, para [0056], [00 84 ] i.e. the digital action proposal system 106 can learn a policy by optimizing for its performance function shown in equation 11 . ) ; sending an electromagnetic signal to a hardware machine, said electromagnetic signal directing the hardware to perform the selected action at (Fig. 1, para[0050] i.e. The digital action proposal system 106 sends a proposed digital action for a client device 112a ) ; receiving an identification of a dynamic reward (rt) resulting from the hardware machine having performed the selected action at (Fig. 4A, para[0067], i.e. the ground truth action 418 includes a digital action performed by one or more client devices in past interactions with the environment to transition from the training current state 402 to the training next state 404.) ; updating the mean reward estimate /^ a t as a function of rt and pt (Fig. 4A, para[0067], i.e. the loss function 416 compares the predicted action 414 to the ground truth action 418 to determine the loss. ) ; and computing an update of the latent state probability vector pt(z) for each latent state z (z = 1, 2, ..., Z), said update of pt(z) comprising a dependence on rt or {rt}, at, and wherein {rt} is ri, r2,... and rt ( Fig. 4A, para[0068] -[0074] i.e. the digital action proposal system 106 back propagates the loss to the supervised machine learning model 412 (as indicated by the dashed line 420) to modify its parameters and then computes the estimated probability of the action using the Boltzmann distribution over the score of z ) . Chandak et al. does not explicitly disclose the following claim limitations: an initial value of a mean reward vector /^( a ) of dimension Z , However, in the same field of endeavor Luo discloses the deficient claim limitations, as follows: an initial value of a mean reward vector /^( a ) of dimension Z ( Clo. 5, ll1-15, i.e. the average total reward for an action which starts from state 0) . Therefore , it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made to modify the teachings of Chandak with Luo to use the average total reward for an action as a reward , the motivation being to improve the quality of the policy gradient model learning of MDP . Regarding claim 2 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein performing time step 0 comprises providing an initial value of one or more aggregation parameters and wherein said executing the HMM parameter transformation computes p, using inputs comprising x, p and ϕ , (Fig. 2, para[0053]-[0057], [0085],[0133], i.e. the digital action proposal system 106 operates to utilize a state-based digital action proposal policy using the initial states and the initial distribution do of a discrete-time Markov d ecision process (MDP) . Each time step is associated with the tuple (s.sub.t, a.sub.t, r.sub.t, s.sub.t+1) where the reward r.sub.t given by the function R(s,a) ) Regarding claim 3 , the rejection of claim 2 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 2, wherein said executing the HMM parameter transformation comprises executing an Online Expectation-Maximization (EM) algorithm (Fig. 4A, para[0056], [0084] i.e. the digital action proposal system 106 can learn a policy by optimizing for its performance function shown in equation 11. ) . Regarding claim 4 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the inputs used to execute the HMM parameter transformation comprise xt , pt-1, Ɵ t-1 , ϕ t-1 (para[0066], i.e. a latent representation generator 406 that generates a latent representation 408 based on the training current state 402 and the training next state 404. ) Regarding claim 5 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the inputs used to execute the HMM parameter transformation comprise { xt }, pt-1, Ɵ t-1 , ϕ t-1 (para[0066], i.e. a latent representation generator 406 that generates a latent representation 408 based on the training current state 402 and the training next state 404. ). Regarding claim 12 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the update of pt(z) comprises a dependence on rt, at, and p2 a t) ( Fig. 4A, para[0068]-[0074] i.e. the digital action proposal system 106 back propagates the loss to the supervised machine learning model 412 (as indicated by the dashed line 420) to modify its parameters and then computes the estimated probability of the action using the Boltzmann distribution over the score of z ) . Regarding claim 13 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the update of pt(z) comprises a dependence on {rt} ( Fig. 4A, para[0068]-[0074] i.e. the digital action proposal system 106 back propagates the loss to the supervised machine learning model 412 (as indicated by the dashed line 420) to modify its parameters and then computes the estimated probability of the action using the Boltzmann distribution over the score of z ) . Regarding claim 14 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the hardware machine is not a generic computer ( para[0156], Fig. 1 and 12 , i.e. Client device can be a camera, a tracker, a watch, a wearable device ) . Regarding claim 15 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the hardware machine is a computing device ( para[0156], Fig. 1 and 12 , i.e. Client device can be a computing device ) . Regarding claim 17 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 1, wherein the external system comprises the hardware machine ( para[0156], Fig. 1 and 12 , i.e. Client device can be a computing device ) . Regarding claim 18 , the rejection of claim 1 is incorporated herein. Chandak et al. meets the claim limitations, as follows: The method of claim 16, wherein said sending the signal comprises transmitting the electromagnetic signal indirectly to the hardware machine in the external system via a computing device in the external system, said computing device configured to receive the transmitted electromagnetic signal and to subsequently send the transmitted electromagnetic signal to the hardware machine (Fig. 1, para[0050] i.e. The digital action proposal system 106 sends a proposed digital action for a client device 112a ) ; Regarding claim s 19 and 20 , all claimed limitations are set forth and rejected as per discussion for claim 1 . 2 . Claim s 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Chandak et al. (US Publication No. 2020/0241878 ) in view of Luo (US Publication No . US10,755,026 ) and further in view of Koseki et al. ( US Publication No. 2020/0250269 ) . Regarding claim 10 , Chandak et al. and Luo do not explicitly disclose the following claim limitations: The method of claim 1, wherein p(xt|z, Ɵt-1) a multinomial context distribution . However, in the same field of endeavor Koseki et al. discloses a multinomial context distribution ( para[0041], i.e. multinomial distribution ) . Therefore , it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made to modify the teachings of Chandak and Luo with Koseki to use multinomial distribution as probability distribution , the motivation being to improve the quality of the policy gradient model learning of MDP . Regarding claim 11 , Chandak et al. and Luo do not explicitly disclose the following claim limitations: The method of claim 1, wherein p(xt|z, Ɵt-1) a Gaussian context distribution . However, in the same field of endeavor Koseki et al. discloses a Gaussian context distribution ( para[0041], i.e. a Gaussian distribution with fixed variance and means ) . Therefore , it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made to modify the teachings of Chandak and Luo with Koseki to use Gaussian distribution as probability distribution , the motivation being to improve the quality of the policy gradient model learning of MDP . 3 . Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Chandak et al. (US Publication No. 2020/0241878 ) in view of Luo (US Publication No . US10,755,026 ) and further in view of Morlot et al. ( US Publication No. 2023/0108874 ). Regarding claim 16 , Chandak et al. and Luo do not explicitly disclose the following claim limitations: The method of claim 1, wherein the hardware machine is an Application-Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), a Neural Processing Unit (NPU), a Tensor Processing Unit (TPU), Graphics Processing Unit (GPU), or Digital Signal Processor (DSP). However, in the same field of endeavor Morlot et al. discloses Graphics Processing Unit (GPU) ( para[0239], i.e. Graphical Processing Units (or GPUs) ) . Therefore , it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention was made to modify the teachings of Chandak and Luo with Morlot to use GPU as hardware machine , the motivation being to improve processing speed . Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT KATE H LUO whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)270-5635 . The examiner can normally be reached on FILLIN "Work Schedule?" \* MERGEFORMAT 8:00-5:00PM . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, can be reached on Alejandro Rivero FILLIN "SPE Phone?" \* MERGEFORMAT (571)270-3641 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KATE H LUO/ Primary Examiner, Art Unit 6216

Read full office action

Prosecution Timeline

Aug 01, 2023

Application Filed

Mar 05, 2026

Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/028,349

Patent 11265536

NULL

Granted Mar 01, 2022

16/925,423

Patent 11259055

NULL

Granted Feb 22, 2022

16/990,553

Patent 11259017

NULL

Granted Feb 22, 2022

17/040,965

Patent 11245905

NULL

Granted Feb 08, 2022

16/714,403

Patent 11234017

NULL

Granted Jan 25, 2022

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

78%

Grant Probability

99%

With Interview (+33.3%)

2y 11m (~2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 498 resolved cases by this examiner. Grant probability derived from career allowance rate.

ONLINE LEARNING SYSTEM WITH CONTEXTUAL BANDITS FEEDBACK AND LATENT STATE DYNAMICS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email