Last updated: April 19, 2026
Application No. 17/846,062
CHESS SELF-LEARNING METHOD AND DEVICE BASED ON MACHINE LEARNING

Non-Final OA §101§103
Filed
Jun 22, 2022
Examiner
KHAN, SHAHID K
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Nanjing University Of Posts And Telecommunications
OA Round
1 (Non-Final)
Interview Optional

— +15.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 389 resolved cases, 2023–2026
Examiner Intelligence

KHAN, SHAHID K View full profile →
Grants 74% — above average
Career Allow Rate
287 granted / 389 resolved
+18.8% vs TC avg
Strong +16% interview lift
Without
With
+15.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
31 currently pending
Career history
420
Total Applications
across all art units
Statute-Specific Performance

§101
10.0%
-30.0% vs TC avg
§103
55.7%
+15.7% vs TC avg
§102
16.5%
-23.5% vs TC avg
§112
15.2%
-24.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 389 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the application filed 6/22/22 in which claims 1-10 were presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/22/22 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: a network constructing module, which is configured to construct a neural network and randomly initialize parameters of the neural network; a data generating module, which is configured to construct a Monte Carlo tree, initialize nodes of the Monte Carlo tree using the neural network, self-play by Monte Carlo tree search, generate game data, and store the game data; a training module, which is configured to train the neural network using the stored game data; a converging module, which is configured to control the data generating module to stop generating game data and control the converging module to stop training when the neural network converges in claim 8.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim does not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of a claim drawn to a computer readable storage medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Ci. 2007). A claim drawn to such a computer readable storage medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. 101 by adding the limitation “non-transitory” to the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over Silver, David, et al. "Mastering chess and shogi by self-play with a general reinforcement learning algorithm." arXiv preprint arXiv:1712.01815 (2017) (“Silver (A)”) in view of Silver, David, et al. "Mastering the game of go without human knowledge." nature 550.7676 (2017): 354-359 (“Silver (B)”).
Regarding claim 1, Silver (A) discloses [a] chess self-learning method based on machine learning, comprising the following steps:
step A, constructing a neural network and randomly initializing parameters of the neural network; (Silver (A) pg. 3 (“The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialized parameters θ.”))
step B, constructing a Monte Carlo tree, initializing nodes of the Monte Carlo tree using the neural network, self-playing by Monte Carlo tree search, (Silver (A) pg. 3 (“Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.”)) 
step C, training the neural network using the stored game data; (Silver (A) pg. 3 (“The parameters of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ. Games are played by selecting moves for both players by MCTS, at ⁓ πt. At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: 1 for a loss, 0 for a draw, and +1 for a win. The neural network parameters are updated so as to minimise the error between the predicted outcome vt and the game outcome z, and to maximise the similarity of the policy vector pt to the search probabilities t.”))
step D, repeating the processes from step B to step C until the neural network converges (Silver (A) pg. 4 (“We applied the AlphaZero algorithm to chess, shogi, and also Go. Unless otherwise specified, the same algorithm settings, network architecture, and hyper-parameters were used for all three games. We trained a separate instance of AlphaZero for each game. Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks.”)).
Silver (A) does not expressly disclose generating game data, and storing the game data; (but see Silver (B) pg. 8, 2nd col. (“At the end of the search AlphaGo Zero selects a move a to play in the root position s0, proportional to its exponentiated visit count, 
    PNG
    media_image1.png
    34
    356
    media_image1.png
    Greyscale
, where τ is a temperature parameter that controls the level of exploration. The search tree is reused at subsequent timesteps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded. AlphaGo Zero resigns if its root value and best child value are lower than a threshold value vresign.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Silver (B) to retain the statistics of the subtree below the current child, at least because doing so would enable using a single neural network instead of separate policy and value networks.

Regarding claim 2, Silver (A), in view of Silver (B), discloses the invention of claim 1 as discussed above. Silver (A) further discloses wherein the neural network comprises an input layer, a hidden layer and an output layer;
the input layer matches the size of the chess board to be trained; (Silver (A) pg. 13 (“The input to the neural network is an N x N x (MT +L) image stack that represents state using a concatenation of T sets of M planes of size N x N.”))
the hidden layer is used to complete the extraction and processing of position features; (Silver (A) pg. 12 (“AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs.”), Silver pg. 13 (“Each set of planes represents the board position at a time-step t T + 1 t, and is set to zero for time-steps less than 1. The board is oriented to the perspective of the current player. The M feature planes are composed of binary feature planes indicating the presence of the player’s pieces, with one plane for each piece type, and a second set of planes indicating the presence of the opponent’s pieces.”))
the output layer comprises a game decision maker for outputting a move vector and a value evaluator of a value function for outputting the current position; and the game decision maker and the value evaluator of the value function share the same input layer and hidden layer (Silver (A) pgs. 2-3 (“Instead of a handcrafted evaluation function and move ordering heuristics, AlphaZero utilises a deep neural network (p, v) = fθ(s) with parameters θ. This neural network takes the board position s as an input and outputs a vector of move probabilities p with components pa = Pr(a|s) for each action a, and a scalar value v estimating the expected outcome z from position s, v ≈ E[z|s]. AlphaZero learns these move probabilities and value estimates entirely from self-play; these are then used to guide its search.”)).

Regarding claim 3, Silver (A), in view of Silver (B), discloses the invention of claim 2 as discussed above. Silver (A) does not expressly disclose wherein the method of constructing the neural network comprises:
setting the structure of the input layer and the decision output layer according to the size of the trained chess board, so that the sizes of the input layer and the decision output layer match the size of the chess board (but see Silver (B) pg. 7 2nd column (“The input features describing the position are structured as a 19 × 19 image; that is, the neural network architecture is matched to the grid structure of the board.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Silver (B) to match the structure of the neural network to the grid structure of the chess board, at least because doing so would enable the AlphaGo Zero algorithm to be adapted “to learn a different (alternating Markov) game.” See Silver (B) pg. 7 2nd column.

Regarding claim 4, Silver (A), in view of Silver (B), discloses the invention of claim 1 as discussed above. Silver (A) teaches using MCTS to train a neural network to play chess based on the AlphaGo Zero reinforcement learning approach, but does not expressly disclose wherein the method of constructing a Monte Carlo tree, initializing nodes of the Monte Carlo tree using the neural network, self-playing by Monte Carlo tree search, and generating game data comprises:
constructing a Monte Carlo tree; (but see Silver (B) Figure 1a (initial Monte Carlo tree at state s1))
self-playing by Monte Carlo tree search, and controlling both players to conduct a round of Monte Carlo tree search based on the last move of the opponent as a root node; (but see Silver (B) Figure 1 (“The program plays a game s1, ..., sT against itself. In each position st, an MCTS αθ is executed (see Fig. 2) using the latest neural network fθ. In each position st, an MCTS αθ is executed (see Fig. 2) using the latest neural network fθ.”))
after the Monte Carlo tree search is completed, obtaining the corresponding decision vector πt according to the selected proportion of each move α under the root node, and then selecting the move in the self-play according to πt; (but see Silver (B) Figure 1a (Moves are selected according to the search probabilities computed by the MCTS, at ∼ πt. The terminal position sT is scored according to the rules of the game to compute the game winner z.”))
after completing a game of self-play, attaching a value tag Z to each decision according to the ending outcome, that is, attaching a tag +1 to all decisions of the winner and a tag −1 to all decisions of the loser, generating a target pair (πt, Z), and storing the target pair (πt, Z) in a container; (but see Silver (B) Figure 1a (“The terminal position sT is scored according to the rules of the game to compute the game winner z.”); see also Silver (B) pg. 2 2nd col. (“A game terminates at step T when both players pass, when the search value drops below a resignation threshold or when the game exceeds a maximum length; the game is then scored to give a final reward of rT ∈  {− 1,+ 1} (see Methods for details). The data for each timestep t is stored as (st, πt, zt), where zt = ± rT is the game winner from the perspective of the current player at step t.”))
when the container is full, discarding the target pair first placed in the container (but see Silver (B) pg. 8 2nd col. (“The search tree is reused at subsequent timesteps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Silver (B) to use the AlphaGo Zero strategy for game play, at least because the AlphaZero taught by Silver is based on the AlphaGo Zero strategy.

Regarding claim 5, Silver (A), in view of Silver (B), discloses the invention of claim 4 as discussed above. Silver does not expressly disclose wherein the method of training the neural network using the stored game data comprises:
randomly selecting the target pair (πt, Z) from the container to train the neural network, where the loss function of the neural network is as follows:

    PNG
    media_image2.png
    84
    516
    media_image2.png
    Greyscale

in which the move vector 
    PNG
    media_image3.png
    68
    452
    media_image3.png
    Greyscale
 is a 1×T-dimensional vector, the decision vector 
    PNG
    media_image4.png
    74
    424
    media_image4.png
    Greyscale
 is a 1×T-dimensional vector, the scalar value Vt has a value range [−1,1], which indicates the possibility that the mover wins in the current position, the larger value Vt indicates that the current player wins more likely, and the smaller value Vt indicates that the current player loses more likely; 
c is a stable constant, θ = (θ0, θ1, θ2, . . . ) is a vector consisted of all parameters of the neural network, c∥θ∥2 is a regular term, and log pt means taking the logarithm of each component of pt;
enabling the move vector pt output by the neural network to be close to the decision vector πt, so that the value judgment Vt solves the final game result Z, that is, the loss function loss decreases as much as possible (but see Silver (B) pg. 2 2nd col. (“The neural network is trained by a self-play reinforcement learning algorithm that uses MCTS to play each move. First, the neural network is initialized to random weights θ0. At each subsequent iteration i ≥ 1, games of self-play are generated (Fig. 1a). At each timestep t, an MCTS search πt = αθ−1(st) is executed using the previous iteration of neural network fθi-1 and a move is played by sampling the search probabilities πt. A game terminates at step T when both players pass, when the search value drops below a resignation threshold or when the game exceeds a maximum length; the game is then scored to give a final reward of rT ∈ {−1, +1} (see Methods for details). The data for each timestep t is stored as (st, πt, zt), where zt = ± rT is the game winner from the perspective of the current player at step t. In parallel (Fig. 1b), new network parameters θi are trained from data (s, π, z) sampled uniformly among all timesteps of the last iteration(s) of self-play. The neural network (p, v) = fθi (s) is adjusted to minimize the error between the predicted value v and the self-play winner z, and to maximize the similarity of the neural network move probabilities p to the search probabilities π. Specifically, the parameters θ are adjusted by gradient descent on a loss function l that sums over the mean squared error and cross entropy losses, respectively:
(p, v) = fθ(s) and l = (z-v)2 - πt log p + c||θ||2
where c is a parameter controlling the level of L2 weight regularization (to prevent overfitting).”))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Silver (B) to train the neural network according to the loss function for the AlphaGo Zero game, at least because the same MCTS technique is applicable to chess.

Regarding claim 6, Silver (A), in view of Silver (B), discloses the invention of claim 4 as discussed above. Silver does not expressly disclose wherein the method of the Monte Carlo tree search comprises: 
selecting according to the following formula:

    PNG
    media_image5.png
    296
    530
    media_image5.png
    Greyscale

in which the scalar NUM represents the total number of times that the node St is accessed; cinit and cbase are two constants; Ct represents the exploration rate, the larger value indicates that the current Monte Carlo tree search trends to explore, the smaller value indicates that the current Monte Carlo tree search trends to select the best move according to the existing result; the move vector pt=(pt0, pt2, . . . , pta, . . . , ptT−1) is a 1×T-dimensional vector; the average value Q[a] indicates the average value obtained by selecting the move α in the current state St; the array of the access number is N[a] with length T; every Monte Carlo tree search returns when encountering unexpanded nodes, and recursively updates each node to the root node (but see Silver (B) pg. 8 (“ AlphaGo Zero uses a much simpler variant of the asynchronous policy and value MCTS algorithm (APV-MCTS) used in AlphaGo Fan and AlphaGo Lee. Each node s in the search tree contains edges (s, a) for all legal actions a ∈ A(s). Each edge stores a set of statistics,
{N(s, a), W(s, a), Q(s, a), P(s, a)}
where N(s, a) is the visit count, W(s, a) is the total action value, Q(s, a) is the mean action value and P(s, a) is the prior probability of selecting that edge. Multiple simulations are executed in parallel on separate search threads. The algorithm proceeds by iterating over three phases (Fig. 2a–c), and then selects a move to play (Fig. 2d).
Select (Fig. 2a). The selection phase is almost identical to AlphaGo Fan12; we recapitulate here for completeness. The first in-tree phase of each simulation begins at the root node of the search tree, s0, and finishes when the simulation reaches a leaf node sL at timestep L. At each of these timesteps, t < L, an action is selected according to the statistics in the search tree, at = argmax (S(st, a) + U(st, a)), using a variant of the PUCT algorithm,

    PNG
    media_image6.png
    102
    488
    media_image6.png
    Greyscale

where cpuct is a constant determining the level of exploration; this search control strategy initially prefers actions with high prior probability and low visit count, but asymptotically prefers actions with high action value.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Silver (B) to use the AlphaGo Zero tree search parameters for MCTS, at least because doing so initially prefers actions with high prior probability and low visit count, but asymptotically prefers actions with high action value.

Regarding claim 7, Silver (A), in view of Silver (B), discloses the invention of claim 4 as discussed above. Silver (A) does not expressly disclose wherein the method of obtaining the corresponding decision vector RI according to the selected proportion of each move α under the root node after the Monte Carlo tree search is completed comprises: calculating the decision vector it according to the following formula:

    PNG
    media_image7.png
    118
    228
    media_image7.png
    Greyscale

where τ≤1 is the parameter that controls the degree of exploration, the larger value τ indicates that the current Monte Carlo tree trends to search, the smaller value τ indicates that the current Monte Carlo tree trends to select the best strategy; the decision vector πt = (πt0, πt2, . . . , πta, . . . , πtT−1) is a 1×T-dimensional vector; the array of the access number is N[a] with length T; and the scalar NUM represents the total number of times that the node St is accessed (but see Silver (B) pg. 8 2nd column (“At the end of the search AlphaGo Zero selects a move a to play in the root position s0, proportional to its exponentiated visit count, 
    PNG
    media_image1.png
    34
    356
    media_image1.png
    Greyscale
, where τ is a temperature parameter that controls the level of exploration. The search tree is reused at subsequent timesteps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded. AlphaGo Zero resigns if its root value and best child value are lower than a threshold value vresign.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Silver (B) to select a move to play in the root position proportional to its exponentiated visit count, at least because doing so would enable using a single neural network instead of separate policy and value networks.

Regarding claim 8, Silver (A) discloses [a] chess self-learning device based on machine learning, wherein the device comprises: (see Silver (A) pg. 15 (“Each MCTS was executed on a single machine with 4 TPUs.”))
a network constructing module, which is configured to construct a neural network and randomly initialize parameters of the neural network; (see Silver (A) pg. pg. 3 (“The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialized parameters θ.”))
a data generating module, which is configured to construct a Monte Carlo tree, initialize nodes of the Monte Carlo tree using the neural network, self-play by Monte Carlo tree search, generate game data, and store the game data; (see Silver (A) pg. 3 (“Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.”))
a training module, which is configured to train the neural network using the stored game data; (see Silver (A) pg. 3 (“The parameters of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ. Games are played by selecting moves for both players by MCTS, at ⁓ πt. At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: 1 for a loss, 0 for a draw, and +1 for a win. The neural network parameters are updated so as to minimise the error between the predicted outcome vt and the game outcome z, and to maximise the similarity of the policy vector pt to the search probabilities t.”))
a converging module, which is configured to control the data generating module to stop generating game data and control the converging module to stop training when the neural network converges (see Silver (A) pg. 4 (“We applied the AlphaZero algorithm to chess, shogi, and also Go. Unless otherwise specified, the same algorithm settings, network architecture, and hyper-parameters were used for all three games. We trained a separate instance of AlphaZero for each game. Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks.”)).
Silver does not expressly disclose generating game data, and storing the game data; (but see Silver 2017 pg. 8, 2nd col. (“At the end of the search AlphaGo Zero selects a move a to play in the root position s0, proportional to its exponentiated visit count, 
    PNG
    media_image1.png
    34
    356
    media_image1.png
    Greyscale
, where τ is a temperature parameter that controls the level of exploration. The search tree is reused at subsequent timesteps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded. AlphaGo Zero resigns if its root value and best child value are lower than a threshold value vresign.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver to incorporate the teachings of Silver 2017 to retain the statistics of the subtree below the current child, at least because doing so would enable using a single neural network instead of separate policy and value networks.

Regarding claim 9, Silver (A) discloses [a] chess self-learning device based on machine learning, comprising a processor and a storage medium; wherein the storage medium is configured to store instructions; (see Silver (A) pg. 15 (“Each MCTS was executed on a single machine with 4 TPUs.”) (Executing an MCTS algorithm on a machine with tensor processing units necessarily requires a medium configured to store instruction for the MCTS)
the processor is configured to operate according to the instructions to execute the steps of the method according to claim 1 (see detailed rejection of claim 1 above).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Silver (A) and Silver (B) as applied to claim 1 above, and further in view of Gorban (US 2018/0150726 A1; published May 31, 2018).
Regarding claim 10, Silver (A) discloses [a] computer-readable storage medium on which a computer program is stored, wherein the program, when executed by a processor, (see Silver (A) pg. 15 (“Each MCTS was executed on a single machine with 4 TPUs.”)) implements the steps of the method according to claim 1 (see detailed rejection of claim 1 above). Silver (A) does not expressly disclose a computer readable storage medium on which a computer program is stored (but see Gorban ¶ 25 (“Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor (e.g., a central processing unit (CPU), graphics processing unit (GPU), and/or Tensor Processing Unit (TPU)) to perform a method such as one or more of the methods described above. Yet another implementation may include a system of one or more computers that include one or more processors operable to execute stored instructions to perform a method such as one or more of the methods described above.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Silver (A) to incorporate the teachings of Gorban to store the MCTS program on a computer readable storage medium, at least because doing so would enable a processor to execute the stored instructions of the program.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Maddison, Chris J., et al. "Move evaluation in Go using deep convolutional neural networks." arXiv preprint arXiv:1412.6564 (2014).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571)272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAHID K KHAN/Primary Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Jun 22, 2022
Application Filed
Jan 24, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/807,290
Patent 12591768
DEEP LEARNING ACCELERATION WITH MIXED PRECISION
2y 5m to grant Granted Mar 31, 2026
18/675,206
Patent 12579516
System and Method for Organizing and Designing Comment
2y 5m to grant Granted Mar 17, 2026
18/525,525
Patent 12566813
SYSTEMS AND METHODS FOR RENDERING INTERACTIVE WEB PAGES
2y 5m to grant Granted Mar 03, 2026
18/263,279
Patent 12547298
Display Method and Electronic Device
2y 5m to grant Granted Feb 10, 2026
17/589,370
Patent 12530916
MULTIMODAL MULTITASK MACHINE LEARNING SYSTEM FOR DOCUMENT INTELLIGENCE TASKS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
90%
With Interview (+15.7%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 389 resolved cases by this examiner. Grant probability derived from career allow rate.