DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1, 4 , 12, 15 and 20 have been amended.
Claims 10 have been cancelled.
Claims 1 – 9 and 11 – 20 are pending.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
This subject matter eligibility analysis follows the latest guidance for Patent Subject Matter Eligibility Guidance.
Claims 1 - 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Step 1:
Claims 1 – 11 are drawn to a method.
Claims 12 – 19 are drawn to a system.
Claim 20 is drawn to a CRM.
Thus, initially, under Step 1 of the analysis, it is noted that the claims are directed towards eligible categories of subject matter.
Step 2A:
Prong 1: Does the Claim recite an Abstract idea, Law of Nature, or Natural Phenomenon?
Claims 12 - 20 are exemplary because they require substantially the same operative limitations of the remaining claims (reproduced below.) Examiner has underlined the claim limitations which recite the abstract idea, discussed in detail in the paragraphs that follow.
12. A decision model training apparatus comprising:
at least one memory configured to store program code; and
at least one processor configured to read the program code and operate as instructed by the program code, the program code comprising:
model obtaining code configured to cause the at least one processor to obtain model pools of virtual characters, the model pools comprising decision models of the virtual characters, and the decision models being used for indicating battle policies adopted by the virtual characters in battles;
model training code configured to cause the at least one processor to update and train nth decision models of the virtual characters based on battle data of a battle between the virtual characters in an nth iteration process to obtain n+1th decision models of the virtual characters, and add the n+1th decision models to model pools of corresponding virtual characters; and
model determination code configured to cause the at least one processor to determine, based on an iterative training end condition is satisfied as a result of battle winning variations of the virtual characters being smaller than a threshold, decision models obtained by the last round of training in the model pools as application decision models of the virtual characters.
The claims recite italicized limitations that fall within at least one of the groupings of abstract ideas enumerated in the 2019 PEG, namely, Mental Processes.
More specifically, under this grouping, the italicized limitations represent the training of decision models by means of iterative and mathematical optimization by means of obtaining model pools updating and training decision models based on battle data and determining and end condition based upon iterative training. These limitation fall under mental processes such as observing, evaluating and judging the performance of a decision model.
Prong 2: Does the Claim recite additional elements that integrate the exception in to a practical application of the exception?
Although the claims recite additional limitations, these limitations do not integrate the exception into a practical application of the exception. For example, the claims require additional limitations as follow, (emphasis added): processor, memory, computer readable medium.
These additional limitations do not represent an improvement to the functioning of a computer, or to any other technology or technical field, (MPEP 2106.05(a)). Nor do they apply the exception using a particular machine, (MPEP 2106.05(b)). Furthermore, they do not effect a transformation. (MPEP 2106.05(c)). Rather, these additional limitations amount to an instruction to “apply” the judicial exception using a computer as a tool to perform the abstract idea. Therefore, since the additional limitations, individually or in combination, are indistinguishable from a computer used as a tool to perform the abstract idea, the analysis continues to Step 2B, below.
Step 2B:
Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because they amount to conventional and routine computer implementation and mere instructions for implementing the abstract idea on generic computing devices.
For example, as pointed out above, the claimed invention recites additional elements facilitating implementation of the abstract idea. Applicant has claimed computer processors, memory and storage mediums. However, all of these elements viewed individually and as a whole, are indistinguishable from conventional computing elements known in the art. Therefore, the additional elements fail to supply additional elements that yield significantly more than the underlying abstract idea.
As the Alice court cautioned, citing Flook, patent eligibility cannot depend simply on the draftsman’s art. Here, amending the claims with generic computing elements does not (in this Examiner’s opinion), confer eligibility.
Regarding the Berkheimer decision, Price (US 2022/0188623) establishes that these additional elements are generic:
[0034] The above-described methods can be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high-level block diagram of such a computer is illustrated in FIG. 6. Computer 600 contains a processor 610, which controls the overall operation of the computer 600 by executing computer program instructions which define such operation. It is to be understood that the processor 610 can include any type of device capable of executing instructions. For example, the processor 610 may include one or more of a central processing unit (CPU), a graphical processing unit (GPU), a field-programmable gate array (FPGA), and an application-specific integrated circuit (ASIC). The computer program instructions may be stored in a storage device 620 (e.g., magnetic disk) and loaded into memory 630 when execution of the computer program instructions is desired. Thus, the steps of the methods described herein may be defined by the computer program instructions stored in the memory 630 and controlled by the processor 710 executing the computer program instructions. The computer 600 may include one or more network interfaces 650 for communicating with other devices via a network. The computer 600 also includes a user interface 660 that enable user interaction with the computer 600. The user interface 660 may include I/O devices 662 (e.g., keyboard, mouse, speakers, buttons, etc.) to allow the user to interact with the computer. Such input/output devices 662 may be used in conjunction with a set of computer programs to receive visual input and display the human understandable output in accordance with embodiments described herein. The user interface also includes a display 664. The computer may also include a receiver 615 configured to receive visual input from the user interface 660 or from the storage device 620. According to various embodiments, FIG. 6 is a high-level representation of possible components of a computer for illustrative purposes and the computer may contain other components.
Therefore, these elements fail to supply additional elements that yield significantly more than the underlying abstract idea. Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea).
Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.
Moreover, the claims do not recite improvements to another technology or technical field. Nor, do the claims improve the functioning of the underlying computer itself -- they merely recite generic computing elements. Furthermore, they do not effect a transformation of a particular article to a different state or thing: the underlying computing elements remain the same.
Concerning preemption, the Federal Circuit has said in Ariosa Diagnostics, Inc., V. Sequenom, Inc., (Fed Cir. June 12, 2015):
The Supreme Court has made clear that the principle of preemption is the basis for the judicial exceptions to patentability. Alice, 134 S. Ct at 2354 (“We have described the concern that drives this exclusionary principal as one of pre-emption”). For this reason, questions on preemption are inherent in and resolved by the § 101 analysis. The concern is that “patent law not inhibit further discovery by improperly tying up the future use of these building blocks of human ingenuity.” Id. (internal quotations omitted). In other words, patent claims should not prevent the use of the basic building blocks of technology—abstract ideas, naturally occurring phenomena, and natural laws. While preemption may signal patent ineligible subject matter, the absence of complete preemption does not demonstrate patent eligibility. In this case, Sequenom’s attempt to limit the breadth of the claims by showing alternative uses of cffDNA outside of the scope of the claims does not change the conclusion that the claims are directed to patent ineligible subject matter. Where a patent’s claims are deemed only to disclose patent ineligible subject matter under the Mayo framework, as they are in this case, preemption concerns are fully addressed and made moot. (Emphasis added.)
For these reasons, it appears that the claims are not patent-eligible under 35 USC §101.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1 – 3, 8, 11 – 14, 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning” by Vinyals et al. in view of “Mastering the Game of Go without Human Knowledge”, Silver et al (2017) (Hereinafter “Silver”).
As per claim 1, Vinyals discloses:
obtaining model pools of virtual characters, the model pools comprising decision models of the virtual characters, and the decision models being used for indicating battle policies adopted by the virtual characters in battles; (Vinyals discloses the use of a reinforcement learning system that utilizes pools of agents comprising decision models of characters that are trained to battle against other agents. Vinyals discloses: “c, Three pools of agents, each initialized by supervised learning, were subsequently trained with reinforcement learning. As they train, these agents intermittently add copies of themselves—‘players’ that are frozen at a specific point—to the league. The main agents train against all of these past players, as well as themselves. The league exploiters train against all past players. The main exploiters train against the main agents. Main exploiters and league exploiters can be reset to the supervised agent when they add a player to the league. Images from StarCraft reproduced with permission from Blizzard Entertainment.”) (Vinyals page 351, Fig 1 Caption)
updating and training nth decision models of the virtual characters based on battle data of a battle between the virtual characters in an nth iteration process to obtain n+1th decision models of the virtual characters; (Vinyals discloses iterative training of decision models that are associated with agents based upon battle data results) (Vinyals page 351, Fig 1 Caption, Page 352, par 3)
adding the n+1th decision models to model pools of corresponding virtual characters; and (Vinyals discloses adding trained decision models to the model pools corresponding to the virtual characters by means of updating the models and adding copies of the models) ) (Vinyals page 351, Fig 1 Caption, page 356, par 8, 9 and 11 “Reinforcement Learning”)
determining, based on an iterative training end condition being satisfied,… decision models obtained by the last round of training in the model pools as application decision models of the virtual characters. (Vinyals discloses the determination of a match outcome such as a loss, draw or win when training the agent decision policies and further updating them) (Vinyals page 356, par 8, “Reinforcement Learning”)
Vinyals fails to disclose:
…as a result of battle winning rate variations of the virtual characters being smaller than a threshold…
However in a similar field of endeavor, Silver discloses the selection of a decision model to be used as a baseline (end condition) for subsequent self-play generation iteration based upon a win rate meeting a threshold. “Each evaluation consists of 400 games, using an MCTS with 1,600 simulations to select each move, using an infinitesimal temperature τ→ 0 (that is, we deterministically select the move with maximum visit count, to give the strongest possible play). If the new player wins by a margin of > 55% (to avoid selecting on noise alone) then it becomes the best player αθ∗ , and is subsequently used for self-play generation, and also becomes the baseline for subsequent comparisons.) (Silver page 8).
It would be obvious to one of ordinary skill in the art, at the time of filing, to modify Vinyals in view of Silver to use a known technique to improve similar devices in the same way by means of specifying a win threshold to determine when to stop the iterative training. This would be beneficial as it would enable to the system to determine an eventual decision model that has the best acceptable training data and not expend processing power unnecessarily.
As per claim 2, Vinyals discloses:
wherein the updating and training an nth decision model of the ith virtual character comprises:
performing mth model sampling from a model pool of a battle virtual character to obtain an mth battle decision model, the battle virtual character being a virtual character other than the ith virtual character among the virtual characters; controlling the ith virtual character based on an nth decision model optimized at an m-1th time and the mth battle decision model to battle against an mth battle virtual character to which the mth battle decision model belongs to obtain an mth battle result; performing parameter optimization on the nth decision model optimized at the m-1th time based on the mth battle result to obtain an nth decision model of the ith virtual character optimized at an mth time; stopping parameter optimization on the nth decision model of the ith virtual character based on a policy convergence condition being satisfied; and determining an nth decision model optimized at a last time as the n+1th decision model of the ith virtual character. (Vinyals discloses the iterative process of updating and training of agent decision policies in a successive manner that comprises n+1 agent decision models of an ith character that are trained in a battle against multiple different opponent characters. Vinyals discloses the updating of the agent decision policies and to obtain an ith+1 decision policy that is added to the pool of agent decision policies. Vinyals even copies an updated agent decision policy and adds that copy to the agent decision policy pool) (Vinyals page 351, Fig 1 Caption, page 356, par 8, 9 and 11 “Reinforcement Learning”)
As per claim 3, Vinyals discloses:
wherein the updating and training an nth decision model of the ith virtual character comprises: performing mth model sampling from a model pool of a battle virtual character to obtain an mth battle decision model, the battle virtual character being a virtual character other than the ith virtual character among the virtual characters; controlling the ith virtual character based on an nth decision model optimized at an m-1th time and the mth battle decision model to battle against an mth battle virtual character to which the mth battle decision model belongs to obtain an mth battle result; performing parameter optimization on the nth decision model optimized at the m-1th time based on the mth battle result to obtain an nth decision model of the ith virtual character optimized at an mth time; stopping parameter optimization on the nth decision model of the ith virtual character based on a policy convergence condition being satisfied; and determining an nth decision model optimized at a last time as the n+1th decision model of the ith virtual character. (Vinyals discloses the use of game characters such as main gents and battle characters such as main exploiter agents that battle one another with each iteration. Vinyals discloses the sampling of the main exploiter agents from a pool of main exploiter agents wherein the main exploiter agents upon the outcome of the battle being realized, the main exploiter agents are also trained. The “main agents are encouraged to address their weaknesses” (i.e. optimizing the main agents decision policy to overcome their weaknesses) in each iteration. (Vinyals pg. 352, par 3)
As per claim 8, Vinyals discloses:
wherein the controlling the ith virtual character to battle against an mth battle virtual character comprises: creating at least two battles; controlling the ith virtual character, based on the nth decision model optimized at the m-1th time and the mth battle decision model, to battle against the mth battle virtual character in the at least two battles to obtain at least two mth battle results; and performing parameter optimization on the nth decision model optimized at the m-1th time based on the at least two mth battle results to obtain the nth decision model of the ith virtual character optimized at the mth time. (Vinyals discloses the creation of successive battles (i.e. two battles wherein an agent and an opponent wherein the agent character is controlled by an agent decision policy. Each battle has an outcome of and win, draw or loss and the agents decision policy is updated from the results of the first battle and optimized for the second battle against the opponent character) ((Vinyals page 351, Fig 1 Caption, page 351, par 5 – P352, Par 3; page 356, par 8, 9 and 11 “Reinforcement Learning”)
As per claim 11, Vinyals discloses:
the updating and training nth decision models comprises: updating and training first decision models of the virtual characters in a first iteration process to obtain second decision models of the virtual characters, the first decision models of the virtual characters being the general decision models. (Vinyals discloses the updating and training of decision models of characters in multiple iterations to obtain updated decision models (second models), wherein the first iteration trains a base decision model (i.e. general model) determined by a supervised learning process) (Vinyals Page 351, 4 – 5; page 351, Fig 1 Caption,
Independent claim(s) 12 and 20 is/are obvious over Vinyals and Silver based on the same analysis set forth for claim(s) 1, which are similar in claim scope. Regarding the hardware limitations of that recite memory, processors and CRM, the Examiner notes that Vinyals anticipates these limitations as the system of Vinyals operates based upon the game application StarCraft II which operates upon a PC that inherently comprises a processor, memory and CRM (Vinyals page 355, “Game and interface, Game Environment”)
Dependent claim(s) 13 is/are obvious by Vinyals and Silver based on the same analysis set forth for claim(s) 2, which are similar in claim scope.
Dependent claim(s) 14 is/are obvious by Vinyals and Silver based on the same analysis set forth for claim(s) 3, which are similar in claim scope.
Dependent claim(s) 19 is/are obvious by Vinyals and Silver based on the same analysis set forth for claim(s) 8, which are similar in claim scope.
Claim(s) 4-6 and 15 – 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning” by Vinyals et al. in view of “Mastering the Game of Go without Human Knowledge”, Silver et al (2017) (Hereinafter “Silver”) in view of “Regret Minimization in Games with Incomplete Information”, Zinkevich et al (hereinafter “Zinkevich”).
As per claim 4, Vinyals discloses: performing mth character sampling from the battle virtual character to obtain the mth battle virtual character; and (Vinyals page Page 352, par 1 – 3)
Vinyals fails to disclose:
performing counterfactual regret minimization (CFR) sampling from a model pool of the mth battle virtual character to obtain the mth battle decision model.
However in a similar field of endeavor, Zinkevich discloses the utilization of sampling by means of Counterfactual regret minimization (Zinkevich pages 3 – 4)
It would be obvious to one of ordinary skill in the art, at the time of filing, to modify Vinyals in view of Zinkevich to use CFR minimization when determining an optimal decision model. As Zinkevich teaches, “The fundamental idea of our approach is to decompose overall regret into a set of additive regret terms, which can be minimized independently. In particular, we introduce a new regret concept for extensive games called counterfactual regret, which is defined on an individual information set. We show that overall regret is bounded by the sum of counterfactual regret, and also show how counterfactual regret can be minimized at each information set independently.” (Zinkevich page 4)
As per claim 5,wherein the performing mth character sampling comprises: sampling from the battle virtual character to obtain the mth battle virtual character based on an mth character weight of the battle virtual character; and sampling from the model pool of the mth battle virtual character to obtain the mth battle decision model based on mth model weights of decision models of the mth battle virtual character, the character weights and the model weights being positively correlated with a battle losing rate of the ith virtual character. (Vinyals discloses the use of a sampling probability wherein the higher losing rate equates to a higher sampling rate) (Vinyals page 358, “Prioritized fictitious self-play” and equation)
As per claim 6, updating a first losing rate of the mth battle virtual character and a second losing rate of the mth battle decision model based on the mth battle result, the first losing rate referring to a losing rate of the ith virtual character based on the ith virtual character battling against a battle virtual character, and the second losing rate referring to a losing rate of the ith virtual character based on a battle decision model controlling the battle virtual character to battle against the ithvirtual character; updating the mth character weight based on the first losing rate to obtain an m+lth character weight; and updating the mth model weight based on the second losing rate to obtain an m+lth model weight. (Vinyals discloses the updating of weights based upon simulated game outcomes) (Vinyals page 357)
Dependent claim(s) 15-17 is/are obvious by Vinyals, Silver and Zinkevich based on the same analysis set forth for claim(s) 4-5, which are similar in claim scope.
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning” by Vinyals et al, in view of “Mastering the Game of Go without Human Knowledge”, Silver et al (2017) (Hereinafter “Silver”) in view of Wu et al CN115193053A (machine translation)
As per claim 9,
Vinyals fails to disclose:
wherein the stopping parameter optimization comprises: determining that the policy convergence condition is satisfied based on a battle winning rate variation of the ith virtual character being smaller than a first threshold; and stopping parameter optimization on the nth decision model of the ith virtual character. [claim 9]
However, in a similar field of endeavor, Wu discloses a system that use of a convergence condition that is satisfied based upon the win rate of the iterative training reaching a predetermined level or amount and thus stopping the iterative training (Wu Machine translation, Page 13, par 11).
It would be obvious to one of ordinary skill in the art, at the time of filing, to modify Vinyals in view of Wu to provide a convergence condition that is based upon the win rate of the decision model. This would ensure that the system is able to determine the most efficient decision model that can be used to provide the optimal outcome in terms of strategy.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1 - 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Please see above rejection in view of Silver addressing the newly added claim limitations.
With respect to the Applicants arguments that the claims are not directed towards a Mathematical Concept, the Examiner finds the Applicant’s arguments persuasive.
Regarding the rejection of the claims under 35 U.S.C. 101, the Applicant states: Applicant respectfully submits that a human mind cannot practically update a training nth decision models of the virtual characters based on battle data of a battle between the virtual characters in an nth iteration process to obtain n+lth decision models of the virtual papers. Even with an aid of a pen and paper, at least a human mind cannot practically update and train nth decision models in an nth iteration process.” (Remarks page 12). The Examiner respectfully disagrees and notes that a human using pen and paper can indeed use iterative steps to simulate game outcomes to train and update decision models.
Applicant further states that the claims “The claim as a whole integrates the alleged exception into a practical application, as evidenced by the Applicant's specification as originally filed ("Specification"). One way to demonstrate such integration is when the claimed invention improves the functioning of a computer or improves another technology or technical field." MPEP § 2106.04(d)(1). The Specification describes difficulties of adapting decision models for machine-based game play as virtual characters have different characteristics. See e.g., paragraphs [0003]-[0004]. Claim 1, when read in light of the Specification, solves the problem and provide certain advantages such as improvement of battle policies for different virtual characters, and contributing to improving the battle winning rates of the virtual characters in the battle based on the decision models by iteratively training decision models corresponding to different virtual characters for multiple rounds based on battle data of the virtual characters in a battle process. See id., at paragraphs [0043]. [0067], and [0141].” (Remarks page 13). The Examiner respectfully disagrees and Applicant fails to provide any substantive reasoning as to how the actual functioning of the computer, technology or technical field is improved. The Examiner respectfully disagrees and notes that the mere improving of the accuracy of an abstract decision model does not provide evidence that of a practical application. This amounts to the mere improvement to the quality of the outcome by means of improving the accuracy. This is more of a business or analytical improvement and not a technological improvement. The Examiner maintains the rejection.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROSS A WILLIAMS whose telephone number is (571)272-5911. The examiner can normally be reached Mon-Fri 8am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kang Hu can be reached at (571)270-1344. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RAW/Examiner, Art Unit 3715 3/1/2026
/KANG HU/Supervisory Patent Examiner, Art Unit 3715