DETAILED ACTION
Claims 1-20 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 U.S.C. § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 5, 7, 8, 12, 15, and 19 are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by Dehesa, et al., Touché: Data-Driven Interactive Sword Fighting in Virtual Reality, CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 25–30 April 2020, pp. 1-15, in its entirety. Specifically:
Claim 1
Claim 1's ''an interface; and'' is anticipated by Dehesa, et al., page 7, right column, last partial paragraph and page 8, left column, first partial paragraph, where it recites:
We first conducted a questionnaire study, where participants were presented with short videos (one minute each) of sword fighting sequences against the characters described by each of the conditions, driven by similar player interactions. The videos are from the point of view of the player and show the character walking towards the player and engaging in sword fighting, attacking and defending.
Claim 1's ''processing circuitry configured to execute a first neural network to:'' is anticipated by Dehesa, et al., page 6, left column, second full paragraph, where it recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Claim 1's ''receive, via the interface, indications of movement of a player controlled by a user playing a video game application;'' is anticipated by Dehesa, et al., page 1, caption for Figure 1 (images on the right), where it recites:
Figure 1: Left: Our framework splits the problem of simulating interactive VR sword fighting characters into a “physical” level, relying on data-driven models, and a “semantic” level, where designers can configure the behaviour of the character. Right: The framework generates responsive animations against player attacks (top), avoiding nonreactive behaviour from the character (bottom left). A neural network parameterised by the position of the player’s sword synthesises the animation (bottom right).
Claim 1's “program a first distance from the player and a second distance from the player, wherein the second distance is greater than the first distance; and” is anticipated by Dehesa, et al., page 7, right column, Figure 4, where it shows distance differences between the bones of different characters.
Claim 1's “implement a movement scheme for a first non-player character (NPC) indicating that the first NPC is to remain within a region created as a result of a difference between the first distance and the second distance from the player as the player moves; and;'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Further, the distance differences are anticipated by Dehesa, et al., page 7, right column, Figure 4, where it shows distance differences between the bones of different characters.
Claim 1's wherein the apparatus is configured to render the first NPC into a user interface (UI) alongside the player following the movement scheme.'' is anticipated by Dehesa, et al., page 5, right column, first full paragraph, where it recites:
Animation Synthesis
The animation synthesis module is responsible for determining the pose that the virtual character shall adopt in each frame. This module is driven by the directives emitted by the behaviour planning module, and can be seen as a translator of these directives into “physical” actions. Animation synthesis is also a data-driven component: it does not use complex state machines, but rather offers a menu of actions it may perform and acts them out, depending on the context. There are two kinds of actions that the model may perform: defending and attacking. Defending is the more complicated, as it is reactive and depends on what the player does. Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user.
Claim 5
Claim 5's ''receive feedback on whether behavior of the first NPC is appropriate; and'' is anticipated by Dehesa, et al., page 6, right column, fourth full paragraph, where it recites:
Gesture Recognition
We can assess the accuracy of the gesture recognition system by measuring the mispredictions of the model. However, we also want to have an understanding of the cases in which those mispredictions are more frequent. We therefore analysed the errors of the neural network per gesture and also across the duration of each gesture. In particular, we want to make sure that gesture classification is more reliable in the middle sections of a gesture, allowing more room for error at the beginning and end, where the boundary of the gesture may not be as precisely defined.
Further, it is anticipated by Dehesa, et al., page 5, right column, first full paragraph, where it recites:
Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user. Attacks, on the other hand, are initiated by the character by request of the behaviour planner, which also indicates the kind of attack to perform. For this, we can simply use a collection of animation clips that are played out as necessary. This simplifies the system and gives designers precise control of what happens when the character attacks the user. The animation synthesis system smoothly blends between the machine learning model and the clips, fading the weight of each one in and out over a short period of time as the character transitions between attack and defence, so the overall animation appears as a continuous action.
For the defence animation synthesis, we start by collecting a set of motion data to train the model. We used Vicon Bonita equipment to record several sword fighting attacks and blocks at different angles. We produced about 15 minutes of training data in total, recorded at 30 frames per second (over 26000 frames), with each frame containing the pose of a 24-joint skeleton and the position of the tip of both swords. The data is split leaving 80% for training and the rest for evaluation.
Defence animation is generated one frame at a time. The model continuously predicts the next character pose using the current pose and the perceived control information from the user. In summary, the collection of features used as input is:
Claim 5's ''train one or more parameters of the first neural network responsive to receiving the feedback on behavior of the first NPC.'' is anticipated by Dehesa, et al., page 6, left column, second full paragraph, where it recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Claim 7
Claim 7's ''7. The apparatus as recited in claim 1, wherein the first neural network uses the first programmable amount (“isclose”?) and the second programmable amount of distance (“NPC starting point”?) as parameters of the movement scheme that define (i) a region proximate to the player that the first NPC does not enter and (ii) a surrounding region bounded between the first and second amounts, and generates movement control data to keep the first NPC within the surrounding region as the player moves (“change gestures”?).'' is taught by Dehesa, et al., page 5, left column, first and second paragraphs, where it recites:
Initially, the character is at a distance from the player (i.e., the claimed “second programmable amount of distance”) and starts walking in their direction until the distance is small enough (IsClose) (i.e., the claimed “first programmable amount”), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
When the character is attacking, it will usually continue the attack until it is completed (AttackFinished), and then come back to defend again, but the attack may also be aborted early. This will always happen when the player hits the character in the middle of an attack (HitByPlayer). But the character may also interrupt an attack willingly if an incoming strike from the player is detected (PlayerAttacking) (i.e., the claimed “as the player moves”), quickly coming back to defending in an attempt to block the attack.
Claim 8
Claim 8's ''receiving, by processing circuitry executing a first neural network, indications of movement of a player controlled by a user playing a video game application;'' is anticipated by Dehesa, et al., page 1, caption for Figure 1(images on the right), where it recites:
Figure 1: Left: Our framework splits the problem of simulating interactive VR sword fighting characters into a “physical” level, relying on data-driven models, and a “semantic” level, where designers can configure the behaviour of the character. Right: The framework generates responsive animations against player attacks (top), avoiding nonreactive behaviour from the character (bottom left). A neural network parameterised by the position of the player’s sword synthesises the animation (bottom right).
Further, the claimed neural network is anticipated by Dehesa, et al., page 6, left column, second full paragraph, where it recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Claim 8's “programming, by the processing circuitry, a first distance from the player and a second distance from the player, wherein the second distance is greater than the first distance; and” is anticipated by Dehesa, et al., page 7, right column, Figure 4, where it shows distance differences between the bones of different characters.
Claim 8's “implementing, by the processing circuitry, a movement scheme for a first non-player character (NPC), indicating that the first NPC is to remain within a region created as a result of a difference between the first distance and the second distance from the player as the player moves; and;'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Further, the distance differences are anticipated by Dehesa, et al., page 7, right column, Figure 4, where it shows distance differences between the bones of different characters.
Claim 8's ''implementing a movement scheme for a first non-player character (NPC) to remain in relatively close proximity to the player without invading a first programmable amount of distance from the player;'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Claim 8's ''causing the first NPC to not exceed a second programmable amount of distance from the player, wherein the second programmable amount of distance is greater than the first programmable amount of distance; and'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Note that the claimed second distance is taught by the distance that the sword crosses the other sword.
Claim 8's ''rendering, by the processing circuitry, the first NPC into a user interface (UI) alongside the player following the movement scheme.'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Claim 12
Claim 12's ''receiving feedback on whether behavior of the first NPC is appropriate; and'' is anticipated by Dehesa, et al., page 6, right column, fourth full paragraph, where it recites:
Gesture Recognition
We can assess the accuracy of the gesture recognition system by measuring the mispredictions of the model. However, we also want to have an understanding of the cases in which those mispredictions are more frequent. We therefore analysed the errors of the neural network per gesture and also across the duration of each gesture. In particular, we want to make sure that gesture classification is more reliable in the middle sections of a gesture, allowing more room for error at the beginning and end, where the boundary of the gesture may not be as precisely defined.
Further, it is anticipated by Dehesa, et al., page 5, right column, first full paragraph, where it recites:
Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user. Attacks, on the other hand, are initiated by the character by request of the behaviour planner, which also indicates the kind of attack to perform. For this, we can simply use a collection of animation clips that are played out as necessary. This simplifies the system and gives designers precise control of what happens when the character attacks the user. The animation synthesis system smoothly blends between the machine learning model and the clips, fading the weight of each one in and out over a short period of time as the character transitions between attack and defence, so the overall animation appears as a continuous action.
For the defence animation synthesis, we start by collecting a set of motion data to train the model. We used Vicon Bonita equipment to record several sword fighting attacks and blocks at different angles. We produced about 15 minutes of training data in total, recorded at 30 frames per second (over 26000 frames), with each frame containing the pose of a 24-joint skeleton and the position of the tip of both swords. The data is split leaving 80% for training and the rest for evaluation.
Defence animation is generated one frame at a time. The model continuously predicts the next character pose using the current pose and the perceived control information from the user. In summary, the collection of features used as input is:
Claim 12's ''training one or more parameters of the first neural network responsive to receiving the feedback on behavior of the first NPC.'' is anticipated by Dehesa, et al., page 6, left column, second full paragraph, where it recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Further, it is anticipated by Dehesa, et al., page 5, right column, first full paragraph, where it recites:
Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user. Attacks, on the other hand, are initiated by the character by request of the behaviour planner, which also indicates the kind of attack to perform. For this, we can simply use a collection of animation clips that are played out as necessary. This simplifies the system and gives designers precise control of what happens when the character attacks the user. The animation synthesis system smoothly blends between the machine learning model and the clips, fading the weight of each one in and out over a short period of time as the character transitions between attack and defence, so the overall animation appears as a continuous action.
For the defence animation synthesis, we start by collecting a set of motion data to train the model. We used Vicon Bonita equipment to record several sword fighting attacks and blocks at different angles. We produced about 15 minutes of training data in total, recorded at 30 frames per second (over 26000 frames), with each frame containing the pose of a 24-joint skeleton and the position of the tip of both swords. The data is split leaving 80% for training and the rest for evaluation.
Defence animation is generated one frame at a time. The model continuously predicts the next character pose using the current pose and the perceived control information from the user. In summary, the collection of features used as input is:
Claim 15
Claim 15's ''receive indications of movement of a player controlled by a user playing a video game application;'' is anticipated by Dehesa, et al., page 1, caption for Figure 1(images on the right), where it recites:
Figure 1: Left: Our framework splits the problem of simulating interactive VR sword fighting characters into a “physical” level, relying on data-driven models, and a “semantic” level, where designers can configure the behaviour of the character. Right: The framework generates responsive animations against player attacks (top), avoiding nonreactive behaviour from the character (bottom left). A neural network parameterised by the position of the player’s sword synthesises the animation (bottom right).
Claim 15's “program a first distance from the player and a second distance from the player, wherein the second distance is greater than the first distance; and” is anticipated by Dehesa, et al., page 7, right column, Figure 4, where it shows distance differences between the bones of different characters.
Claim 15's “implement a movement scheme for a first non-player character (NPC) indicating that the first NPC is to remain within a region created as a result of a difference between the first distance and the second distance from the player as the player moves; and;'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Further, the distance differences are anticipated by Dehesa, et al., page 7, right column, Figure 4, where it shows distance differences between the bones of different characters.
Claim 15's ''a rendering engine configured to render the first NPC into a user interface (UI) alongside the player following the movement scheme enforced by the first neural network.'' is anticipated by Dehesa, et al., page 5, left column, first full paragraph, where it recites:
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
Further, the claimed neural network is anticipated by Dehesa, et al., page 6, left column, second full paragraph, where it recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Claim 19
Claim 19's ''receive feedback on whether behavior of the first NPC is appropriate; and'' is anticipated by Dehesa, et al., page 6, right column, fourth full paragraph, where it recites:
Gesture Recognition
We can assess the accuracy of the gesture recognition system by measuring the mispredictions of the model. However, we also want to have an understanding of the cases in which those mispredictions are more frequent. We therefore analysed the errors of the neural network per gesture and also across the duration of each gesture. In particular, we want to make sure that gesture classification is more reliable in the middle sections of a gesture, allowing more room for error at the beginning and end, where the boundary of the gesture may not be as precisely defined.
Further, it is anticipated by Dehesa, et al., page 5, right column, first full paragraph, where it recites:
Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user. Attacks, on the other hand, are initiated by the character by request of the behaviour planner, which also indicates the kind of attack to perform. For this, we can simply use a collection of animation clips that are played out as necessary. This simplifies the system and gives designers precise control of what happens when the character attacks the user. The animation synthesis system smoothly blends between the machine learning model and the clips, fading the weight of each one in and out over a short period of time as the character transitions between attack and defence, so the overall animation appears as a continuous action.
For the defence animation synthesis, we start by collecting a set of motion data to train the model. We used Vicon Bonita equipment to record several sword fighting attacks and blocks at different angles. We produced about 15 minutes of training data in total, recorded at 30 frames per second (over 26000 frames), with each frame containing the pose of a 24-joint skeleton and the position of the tip of both swords. The data is split leaving 80% for training and the rest for evaluation.
Defence animation is generated one frame at a time. The model continuously predicts the next character pose using the current pose and the perceived control information from the user. In summary, the collection of features used as input is:
Claim 19's ''train one or more parameters of a first neural network responsive to receiving the feedback on behavior of the first NPC.'' is anticipated by Dehesa, et al., page 6, left column, second full paragraph, where it recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Further, it is anticipated by Dehesa, et al., page 5, right column, first full paragraph, where it recites:
Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user. Attacks, on the other hand, are initiated by the character by request of the behaviour planner, which also indicates the kind of attack to perform. For this, we can simply use a collection of animation clips that are played out as necessary. This simplifies the system and gives designers precise control of what happens when the character attacks the user. The animation synthesis system smoothly blends between the machine learning model and the clips, fading the weight of each one in and out over a short period of time as the character transitions between attack and defence, so the overall animation appears as a continuous action.
For the defence animation synthesis, we start by collecting a set of motion data to train the model. We used Vicon Bonita equipment to record several sword fighting attacks and blocks at different angles. We produced about 15 minutes of training data in total, recorded at 30 frames per second (over 26000 frames), with each frame containing the pose of a 24-joint skeleton and the position of the tip of both swords. The data is split leaving 80% for training and the rest for evaluation.
Defence animation is generated one frame at a time. The model continuously predicts the next character pose using the current pose and the perceived control information from the user. In summary, the collection of features used as input is:
Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 6, 13-14, and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Dehesa, et al., Touché: Data-Driven Interactive Sword Fighting in Virtual Reality, CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 25–30 April 2020, pp. 1-15, in view of Lee, et al., Precomputing Avatar Behavior From Human Motion Data, Graphical Models, Volume 68, Issue 2, March 2006, Pages 158-174, in their entireties. Specifically:
Claim 6
Claim 6's ''The apparatus as recited in claim 1, wherein the processing circuitry is configured to execute a reinforcement learning application to:'' is not expressly taught by the structure of Dehesa, et al.
It is, however, taught by Lee, et al., page 165, last full paragraph, where it recites:
4.2. Dynamic programming
Computing a control policy is a simple iterative process of sampling states and applying a local update rule to incrementally refine values in the table. On each iteration, we randomly choose a pose s of the avatar and a target e among grid points. The avatar needs to decide which action to take among a set of actions immediately available to the avatar at state s. A greedy policy is to select the one that gains the highest reward in one step. Taking action a brings the avatar to state s0 and the target to a new location e0 (since the location is represented with respect to a local moving coordinate system). According to the greedy policy, the value at (s, e) should be updated to reflect the immediate reward and the value of the next state (s0 , e0 ), using the best available action. This process is called value iteration in reinforcement learning community and can be shown to converge to the optimal values. We repeat the process until all states have been visited dozens of times.
Rationale -- It would have been obvious for one of ordinary skill in the art, as of the effective filing date, to substitute the reinforcement learning system of Lee, et al., for the neural network of Dehesa, et al. because both AI systems are suitable for making the decisions required of the system.
Claim 6's access, from the first neural network, data associated with features based on game scenarios encountered in an environment sequence; and'' is anticipated by Dehesa, et al., page 3, left column, first partial paragraph, where it recites:
Given the limited complexity of our features (two hand sensors and one head sensor), we chose neural networks to implement gesture recognition in our framework for their simplicity and proven effectiveness.
Claim 6's ''select a next action for the first NPC based on the accessed data.'' is anticipated by Dehesa, et al., page 4, left column, third full paragraph, where it recites:
We split the problem into two levels. The physical level represents actual events and information in the virtual world. For our purposes, this means essentially input from the player and character animation. It is difficult to reason directly with this data, as it is mostly 3D geometrical information with little structure. We therefore use data-driven models to project it onto a semantic level, where the information is seen as discrete labels for virtual world events. On the player’s side, a gesture recognition system interprets raw 3D input as sword fighting gestures. It becomes now simple to define custom logic to decide how to react to the actions of the user. This is done by the human-designed behaviour planning module.
Claim 13
Claim 13's ''executing, by the processing circuitry a reinforcement learning engine, to access, from the first neural network, data associated with features based on the game scenarios encountered in an environment sequence; and'' is not expressly taught by the structure of Dehesa, et al.
It is, however, taught by Lee, et al., page 165, last full paragraph, where it recites:
4.2. Dynamic programming
Computing a control policy is a simple iterative process of sampling states and applying a local update rule to incrementally refine values in the table. On each iteration, we randomly choose a pose s of the avatar and a target e among grid points. The avatar needs to decide which action to take among a set of actions immediately available to the avatar at state s. A greedy policy is to select the one that gains the highest reward in one step. Taking action a brings the avatar to state s0 and the target to a new location e0 (since the location is represented with respect to a local moving coordinate system). According to the greedy policy, the value at (s, e) should be updated to reflect the immediate reward and the value of the next state (s0 , e0 ), using the best available action. This process is called value iteration in reinforcement learning community and can be shown to converge to the optimal values. We repeat the process until all states have been visited dozens of times.
Rationale -- It would have been obvious for one of ordinary skill in the art, as of the effective filing date, to substitute the reinforcement learning system of Lee, et al., for the neural network of Dehesa, et al. because both AI systems are suitable for making the decisions required of the system.
Claim 13's ''selecting a next action for the first NPC based on the features.'' is anticipated by Dehesa, et al., page 4, left column, third full paragraph, where it recites:
We split the problem into two levels. The physical level represents actual events and information in the virtual world. For our purposes, this means essentially input from the player and character animation. It is difficult to reason directly with this data, as it is mostly 3D geometrical information with little structure. We therefore use data-driven models to project it onto a semantic level, where the information is seen as discrete labels for virtual world events. On the player’s side, a gesture recognition system interprets raw 3D input as sword fighting gestures. It becomes now simple to define custom logic to decide how to react to the actions of the user. This is done by the human-designed behaviour planning module.
Claim 14
Claim 14's ''receiving a personality score generated based on whether behavior of the first NPC matches an assigned personality and mood; and'' is anticipated by Dehesa, et al., page 3, left column, last full paragraph, where it recites:
For our purposes, we implement a basic behavioural planner based on a simple state machine in our framework, although our approach could support more sophisticated reasoning models to model scenarios that require them.
Claim 14's ''training one or more parameters of the first neural network based on the personality score.'' is anticipated by Dehesa, et al., page 4, right column, first partial paragraph, where it recites:
We trained a neural network as the basis of our recogniser. To this end, we first collected a dataset of gestural actions. This was done in a separate VR environment, where the player was presented with a signal indicating a gesture to perform, which they then did while pressing a button on the hand controller.
Claim 20
Claim 20's ''The system as recited in claim 15, wherein the processor is further to:'' is not expressly taught by the structure of Dehesa, et al.
It is, however, taught by Lee, et al., page 165, last full paragraph, where it recites:
4.2. Dynamic programming
Computing a control policy is a simple iterative process of sampling states and applying a local update rule to incrementally refine values in the table. On each iteration, we randomly choose a pose s of the avatar and a target e among grid points. The avatar needs to decide which action to take among a set of actions immediately available to the avatar at state s. A greedy policy is to select the one that gains the highest reward in one step. Taking action a brings the avatar to state s0 and the target to a new location e0 (since the location is represented with respect to a local moving coordinate system). According to the greedy policy, the value at (s, e) should be updated to reflect the immediate reward and the value of the next state (s0 , e0 ), using the best available action. This process is called value iteration in reinforcement learning community and can be shown to converge to the optimal values. We repeat the process until all states have been visited dozens of times.
Rationale -- It would have been obvious for one of ordinary skill in the art, as of the effective filing date, to substitute the reinforcement learning system of Lee, et al., for the neural network of Dehesa, et al. because both AI systems are suitable for making the decisions required of the system.
Claim 20's ''execute a reinforcement learning application to access, from the first neural network, data associated with features based on the game scenarios encountered in an environment sequence; and'' is not expressly taught by Dehesa, et al.
It is, however, taught by Lee, et al., page 165, last full paragraph, where it recites:
4.2. Dynamic programming
Computing a control policy is a simple iterative process of sampling states and applying a local update rule to incrementally refine values in the table. On each iteration, we randomly choose a pose s of the avatar and a target e among grid points. The avatar needs to decide which action to take among a set of actions immediately available to the avatar at state s. A greedy policy is to select the one that gains the highest reward in one step. Taking action a brings the avatar to state s0 and the target to a new location e0 (since the location is represented with respect to a local moving coordinate system). According to the greedy policy, the value at (s, e) should be updated to reflect the immediate reward and the value of the next state (s0 , e0 ), using the best available action. This process is called value iteration in reinforcement learning community and can be shown to converge to the optimal values. We repeat the process until all states have been visited dozens of times.
Rationale -- It would have been obvious for one of ordinary skill in the art, as of the effective filing date, to substitute the reinforcement learning system of Lee, et al., for the neural network of Dehesa, et al. because both AI systems are suitable for making the decisions required of the system.
Claim 20's ''select a next action for the first NPC based on the accessed data.'' is anticipated by Dehesa, et al., page 4, left column, third full paragraph, where it recites:
We split the problem into two levels. The physical level represents actual events and information in the virtual world. For our purposes, this means essentially input from the player and character animation. It is difficult to reason directly with this data, as it is mostly 3D geometrical information with little structure. We therefore use data-driven models to project it onto a semantic level, where the information is seen as discrete labels for virtual world events. On the player’s side, a gesture recognition system interprets raw 3D input as sword fighting gestures. It becomes now simple to define custom logic to decide how to react to the actions of the user. This is done by the human-designed behaviour planning module.
Allowable Subject Matter
Claims 2-4, 9-11 and 16-18 are allowed. Specifically, for:
Claims 2, 9, and 16 the cited prior art does not teach a score representative of truthfulness of information contained in the output, and wherein the score includes metadata indicating a time when the output was received and information about the first NPC;
Claims 3, 10, and 17 the cited prior art does not teach the second neural network has a different complexity level from the first neural network;
Claims 4, 11, and 18 the cited prior art does not teach “a friend threshold”, nor “a foe threshold”.
Response to Arguments
Applicant's arguments filed 04 SEP 2025 have been fully considered but they are not persuasive. Specifically, Applicant argues:
Argument 1
Claim 1 recites an apparatus comprising circuitry configured to program two distances from a player, a first distance and a second distance that is greater than the first distance, and implement a movement scheme that causes the NPC to remain within a region created by the difference between those two distances as the player moves. The present Office Action maps the recited "region" to Dehesa's "IsClose" condition and maps the "two distances" to Figure 4 of Dehesa. See Office Action at pp. 12-13 (IsClose) and pp. 20-21 (IsClose + Fig. 4). The Office Action further states: "[t]he region in the prior art is '(IsClose)."' and relies again on Figure 4 for the "distance differences." Applicant respectfully submits the cited disclosures of Dehesa are not equivalent to the claim features for at least the following reasons.
Dehesa discloses a single threshold ("IsClose"), not two programmed distances with a region (band/annulus) in which the NPC must remain. Dehesa's behavior diagram and disclosure explicitly describe the character "walking in [the player's] direction until the distance is small enough (IsClose)"-a single threshold to engage, not a bounded region defined by the difference between two programmed distances that is continuously enforced as the player moves.
In the prior art, there are two distances the opponent stays between: 1) the initial location of the character from which it approaches in order to fight; 2) the “isclose” distance at which the characters begin to engage.
In the claims, there is no limit on the claimed “first limit.” It may be as small as zero. There is no limit on the claimed “second limit.” It may be infinite. There is no limit on the difference between the claimed “first limit” and “second limit” other than the second is greater than the first. That difference may be as small as a pixel…or smaller.
Further, regarding the argued limitation “band/annulus,” neither “band” nor “annulus” is in Spec. Not even a “radius” is in the Spec. Nor are they claimed. There is no indication that the claimed distances refer to a circular planar segment.
Applicant's argument is unpersuasive.
The rejections stand.
Argument 2
In addition, Dehesa Figure 4 is not NPC-to-player distance at all. Rather, it defines a pose- difference metric measured as areas between corresponding bones of the same character across poses. Accordingly, it says nothing about programming two distances from the player or maintaining an NPC within an annulus relative to a moving player.
Accordingly, Dehesa does not disclose (i) programming two distances from the player nor (ii) and does not disclose a movement scheme that keeps the NPC within the region created by their difference. For at least these reasons claim 1 is not anticipated by Dehesa.
Claims 8 and 15 are similarly not anticipated by Dehesa.
Neither “band” nor “annulus” is in Spec. Not even a “radius” is in the Spec. Nor are they claimed. There is no indication that the claimed distances refer to a circular planar segment.
Further, Applicant does not limit in the claim what the two distances pertain to. They cannot be limited to an annulus, since no annulus was claimed. Further, the distances do not specify what part of the character from which they are measured and what governs the limits of those distances. The limits may be infinite and the limits may be as close as the end of the character’s weapons or appendages. In the broadest reasonable interpretation of the claim, the distances may be measured to different parts of the same character.
Further, Applicant argues that the claim contains the following limitation:
…a movement scheme that keeps the NPC within the region created by their difference…
That is not a direct quote of the claim and has a different scope than the actual claim. The claim actually recites:
… the first NPC is to remain within a region created as a result of a difference between the first distance and the second distance…
Many regions may be created “as a result of” a difference. In the prior art, the difference pertains to a pose of the user. As a result of that pose, the NPC approaches in one of many ways to interact with the player. The NPC remains within its initial starting point and the “isclose” distance “as a result of” the pose distances.
Again, there is no teaching of an annulus in the Specification. Applicant argues matter that is not in the claims.
Applicant's argument is unpersuasive.
The rejections stand.
Argument 3
Further, Applicant respectfully submits claims 5, 12 and 19 are not anticipated by Dehesa. For example, claim 5 further recites:
"The apparatus as recited in claim 1, wherein the processing circuitry is configured to execute the first neural network to:
receive feedback on whether behavior of the first NPC is appropriate; and
train one or more parameters of the first neural network responsive to receiving the feedback on behavior of the first NPC."
As seen from the above, claim 5 recites receiving feedback on whether behavior of the first NPC is appropriate and training one or more parameters of the first neural network responsive to receiving the feedback. The Office Action cites Dehesa passages about training datasets, evaluation, and misprediction analysis for gesture recognition as being equivalent to the claim features. However, the cited disclosures describe offline model training/evaluation and measuring classification errors. They do not disclose receiving behavioral appropriateness feedback about the NPC's behavior and then training responsive to that feedback as claimed. The claim language ties training to feedback on behavior appropriateness of the NPC, not to dataset curation or error distribution analysis for a recognizer. Accordingly, Dehesa does not anticipate claims 5, 12, or 19 for at least these reasons.
Applicant's argument is conclusory. The prior art cited by Examiner teaches “defence animation synthesis,” which pertains to the animation of the NPC relative to the actions of the player. It trains by feeding back error/appropriateness of its actions. The art cited in the rejection teaches:
…the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user.
Further, Dehesa, et al., page 6, left column, second full paragraph recites:
We chose to use a neural network as a model for this system, given their convenience and their success in comparable problems. We used an architecture inspired by the phase-functioned neural network by Holden et al. [27]. Instead, however, of training a single neural network, we train a collection, each specialised in particular aspects of the problem. We use a fixed network architecture, but train for multiple sets of weights (parameterisations).
Applicant's argument is unpersuasive.
The rejections stand.
Argument 4
Section 103 Rejections
Claims 6-7, 13-14, and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Dehesa in view of Lee, et al., Precomputing Avatar Behavior From Human Motion Data, Graphical Models, Volume 68, Issue 2, March 2006, Pages 158-174, in their entireties ("Lee"). However, Applicant disagrees and submits the claims are patentable over the proposed combination for at least the following reasons.
Claim 6 recites the apparatus as recited in claim 1,
"wherein the processing circuitry is configured to execute a reinforcement learning application to:
access, from the first neural network, data associated with features based on game scenarios encountered in an environment sequence; and
select a next action for the first NPC based on the accessed data."
Claims 6 recites circuitry configured to execute a reinforcement-learning (RL) to access, from the first neural network, data associated with features (based on game scenarios encountered in an environment sequence) and selecting a next action based on those features. The Office Action acknowledges Dehesa does not teach these features and suggests substituting Lee's RL/value- iteration for Dehesa's neural network "because both AI systems are suitable for making the decisions required." See Office Action rationale at pp. 27, 31, 33. However, the proposed rationale replaces the NN with RL, which contradicts the claimed cooperation where the RL engine accesses data from the first NN. The Office Action cites no teaching (in Lee or elsewhere) of an RL module consuming features produced by Dehesa's first NN, and offers no reasoned motivation to retain the NN while adding RL in the claimed manner.
An obviousness rejection on under 35 U.S.C. 103 requires a reasoned explanation to modify/arrange the references to meet the claim as a whole. Here, the Office Action's "substitute RL for NN" rationale not only lacks such an explanation but runs contrary to the claimed architecture that uses both (NN producing feature data that feeds RL). The cited Dehesa passages further confirm that decision logic is human-designed "behaviour planning" plus a recognizer and animation model. There is no RL or NN to RL handoff disclosed or suggested. Accordingly, for at least these reasons, claims 6, 13, and 20 are not rendered obvious by the cited combination of art.
Applicant states that the structure formed by the 35 U.S.C. § 103 rejection fails to contain “…an (sic.) RL module consuming features produced by Dehesa's first NN…”
By saying “…Dehesa's first NN…,” Applicant admits that Dehesa contains more than one neural network. Actually, there are two sets of cascaded neural networks under the following two categories: 1) gesture recognition and 2) animation synthesis. One of the many places in Dehesa that teach this is Dehesa, page 6, right column, first full paragraph, number 3, where it recites:
“3. The gesture recognition and animation synthesis models are trained as described, completing the physical level of the framework.”
The second model set (i.e., the “animation synthesis” neural network) may be substituted by reinforcement learning, as discussed in the rationale of the rejection.
Applicant's argument is unpersuasive.
The rejections stand.
Argument 5
In addition to the above, claim 7 recites further features neither disclosed nor suggested by the cited combination of art. Claim 7 recites the features "wherein for a given first distance and second distance, the first NPC is permitted to move both closer to, and further from, the player while remaining within the region." At least the features directed to moving closer and farther while remaining within the region are missing. Claim 7 recites that, for a given first and second distance, the NPC is permitted to move both closer to and farther from the player while remaining within the region. The Office Action cites Lee's crowd simulation ("approach-the-target"/"avoid- the-obstacle") as being equivalent to this. However, Lee's behaviors are goal-seeking and obstacle- avoiding; they do not disclose maintaining the NPC within a region defined by two programmed distances from the player as the player moves.
The claim actually recites:
… the first NPC is to remain within a region created as a result of a difference between the first distance and the second distance…
Many regions may be created “as a result of” a difference. In the prior art, the difference pertains to a pose of the user. As a result of that pose, the NPC approaches in one of many ways to interact with the player. The NPC remains within its initial starting point and the “isclose” distance “as a result of” the pose distances.
Again, there is no teaching of an annulus in the Specification. Applicant argues matter that is not in the claims.
Applicant's argument is unpersuasive.
The rejections stand.
Argument 6
Further, Dehesa never supplied the prerequisite two programmed distances/region in the first place. Thus, the combination still lacks the claimed "remain-within-region" behavior. Accordingly, claim 7 is distinguishable from the cited combination of art for at least these reasons.
The claim actually recites:
… the first NPC is to remain within a region created as a result of a difference between the first distance and the second distance…
Many regions may be created “as a result of” a difference. In the prior art, the difference pertains to a pose of the user. As a result of that pose, the NPC approaches in one of many ways to interact with the player. The NPC remains within its initial starting point and the “isclose” distance “as a result of” the pose distances.
Again, there is no teaching of an annulus in the Specification. Applicant argues matter that is not in the claims.
Applicant's argument is unpersuasive.
The rejections stand.
Argument 7
Claim 14 recites receiving a personality score based on whether the NPC's behavior matches an assigned personality and mood, and training NN parameters based on the personality score. The Office Action maps this to (i) Dehesa's generic statement about using a simple behavior planner and (ii) Dehesa's generic recognizer training procedure. However, neither teaches a personality score, nor training based on that score. Nothing in Lee remedies this deficiency. The specific "score" construct and its use for training the NN parameters is not found in the art cited.
In view of the foregoing, Applicant submits the application is now in condition for allowance, and an early notice to that effect is requested.
Applicant's Specification does not define a “personality score.” The prior art has “scores” that are used in behavior planning that are anticipating of the argued “personality score.” Specifically, they are two scores that are related to defensive actions of the character. Dehesa, et al., page 4, right column, last partial paragraph and Dehesa, et al., page 5, entire left column, and Dehesa, et al., page 5, right column, first partial paragraph, where it recites:
Behaviour Planning
The behaviour planning module is in charge of determining the actions that the virtual character shall perform, given the identified actions of the player. Unlike the other two modules, which are based on data-driven processes, this one is human-designed, albeit at a very high level. The behaviour planning module implements the state diagram shown in fig. 2.
Initially, the character is at a distance from the player and starts walking in their direction until the distance is small enough (IsClose), from where it can either defend or attack. While defending, the character will simply try to put their sword across the trajectory of the player’s sword. There are two circumstances that puts the character into attacking state: if no strikes from the user are detected for some random amount of time (RandomTimer), the character will proactively initiate an attack, or when a strike from the player is detected (PlayerAttacked), the character may react by counterattacking with a strike following the same direction, in an attempt to hit the unguarded area. For example, if the player performs a strike from right to left, then the right-hand side will be left unprotected, so the character may try to strike there.
When the character is attacking, it will usually continue the attack until it is completed (AttackFinished), and then come back to defend again, but the attack may also be aborted early. This will always happen when the player hits the character in the middle of an attack (HitByPlayer). But the character may also interrupt an attack willingly if an incoming strike from the player is detected (PlayerAttacking), quickly coming back to defending in an attempt to block the attack.
Though simple, this model exposes a few adjustable parameters that a designer may tune to direct the behaviour of the character. These define how exactly these transitions should take place. The first parameter is the average attack rate (s−1), which regulates the aggressiveness of the character. This rate regulates the random time that the character waits between attacks (RandomTimer), sampled from an exponential distribution. The second parameter is the probability that the character reacts to an attack from the player with a counterattack (PlayerAttacked).
This leads to a third parameter determining the
average reaction time (s) between when a player’s strike is completed and a counterattack is actually started. The actual delay is also sampled from an exponential distribution. Finally, a fourth parameter expresses the probability that an attack by the character is aborted due to an incoming attack from the player (PlayerAttacking).
Further, the use of the score to train a model is taught at Dehesa, et al., page 5, right column, first two full and last partial paragraphs whedre it recites:
Animation Synthesis
The animation synthesis module is responsible for determining the pose that the virtual character shall adopt in each frame. This module is driven by the directives emitted by the behaviour planning module, and can be seen as a translator of these directives into “physical” actions. Animation synthesis is also a data-driven component: it does not use complex state machines, but rather offers a menu of actions it may perform and acts them out, depending on the context. There are two kinds of actions that the model may perform: defending and attacking. Defending is the more complicated, as it is reactive and depends on what the player does. Therefore, the defence animation synthesis uses a machine learning model capable of generating the motion necessary to block arbitrary strikes from the user. Attacks, on the other hand, are initiated by the character by request of the behaviour planner, which also indicates the kind of attack to perform. For this, we can simply use a collection of animation clips that are played out as necessary. This simplifies the system and gives designers precise control of what happens when the character attacks the user. The animation synthesis system smoothly blends between the machine learning model and the clips, fading the weight of each one in and out over a short period of time as the character transitions between attack and defence, so the overall animation appears as a continuous action.
For the defence animation synthesis, we start by collecting a set of motion data to train the model. We used Vicon Bonita equipment to record several sword fighting attacks and blocks at different angles. We produced about 15 minutes of training data in total, recorded at 30 frames per second (over 26000 frames), with each frame containing the pose of a 24-joint skeleton and the position of the tip of both swords. The data is split leaving 80% for training and the rest for evaluation.
Defence animation is generated one frame at a time. The model continuously predicts the next character pose using the current pose and the perceived control information from the user. In summary, the collection of features used as input is:…
Applicant's argument is unpersuasive.
The rejections stand.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiries concerning this communication or earlier communications from the examiner should be directed to Wilbert L. Starks, Jr., who may be reached Monday through Friday, between 8:00 a.m. and 5:00 p.m. EST. or via telephone at (571) 272-3691 or email: Wilbert.Starks@uspto.gov.
If you need to send an Official facsimile transmission, please send it to (571) 273-8300.
If attempts to reach the examiner are unsuccessful the Examiner’s Supervisor (SPE), Kakali Chaki, may be reached at (571) 272-3719.
Hand-delivered responses should be delivered to the Receptionist @ (Customer Service Window Randolph Building 401 Dulany Street, Alexandria, VA 22313), located on the first floor of the south side of the Randolph Building.
Finally, information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Moreover, status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have any questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) toll-free @ 1-866-217-9197.
/WILBERT L STARKS/
Primary Examiner, Art Unit 2122
WLS
15 DEC 2025