Last updated: April 19, 2026
Application No. 17/822,227
MACHINE LEARNING DEVICE, MACHINE LEARNING METHOD, AND COMPUTER PROGRAM PRODUCT

Final Rejection §101§103
Filed
Aug 25, 2022
Examiner
CHOI, YUK TING
Art Unit
2164
Tech Center
2100 — Computer Architecture & Software
Assignee
Kabushiki Kaisha Toshiba
OA Round
2 (Final)
Interview Optional

— +37.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 652 resolved cases, 2023–2026
Examiner Intelligence

CHOI, YUK TING View full profile →
Grants 72% — above average
Career Allow Rate
466 granted / 652 resolved
+16.5% vs TC avg
Strong +37% interview lift
Without
With
+37.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
29 currently pending
Career history
681
Total Applications
across all art units
Statute-Specific Performance

§101
16.8%
-23.2% vs TC avg
§103
55.0%
+15.0% vs TC avg
§102
13.5%
-26.5% vs TC avg
§112
6.8%
-33.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 652 resolved cases
Office Action

§101 §103
2DETAILED ACTION
  			
 1.	This office action is in response to applicant’s communication filed on 09/10/2025 in response to PTO Office Action mailed on 06/10/2025.  The Applicant’s remarks and amendments to the claims and/or the specification were considered with the results as follows.  
2.	In response to the last Office Action, claims 1-9 and 17 are amended. Claim 25 is added.  As a result, claims 1-25 are pending in this office action.
  				     Response to Arguments
3.	Applicant’s argument with respect to 101 rejections have been fully considered but are not persuasive and the details are as follow:
	Applicant’s argument stated as “Claim 1 recites an ordered combination that uses usually derived control information from the discount rate corrected using the travel distance and a learned control policy…This is a specific technical improvement to controlling control target points…a robot, by using a corrected discount rate in accordance with a travel amount of the control target point providing a specific improvement over conventional system”.
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., controlling control target points of a robot ) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).  Claim 1 is directed to the abstract idea of a machine learning method of outputting control information. The claim is recited at a high level of generality and add no more to the claimed invention than a computer that perform an abstract idea.  The additional feature merely uses a computer/device as a tool to generate result data after a series of data gathering steps. The output of result data and data gathering steps are insignificant extra-solution activity, thus, the judicial exception is not integrated into a practical application.  The additional limitation does not appear to be improvements to the functioning of a computer or to any other technology or technical field.  Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. Therefore, the 35 USC 101 rejection is maintained. 
	Applicant's arguments with respect to 103 rejections have been fully considered but are moot in view of new ground(s) of rejection.

Claim Rejections - 35 USC § 101
4.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
	Claims 1-25 are rejected under 35 U.S.C 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more.  Claim 1 is directed to the abstract idea of a machine learning method of outputting control information, as explained in detail below. The claims do not include elements that are sufficient to amount to significantly more than the judicial exception because the elements can be concepts performed in the human mind which do not add meaningful limits to practicing the abstract idea. 
Claim 1 recites a machine learning system comprising at least in part:
processing circuity configured to: 
acquiring observation information including information on a speed of a control target point at a control target time (e.g., observing information on a speed of a control point at a specific time can be performed in the human mind using pen and paper);
calculating a reward for the observation information; calculated a corrected discount rate obtained by correcting a distance rate of the reward in accordance with a travel distance of the control target point represented by the observation information, wherein the travel distance is a distance along a target trajectory and the corrected discount rate is a parameter used for evaluating the reward earned in the at a greater discount (e.g., computing a reward based on the observed information and computing a discount rate using the observed information can be performed in the human mind including observation and evaluation with mathematical calculations);
learning a control policy by reinforcement learning from the observation information, the reward, and the corrected discount rate (e.g., learning a policy from the observed information and obtained calculations can be performed in the human mind including observation and evaluation with mathematical calculations); and outputting control information including information on speed control of the control target point that is determined in accordance with the observation information and the control policy (e.g., outputting control information including the learned policy and the observations/calculations can be performed in the human mind including observation, evaluation and judgement using pen and paper).
	Claim 1 as it is recited falls within two of the groupings of abstract ideas [e.g., mental process and mathematical concepts] enumerated in the 2019 PEG. The recited concept can be performed using mathematical calculations in human mind including an observation, evaluation, judgement, opinion.  That is, other than reciting a machine learning method, nothing in the claim elements preclude the step from practically being performed in the mind.  The claim is recited at a high level of generality and add no more to the claimed invention than a computer that perform an abstract idea.  The additional feature merely uses a computer/device as a tool to generate result data after a series of data gathering steps. The output of result data and data gathering steps are insignificant extra-solution activity, thus, the judicial exception is not integrated into a practical application.  The additional limitation does not appear to be improvements to the functioning of a computer or to any other technology or technical field.  Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology.  Their collective functions merely provide conventional computer implementation. Therefore, claim 1 is not patent eligible. 
Claims 2-8 and 25 are similar to claim 1 as falls within two of the groupings of abstract ideas [e.g., mental process and mathematical concepts] enumerated in the 2019 PEG. The recited concept can be performed using mathematical calculations in human mind including an observation, evaluation, judgement, opinion.  Claims 2-8 and 25 merely use a computer/device as a tool to generate result data after a series of data gathering and data calculations. The output of result data and data gathering steps are insignificant extra-solution activity, thus, the judicial exception is not integrated into a practical application.  Claims 2-8 and 25 do not appear to be improvements to the functioning of a computer or to any other technology or technical field.  Thus, taken alone, the features recited in claims 2-8 and 25 do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology.  Their collective functions merely provide conventional computer implementation. Therefore, claims 2-8 and 25 are not patent eligible. 
Claim 9 recites a method comprising at least in part: 
acquiring observation information including information on a speed of a control target point at a control target time (e.g., observing information on a speed of a control point at a specific time can be performed in the human mind using pen and paper);
first calculating a reward for the observation information; second calculating a corrected discount rate obtained by correcting a distance rate of the reward in accordance with a travel distance of the control target point represented by the observation information, wherein the travel distance is a distance along a target trajectory and the corrected discount rate is a parameter used for evaluating the reward earned in the at a greater discount (e.g., computing a reward based on the observed information and computing a discount rate using the observed information can be performed in the human mind including observation and evaluation with mathematical calculations);
learning a control policy by reinforcement learning from the observation information, the reward, and the corrected discount rate (e.g., learning a policy from the observed information and obtained calculations can be performed in the human mind including observation and evaluation with mathematical calculations); and outputting control information to a control target having the control target point including information on speed control of the control target point that is determined in accordance with the observation information and the control policy and controlling the control target point according to the control information (e.g., outputting control information including the learned policy and the observations/calculations can be performed in the human mind including observation, evaluation and judgement using pen and paper).
	Claim 9 as it is recited falls within two of the groupings of abstract ideas [e.g., mental process and mathematical concepts] enumerated in the 2019 PEG. The recited concept can be performed using mathematical calculations in human mind including an observation, evaluation, judgement, opinion.  That is, other than reciting a machine learning method, nothing in the claim elements preclude the step from practically being performed in the mind.  The claim is recited at a high level of generality and add no more to the claimed invention than a computer that perform an abstract idea.  The additional feature merely uses a computer/device as a tool to generate result data after a series of data gathering steps. The output of result data and data gathering steps are insignificant extra-solution activity, thus, the judicial exception is not integrated into a practical application.  The additional limitation does not appear to be improvements to the functioning of a computer or to any other technology or technical field.  Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology.  Their collective functions merely provide conventional computer implementation. Therefore, claim 9 is not patent eligible. 
Claims 10-15 are similar to claim 9 as falls within two of the groupings of abstract ideas [e.g., mental process and mathematical concepts] enumerated in the 2019 PEG. The recited concept can be performed using mathematical calculations in human mind including an observation, evaluation, judgement, opinion.  Claims 10-15 merely use a computer/device as a tool to generate result data after a series of data gathering and data calculations. The output of result data and data gathering steps are insignificant extra-solution activity, thus, the judicial exception is not integrated into a practical application.  Claims 10-15 do not appear to be improvements to the functioning of a computer or to any other technology or technical field.  Thus, taken alone, the features recited in claims 10-15 do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology.  Their collective functions merely provide conventional computer implementation. Therefore, claims 10-15 are not patent eligible. 
	Claim 16 recites similar features as claim 9, are also fall within the mental processes abstract ideas enumerated in the 2019 PEG. The recited concept can be performed in human mind including an observation, evaluation, judgement, opinion. Claim 16 recites additional limitation includes displaying correspondence information indicating a correspondence between the corrected discount rate and the travel rate is mere instruction to implement an abstract idea on a computer and merely uses a computer as a tool to display data after a series of data gathering steps to collect necessary inputs to perform an abstract idea.  The displaying and data gathering steps are insignificant extra-solution activity, the judicial exception is not integrated into a practical application. Further, the additional element (or combination of elements) is well-understood, routine or conventional activity “Presenting offers and gathering statistics,” OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93 from MPEP 2106.05(d)(II). For these reasons, there is no inventive concept in the claims 16 and thus, it is ineligible.

Claim 17 recites a computer program product comprising at least in part: 
acquiring observation information including information on a speed of a control target point at a control target time (e.g., observing information on a speed of a control point at a specific time can be performed in the human mind using pen and paper);
first calculating a reward for the observation information; second calculating a corrected discount rate obtained by correcting a distance rate of the reward in accordance with a travel distance of the control target point represented by the observation information, wherein the travel distance is a distance along a target trajectory and the corrected discount rate is a parameter used for evaluating the reward earned in the at a greater discount (e.g., computing a reward based on the observed information and computing a discount rate using the observed information can be performed in the human mind including observation and evaluation with mathematical calculations);
learning a control policy by reinforcement learning from the observation information, the reward, and the corrected discount rate (e.g., learning a policy from the observed information and obtained calculations can be performed in the human mind including observation and evaluation with mathematical calculations); and outputting control information to a control target having the control target point including information on speed control of the control target point that is determined in accordance with the observation information and the control policy and controlling the control target point according to the control information (e.g., outputting control information including the learned policy and the observations/calculations can be performed in the human mind including observation, evaluation and judgement using pen and paper).
	Claim 17 as it is recited falls within two of the groupings of abstract ideas [e.g., mental process and mathematical concepts] enumerated in the 2019 PEG. The recited concept can be performed using mathematical calculations in human mind including an observation, evaluation, judgement, opinion.  That is, other than reciting a machine learning method, nothing in the claim elements preclude the step from practically being performed in the mind.  The claim is recited at a high level of generality and add no more to the claimed invention than a computer that perform an abstract idea.  The additional feature merely uses a computer/device as a tool to generate result data after a series of data gathering steps. The output of result data and data gathering steps are insignificant extra-solution activity, thus, the judicial exception is not integrated into a practical application.  The additional limitation does not appear to be improvements to the functioning of a computer or to any other technology or technical field.  Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitation as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology.  Their collective functions merely provide conventional computer implementation. Therefore, claim 17 is not patent eligible. 
				Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1, 2, 7-10, 15-18, 23 and 24 are rejected under 35 U.S.C. 103 as being unpatentable by Kanemaru (US 2018/0356793 A1) and in view of Sagasaki (2022/0043426 A1) and further in view of Tajima (DE 102018214276 A1).
	Referring to claims 1, 9 and 17, Kanemaru discloses a machine learning system (See para. [0002], para. [0041] and Figures and 3, a controller device performs arithmetic processes in parallel at multiple operation units to control drive of a machine tool, the controller device comprises a processor including multiple cores configured to perform machine learning) comprising: acquire observation information (See para. [0125], para. [0126] and Figure 6, in step S11, the state information acquisition unit acquires state information) including information on a speed of a control target point at a control target time (See para. [0003], para. [0004] and para. [0016], the state information of a machine tool can be based on real-time control over a position of a speed of each axis in a machine tool in a fixed cycle or a fixed period of time); 
calculate a reward for the observation information (See Figure 6, in steps s13-s15, acquires the state information s’ in the new state s’ from the controller device and calculates a reward value based on the determined information in the state information s’); 
calculate a corrected discount rate obtained by correcting a discount rate of the reward in accordance with a travel distance of the control target point represented by the observation information (See para. [0058], para. [0083]-para. [0085], the controller device computes a discount rate y to maximize a reward, the discount rate is computed as Q(s, a)=E[Σ(γ.sup.t)r.sub.t]. Here, E[ ] is an expected value, t is time, γ is a parameter called a discount rate described later, r.sub.t is a reward at time t, and Σ is a total at the time t. An expected valued obtained from this formula is an expected value resulting from a change of state [e.g., a movement distance of a tool] generated by an optimal behavior, note in para. [0069], the controller device is responsible for a control process to be performed in a fixed cycle or a fixed period of time for achieving real time control over the position or speed of an axis);
 a learning module configured to learn a control policy by reinforcement learning from the observation information, the reward, and the corrected discount rate (See para. [0131]-para. [0133] and Figure 6, step s20, the learning unit 120 is configured to perform reinforcement learning based on the determination information in the state information and the reward, note in para. [0083]-para. [0085], the reward also includes a discount rate y, the learning unit 120 determines whether or not a condition for finishing the reinforcement learning has been satisfied, the reinforcement learning is to be finished on the condition that the foregoing processes have been repeated a predetermined number of times or repeated for a predetermined period of time); and an output module configured to output control information including information on speed control of the control target point that is determined in accordance with the observation information and the control policy; and controlling the control target point according to the control information (See para. [0135]-para. [0137] and Figures 6 and 7, the optimized behavior information output unit 150 generates optimized behavior information based on a matching condition in the state information acquired in step 31 and the value function Q acquired in step 32).
Kanemaru discloses output control information but does not explicitly disclose including information on speed control of the control target point that is determined in accordance with the observation information and a control policy.
Sagasaki discloses output control information but does not explicitly disclose including information on speed control of the control target point that is determined in accordance with the observation information and a control policy (See para. [0046], para. [0096], the amount of travel calculation outputs the calculated amount of travel of the tool per unit time for each of the drive shafts).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify the output of the Kanemaru to include speed control of a control target point, as taught by Sagasaki. Skilled artisan would have been motivated to include a travel speed for the control information in order to control more behavior of the tool (See Sagasaki, para. [0041]).  In addition, both references (Kanemaru and Sagasaki) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a machine learning model to control a device or a tool. This close relation between all references highly suggests an expectation of success.
Kanemaru in view of Sagasaki does not explicitly disclose the travel distance is a distance along a target trajectory and the corrected discount rate is a parameter used for evaluated the reward earning in the future at a greater discount.
Tajima discloses the travel distance is a distance along a target trajectory  The rotation amount detection unit 222 refers to the amount of rotation (See Figure 2, the travel distance during acceleration) from the beginning of the acceleration rotation of the spindle motor 101 until reaching the maximum speed V0 , The travel distance at acceleration Sa is an integrated value of an integral (v * t) of the rotational speed v of the spindle motor and the time t from the beginning of the rotation of the spindle motor 101 and the acceleration until reaching the maximum speed V0 , When the remaining rotation amount Sr of the spindle motor 101 by the residual rotation amount detection unit 223 is detected corresponds to the travel distance at acceleration (the rotation amount at acceleration) Sa, causes the positioning operation control unit 225 a deceleration rotation by setting an acceleration of deceleration so that a tip end of the tool stops at the target screw depth. The acceleration of the deceleration is obtained by the residual rotation amount Sr and the actual speed Vc. A rotational period tr of a deceleration is obtained by (the residual rotational amount Sr) / (the actual rotational speed Vc), and the deceleration acceleration is obtained by (the actual rotational speed Vc) / (the rotational period Tr of the deceleration). The travel distance at deceleration Sd is an integrated value of an integral (v * t) of the rotational speed v of the spindle motor and the time t from the beginning of the deceleration rotation of the spindle motor 101 until the spindle motor stops 101 , When a transition from the acceleration rotation to the deceleration rotation is performed before the actual speed Vc of the spindle motor 101 reaches the maximum speed, the trajectory at acceleration to make the transition to the deceleration rotation is the trajectory at deceleration, that is, the residual rotation amount Sr. The positioning operation control unit 225 can gain acceleration of deceleration similar to above) and the corrected discount rate is a parameter used for evaluated the reward earning in the future at a greater discount (See para. [0066], Figure 3, the agent then attempts to achieve Q (S, A) = E [Σ (γ .sup.t ) r .sub.t ] in order to maximize the total reward that can be achieved in the future. E [] represents an expected value, t represents a time, γ represents a parameter described below, which is called a discount rate, r .sub.t represents a reward at time t, and Σ represents the sum at time t. The expected value in this formula is an expected value in a case where the state is changed according to the optimal action. However, the optimal action is not apparent in the process of Q-Learning. Therefore, the agent takes various actions to perform the reinforcing learning while searching. An updating formula of such a value function Q (S, A) can be represented by, for example, the following formula 2 (which will be hereinafter referred to as formula 2).Q(St+1.At+1)←Q(St.At)+α(rt+1+γmaxAQ(St+1.A)-Q(St.At)).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify the travel distance to include a distance along a target trajectory and the corrected discount rate is a parameter used for evaluated the reward earning in the future at a greater discount, as taught by Tajima. Skilled artisan would have been motivated to observes an environmental condition and learns to select an action to maximize the total reward in the future (See Tajima, para. [0062]).  In addition, both references (Tajima, Kanemaru and Sagasaki) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a machine learning model to control a device or a tool. This close relation between all references highly suggests an expectation of success.
As to claims 2, 10 and 18, Kanemaru discloses wherein the learning module learns the control policy, based on experience data in which at least the corrected discount rate and the reward are associated with each other (See para. [0079], para. [0080], para. [0084], the state of the environment at time t, and at is a behavior at the time t. A state is changed to s.sub.t+1 by the behavior a.sub.t. Further, r.sub.t+1 is a reward given by this change of state. A term with max shows a value obtained by multiplying a Q value by γ. This Q value is determined by selecting a behavior a known to result in the highest Q value in the state s.sub.t+1. Here, γ is a parameter in a range of 0<γ≤1 and is called a discount rate. Further, a is a learning coefficient and is in a range of 0<α≤1).
As to claims 7, 15 and 23, Kanemaru discloses wherein the second calculation module is configured to calculate the corrected discount rate obtained by correcting the discount rate in accordance with an input corrected discount rate for an input travel distance, input of which has been accepted, in accordance with the travel distance (See para. [0058], para. [0083]-para. [0085], to maximize a reward, the controller 200 computes a discount rate y, the discount rate is computed as Q(s, a)=E[Σ(γ.sup.t)r.sub.t]. Here, E[ ] is an expected value, t is time, γ is a parameter called a discount rate described later, r.sub.t is a reward at time t, and Σ is a total at the time t. An expected valued obtained from this formula is an expected value resulting from a change of state [e.g., a movement distance of a tool] generated by an optimal behavior, note in para. [0069], the controller 200 is responsible for a control process to be performed in a fixed cycle or a fixed period of time for achieving real time control over the position or speed of an axis).
As to claims 8, 16 and 24, Kanemaru discloses display control module configured to display correspondence information indicating a correspondence between the corrected discount rate and the travel distance (See para. [0083], para. [0084], para. [0085], updating a value Q(s.sub.t, a.sub.t) of the behavior a.sub.t in the state s.sub.t based on the reward r.sub.t+1 given in response to doing the behavior a.sub.t tentatively. This update formula shows that, if a best behavior value max.sub.a Q(s.sub.t+1, a) determined by the behavior a.sub.t in the subsequent state s.sub.t+1 becomes larger than a value Q(s.sub.t, a.sub.t) determined by the behavior at in the state s.sub.t, Q(s.sub.t, a.sub.t) is increased. Conversely, if the best behavior value max.sub.a Q(s.sub.t+1, a) is smaller than the value Q(s.sub.t, a.sub.t), Q(s.sub.t, a.sub.t) is reduced. In other words, the value of a certain behavior in a certain state is approximated to a best behavior value determined by the same behavior in a subsequent state. A difference between these values is changed by a way of determining the discount rate γ and the reward r.sub.t+1. Meanwhile, the basic mechanism is such that a best behavior value in a certain state is propagated to a behavior value in a state previous to the state of the best behavior value).
As to claim 25, Kanemaru in view of Tajima discloses calculate the corrected discount rate at each of a plurality of control times (See Tajima, FIG. 12 is a graph illustrating a relationship between the speed v of the spindle motor at deceleration and the time t when the ratio of the travel distance at acceleration Sa and a travel distance at deceleration after correction Sd 'is 1: 0.7. A time of a cycle (cycle time) becomes of one-time t1 in a time t2 changed less than the time t1 is. When step S16 ends, processing returns to step S12 back).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify the travel distance to include a distance along a target trajectory and the corrected discount rate is a parameter used for evaluated the reward earning in the future at a greater discount, as taught by Tajima. Skilled artisan would have been motivated to observes an environmental condition and learns to select an action to maximize the total reward in the future (See Tajima, para. [0062]).  In addition, both references (Tajima, Kanemaru and Sagasaki) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a machine learning model to control a device or a tool. This close relation between all references highly suggests an expectation of success.
	Claims 4-6, 12-14 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable by Kanemaru (US 2018/0356793 A1) and in view of Sagasaki (2022/0043426 A1) and Tajima (DE 102018214276 A1) and further in view of Kawai (US 2017/0154283 A1).
As to claims 4, 12 and 20, Kanemaru does not explicitly disclose wherein the first calculation module is configured to calculate a first error between the control target point and a target trajectory using information on a position of the control target point included in the observation information and calculate the reward higher as the first error is smaller.
Kawai discloses wherein the first calculation module is configured to calculate a first error between the control target point and a target trajectory using information on a position of the control target point included in the observation information and calculate the reward higher as the first error is smaller (See para. [0051], the reward calculation unit 21 calculates a reward based on the number of errors between the position command relative to the rotor of the motor which is drive-controlled by the motor control apparatus and an actual position of the rotor. The smaller the number of errors is, the higher reward the reward calculation unit 21 provides while recognizing that the number of corrections used to correct any command of the position command, the speed command, or the current command in the motor control apparatus has a favorable influence. For example, the reward calculation unit 21 may be configured to increase the reward when the number of errors observed by the state observation unit 11 is smaller than the number of errors observed by the state observation unit 11 before the current number of errors, and reduce the reward when larger. Further, for example, the reward calculation unit 21 may be configured to increase the reward when the number of errors observed by the state observation unit 11 is inside a specified range, and to reduce the reward when the number of errors is outside the specified range. The specified range may be set as appropriate by the operator, taking into account of various factors, such as the manufacturing cost of the motor and the machine tool, the use environment, or the like).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify the system of the Kanemaru to include calculate a first error between the control target point and a target trajectory using information on a position of the control target point included in the observation information and calculate the reward higher as the first error is smaller, as taught by Kawai. Skilled artisan would have been motivated to update a learning model for calculating the number of errors based on the state variable and the errors in order to output more accurate results (See Kawai para. [0020]).  In addition, all references (Tajima, Kawai, Kanemaru and Sagasaki) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a machine learning model to control a device or a tool. This close relation between all references highly suggests an expectation of success.
	As to claims 5, 13 and 21, Kawai also discloses wherein the first calculation module is configured to set an error calculation target position to a position away from a position of the control target point represented by the observation information by a certain distance or more or a certain time period or more along a trajectory of the control target point, and calculate, as the first error, a second error between the target trajectory and the error calculation target position (See para. [0085] and para. [0122], the machine learning apparatus automatically learns and adjusts the number of corrections relative to a command for minimizing the number of errors between a rotor position command and a rotor actual position so that regardless of changes in the surrounding environment, each number of corrections can be changed in real time and the number of errors can be appropriately minimized).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify the system of the Kanemaru to include calculates errors, as taught by Kawai. Skilled artisan would have been motivated to update a learning model for calculating the number of errors based on the state variable and the errors in order to output more accurate results (See Kawai para. [0020]).  In addition, all references (Kawai, Kanemaru and Sagasaki) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a machine learning model to control a device or a tool. This close relation between all references highly suggests an expectation of success.
As to claims 6, 14 and 22, Kawai discloses wherein the first calculation module is configured to set the error calculation target position to a position away from a position of the control target point represented by the observation information by the certain distance or more or the certain time period, input of which has been accepted, along a trajectory of the control target point (See para. [0085] and para. [0122], the machine learning apparatus automatically learns and adjusts the number of corrections relative to a command for minimizing the number of errors between a rotor position command and a rotor actual position so that regardless of changes in the surrounding environment, each number of corrections can be changed in real time and the number of errors can be appropriately minimized).
Therefore, it would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to modify the system of the Kanemaru to include calculates errors, as taught by Kawai. Skilled artisan would have been motivated to update a learning model for calculating the number of errors based on the state variable and the errors in order to output more accurate results (See Kawai para. [0020]).  In addition, all references (Kawai, Tajima, Kanemaru and Sagasaki) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as training a machine learning model to control a device or a tool. This close relation between all references highly suggests an expectation of success.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUK TING CHOI whose telephone number is (571)270-1637. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, AMY NG can be reached at 5712701698. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YUK TING CHOI/Primary Examiner, Art Unit 2164
Read full office action
Prosecution Timeline

Aug 25, 2022
Application Filed
Jun 06, 2025
Non-Final Rejection — §101, §103
Aug 28, 2025
Applicant Interview (Telephonic)
Aug 28, 2025
Examiner Interview Summary
Sep 10, 2025
Response Filed
Oct 02, 2025
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/667,680
Patent 12591610
SYSTEMS AND METHODS FOR REMOVING NON-CONFORMING WEB TEXT
2y 5m to grant Granted Mar 31, 2026
18/655,533
Patent 12579156
SYSTEMS AND METHODS FOR VISUALIZING ONE OR MORE DATASETS
2y 5m to grant Granted Mar 17, 2026
18/970,934
Patent 12562753
SYSTEM AND METHOD FOR MULTI-TYPE DATA COMPRESSION OR DECOMPRESSION WITH A VIRTUAL MANAGEMENT LAYER
2y 5m to grant Granted Feb 24, 2026
17/846,840
Patent 12536282
METHODS AND APPARATUS FOR MACHINE LEARNING BASED MALWARE DETECTION AND VISUALIZATION WITH RAW BYTES
2y 5m to grant Granted Jan 27, 2026
18/386,546
Patent 12511258
DYNAMIC STORAGE OF SEQUENCING DATA FILES
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+37.4%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 652 resolved cases by this examiner. Grant probability derived from career allow rate.