Office Action Analysis: 17922029 — LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Office Action

§101 §103 §112 §DP
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is made non-final.
Claims 1-13 are pending in the case. Claims 1, 6, and 10 are independent claims.

Priority
Acknowledgement is made of applicant’s claim for domestic benefit of PCT/JP2020/018767 for application 17/922029 filed on 05/11/2020.

Specification
The abstract of the disclosure is objected to because it is merely reciting the claim language, and not a concise statement of the technical disclosure of the patent.  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.

Claim Rejections - 35 USC § 112
Claims 2, and 3 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation "the direct change instruction" in line 3.  There is insufficient antecedent basis for this limitation in the claim.
Claim 3 recites the limitation "explanatory variable" in line 4.  There is insufficient antecedent basis for this limitation in the claim.
For purposes of examination, the examiner notes that “the direct change instruction” will be interpreted as the change instruction. Regarding the “explanatory variables” it will be interpreted as input features for the objective function.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

To determine if a claim is directed to patent ineligible subject matter, the Court has guided the Office to apply the Alice/Mayo test, which requires:
Step 1: Determining if the claim falls within a statutory category.

Step 2A: Determining if the claim is directed to a patent ineligible judicial exception consisting of a law of nature, a natural phenomenon, or abstract idea; and Step 2A is a two prong inquiry. MPEP 2106.04(II)(A). Under the first prong, examiners evaluate whether a law of nature, natural phenomenon, or abstract idea is set forth or described in the claim. Abstract ideas include mathematical concepts, certain methods of organizing human activity, and mental processes. MPEP 2104.04(a)(2). The second prong is an inquiry into whether the claim integrates a judicial exception into a practical application. MPEP 2106.04(d).

Step 2B: If the claim is directed to a judicial exception, determining if the claim recites limitations or elements that amount to significantly more than the judicial exception. (See MPEP 2106).

Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed an abstract idea without significantly more.

Step 1: Claims 1-5 are directed to a learning device (a machine), Claims 6-9 are directed to a method (a process), and Claims 10-13 are directed to a non-transitory computer-readable medium (a manufacture). Therefore, claims 1-13 are directed to a process, machine, manufacture or composition of matter.

Regarding Claim 1 
Step 2A, Prong 1
	Claim 1 recites the following mental processes, that in each case under the broadest reasonable interpretation, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “memory”, “ processor”, and “inverse reinforcement learning”) [see MPEP 2106.04(a)(2)(III)].
	“output the actual change from the second target to the third target as decision making history data” (e.g., a human can compare different time tables and note how the times change over time)
	“learn the objective function using the decision making history data” (e.g., a human can notice time changes from different time tables and understand a trend on those tables)
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A, Prong 2
	The judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of  a “memory”, “ processor”, and “inverse reinforcement learning”, which are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). In particular, the recited “inverse reinforcement learning” is merely a generic computer component, because it is merely recited to perform the function of implementing “objective function” and “optimization result for a first target” the claims do not recite any particular structure for how such “inverse reinforcement learning” is implemented. 
	Regarding the “output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target” which is recited at a high-level of generality and amounts to specifying how the optimized output is generated, indicating a field of use (See MPEP 2106.05(h)).
	Regarding the “output a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user” limitation, this additional element is recited at a high-level of generality and amounts to extra-solution activity of generating output, i.e. post-solution activity of outputting data for use in the claimed system (see MPEP 2106.05(g)). 
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “memory”, “ processor”, and “inverse reinforcement learning”, which are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
Regarding the “output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target” which is recited at a high-level of generality and amounts to specifying how the optimized output is generated, indicating a field of use (See MPEP 2106.05(h)).
Regarding the “output a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user” limitation, as discussed above, the additional element of generating output is recited at a high level of generality and amounts to extra-solution activity of post-solution activity of outputting data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Accordingly, at Step 2B, the additional elements individually or in combination do not integrate the judicial exception into a practical application.


Regarding Claim 2 
Step 2A, Prong 1
	Claim 2 recites the following mental processes, that in each case under the broadest reasonable interpretation, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “memory”, “ processor”, and “inverse reinforcement learning”) [see MPEP 2106.04(a)(2)(III)].
	“accept the direct change instruction from the user for the output second target, and output the resulting target based on the accepted change instruction as the third target” (e.g. a human update information of a table according to edits)
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A Prong 2
	In accordance with Step 2A, Prong 2, the claim does not include any additional elements and the judicial exception is not integrated into a practical application.
Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.  

Regarding Claim 3 
Step 2A, Prong 1
	Claim 3 recites the following mental processes, that in each case under the broadest reasonable interpretation, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “memory”, “ processor”, and “inverse reinforcement learning”) [see MPEP 2106.04(a)(2)(III)].
	“accept the change instruction from the user for the weights of explanatory variables included in the objective function represented by a linear expression” (e.g. a human update weights of values for factors by using mathematical functions)
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A, Prong 2
	The judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of  a “output a third target as a result of changing the second target by optimization using the changed objective function” limitation, this additional element is recited at a high-level of generality and amounts to extra-solution activity of generating output, i.e. post-solution activity of outputting data for use in the claimed system (see MPEP 2106.05(g)). 
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “output a third target as a result of changing the second target by optimization using the changed objective function” limitation, as discussed above, the additional element of generating output is recited at a high level of generality and amounts to extra-solution activity of post-solution activity of outputting data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Accordingly, at Step 2B, the additional elements individually or in combination do not integrate the judicial exception into a practical application.

Regarding Claim 4 
Step 2A, Prong 1
	Claim 4 recites the following mental processes, that in each case under the broadest reasonable interpretation, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “memory”, “ processor”, and “inverse reinforcement learning”) [see MPEP 2106.04(a)(2)(III)].
	“accept the change instruction from the user to add an explanatory variable to the objective function” (e.g. a human update weights of values for factors by using mathematical functions)
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A, Prong 2
	The judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of  a “output a third target as a result of changing the second target by optimization using the changed objective function” limitation, this additional element is recited at a high-level of generality and amounts to extra-solution activity of generating output, i.e. post-solution activity of outputting data for use in the claimed system (see MPEP 2106.05(g)). 
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “output a third target as a result of changing the second target by optimization using the changed objective function” limitation, as discussed above, the additional element of generating output is recited at a high level of generality and amounts to extra-solution activity of post-solution activity of outputting data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
	Accordingly, at Step 2B, the additional elements individually or in combination do not integrate the judicial exception into a practical application.

Regarding Claim 5 
Step 2A, Prong 1
	Claim 5 does not introduce any new abstract ideas, but recites the abstract idea identified in claim 1.
	Accordingly, at Step 2A, prong one, the claim recites an abstract idea.
Step 2A, Prong 2
	The judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of a “learn the objective function including the added explanatory variable” limitation, this additional element is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2A, prong two, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional element of a “learn the objective function including the added explanatory variable” limitation, this additional element is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
	Accordingly, at Step 2B, the additional elements individually or in combination do not integrate the judicial exception into a practical application.


Regarding claims 6-9
	Claims 6-9 recites a method. Which corresponds directly to the system steps recited in claims 1-4, respectively, with the addition of instructions and computer-executable instructions which are insufficient to render the claims subject matter eligible for the same reasons described above.
Specifically:
	Claim 6 corresponds to claim 1, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 1.
	Claim 7 corresponds to claim 2, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 2.
	Claim 8 corresponds to claim 3, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 3.
	Claim 9 corresponds to claim 4, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 4.

Regarding claims 10-13
	Claims 10-13 recites a non-transitory computer readable information recording medium. Which corresponds directly to the system steps recited in claims 1-4, respectively, with the addition of instructions and computer-executable instructions which are insufficient to render the claims subject matter eligible for the same reasons described above.
Specifically:
	Claim 10 corresponds to claim 1, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 1.
	Claim 11 corresponds to claim 2, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 2.
	Claim 12 corresponds to claim 3, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 3.
	Claim 13 corresponds to claim 4, with the added recitation of method steps executing instructions to perform the same abstract system steps of claim 4.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 6 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koseki et al. (US 20180293514 A1, referred to as Koseki), in view of Ziebart et al. ("Maximum entropy inverse reinforcement learning.", referred to as Ziebart), in view of Fails et al. ("Interactive machine learning.", referred to as Fails),in view of Clement et al. ("Design of a deep space network scheduling application.", referred to as Clement).

Regarding claim 1, Koseki teaches, A learning device comprising: 
	a memory storing instructions ([0027]: Describes storage devices to store instructions for the system.); and 
	one or more processors configured to execute the instructions to ([0029-0030]: Describes a system containing one or more processors to assist in execution of the systems instructions.): 
	output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning(FIG. 5, [0046-0057]: Describes creating a reward function, using inverse reinforcement learning , (step 510). After learning T, it computes an optimal policy (step 540), to update the rule engine  (step 550) and then apply it to change the objects state (step 560), which generates an optimized output form the learned objective.) 
	Although Koseki teaches a learning device comprising a memory storing instructions and 
one or more processors configured to execute the instructions to output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning. It does not teach based on decision making history data indicating an actual change to the target and learn the objective function using the decision making history data.
	Ziebart teaches, based on decision making history data indicating an actual change to the target (Page 1-2, Background: Describes a linear reward function from demonstrated trajectories which correspond to historical decision sequences, which is then used to learn the reward function.);
	It would have been obvious to one of ordinary skill in the art at the time of the claimed invention to incorporate the inverse reinforcement system of Koseki, with the history optimization of Ziebart. Doing so would allow for the system to effectively and efficiently adjust model algorithms with historic data generating optimized outputs.
	Although Koseki, in view of Ziebart teaches a learning device comprising a memory storing instructions and one or more processors configured to execute the instructions to output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target. It does not teach output a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user and learn the objective function using the decision making history data.
	Fails teaches, output a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user(Figure 4, Page 2: Describes that a user can do manual classification, from a displayed result, which the system accepts and then outputs an updated result.);
	learn the objective function using the decision making history data (Fails, Figure 4, Page 2: Describes that the system takes in the user corrections which then triggers re-training, and a refreshed result, thus re-learning the model after new user provided data each iteration.).
	It would have been obvious to one of ordinary skill in the art at the time of the claimed invention to incorporate the inverse reinforcement system of Koseki with the history optimization of Ziebart with the user interactions of Fails. Doing so would allow for an interactive system to effectively and efficiently adjust model outputs to generate desired outcomes from a user who knows a preferred and optimized outcome.
	Although Koseki, in view of Ziebart, in view of Fails teaches a learning device comprising a memory storing instructions and one or more processors configured to execute the instructions to output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target and learn the objective function using the decision making history data. It does not output the actual change from the second target to the third target as decision making history data.
	Clement teaches, output the actual change from the second target to the third target as decision making history data (, Page 6, Data Management: Describes that the difference between two schedule states, from the system, can be output and kept as change history/checkpoints tied to the schedule for traceability.) 
	It would have been obvious to one of ordinary skill in the art at the time of the claimed invention to incorporate the inverse reinforcement system of Koseki with the history optimization of Ziebart with the user interactions of Fails with the logging of Clement. Doing so would allow for the interactive system to display and log changes as save points so a user could verify the change data and update the model for preferred optimization output runs.

	Regarding claim 2, Koseki, in view of Ziebart, in view of Fails, in view of Clement, further teaches the learning device according to claim 1, wherein the processor is configured to execute the instructions to 
	accept the direct change instruction from the user for the output second target, and output the resulting target based on the accepted change instruction as the third target (Fails, Page 2, Image Processing With Crayons: Describes the interactive loop with the user edits, displays results which the system re-trains then it will further display the updated feedback immediately, and will repeat this process.).

	Regarding claim 6, which recites substantially the same limitations claim 1. Claim 6 further cites a learning method(Koseki, FIG. 5 and [0032]: Show both a method and detail that the system uses a method of steps to execute the instructions for the system.) to perform the system steps of claim 1, and is therefore rejected on the same premise.

	Regarding claim 7, which recites substantially the same limitations claim 2. Claim 7 further cites a learning method (Koseki, FIG. 5 and [0032]: Discussed above) to perform the system steps of claim 2, and is therefore rejected on the same premise.

	Regarding claim 10, which recites substantially the same limitations claim 1. Claim 10 further cites a non-transitory computer readable information recording medium (Koseki, [0006]: Describes use of a non-transitory computer readable storage medium) to perform the system steps of claim 1, and is therefore rejected on the same premise.

	Regarding claim 11, which recites substantially the same limitations claim 2. Claim 11 further cites a non-transitory computer readable information recording medium (Koseki, [0006]: Discussed above) to perform the system steps of claim 2, and is therefore rejected on the same premise.

Claim(s) 3, 4, 5, 8, 9, 12, and 13  is/are rejected under 35 U.S.C. 103 as being unpatentable over  Koseki, in view of Ziebart, in view of Fails, in view of Clement, in view of Li et al. (“Interactive Machine learning by Visualization: A Small Data Solution”, referred to as Li).	

	Regarding claim 3, Koseki, in view of Ziebart, in view of Fails, in view of Clement teaches the learning device according to claim 1.
	Although Koseki, in view of Ziebart, in view of Fails, in view of Clement teaches the learning device, they do not teach wherein the processor is configured to execute the instructions to accept the change instruction from the user for the weights of explanatory variables included in the objective function represented by a linear expression.
	Li teaches, wherein the processor is configured to execute the instructions to 
	accept the change instruction from the user for the weights of explanatory variables included in the objective function represented by a linear expression (Page 6 Equation 3: “We may express the reward function as a linear sum of weighted features” showing that the reward function/ objective function is a linear equation using weighted variables.; Page 12, Algorithm 1: Describes using these to facilitate training to produce sequential target outputs.), and output a third target as a result of changing the second target by optimization using the changed objective function(The examiner notes that this is a combination of previous teachings of claims 1 and 2 with the added teaching of Li.).

Regarding claim 4, Koseki, in view of Ziebart, in view of Fails, in view of Clement teaches, the learning device according to claim 1.
	Although Koseki, in view of Ziebart, in view of Fails, in view of Clement teaches the learning device, they do not teach wherein the processor is configured to execute the instructions to accept the change instruction from the user for the weights of explanatory variables included in the objective function represented by a linear expression.
	Li teaches, wherein the processor is configured to execute the instructions to 
	accept the change instruction from the user to add an explanatory variable to the objective function (Page 3, Section 3: Describes how a user can input new features or other details into a training system to create optimal training sets, to generate optimized training outputs.), and output a third target as a result of changing the second target by optimization using the changed objective function (The examiner notes that this is a combination of previous teachings of claims 1 and 2 with the added teaching of Li.).
	It would have been obvious to one of ordinary skill in the art at the time of the claimed invention to incorporate the inverse reinforcement system of Koseki with the history optimization of Ziebart with the user interactions of Fails with the logging of Clement, with the function changes of Li. Doing so would allow for the user to update model parameters during runs of the model with the outputs generated, giving the user the ability to generate optimized results faster.

	Regarding claim 5, Koseki, in view of Ziebart, in view of Fails, in view of Clement in view of Li teaches, the learning device according to claim 4, wherein the processor is configured to execute the instructions to
	 learn the objective function (Fails, Figure 4, Page 2: Describes that the system takes in the user corrections which then triggers re-training, and a refreshed result, thus re-learning the model after new user provided data each iteration.).
	Although Fails teaches learning the objective function, it does not include an added explanatory variable.
	Li teaches, including the added explanatory variable(Page 3, Section 3: Describes how a user can input new features or other details into a training system to create optimal training sets, to generate optimized training outputs.).

	Regarding claim 8, which recites substantially the same limitations claim 3. Claim 8 further cites a learning method(Koseki, FIG. 5 and [0032]: Show both a method and detail that the system uses a method of steps to execute the instructions for the system.) to perform the system steps of claim 3, and is therefore rejected on the same premise.

	Regarding claim 9, which recites substantially the same limitations claim 4. Claim 9 further cites a learning method (Koseki, FIG. 5 and [0032]: Discussed above) to perform the system steps of claim 4, and is therefore rejected on the same premise.

	Regarding claim 12, which recites substantially the same limitations claim 3. Claim 12 further cites a non-transitory computer readable information recording medium (Koseki, [0006]: Describes use of a non-transitory computer readable storage medium) to perform the system steps of claim 3, and is therefore rejected on the same premise.

	Regarding claim 13, which recites substantially the same limitations claim 4. Claim 13 further cites a non-transitory computer readable information recording medium (Koseki, [0006]: Discussed above) to perform the system steps of claim 4, and is therefore rejected on the same premise.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-13 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 7-10 of U.S. Patent No. US 20230186099 A1. Although the claims at issue are not identical, they are not patentably distinct from each other because becasue every aspect of teh instanct claims is already covered ion the claims of U.S. Patent No. 20230186099 A1 and are anticipated by the reference claims as follows:
Instant Application
Patent No. 20230186099 
Claim 1
A learning device comprising: 
	a memory storing instructions; and 
	one or more processors configured to execute the instructions to: 
	output a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target; 
	output a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user; 
	output the actual change from the second target to the third target as decision making history data; and 
	learn the objective function using the decision making history data.

Claim 2
The learning device according to claim 1, wherein the processor is configured to execute the instructions to 
	accept the direct change instruction from the user for the output second target, and output the resulting target based on the accepted change instruction as the third target.

Claim 3
The learning device according to claim 1, wherein the processor is configured to execute the instructions to 
	accept the change instruction from the user for the weights of explanatory variables included in the objective function represented by a linear expression, and output a third target as a result of changing the second target by optimization using the changed objective function.

Claim 4 
The learning device according to claim 1, wherein the processor is configured to execute the instructions to 
	accept the change instruction from the user to add an explanatory variable to the objective function, and output a third target as a result of changing the second target by optimization using the changed objective function.

Claim 5
The learning device according to claim 4, wherein the processor is configured to execute the instructions to
	 learn the objective function including the added explanatory variable.  

Claim 6
A learning method comprising: outputting a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target; outputting a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user; outputting the actual change from the second target to the third target as decision making history data; and learning the objective function using the decision making history data.

Claim 7
A learning method according to claim 6 further comprising accepting the direct change instruction from the user for the output second target, and outputting the resulting target based on the accepted change instruction as the third target.  
Claim 7 
The learning device according to any one of claim 1, wherein the processor is configured to execute the instructions to comprising a change target output means which output a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from a user; and output the actual change from the second target to the third target as decision making history data.
Claim 8
A learning method according to claim 6 further comprising accepting the direct change instruction from the user for the output second target, and outputting the resulting target based on the accepted change instruction as the third target.  
Claim 8
A learning method comprising: outputting a plurality of second targets, which are optimization results for a first target using one or more objective functions generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to a target; accepting a selection instruction from a user for a plurality of the output second targets; outputting the actual change from the first target to the accepted second target as the decision making history data; and learning the objective function using the decision making history data.
Claim 9
A learning method according to claim 6 further comprising accepting the change instruction from the user to add an explanatory variable to the objective function, and outputting a third target as a result of changing the second target by optimization using the changed objective function.

Claim 10
A non-transitory computer readable information recording medium storing when executed by a processor, that performs a method for:  outputting a second target, which is an optimization result for a first target using an objective function generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to the target; outputting a third target indicating a target resulting from further changing of the second target based on a change instruction regarding the second target accepted from the user; outputting the actual change from the second target to the third target as decision making history data; and learning the objective function using the decision making history data.  
Claim 10
A non-transitory computer readable information recording medium storing a learning program, when executed by a processor, that performs a method for; outputting a plurality of second targets, which are optimization results for a first target using one or more objective functions generated in advance by inverse reinforcement learning based on decision making history data indicating an actual change to a target; accepting a selection instruction from a user for a plurality of the output second targets; outputting the actual change from the first target to the accepted second target as the decision making history data; and learning the objective function using the decision making history data.
Claim 11
The non-transitory computer readable information recording medium according to claim 10, wherein the direct change instruction from the user for the output second target is accepted, and the resulting target based on the accepted change instruction is output as the third target.

Claim 12
The non-transitory computer readable information recording medium recording according to claim 10, wherein the change instruction from the user for the weights of explanatory variables included in the objective function represented by a linear expression is accepted, and a third target as a result of changing the second target is output by optimization using the changed objective function.

Claim 13
The non-transitory computer readable information recording medium according to claim 10, wherein the change instruction from the user to add an explanatory variable to the objective function is accepted, and a third target as a result of changing the second target is output by optimization using the changed objective function.



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached PTO-892 for additional prior art including:
US 20220083884 A1: Decision History.
US 11308401 B2: Inverse reinforcement learning.
US 20220318917 A1: IRL with decision history.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DONALD T RODEN whose telephone number is (571)272-6441. The examiner can normally be reached Mon-Thur 8:00-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/D.T.R./Examiner, Art Unit 2128      

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128
Read full office action
LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email