Office Action Analysis: 18173873 — MACHINE LEARNING SYSTEM, EDGE DEVICE, AND INFORMATION PROCESSING DEVICE

Office Action

§101 §102 §103 §112
DETAILED ACTION 
This communication is in response to Application No. 18/173,873 filed on February 24, 2023 in which claims 1-18 are presented for examination

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Japan on 08/19/2022. Acknowledgment is also made of receipt of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Information Disclosure Statement
The information disclosure statements submitted on 02/24/2023 and 07/08/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements were considered by the examiner. 

Specification
	The contents of the specification are sufficient for examination purposes.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
“first evaluation unit” (Claims 1-6, 17, and 18).
“first selection unit” (Claim 1, Claim 3-4, 15, and 17-18).
“candidate data transmission unit” (Claim 1 and 17-18).
“candidate data reception unit” (Claim 1 and 17-18).
“second evaluation unit” (Claim 1, 7-13, and 17-18).
“second selection unit” (Claim 1, 8-13, 15, 17-18).
“feedback unit” (Claim 14).
“input generation unit” (Claim 15).
“inference unit” (Claim 15).
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Objections
Claims 9 and 16 are objected to because of the following informalities:
The recitations of “calculated by a first arithmetic processing device as hardware having first arithmetic accuracy” (Claim 9, ln. 6-7), “calculated by a second arithmetic processing device as hardware having second arithmetic accuracy” (Claim 9, ln. 8-9), and “a first information processing device including a first arithmetic processing device as hardware; and a second information processing device that is hardware different from the first arithmetic processing device, and includes a second arithmetic processing device” (Claim 16, ln. 2-6) are confusing, as currently formulated, because the relationships between the “information processing device[s]” are their associated “arithmetic processing device[s]” are not presented in a straightforward manner. The claims should be amended to more clearly articulate the relationships between the “information processing device[s]” and their associated “arithmetic processing device[s]”.
Appropriate correction is required.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1-18 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.

Regarding Claim 1, the claim recites the limitations “a first evaluation unit configured to”, “a first selection unit configured to”, “a candidate data transmission unit configured to”, “a candidate data reception unit configured to”, “a second evaluation unit configured to”, and “a second evaluation unit configured to”, which invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Additionally, the claim recites the limitation “determined in advance” (ln. 15, ln. 20, ln. 32, and ln. 39) to describe multiple elements in the claim. In each instance, it is unclear what the described element is “determined in advance” of. As a result, the scope of the claim is indefinite because one cannot reasonably ascertain what qualifies as a determination of the element “in advance”. Therefore, the claim is rejected. The claim should be amended to clarify, for each instance where “determined in advance” is recited, what the associated element is “determined in advance” of. 

Regarding Claim 2, the claim recites the limitation “first evaluation unit”, which, as discussed above, invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, the first evaluation unit is discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above. 
	Additionally, the claim is rejected because it is dependent upon a rejected claim. 

Regarding Claims 3-5, the claims recite at least one of the limitations “first evaluation unit” (Claims 1-5) and “first selection unit” (Claims 3-4), which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above. 
Additionally, each of the claims recite the limitation “determined in advance” (Claim 3, ln. 11; Claim 4, ln. 11; Claim 5, ln. 9), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claims are similarly rejected and should be amended in a similar manner.
	Additionally, the claims are rejected because they are dependent upon a rejected claim.

Regarding Claims 6-7, the claims recite either a “first evaluation unit” (Claim 6) and “second evaluation unit” (Claim 7), which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above. 
	Additionally, the claims are rejected because they are dependent upon a rejected claim.

Regarding Claim 8, the claim recites “a classification probability of each of one or a plurality of the classes” (ln. 12-14). The use of “each” indicates that the “classification probability” applies to both the “one” and the “plurality”. Conversely, use of “or” indicates the “classification probability” applies to only one of the “one” and the “plurality”. As a result, the scope of the claim is indefinite. Therefore, the claim is rejected. The claim should be amended to clarify the meaning of “each of one or a plurality”, such as “one or each of a plurality”.
	Additionally, the claim recites “second evaluation unit” and “second selection unit”, which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above.
Furthermore, the claim recites the limitation “determined in advance” (ln. 17-18), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
	Finally, the claim is rejected because it is dependent upon a rejected claim. 

Regarding Claims 9-11, each of the claim recite “the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data output from the predetermined position in the first machine learning model” (Claim 9, ln. 21-25; Claim 10, ln. 19-23; and Claim 11, ln. 19-23). For each claim, it is unclear whether “which” is referencing “the second evaluation data” or “the first evaluation data”. As a result, the scope of each claim is indefinite because it is not clear whether the additional limitations regarding “the first machine learning model” apply to the second evaluation data” or “the first evaluation data”. Therefore, the claims are rejected. The claim should be modified to clarify which “data” is being referenced by “which”. 
Next, the claims each recite the limitation “predetermined” to describe a “position” (Claim 9, ln. 17 and ln. 24; Claim 10, ln. 17 and ln. 22, Claim 11, ln. 17 and 22). However, for each claim, it is unclear what the “position” is “predetermined” in advance of. As a result, the scope of the claims is indefinite because one cannot reasonably ascertain what qualifies as a “predetermined” “position”. Therefore, the claims are rejected. The claims should be amended to clarify what qualifies as a “predetermined” “position”.
	Additionally, the claims each recite both the “second evaluation unit” and the “second selection unit”, which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above.
Furthermore, each of the claims recite the limitation “determined in advance” (Claim 9, ln. 12-13; Claim 10, ln. 12-13; Claim 11, ln. 12-13), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claims are similarly rejected and should be amended in a similar manner.
	Finally, the claims are rejected because they are dependent upon a rejected claim. 

Regarding Claims 12-13, both claims recite each of the “second evaluation unit” and the “second selection unit”, which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above.
Additionally, each of the claims recite the limitation “determined in advance” (Claim 12, ln. 7-8; Claim 13, ln. 8-9), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claims are similarly rejected and should be amended in a similar manner.
	Furthermore, the claims are rejected because they are dependent upon a rejected claim. 

Regarding Claim 14, the claim recites “makes a probability of selecting, as the candidate data, the input data acquired in a time range determined in advance after the input data indicated by the employment information to be higher than a probability of selecting another time range” (ln. 10-14). As currently formulated, the recites selecting data, “the input data”, which happens to be “acquired in a time range”. However, “selecting another time range” indicates the time range, instead of the data itself, is being selected. As a result, it is not clear whether “a probability of selecting” applies to “selecting . . . data” or “selecting . . . [a] time range” that contains data. As a result, the scope of the claim is indefinite. Therefore, the claim is rejected. The claim should be amended to clarify whether “a probability of selecting” applies to “selecting . . . data” or “selecting . . . [a] time range” that contains data.
Additionally,  the claim recites the limitation “a feedback unit configured to” (ln. 4), which  invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, the “feedback unit” is discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above.
Furthermore, the claim recites the limitation “determined in advance” (ln. 12), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
	Finally, the claim is rejected because it is dependent upon a rejected claim. 

Regarding Claim 15, the claim recites the limitations “an input generation unit configured to” (ln. 5) and “an inference unit configured to” (ln. 9), which invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. Additionally, the claim recites “first selection unit” (ln. 14) and “second selection unit” (ln. 18), which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above.
Additionally, the claim is rejected because it is dependent upon a rejected claim.

Regarding Claim 16, the claim recites the limitation “determined in advance” (ln. 29), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.
Additionally, the claim recites the limitation “predetermined” (ln. 14 and 21), which is indefinite for substantially the same reasoning as articulated in the rejection of Claims 9-11 above. Therefore, the claim is similarly rejected and should be amended in a similar manner.

Regarding Claims 17-18, the claims recite the limitations “a first evaluation unit configured to”, “a first selection unit configured to”, “a candidate data transmission unit configured to”, “a candidate data reception unit configured to”, “a second evaluation unit configured to”, and “a second evaluation unit configured to”, which, as discussed above, invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, as also discussed above, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, each of the above-mentioned units are discussed in regard to its functional result, without sufficient description of a linked structure, material, or acts for carrying out the functionality to arrive at the result. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may amend or respond in the manner described in regard to the rejection of Claim 1 above.
Furthermore, each of the claims recite the limitation “determined in advance” (Claim 17, ln. 12, ln. 17, ln. 29, ln. 35; Claim 18, ln. 13, ln. 18, ln. 30, ln. 37), which is indefinite for substantially the same reasoning as articulated in the rejection of Claim 1 above. Therefore, the claims are similarly rejected and should be amended in a similar manner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more.

Regarding Claim 1:
Step 1: Claim 1 is a machine claim. Therefore, Claims 1-15 are directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, elements of the claimed subject matter are mental processes. Specifically, the claim recites 
“select a plurality of pieces of learning data . . . from among a plurality of pieces of input data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, which may be aided by pen and paper);
“calculate a first evaluation value representing effectiveness of each of the pieces of input data . . . based on a first evaluation standard determined in advance” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper);
“select whether each of the pieces of input data is included in a plurality of pieces of candidate data by comparing the first evaluation value of each of the pieces of input data with a value determined in advance” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper);
“calculate a second evaluation value indicating effectiveness of each of the pieces of candidate data . . . based on a second evaluation standard determined in advance, the second evaluation standard being different from the first evaluation standard” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper); and 
“select whether each of the pieces of candidate data is included in the pieces of learning data by comparing the second evaluation value of each of the pieces of candidate data with a value determined in advance” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“A machine learning system configured to . . . for causing a first machine learning model to perform learning . . . a first evaluation unit configured to . . . a first selection unit configured to . . . and a candidate data transmission unit configured to . . . a candidate data reception unit configured to . . . a second evaluation unit configured to . . . and a second selection unit configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea);
“the machine learning system comprising: a first information processing device; and a second information processing device connected to the first information processing device via a network, wherein the first information processing device comprises . . . when being used for learning of the first machine learning model . . . and the second information processing device comprises: . . . when being used for learning of the first machine learning model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea); and
“transmit each of the pieces of candidate data to the second information processing device via the network . . . receive each of the pieces of candidate data from the first information processing device via the network” (transmitting and receiving data amounts to extra-solution activity because transmission of data over a network is incidental to the claimed subject matter). 
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“A machine learning system configured to . . . for causing a first machine learning model to perform learning . . . a first evaluation unit configured to . . . a first selection unit configured to . . . and a candidate data transmission unit configured to . . . a candidate data reception unit configured to . . . a second evaluation unit configured to . . . and a second selection unit configured to” (mere instructions to apply the exception using generic computer components does not provide an inventive concept);
“the machine learning system comprising: a first information processing device; and a second information processing device connected to the first information processing device via a network, wherein the first information processing device comprises . . . when being used for learning of the first machine learning model . . . and the second information processing device comprises: . . . when being used for learning of the first machine learning model” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept); and
“transmit each of the pieces of candidate data to the second information processing device via the network . . . receive each of the pieces of candidate data from the first information processing device via the network” (transmitting data over a network is well‐understood, routine, and conventional, see Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration). 
For the reasons above, Claim 1 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 2-15. The additional limitations of the dependent claims are addressed below.

Regarding Claim 2:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 2 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates the first evaluation value for first input data among the pieces of input data based on a relation between the first input data and data different from the first input data” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or observed relations between the information and other information, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first evaluation unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first evaluation unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept).
Accordingly, Claim 2 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 3:
Step 2A Prong 1: See the rejection of Claim 2 above, which Claim 3 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the first evaluation value, a value according to a time difference between . . . time of the first input data and . . . time of second input data that is selected as one of the pieces of candidate data . . . among one or more pieces of the second input data” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or observed times associated with the information, which may be aided by pen and paper) and
“compares the first evaluation value with a standard value determined in advance, and selects the first input data as one of the pieces of candidate data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first evaluation unit . . . and the first selection unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea); 
“immediately before the first input data . . . different from the first input data” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea); and
“acquisition . . . acquisition” (in the event that the recitations of acquisition times implicitly required data gathering, gathering of data is extra-solution activity because it is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first evaluation unit . . . and the first selection unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept);
“immediately before the first input data . . . different from the first input data” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept); and
“acquisition . . . acquisition” (data gathering is well-understood, routine and conventional, see OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93; therefore the limitation, which, at most, is implicitly recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration).
Accordingly, Claim 3 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 4:
Step 2A Prong 1: See the rejection of Claim 2 above, which Claim 4 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the first evaluation value, a value according to a degree of difference representing a difference between the first input data and k pieces of the candidate data . . . among the pieces of candidate data, k being an integral number equal to or larger than 1” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or observed differences associated with the information, which may be aided by pen and paper) and
“compares the first evaluation value with a standard value determined in advance, and selects the first input data as one of the pieces of candidate data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first evaluation unit . . . and the first selection unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea); 
“immediately before the first input data” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea); and
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first evaluation unit . . . and the first selection unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept);
“immediately before the first input data” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept); and
Accordingly, Claim 4 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 5:
Step 2A Prong 1: See the rejection of Claim 2 above, which Claim 5 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the first evaluation value, a value according to a degree of difference representing a difference between the first input data and one or more pieces of data” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or observed differences associated with the information, which may be aided by pen and paper) and
“compares the first evaluation value with a standard value determined in advance, and selects the first input data as one of the pieces of candidate data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first evaluation unit . . . and the first selection unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and 
“used for training of the first machine learning model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first evaluation unit . . . and the first selection unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and 
“used for training of the first machine learning model” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 5 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 6:
Step 2A Prong 1: See the rejection of Claim 2 above, which Claim 6 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“wherein the first evaluation value represents a binary value of effectiveness or ineffectiveness” (mental process – amounts to exercising judgment to assign known or observed information with one of two descriptions, based on an opinion on their effectiveness, which may be aided by pen and paper);
“calculates the first evaluation value based on a random number such that the effectiveness or the ineffectiveness occurs with a probability set in advance” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to a known or imagined random number to arbitrarily form opinions, which may be aided by pen and paper); and 
“selects the first input data as one of the pieces of candidate data in a case in which the first evaluation value indicates selection” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“the first evaluation unit . . . and the first selection unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“the first evaluation unit . . . and the first selection unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept).
Accordingly, Claim 6 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 7:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 7 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates the second evaluation value for first candidate data among the pieces of candidate data by analyzing an inference result or an intermediate result” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to a known or determined initial inference or intermediate result, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first evaluation unit . . . obtained by inputting the first candidate data to a machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first evaluation unit . . . obtained by inputting the first candidate data to a machine learning model” (mere instructions to apply the exception using generic computer components does not provide an inventive concept).
Accordingly, Claim 7 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 8:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 8 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“classifies input data into any of a plurality of classes . . . and calculates, as the second evaluation value, a value according to a degree of difference representing a difference between a classification probability of a class into which the first candidate data is classified as belonging among the classes and a classification probability of each of one or a plurality of the classes into which the first candidate data is classified as not belonging” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to a difference between known or determined data and descriptions associated with the information, which may be aided by pen and paper) and
“compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first machine learning model . . . the second evaluation unit . . . obtained by inputting the first candidate data to the first machine learning model . . . and the second selection unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“acquires a classification probability of belonging to each of the classes” (acquiring classification probabilities is mere extra-solution activity because it is data gathering that is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first machine learning model . . . the second evaluation unit . . . obtained by inputting the first candidate data to the first machine learning model . . . and the second selection unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and
“acquires a classification probability of belonging to each of the classes” (data gathering is well-understood, routine and conventional, see OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93; therefore the limitation, which is recited with a high level of generality, so remains insignificant extra-solution activity even upon reconsideration).
Accordingly, Claim 8 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 9:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 9 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the second evaluation value, a value according to a degree of difference representing a difference between first evaluation data . . . and second evaluation data calculated” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or observed differences associated with the information, which may be aided by pen and paper) and
“compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the second evaluation unit . . . calculated by a first arithmetic processing device as hardware . . . by a second arithmetic processing device as hardware . . . the second selection unit . . . output from a predetermined position in the first machine learning model obtained by inputting the first candidate data to the first machine learning model . . . output from the predetermined position in the first machine learning model obtained by inputting the first candidate data to the first machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“having first arithmetic accuracy . . . having second arithmetic accuracy higher than the first arithmetic accuracy . . . the first evaluation data includes at least one of output data of the first machine learning model and intermediate data . . . and the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the second evaluation unit . . . calculated by a first arithmetic processing device as hardware . . . by a second arithmetic processing device as hardware . . . the second selection unit . . . output from a predetermined position in the first machine learning model obtained by inputting the first candidate data to the first machine learning model . . . output from the predetermined position in the first machine learning model obtained by inputting the first candidate data to the first machine learning model” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and
“having first arithmetic accuracy . . . having second arithmetic accuracy higher than the first arithmetic accuracy . . . the first evaluation data includes at least one of output data of the first machine learning model and intermediate data . . . and the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 9 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 10:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 10 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the second evaluation value, a value according to a degree of difference representing a difference between first evaluation data . . . and second evaluation data . . . that is obtained by partially changing the first candidate data” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to an altered version of the information, which may be aided by pen and paper) and
“compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the second evaluation unit . . . obtained by inputting the first candidate data to the first machine learning model . . . obtained by inputting data . . . to the first machine learning model, the second selection unit . . . output from a predetermined position in the first machine learning model . . . output from the predetermined position in the first machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“the first evaluation data includes at least one of output data of the first machine learning model and intermediate data . . . and the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data”  (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the second evaluation unit . . . obtained by inputting the first candidate data to the first machine learning model . . . obtained by inputting data . . . to the first machine learning model, the second selection unit . . . output from a predetermined position in the first machine learning model . . . output from the predetermined position in the first machine learning model” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and
“the first evaluation data includes at least one of output data of the first machine learning model and intermediate data . . . and the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 10 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 11:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 11 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the second evaluation value, a value according to a degree of difference representing a difference between first evaluation data . . . and second evaluation” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to other known or observed information, which may be aided by pen and paper) and
“compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the second evaluation unit . . . obtained by inputting the first candidate data to the first machine learning model . . . obtained by inputting the first candidate data to a second machine learning model that is obtained by partially changing the first machine learning model, the second selection unit . . . output from a predetermined position in the first machine learning model . . . output from the predetermined position in the first machine learning model” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“the first evaluation data includes at least one of output data of the first machine learning model and intermediate data . . . and the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data”  (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the second evaluation unit . . . obtained by inputting the first candidate data to the first machine learning model . . . obtained by inputting the first candidate data to a second machine learning model that is obtained by partially changing the first machine learning model, the second selection unit . . . output from a predetermined position in the first machine learning model . . . output from the predetermined position in the first machine learning model” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and
“the first evaluation data includes at least one of output data of the first machine learning model and intermediate data . . . and the second evaluation data includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 11 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 12:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 12 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the second evaluation value, a value representing variation among a plurality of pieces of output data” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or determined variance among the observed information, which may be aided by pen and paper) and
“compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the second evaluation unit . . . the second selection unit . . . obtained by inputting the first candidate data to a plurality of machine learning models” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“and the pieces of output data are a plurality of inference results . . . learned with learning parameters different from learning parameters of the first machine learning model”  (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the second evaluation unit . . . the second selection unit . . . obtained by inputting the first candidate data to a plurality of machine learning models” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and
“and the pieces of output data are a plurality of inference results . . . learned with learning parameters different from learning parameters of the first machine learning model”  (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 12 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 13:
Step 2A Prong 1: See the rejection of Claim 7 above, which Claim 13 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“calculates, as the second evaluation value, a value based on a degree of difference representing a difference between first output data and each of one or more pieces of second output data” (mental process – amounts to exercising judgment to determine a value associated with known or observed information, with reference to known or determined differences between the observed information, which may be aided by pen and paper) and
“compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data and/or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the second evaluation unit . . . the second selection unit . . . obtained by inputting the first candidate data to the first machine learning model . . . obtained by inputting the first candidate data to one or more machine learning models” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) and
“the first output data is an inference result . . . and the one or more pieces of second output data are respectively one or more inference results . . . learned with learning parameters different from learning parameters of the first machine learning model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the second evaluation unit . . . the second selection unit . . . obtained by inputting the first candidate data to the first machine learning model . . . obtained by inputting the first candidate data to one or more machine learning models” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) and
“the first output data is an inference result . . . and the one or more pieces of second output data are respectively one or more inference results . . . learned with learning parameters different from learning parameters of the first machine learning model” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept).
Accordingly, Claim 13 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 14:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 14 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“makes a probability of selecting, as the candidate data, the input data acquired in a time range determined in advance . . . to be higher than a probability of selecting another time range.” (mental process – amounts to exercising judgement to form opinion that information within a previously determined time range should be prioritized over information in another time range).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the second information processing device further comprises . . . indicating that corresponding input data is selected as the learning data . . . each time the learning data is selected . . . after the input data indicated by the employment information” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea);
“a feedback unit configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea); and 
“transmit employment information . . . to the first information processing device . . . and the first information processing device receives the employment information” (transmitting and receiving data amounts to extra-solution activity because transmission of data over a network is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the second information processing device further comprises . . . indicating that corresponding input data is selected as the learning data . . . each time the learning data is selected . . . after the input data indicated by the employment information” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept);
“a feedback unit configured to”  (mere instructions to apply the exception using generic computer components does not provide an inventive concept); and 
“transmit employment information . . . to the first information processing device . . . and the first information processing device receives the employment information” (transmitting data over a network is well‐understood, routine, and conventional, see Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration).
Accordingly, Claim 14 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 15:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 15 depends on. Here, the claim recites additional limitations that are mental processes. Specifically, the claim recites
“perform inference processing on the respective pieces of input data on a time-series basis . . . and output an inference result obtained by performing the inference processing on a time-series basis” (mental process – amounts to exercising judgement to form opinions on known or observed information, in a sequential basis, which may be aided by pen and paper);
“determines whether to select each of the pieces of input data as a candidate based on a corresponding first evaluation value on a time-series basis” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data on a sequential basis, which may be aided by pen and paper); and  
“determines whether each of the pieces of candidate data is included in the pieces of learning data based on a corresponding second evaluation value on a time-series basis” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data on a sequential basis, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“wherein the first information processing device further comprises” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea);
“an input data generation unit configured to . . . an inference unit configured to . . . based on the first machine learning model . . . the first selection unit . . . the second selection unit” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea); and 
“collect observation results obtained by observing surroundings, and generate pieces of time-series input data” (collection of observation results to generate time-series data is extra-solution activity because it is mere gathering of data that is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“wherein the first information processing device further comprises”  (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept);
“an input data generation unit configured to . . . an inference unit configured to . . . based on the first machine learning model . . . the first selection unit . . . the second selection unit” (mere instructions to apply the exception using generic computer components does not provide an inventive concept); and 
“collect observation results obtained by observing surroundings, and generate pieces of time-series input data” (data gathering is well-understood, routine and conventional, see OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93; therefore the limitation, which is recited with a high level of generality, so remains insignificant extra-solution activity even upon reconsideration).
Accordingly, Claim 15 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 16:
Step 1: Claim 1 is a machine claim. Therefore, it is directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, elements of the claimed subject matter are mental processes. Specifically, the claim recites 
“generates . . . first evaluation data . . . generates . . . second evaluation data” (mental process – apart from a potential step of data gathering, which may be implicitly required, amounts to exercising judgment to form opinions on known or observed information, for use in subsequent evaluations, which may be aided by pen and paper) and 
“selects, as learning data . . . input data for which a difference between the first evaluation data and the second evaluation data is larger than a standard value determined in advance among the pieces of input data” (mental process – amounts to exercising judgment to form an opinion on known or observed information, with reference to other known or observed data or standards with specific constraints, which may be aided by pen and paper).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“A machine learning system comprising: a first information processing device including a first arithmetic processing device as hardware; and a second information processing device that is hardware different from the first arithmetic processing device, and includes a second arithmetic processing device . . . with higher arithmetic accuracy than the first arithmetic processing device . . . including at least one of output data of a first machine learning model and intermediate data output from a predetermined position in the first machine learning model . . . including at least one of the output data of the first machine learning model and the intermediate data output from the predetermined position in the first machine learning model . . . for training the first machine learning model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea);
“a second arithmetic processing device configured to execute information processing . . . , wherein the first information processing device . . . using the first arithmetic processing device . . . obtained by inputting each of a plurality of pieces of input data to the first machine learning model, the second information processing device . . . using the second arithmetic processing device . . . obtained by inputting each of the pieces of input data to the first machine learning model, and the second information processing device” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea) ; and
“generates . . . first evaluation data . . . generates . . . second evaluation data” (in the event that data gathering were implicitly required, gathering of data is extra-solution activity because it is incidental to the claimed subject matter).
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“A machine learning system comprising: a first information processing device including a first arithmetic processing device as hardware; and a second information processing device that is hardware different from the first arithmetic processing device, and includes a second arithmetic processing device . . . with higher arithmetic accuracy than the first arithmetic processing device . . . including at least one of output data of a first machine learning model and intermediate data output from a predetermined position in the first machine learning model . . . including at least one of the output data of the first machine learning model and the intermediate data output from the predetermined position in the first machine learning model . . . for training the first machine learning model” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept) and
“a second arithmetic processing device configured to execute information processing . . . , wherein the first information processing device . . . using the first arithmetic processing device . . . obtained by inputting each of a plurality of pieces of input data to the first machine learning model, the second information processing device . . . using the second arithmetic processing device . . . obtained by inputting each of the pieces of input data to the first machine learning model, and the second information processing device” (mere instructions to apply the exception using generic computer components does not provide an inventive concept) ; and
“generates . . . first evaluation data . . . generates . . . second evaluation data” (data gathering is well-understood, routine and conventional, see OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93; therefore the limitation, which, at most, is implicitly recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration).
For the reasons above, Claim 16 is rejected as being directed to an abstract idea without significantly more.

Regarding Claim 17:
Step 1: Claim 17 is a machine claim. Therefore, it is directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, the claim recites elements that are substantially the same as the limitations of Claim 1. As a result, and as elaborated above, these limitations are abstract ideas because they are mental processes.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“An edge device . . . that . . . a first evaluation unit configured to . . . a first selection unit configured to . . . and a candidate data transmission unit configured to . . . a candidate data reception unit configured to . . . a second evaluation unit configured to . . . ; and a second selection unit configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea);
“in a machine learning system that comprises the edge device and an information processing device connected to the edge device via a network . . . for causing a first machine learning model to perform learning . . . the edge device comprising . . . when being used for learning of the first machine learning model . . . wherein the information processing device comprises: . . . when being used for learning of the first machine learning model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea); and
“transmit each of the pieces of candidate data to the information processing device via the network . . . receive each of the pieces of candidate data from the edge device via the network” (transmitting and receiving data amounts to extra-solution activity because transmission of data over a network is incidental to the claimed subject matter). 
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“An edge device . . . that . . . a first evaluation unit configured to . . . a first selection unit configured to . . . and a candidate data transmission unit configured to . . . a candidate data reception unit configured to . . . a second evaluation unit configured to . . . ; and a second selection unit configured to” (mere instructions to apply the exception using generic computer components does not provide an inventive concept);
“in a machine learning system that comprises the edge device and an information processing device connected to the edge device via a network . . . for causing a first machine learning model to perform learning . . . the edge device comprising . . . when being used for learning of the first machine learning model . . . wherein the information processing device comprises: . . . when being used for learning of the first machine learning model” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept); and
“transmit each of the pieces of candidate data to the information processing device via the network . . . receive each of the pieces of candidate data from the edge device via the network” (transmitting data over a network is well‐understood, routine, and conventional, see Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration). 
For the reasons above, Claim 17 is rejected as being directed to an abstract idea without significantly more. 

Regarding Claim 18:
Step 1: Claim 18 is a machine claim. Therefore, it is directed to a statutory category of eligible subject matter.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. Here, the claim recites elements that are substantially the same as the limitations of Claim 1. As a result, and as elaborated above, these limitations are abstract ideas because they are mental processes.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites the additional elements:
“An information processing device . . . that . . . a first evaluation unit configured to . . . a first selection unit configured to . . . and a candidate data transmission unit configured to . . . a candidate data reception unit configured to . . . a second evaluation unit configured to . . . ; and a second selection unit configured to” (amounts to mere instructions to apply the judicial exception on generic and unspecialized computer components, which do not impose any meaningful limits on practicing the abstract idea);
“in a machine learning system that comprises the edge device and an information processing device connected to the edge device via a network . . . for causing a first machine learning model to perform learning . . . the edge device comprising . . . when being used for learning of the first machine learning model . . . the information processing device comprises: . . . when being used for learning of the first machine learning model” (amounts to merely generally linking the use of the judicial exception to a particular technological environment or field of use, which do not impose any meaningful limits on practicing the abstract idea); and
“transmit each of the pieces of candidate data to the information processing device via the network . . . receive each of the pieces of candidate data from the edge device via the network” (transmitting and receiving data amounts to extra-solution activity because transmission of data over a network is incidental to the claimed subject matter). 
Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
The claim recites the additional elements:
“An information processing device . . . that . . . a first evaluation unit configured to . . . a first selection unit configured to . . . and a candidate data transmission unit configured to . . . a candidate data reception unit configured to . . . a second evaluation unit configured to . . . ; and a second selection unit configured to” (mere instructions to apply the exception using generic computer components does not provide an inventive concept);
“in a machine learning system that comprises the edge device and an information processing device connected to the edge device via a network . . . for causing a first machine learning model to perform learning . . . the edge device comprising . . . when being used for learning of the first machine learning model . . . the information processing device comprises: . . . when being used for learning of the first machine learning model” (merely generally linking the use of the judicial exception to a particular technological environment or field of use does not provide an inventive concept); and
“transmit each of the pieces of candidate data to the information processing device via the network . . . receive each of the pieces of candidate data from the edge device via the network” (transmitting data over a network is well‐understood, routine, and conventional, see Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; see also buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014); therefore the limitation, which is recited with a high level of generality, remains insignificant extra-solution activity even upon reconsideration). 
For the reasons above, Claim 18 is rejected as being directed to an abstract idea without significantly more. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.



Claims 1-2, 5, 15, and 17-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Karpathy et al. (hereinafter Karpathy) (Patent Pub. No. US 2021/0271259 A1).

Regarding Claim 1, Karpathy teaches a machine learning system configured to select a plurality of pieces of learning data for causing a first machine learning model to perform learning from among a plurality of pieces of input data (Abstract, “Systems and methods for obtaining training data . . . includes . . . applying a neural network to the sensor data. A trigger classifier is applied to an intermediate result of the neural network to determine a classifier score for the sensor data. Based at least in part on the classifier score, a determination is made whether to transmit via a computer network at least a portion of the sensor data. Upon a positive determination, the sensor data is transmitted and used to generate training data”, where “Systems” that include “applying a neural network” are machine learning systems, which are configured to select, “determination is made whether to transmit . . . [and use] to generate training data”, a plurality of pieces of learning data of learning data from a plurality of pieces of input data, where the selected “portion of the sensor data” are the learning data, selected from the input data of all “sensor data”; see also Para. [0081], “sensor data received is processed to create training data for training a machine learning model”, where the “training data” is used for causing a first “machine learning model” to perform learning through “training”), 
the machine learning system comprising: a first information processing device; and a second information processing device connected to the first information processing device via a network (Para. [0007], “FIG. 1B is a block diagram illustrating one embodiment of a system for generating training data” and FIG. 1B, where the machine learning system, as indicated by the inclusion of a “Deep Learning System 700” component, includes a vehicle “102” as a first device and a “Training Data Generating System 120” as a second device, which are connected via the “Network”; see also Para.[0023], “classifiers may be uploaded to a computer system within a vehicle, such that the classifier may be used to recognize specific image features or objects associated with the classifiers. The captured images that are designated by the classifier as including the particular feature or object can then be transmitted to a central server system and used as training data for neural network systems”, where both the “vehicle”, which as discussed above is “102”, and the “server”, which is 120, see Para. [0080], “a computer server (e.g., the training data generation system 120)”, are information processing devices because the “vehicle” processes “image features or objects” and the “server” processes “training data”), 
wherein the first information processing device comprises (Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where, as discussed above, the “vehicle” is the first information processing device, which executes the functionality of first evaluation, “Determine Trigger Classification Score 411”, first selection, “Score Exceeds Threshold And Conditions Met 413”, and candidate data transmission, “Transmit Identified Sensor Data 415”, which each require a unit of associated hardware and software functionality, respectively the first evaluation unit, the first selection unit, and the candidate selection unit, see generally Para. [0103], “The various aspects, embodiments, implementations or features of the described embodiments can be used separately or in any combination. Various aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software . . . The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion”, where the units of “hardware and software” could be within a computer system or “distributed over network-coupled computer systems”): 
a first evaluation unit configured to calculate a first evaluation value representing effectiveness of each of the pieces of input data when being used for learning of the first machine learning model based on a first evaluation standard determined in advance (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Determine Trigger Classification Score 411” functionality are collectively the first evaluation unit, which is configured to calculate a first evaluation value, “classification score”, see Para. [0077], “At 411, a trigger classifier score is determined”; see also Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the “trigger classifier”  determines the trigger score as a representation of the effectiveness of each piece of input data in machine learning of the first model, “analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases”, based on a first evaluation standard, whether “the sensor data” “warrant retraining”; see also Para. [0102], “trigger classifier module 713 determines a classifier score for a data captured by one or more sensors of sensors”, where each “classifier score” is for each of the pieces of data because it is for “a data captured”; see also Fig. 3, where the first evaluation standard is determined in advance because the steps of “Train Trigger Classifier 305” and “Determine Trigger Properties 307” occur in advance of “Deploy Trigger Classifier And Properties 309”); 
a first selection unit configured to select whether each of the pieces of input data is included in a plurality of pieces of candidate data by comparing the first evaluation value of each of the pieces of input data with a value determined in advance (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Score Exceeds Threshold And Conditions Met 413” functionality are collectively the first selection unit, which is configured to compare the first evaluation value of each of the pieces of input data, which as discussed above is the “classifier score”, with a value, “threshold value”, which determines whether the pieces of input data associated with the “classifier score” will be selected for inclusion in “continue[d]” “processing” at “415”, see Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415”; see also Fig. 3, where the value is determined in advance because the steps of “Train Trigger Classifier 305” and “Determine Trigger Properties 307” occur in advance of “Deploy Trigger Classifier And Properties 309”; see also Para. [0060], “the intermediate results of the deep learning analysis at 205 are utilized for identifying training data at 207 and transmitting the identified sensor data at 209”, where the plurality of pieces of “sensor data” “identified” for inclusion in future processing are candidate data for “training data”); and 
a candidate data transmission unit configured to transmit each of the pieces of candidate data to the second information processing device via the network (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Transmit Identified Sensor Data 415” functionality are collectively the candidate data transmission unit, which is configured to “transmit” each of the pieces of candidate data, “identified sensor data”, to the second information processing device, “server (e.g., the training data generation system 120)”, see Para. [0080], “At 415, the identified sensor data is transmitted. For example, the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”; see also Para. [0044], “the vehicle may transmit the sensor data 108 over a network”, where the “transmi[ssion]” is “over a network”), and 
the second information processing device comprises (Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, which as discussed above is the second information processing device, performs the functionality of “FIG. 5[’s] . . . flow diagram”; see also Fig. 5, where the functionality of candidate data reception, “Receive Sensor Data Meeting Trigger Conditions 501”, second evaluation, “Convert Sensor Data Into Training Data 503”, and second selection, “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507”, each require a unit of associated hardware and software functionality, respectively the candidate data reception unit, the second evaluation unit, and the second selection unit, see generally Para. [0103], “The various aspects, embodiments, implementations or features of the described embodiments can be used separately or in any combination. Various aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software . . . The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion”, where the units of “hardware and software” could be within a computer system or “distributed over network-coupled computer systems”): 
a candidate data reception unit configured to receive each of the pieces of candidate data from the first information processing device via the network (Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Receive Sensor Data Meeting Trigger Conditions 501” functionality are collectively the candidate data reception unit, which is configured to “receive” pieces of “sensor data”, see Para. [0082], “At 501, sensor data meeting trigger conditions is received”; see also Para. [0081], “the sensor data is received using the process of FIG. 4”, where, as discussed above, “the process of FIG. 4”, includes the transmission of each of the pieces of candidate data from the first information processing device; see also Para. [0093], “The sensor data that triggers retention for transmittal by trigger classifier module 713 is sent via network interface 711 . . . [by] the vehicle”, where the transmission is “via network”); 
a second evaluation unit configured to calculate a second evaluation value indicating effectiveness of each of the pieces of candidate data when being used for learning of the first machine learning model based on a second evaluation standard determined in advance, the second evaluation standard being different from the first evaluation standard (Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which is output based on the configuration of the “machine learning model”, the second evaluation standard in this instance, which is determined in advance by training the model to be “highly accurate”; see also Para. [0045], “The classifiers 110A-110N may, as an example, use classifier scores which cause transmission of a multitude of sensor data 108 to the outside system 120. For example, a portion of images transmitted to the system 120 may not include tires. In some embodiments, the entity may thus rapidly review and discard certain of the images”, where the second evaluation standard must be different from the first evaluation standard for “a portion of images transmitted to the system” to be “discard[ed]” based on the second evaluation standard, despite the determination of “sensor data” for “transmission” being based on the first evaluation standard, see Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”); and 
a second selection unit configured to select whether each of the pieces of candidate data is included in the pieces of learning data by comparing the second evaluation value of each of the pieces of candidate data with a value determined in advance (Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, see Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value). 

Regarding Claim 2, Karpathy teaches the machine learning system according to claim 1, wherein the first evaluation unit calculates the first evaluation value for first input data among the pieces of input data (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Determine Trigger Classification Score 411” functionality are collectively the first evaluation unit, which, as also discussed above, is configured to calculate a first evaluation value, “classification score”, see Para. [0077], “At 411, a trigger classifier score is determined”; see also Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the “trigger classifier”  determines the trigger score as a representation of the effectiveness of each piece of input data in machine learning of the first model, “analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases”; see also Para. [0102], “trigger classifier module 713 determines a classifier score for a data captured by one or more sensors of sensors”, where each “classifier score” is for each of the pieces of data because it is for “a data captured”)
based on a relation between the first input data and data different from the first input data (Para. [0079], “the trigger classifier used in the process of FIG. 4 is trained using the process of FIG. 3”, where the first input data is “used in the process of FIG. 4” and different data is used for the “train[ing] . . . process of FIG. 3”, and the output, the “identif[ication of] the likelihood an input”, generated by a given first input data, “input from sensor data . . . [during] deployment”, will necessarily be based on a relationship between the first input data and different training data, data for “offline” “train[ing]”, see Para. [0064], “By using positive and negative examples, the trigger classifier is trained to identify the likelihood an input (for example, an input from sensor data) is a match for the particular use case, such as a tunnel exit . . . In some embodiments, the trigger classifier is trained using an offline neural network that matches the neural network deployed on a vehicle”, where the output of the “trigger classifier” is based on a shared relationship of features determined to “match a particular use case, such as a tunnel exit”, where the more representative the relationship between the first input data and the different data, the more useful the model output will be for a particular use case, see generally Para. [0084], “The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case”).

Regarding Claim 5, Karpathy teaches the machine learning system according to claim 2, wherein the first evaluation unit calculates, as the first evaluation value (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Determine Trigger Classification Score 411” functionality are collectively the first evaluation unit, which, as also discussed above, is configured to calculate a first evaluation value, “classification score”, see Para. [0077], “At 411, a trigger classifier score is determined”), 
a value according to a degree of difference representing a difference between the first input data and one or more pieces of data used for training (Para. [0068], “Classifier scores lie between −1.0 and 1.0 to indicate how likely the raw input is a positive or negative example of the targeted use case”, where the value, “Classifier score”, is according to a degree, “how likely”, of difference to “the target use case”, which represents the difference between the first input data and one or more pieces of data used for training the classifier, see Para. [0079], “the trigger classifier used in the process of FIG. 4 is trained using the process of FIG. 3”, where the first input data is “used in the process of FIG. 4” to generate an output, identif[ication of] the likelihood an input”, where the output will necessarily represent a difference between the first input data and the different data is used for the “train[ing] . . . process of FIG. 3”, with a larger difference translating to outputs closer to the middle and less different inputs resulting in more confident scores near the extremes of “−1.0 and 1.0”, see Para. [0064], “By using positive and negative examples, the trigger classifier is trained to identify the likelihood an input (for example, an input from sensor data) is a match for the particular use case, such as a tunnel exit . . . In some embodiments, the trigger classifier is trained using an offline neural network that matches the neural network deployed on a vehicle”, where the output of the “trigger classifier” is based on a difference relationship of features determined to “match a particular use case, such as a tunnel exit”, where the more similar the relationship between the first input data and the different data, the more useful the model output will be for a particular use case, see generally Para. [0084], “The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case” and Para. [0057], “For example a higher classified score indicates a higher likelihood the sensor data is representative of the use case. In some embodiments, the classifier score is a number between negative one and positive one. A score closer to positive one is more likely to be representative of the targeted use case”)
of the first machine learning model (Para. [0026], “an initial data set representative of the targeted use case is created and used to create a trigger classifier”, where the “initial data set” used to train the “trigger classifier” is “representative of the target use case”, which is present in some quantity in the training set of the first machine learning model, and therefore, the “initial data set” is representative of one or more pieces of training data used for the first machine learning model, see Para. [0084], “ an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case” and Para. [0030], “A new machine learning model is trained using the newly curated data set to improve the autonomous vehicle neural network, and is then deployed to vehicles as an update to the autonomous vehicle system. The newly deployed machine learning model has an improved ability to detect the particular use case (for example, tunnel exit) targeted by the trigger classifier”), 
and the first selection unit compares the first evaluation value with a standard value determined in advance, and selects the first input data as one of the pieces of candidate data (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Score Exceeds Threshold And Conditions Met 413” functionality are collectively the first selection unit, which, as also discussed above, is configured to compare the first evaluation value of each of the pieces of input data, which as discussed above is the “classifier score”, with a value, “threshold value”, which is a standard value because it established a “threshold” standard and determines whether the pieces of input data, first input data, associated with the “classifier score” will be selected for inclusion in “continue[d]” “processing” at “415”, see Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415”; see also Fig. 3, where the value is determined in advance because the steps of “Train Trigger Classifier 305” and “Determine Trigger Properties 307” occur in advance of “Deploy Trigger Classifier And Properties 309”; see also Para. [0060], “the intermediate results of the deep learning analysis at 205 are utilized for identifying training data at 207 and transmitting the identified sensor data at 209”, where the plurality of pieces of “sensor data” “identified” for inclusion in future processing are candidate data for “training data”).

Regarding Claim 15, Karpathy teaches the machine learning system according to claim 1, wherein the first information processing device further comprises (Para. [0052], “the process of FIG. 2 is implemented on a vehicle” and Fig. 2, where, as discussed above, the “vehicle” is the first information processing device, which executes the functionality of input data generation, “Receive Sensor Data 201” and “Perform Data Pre-processing 203”, and inference, “Initiate Deep Learning Analysis 205”, “Perform Data Post-processing 211”, and “Provide Results To Vehicle Control 213”, which each require a unit of associated hardware and software functionality, respectively the input generation unit and the inference unit, see generally Para. [0103], “The various aspects, embodiments, implementations or features of the described embodiments can be used separately or in any combination. Various aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software . . . The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion”, where the units of “hardware and software” could be within a computer system or “distributed over network-coupled computer systems”): 
an input data generation unit configured to collect observation results obtained by observing surroundings, and generate pieces of time-series input data (Fig. 2, where, as discussed above, the associated software and hardware required to execute the “Receive Sensor Data 201” and “Perform Data Pre-processing 203” functionality are collectively the input data generation unit, which is configured to collect observation results, “capture sensor data”, by observing surroundings, “the surrounding environment”, to generate pieces of input data, “the captured image is provided for deep learning analysis”, see Para. [0054], “At 201, sensor data is received. For example, a vehicle equipped with sensors captures sensor data and provides the sensor data to a neural network running on the vehicle . . . to capture data of the surrounding environment . . . the captured image is provided for deep learning analysis”; see also Para. [0021], “the sensor information may be captured in the normal course of operation of the vehicles. The sensor information may be used by the vehicles for certain automated driving features, such as lane navigation” and Para. [0080], “the sensor data transmitted includes metadata. Examples of metadata may include the time of data, a timestamp . . . compression of multiple images of sensor data is performed and a series of sensor data is transmitted together”, where the generated input data, “sensor data transmitted”, includes “timestamp[s]”, can be organized as “a series of sensor data”, and is “captured” over the course of time, “normal course of operation”, which is within the broadest reasonable interpretation of time series); and
an inference unit configured to perform inference processing on the respective pieces of input data on a time-series basis based on the first machine learning model, and output an inference result obtained by performing the inference processing on a time-series basis (Fig. 2, where, as discussed above, the associated software and hardware required to execute the “Initiate Deep Learning Analysis 205”, “Perform Data Post-processing 211”, and “Provide Results To Vehicle Control 213” functionality are collectively the inference unit, which is configured to perform inference processing on the respective pieces of input data, “deep learning analysis of the sensor data”, based on the first machine learning model, “a convolutional neural network (CNN)”, which must be on a time-series basis to “identify . . .  moving vehicles”, see Para. [0056], “At 205, deep learning analysis of the sensor data is initiated . . . using a neural network such as a convolutional neural network (CNN). In various embodiments, the machine learning model is trained offline and installed onto the vehicle for performing inference on the sensor data. For example, the model may be trained to identify road lane lines, obstacles, pedestrians, moving vehicles, parked vehicles, drivable space, etc., as appropriate”, and output inference “results”, which must be in a time series basis to allow for “autonomous driving”, see Para. [0060], “At 213, the results of the deep learning analysis are provided to vehicle control. For example, the results are used by a vehicle control module to control the vehicle for autonomous driving”; see also Para. [0030], “A new machine learning model is trained using the newly curated data set to improve the autonomous vehicle neural network, and is then deployed to vehicles as an update to the autonomous vehicle system”, where the first machine learning model, “the autonomous vehicle neural network”, is the model “deployed to [the] vehicles”), 
the first selection unit determines whether to select each of the pieces of input data as a candidate based on a corresponding first evaluation value on a time-series basis (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Score Exceeds Threshold And Conditions Met 413” functionality are collectively the first selection unit, which, as also discussed above, is configured to compare the first evaluation value of each of the pieces of input data, which as discussed above is the “classifier score”, with a value, “threshold value”, which determines whether the pieces of input data associated with the “classifier score” will be selected for inclusion in “continue[d]” “processing” at “415”, on a time-series basis, “within the last 10 minutes may be retained”, see Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415 . . . only sensor data with the highest score from the same location within the last 10 minutes may be retained as potential data”), and 
the second selection unit determines whether each of the pieces of candidate data is included in the pieces of learning data based on a corresponding second evaluation value on a time-series basis (Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which, as also discussed above, is configured to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, see Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value; see also Para. [0021], “the sensor information may be captured in the normal course of operation of the vehicles. The sensor information may be used by the vehicles for certain automated driving features, such as lane navigation”, Para. [0080], “the sensor data transmitted includes metadata. Examples of metadata may include the time of data, a timestamp . . . compression of multiple images of sensor data is performed and a series of sensor data is transmitted together”, and Para. [0078], “only sensor data with the highest score from the same location within the last 10 minutes may be retained as potential data”, where determinations of the second selection unit are on a time-series basis are on a time-series basis because, as discussed above, it selects time-series data and receives the time-series data on a time-series basis).

Regarding Claim 17, Karpathy teaches an edge device in a machine learning system that comprises the edge device and an information processing device connected to the edge device via a network (Para. [0007], “FIG. 1B is a block diagram illustrating one embodiment of a system for generating training data” and FIG. 1B, where the machine learning system, as indicated by the inclusion of a “Deep Learning System 700” component, includes a vehicle “102” as a first device and a “Training Data Generating System 120” as a second device, which are connected via the “Network”; see also Para.[0023], “classifiers may be uploaded to a computer system within a vehicle, such that the classifier may be used to recognize specific image features or objects associated with the classifiers. The captured images that are designated by the classifier as including the particular feature or object can then be transmitted to a central server system and used as training data for neural network systems”, where the “vehicle”, which as discussed above is “102”, is an edge device because it is at the edge of the “system”, distant from the “central server system”, and where the “server”, which is 120, see Para. [0080], “a computer server (e.g., the training data generation system 120)”, is an information processing device because it processes “training data”), 
and selects a plurality of pieces of learning data for causing a first machine learning model to perform learning from among a plurality of pieces of input data (Abstract, “Systems and methods for obtaining training data . . . includes . . . applying a neural network to the sensor data. A trigger classifier is applied to an intermediate result of the neural network to determine a classifier score for the sensor data. Based at least in part on the classifier score, a determination is made whether to transmit via a computer network at least a portion of the sensor data. Upon a positive determination, the sensor data is transmitted and used to generate training data”, where the “Systems” selects, “determination is made whether to transmit . . . [and use] to generate training data”, a plurality of pieces of learning data of learning data from a plurality of pieces of input data, where the selected “portion of the sensor data” are the learning data, selected from the input data of all “sensor data”; see also Para. [0081], “sensor data received is processed to create training data for training a machine learning model”, where the “training data” is used for causing a first “machine learning model” to perform learning through “training”; see also Para. [0087], “the vehicle may determine classifier scores”, where the “vehicle” edge device, in combination other elements of the system, performs the selecting because, as a nonexclusive example, it “determine[s] classifier score”), 
the edge device comprising: a first evaluation unit . . . a first selection unit . . . and a candidate data transmission unit . . . (Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where, as discussed above, the “vehicle” is the edge device, which executes the functionality of first evaluation, “Determine Trigger Classification Score 411”, first selection, “Score Exceeds Threshold And Conditions Met 413”, and candidate data transmission, “Transmit Identified Sensor Data 415”, which each require a unit of associated hardware and software functionality, respectively the first evaluation unit, the first selection unit, and the candidate selection unit, see generally Para. [0103], “The various aspects, embodiments, implementations or features of the described embodiments can be used separately or in any combination. Various aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software . . . The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion”, where the units of “hardware and software” could be within a computer system or “distributed over network-coupled computer systems”)
. . . from the edge device via the network . . . (Para. [0093], “The sensor data that triggers retention for transmittal by trigger classifier module 713 is sent via network interface 711 . . . [by] the vehicle”).
The remaining limitations are substantially the same as limitations of Claim 1, therefore it is rejected under the same rationale. 

Regarding Claim 18, Karpathy teaches an information processing device in a machine learning system that comprises an edge device and the information processing device connected to the edge device via a network (Para. [0007], “FIG. 1B is a block diagram illustrating one embodiment of a system for generating training data” and FIG. 1B, where the machine learning system, as indicated by the inclusion of a “Deep Learning System 700” component, includes a vehicle “102” as a first device and a “Training Data Generating System 120” as a second device, which are connected via the “Network”; see also Para.[0023], “classifiers may be uploaded to a computer system within a vehicle, such that the classifier may be used to recognize specific image features or objects associated with the classifiers. The captured images that are designated by the classifier as including the particular feature or object can then be transmitted to a central server system and used as training data for neural network systems”, where the “vehicle”, which as discussed above is “102”, is an edge device because it is at the edge of the “system”, distant from the “central server system”, and where the “server”, which is 120, see Para. [0080], “a computer server (e.g., the training data generation system 120)”, is an information processing device because it processes “training data”), and 
selects a plurality of pieces of learning data for causing a first machine learning model to perform learning from among a plurality of pieces of input data . . . (Abstract, “Systems and methods for obtaining training data . . . includes . . . applying a neural network to the sensor data. A trigger classifier is applied to an intermediate result of the neural network to determine a classifier score for the sensor data. Based at least in part on the classifier score, a determination is made whether to transmit via a computer network at least a portion of the sensor data. Upon a positive determination, the sensor data is transmitted and used to generate training data”, where the “Systems” selects, “determination is made whether to transmit . . . [and use] to generate training data”, a plurality of pieces of learning data of learning data from a plurality of pieces of input data, where the selected “portion of the sensor data” are the learning data, selected from the input data of all “sensor data”; see also Para. [0081], “sensor data received is processed to create training data for training a machine learning model”, where the “training data” is used for causing a first “machine learning model” to perform learning through “training”; see also Para. [0080], “the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”, where the “server” information processing device, in combination other elements of the system, performs the selecting because, as a nonexclusive example, the “sensor data” “identified” by the classifier scores are used to “create training data”).
The remaining limitations are substantially the same as limitations of Claim 17, therefore it is rejected under the same rationale. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Teinemaa et al. (hereinafter Teinemaa) (“Temporal stability in predictive process monitoring”).

Regarding Claim 3, Karpathy teaches the machine learning system according to claim 2, wherein the first evaluation unit calculates, as the first evaluation value, a value (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Determine Trigger Classification Score 411” functionality are collectively the first evaluation unit, which, as also discussed above, is configured to calculate a first evaluation value, “classification score”, see Para. [0077], “At 411, a trigger classifier score is determined”) . . . 
[according to a time-based analysis, involving] acquisition time of the first input data and acquisition time of second input data that is selected as one of the pieces of candidate data . . . [wherein, the] first input data among one or more pieces of the second input data different from the first input data (Para. [0021], “the sensor information may be captured in the normal course of operation of the vehicles. The sensor information may be used by the vehicles for certain automated driving features, such as lane navigation”, where the “sensor information” “captured in the normal course of operation” requires multiple stages of input data generation, including first input data and second input data, which must be different from the first to detect changing “features” for “automated driving”; see also Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415 . . . only sensor data with the highest score from the same location within the last 10 minutes may be retained as potential data”, where different first and second input data is also analyzed for multiple vehicles “from the same location”, which are selected as pieces of candidate data, “In the event the classifier score exceeds the threshold value, processing continues to 415”, is in part according to a time-based analysis of acquisition between the different input data, “within the last 10 minutes”), 
and the first selection unit compares the first evaluation value with a standard value determined in advance, and selects the first input data as one of the pieces of candidate data (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Score Exceeds Threshold And Conditions Met 413” functionality are collectively the first selection unit, which, as also discussed above, is configured to compare the first evaluation value of each of the pieces of input data, which as discussed above is the “classifier score”, with a value, “threshold value”, which is a standard value because it established a “threshold” standard and determines whether the pieces of input data, including first input data, associated with the “classifier score” will be selected for inclusion in “continue[d]” “processing” at “415”, see Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415”; see also Fig. 3, where the value is determined in advance because the steps of “Train Trigger Classifier 305” and “Determine Trigger Properties 307” occur in advance of “Deploy Trigger Classifier And Properties 309”; see also Para. [0060], “the intermediate results of the deep learning analysis at 205 are utilized for identifying training data at 207 and transmitting the identified sensor data at 209”, where the plurality of pieces of “sensor data” “identified” for inclusion in future processing are candidate data for “training data”).
Karpathy does not explicitly disclose . . . according to a time difference between . . . immediately before the . . . .
However, Teinemaa teaches . . . [calculating a value] according to a time difference between . . . [the time of first data and the time of second data, where the second data is] immediately before the [first data] . . . . (Pg. 1308, Abstract, “Predictive process monitoring is concerned with the analysis of events produced during the execution of a business process . . . this paper defines a notion of temporal stability for binary classification tasks in predictive process monitoring”, where “analysis of events” requires data for a plurality of “events”, including first “event” data and second “event” data, where a value is calculated for the first event according to a time difference between it and the event immediately before it, “time since last event”, see Pg. 1318, “we apply some preprocessing on the raw datasets. In general, we use all the available case and event attributes without doing any feature extraction before encoding. Still, a few extra features are added to each event based on the timestamps, namely, hour, weekday, month, time since case start, and time since last event”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the calculation of a first evaluation value for use in selection of candidate data pieces as training data, where the value is calculated according to a time-based analysis of the acquisition times of first input data and different input data of Karpathy with the calculation of a value according to a time difference between the time of first data and the time of second data, where the second data is immediately before the first data of Teinemaa in order to incorporate differences in acquisition time into the data selection process (Teinemaa, Pg. 1327, Para. 3, “Temporal stability characterizes how much successive prediction scores obtained for the same case (sequence of events) differ from each other”), which allows for data processing to remove volatility (Teinemaa, Pg. 1327, Para. 3, “For a temporally stable classifier, such successive prediction scores are similar to each other, resulting in a smooth time series, while in case of an unstable classifier, the resulting time series is volatile”; Teinemaa, Pg. 1308, Para. 1, “volatile predictions can mislead users of the system”) and allows for context-dependent decisions on whether input data should be incorporated into the training data (Karpathy, Para. [0078], “only sensor data with the highest score from the same location within the last 10 minutes may be retained as potential data”). 

Claims 4 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Kanno et al. (hereinafter Kanno) (Patent Pub. No. US 2021/0004723 A1).

Regarding Claim 4, Karpathy teaches the machine learning system according to claim 2, wherein the first evaluation unit calculates, as the first evaluation value, a value . . . (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Determine Trigger Classification Score 411” functionality are collectively the first evaluation unit, which, as also discussed above, is configured to calculate a first evaluation value, “classification score”, see Para. [0077], “At 411, a trigger classifier score is determined”), 
and the first selection unit compares the first evaluation value with a standard value determined in advance, and selects the first input data as one of the pieces of candidate data (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Score Exceeds Threshold And Conditions Met 413” functionality are collectively the first selection unit, which is configured to compare the first evaluation value of each of the pieces of input data, which as discussed above is the “classifier score”, with a value, “threshold value”, which is a standard value because it establishes a “threshold” standard, and which determines whether the pieces of input data associated with the “classifier score” will be selected for inclusion in “continue[d]” “processing” at “415”, see Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415”; see also Fig. 3, where the value is determined in advance because the steps of “Train Trigger Classifier 305” and “Determine Trigger Properties 307” occur in advance of “Deploy Trigger Classifier And Properties 309”; see also Para. [0060], “the intermediate results of the deep learning analysis at 205 are utilized for identifying training data at 207 and transmitting the identified sensor data at 209”, where the plurality of pieces of “sensor data” “identified” for inclusion in future processing are candidate data for “training data”).
Karpathy does not explicitly disclose . . . according to a degree of difference representing a difference between the first input data and k pieces of the candidate data immediately before the first input data among the pieces of candidate data, k being an integral number equal to or larger than 1 . . . .
However, Kanno teaches . . . [calculating an evaluation value] according to a degree of difference representing a difference between the first input data and k pieces of the candidate data (Para. [0036], “ the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data, and further sorts the training data based on the difference. Here, it is assumed that the selecting unit 4 sorts the training data in ascending order based on a value indicating the difference”, where the calculated evaluation value, the rank used to “sort . . . in ascending order”, is according to a degree of difference, “indicating the difference”, which represents a difference, the variability in “difference between a category determined for training data and correct answer data” between the candidate data, the  “each of training data”, where “sort[ing] . . . in ascending order” requires a plurality of data, represented as an integral greater than 1, where one input is the first input data and the remaining are the k pieces)
immediately before the first input data among the pieces of candidate data, k being an integral number equal to or larger than 1 . . .  (Para. [0036], “ the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data, and further sorts the training data based on the difference. Here, it is assumed that the selecting unit 4 sorts the training data in ascending order based on a value indicating the difference”, where, as discussed above, “sort[ing] . . . in ascending order” “each of training data”, requires a plurality of data, represented as an integral greater than 1, where one input is the first input data and the remaining are the k pieces, and where, in this instance, the first input data is the data “in [the] ascending order”, which is immediately after the k pieces, which are immediately before it).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the first evaluation unit, which calculates a first evaluation value for use by the first selection unit in data selection of Karpathy with the calculating of an evaluation value according to a degree of difference representing a difference between the first input data and at least one other candidate piece, which are arranged immediately before the first input data and are of a quantity representable by an integral number greater than or equal to 1 of Kanno in order to utilize an evaluation value that clearly organizes data pieces based on whether each should be considered for further evaluation (Kanno, Para. [0038], “It can be said that training data having a small difference from correct answer data is appropriate as training data used for learning the first model. In addition, it can be said that training data having a large difference from correct answer data is inappropriate as training data used for learning the first model”, where converting model inferences to a usable value allows for clear metrics that can easily be compared against the data immediately before and after it, see Kanno, Para. [0036], “the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data, and further sorts the training data based on the difference”), which allows for improved model training and increased inference accuracy of trained models (Kanno, Para. [0003], “When training data includes training data not having a characteristic affecting determination of a category, accuracy of determining a learned model is reduced, or learning of a model is adversely affected. Therefore, it is necessary to remove training data not having a characteristic affecting determination of a category from collected training data”) by allowing only transmission of candidate data pieces to the server which conform a desired level of training data appropriateness (Kanno, Para. [0038], “Therefore, in the training data sorted in ascending order based on a value indicating the difference, higher training data can be said to be appropriate training data, and lower training data can be said to be inappropriate training data”; Karpathy, Para. [0080], “At 415, the identified sensor data is transmitted. For example, the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”).

Regarding Claim 7, Karpathy teaches the machine learning system according to claim 1, wherein the second evaluation unit calculates the second evaluation value for first candidate data among the pieces of candidate data . . . by inputting the first candidate data to a machine learning model (Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which, as also discussed above, is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires a calculated evaluation value, the second evaluation value in this instance, which is output based on the configuration of the “machine learning model”, the second evaluation standard in this instance, which receives the first candidate data as input, “data identified as potentially useful training data”).
	Karpathy does not explicitly disclose . . . by analyzing an inference result or an intermediate result obtained . . . .
However, Kanno teaches . . . [calculating an evaluation value for data] by analyzing an inference result or an intermediate result obtained [by inputting the data into a machine learning model] . . . (Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”, where the “value indicating the difference” between the inference “category” of the data and the “correct answer” for the data, is the evaluation value, which is calculated by analyzing the “category” inference of the “model”).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the second evaluation unit, which calculates the second evaluation value for the first candidate data among the pieces of candidate by inputting the data into a machine learning model of Karpathy with the calculation of an evaluation value for a set of data by analyzing an inference result obtained by inputting the data into a machine learning model of Kanno in order to convert model outputs into evaluation metrics that clearly determine whether candidate data should be used as training data (Kanno, Para. [0038], “It can be said that training data having a small difference from correct answer data is appropriate as training data used for learning the first model. In addition, it can be said that training data having a large difference from correct answer data is inappropriate as training data used for learning the first model”, where converting model inferences to a usable value allows for clear metrics that can easily be compared, see Kanno, Para. [0036], “the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data, and further sorts the training data based on the difference”), which allows for improved model training and increased inference accuracy of trained models (Kanno, Para. [0003], “When training data includes training data not having a characteristic affecting determination of a category, accuracy of determining a learned model is reduced, or learning of a model is adversely affected. Therefore, it is necessary to remove training data not having a characteristic affecting determination of a category from collected training data”).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of PSU Online Contributors (hereinafter PSU Online) (“Lesson 34: Creating Randomness”).

Regarding Claim 6, Karpathy teaches the machine learning system according to claim 2, wherein the first evaluation value represents a binary value of effectiveness or ineffectiveness (Para. [0079], “Classifier scores between +0.5 and 1.0 are used to identify positive examples and classifier scores between −1.0 and −0.5 are used to identify negative examples. In some embodiments, only positive examples are retained for transmittal”, where, as discussed above, the “Classifier scores” are the first evaluation value, which is a number that represents a binary value of “positive” or “negative”, where only “positive” are considered effective as training data, “relevant” data “retained for transmittal” to “improve performance”, see Para. [0026], “A neural network training technique for identifying additional training data relevant to particular use cases . . . to improve its performance . . . the existing machine learning model is utilized with a trigger classifier to identify relevant training data. The relevant training data is then transmitted”), 
the first evaluation unit calculates the first evaluation value based on . . . the effectiveness or the ineffectiveness . . . (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Determine Trigger Classification Score 411” functionality are collectively the first evaluation unit, which, as also discussed above, is configured to calculate a first evaluation value, “classification score”, see Para. [0077], “At 411, a trigger classifier score is determined”; see also Para. [0026], “A neural network training technique for identifying additional training data relevant to particular use cases . . . to improve its performance . . . the existing machine learning model is utilized with a trigger classifier to identify relevant training data. The relevant training data is then transmitted”, where “data relevant to particular use cases” that “improve[s] . . . performance” is assigned a higher score based on effectiveness for “transmission” for incorporation into “training data”, otherwise it is considered ineffective), 
and the first selection unit selects the first input data as one of the pieces of candidate data in a case in which the first evaluation value indicates selection (Fig. 4, where, as discussed above, the associated software and hardware required to execute the “Score Exceeds Threshold And Conditions Met 413” functionality are collectively the first selection unit, which, as also discussed above, is configured to compare the first evaluation value of each of the pieces of input data, which as discussed above is the “classifier score”, with a value, “threshold value”, to determine whether the pieces of input data, including first input data, associated with the “classifier score” will be selected for inclusion in “continue[d]” “processing” at “415”, which occurs in the case where the first evaluation value indicates selection, “the classifier score exceeds the threshold value”, see Para. [0078], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415”).
Karpathy does not explicitly disclose . . . a random number such that . . . occurs with a probability set in advance . . . .
	However, PSU Online teaches [generation of a binary value based on] . . . a random number (Pg. 5, Para. 2, “If the random number generated is less than or equal to 0.30, then the observation is selected for inclusion in the sample”, where a binary value of “selected for inclusion” or not “selected for inclusion” is generated based on a “random number”)
such that . . . [the binary value] occurs with a probability set in advance . . . (Pg. 4-5, Para. 1-2, “the variable random contains only values that are smaller than 0.30, as should be expected in light of the WHERE= option attached to the DATA statement . . . Since the mailing data set has 50 observations, about 30% of the observations should be selected”, where the probability of “0.30” is set in advance for the chance that input data, “the mailing data”, will be assigned the binary value of “selected”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the generation of an evaluation value for use in selection of training data candidates, wherein the evaluation value represents a binary value of effectiveness or ineffectiveness of Karpathy with the generation of a binary value based on a random number such that the binary value occurs with a probability set in advance of PSU Online in order to introduce a degree of randomness into the selection of training data candidates, which will allow for a more uniform distribution of samples (PSU Online, Pg. 4, Para. 1, “Launch and run  the SAS program. Then, review the resulting output to see the random sample that SAS selected from the mailing data set. You should note a couple of things. First, the people that appear in the random sample appear to be fairly uniformly distributed across the 50 possible Num values”) and can mitigate processing constants associated with large datasets (PSU Online, Pg. 2, Para. 2, “Randomly selecting records from a large data set may be helpful if your data set is so large as to prevent slow processing”).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Kanno and Luo et al. (hereinafter Luo) (“Confident Learning: Estimating Uncertainty in Dataset Labels”).

Regarding Claim 8, Karpathy in view of Kanno teach the machine learning system according to claim 7, wherein the first machine learning model classifies input data into any of a plurality of classes (Karpathy, Para. [0056], “At 205, deep learning analysis of the sensor data is initiated . . . In various embodiments, the machine learning model is trained offline and installed onto the vehicle for performing inference on the sensor data. For example, the model may be trained to identify road lane lines, obstacles, pedestrians, moving vehicles, parked vehicles, drivable space, etc., as appropriate”, where the first machine learning model, “the machine learning model” used in “205, deep learning analysis”, classifies input data, “identif[ies]” “sensor data”, into a plurality of classes, “road lane lines, obstacles, pedestrians, moving vehicles, parked vehicles, drivable space, etc.,” for use in “autonomous driving”, see Karpathy, Para. [0060], “the results of the deep learning analysis are provided to vehicle control. For example, the results are used by a vehicle control module to control the vehicle for autonomous driving” and Karpathy, Para. [0057], “possible use cases may involve identifying: a curved road, an on ramp, an off ramp, the entrance to a tunnel, the exit of a tunnel, an obstacle in the road, a fork in the road, road lane lines or markers, drivable space, road signage, contents of signs (e.g., words, numbers, symbols, etc.), and/or other features as appropriate for autonomous driving”), 
the second evaluation unit acquires a classification probability of belonging to each of the classes (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to receive “sensor data”, “the sensor data received at 501 includes data identified as potentially useful training data”, which includes the “classifier score”, see Karpathy, Para. [0029], “additional metadata is collected and retained along with the sensor data such as . . . the classifier score”, which is the probability of belonging to a class, see Karpathy, Para. [0057], “a higher classified score indicates a higher likelihood the sensor data is representative of the use case”; see also Karpathy, Para. [0023], “A multitude of these classifiers may be uploaded to a computer system within a vehicle, such that the classifier may be used to recognize specific image features or objects associated with the classifiers. The captured images that are designated by the classifier as including the particular feature or object can then be transmitted to a central server system and used as training data for neural network systems”, where a “multitude” of classifier scores identifying each of the classes, “captured images that are designated by the classifier as including the particular feature or object”, can be provided to the “server”)
obtained by inputting the first candidate data to the first machine learning model (Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the first candidate data, is part of the data input to the first machine learning model, “deep learning system”, which is analyzed by the “trigger classifier”, see Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”), and 
calculates, as the second evaluation value, a value according to a degree of difference representing a difference between a classification probability of a class into which the first candidate data is classified as belonging among the classes and a classification probability of [comparison] . . . (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, is according to a degree of difference “a difference”, representing a difference between a classification probability of a class into which the first candidate data is classified among the classes of classification, “the selecting unit 4 determines a category to which each of training data belongs” and the classification probability of comparison, “correct answer data”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”), and 
the second selection unit compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured, as discussed above, to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, which is within the broadest reasonable interpretation of a standard value because it sets the standard for “a particular use case”, see Karpathy, Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Karpathy, Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Karpathy, Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value).
The reasons for obviousness, in regard to the combination of Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 7 above and remain applicable here. 
Karpathy in view of Kanno do not explicitly disclose . . .  each of one or a plurality of the classes into which the first candidate data is classified as not belonging . . . . 
	However, Luo teaches . . . [calculating a value according to a difference between a classification probability the first candidate data is classified into and] each of one or a plurality of the classes into which the first candidate data is classified as not belonging . . . (Pg. 594-595, Para. 4-1, “Suppose P(a) is the largest and P(b) is the second largest probability for example x, where a, b are class labels. “BT” tries to improve the P(a)−P(b). Intuitively, improving the value of P(a)−P(b) amounts to breaking the tie between P(a) and P(b), thus improving the classification confidence”, where “example x” is the first candidate data, which is classified into “class” “a” and not “class” “b”, which is one class “x” is not classified as belong to, and a value is calculated between the difference probabilities of the classes, “P(a)−P(b)”). 
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the calculation of a second evaluation value for use by a second selection unit, wherein the second evaluation value is calculated according to a difference between a classification probability of an assigned class for a first candidate data and another classification probability of comparative interest of Karpathy in view of Kanno with the calculation of a value according to a difference between a classification probability that the first candidate data is classified into and each of one or a plurality of the classes into which the first candidate data is classified as not belonging of Luo in order to identify data associated with lower certainty model outputs (Luo, Pg. 595, Para. 1, “Intuitively, improving the value of P(a)−P(b) amounts to breaking the tie between P(a) and P(b), thus improving the classification confidence”, where a smaller “value of P(a)−P(b)” shows that the model’s “classification confidence” between “between P(a) and P(b)” is lower), which allows for removal of training data candidates with reduced utility for particular use cases (Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case” and Karpathy, Para. [0045], “Thus, the outside system 120 may receive sensor data 108 from a multitude of vehicles . . . For example, a portion of images transmitted to the system 120 may not include tires. In some embodiments, the entity may thus rapidly review and discard certain of the images. The remaining images may be aggregated into large training data sets and used to update the machine learning models executing on the vehicle”, where lower confidence scores, such as those incorrectly classified, “a portion of images transmitted to the system 120 may not include tires”, can be discarded based on their lower difference score).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Kanno and Mahajan et al. (hereinafter Mahajan) (“Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration”).

Regarding Claim 9, Karpathy in view of Kanno teach the machine learning system according to claim 7, wherein the second evaluation unit calculates, as the second evaluation value, a value according to a degree of difference . . . (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated representing a degree of difference between data, “correct answer data” and “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”)
[associated with] first evaluation data calculated by a first arithmetic processing device as hardware having first arithmetic accuracy (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Karpathy, Fig. 4, where, as discussed above, the “vehicle” is the first information processing device, which executes the functionality of Fig. 4, and where the first evaluation data is all data produced during this “process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” because it data used to evaluate whether to “identify” data as “potential training data”; see also Karpathy, Para. [0039], “a deep learning system 700 of one or more processors, which is included in the vehicle 102”, where the first information processing device includes the first arithmetic processing device, “one or more processors”, used to perform arithmetic with first arithmetic accuracy, “complex . . . calculations”, see Karpathy, Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”)
and second evaluation data calculated by a second arithmetic processing device as hardware . . . (Karpathy, Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Karpathy, Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, which as discussed above is the second information processing device, performs the functionality of “FIG. 5[’s] . . . flow diagram”, all the data generated during the “process” is the second evaluation data because it is used to evaluate “training data” and the “machine learning model” for “deploy[ment]”; see also Karpathy, Fig. 5 and Karpathy, Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”, where the second arithmetic processing device, “hardware or one or more physical computing devices”, is “necessary to perform the functionality . . . [due to the] complexity of the calculations involved” in Fig. 5), 
the second selection unit compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured, as discussed above, to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, which is within the broadest reasonable interpretation of a standard value because it sets the standard for “a particular use case”, see Karpathy, Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Karpathy, Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Karpathy, Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value), 
the first evaluation data (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where the first evaluation data is all data produced during “Fig. 4[‘s]” “process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” because it data used to evaluate whether to “identify” data as “potential training data”)
includes at least one of output data of the first machine learning model and intermediate data output from a predetermined position in the first machine learning model (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Karpathy, Fig. 4, where, as discussed above, data generated as part of the “process for identifying potential training data” is first evaluation data, including both “output” “of the final layer” or “output” of an “intermediate” layer, see Karpathy, Para. [0072] – [0076], “At 401, a deep learning analysis is initiated . . . with sensor data captured by sensors attached to a vehicle . . . At 403, inference using one layer of the deep learning analysis is completed. For example, a neural network . . . At 405, a determination is made whether the output of the layer analysis performed at 403 is a result of the final layer of the neural network. In the event the output is not the result of the final layer, for example, the output is an intermediate result . . . At 409, a determination is made whether the layer of the neural network and trigger conditions are appropriate for applying the trigger classifier”; see also Karpathy, Para. [0076], “For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case. In some cases, the trigger properties used to specify the conditions to apply the trigger classifier can be nested using multiple conditional checks and/or logical operators such as AND and OR operators”, where “conditional checks and/or logical operators” can be used to predetermine which “result” of “layer of the neural network” to evaluate in advance)
obtained by inputting the first candidate data to the first machine learning model (Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the first candidate data, is part of the data input to the first machine learning model, “deep learning system”, which is analyzed by the “trigger classifier”, see Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”), and 
the second evaluation data (Karpathy, Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Karpathy, Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, performs the functionality of “FIG. 5[’s] . . . flow diagram”, and all the data generated during the “process” is the second evaluation data because it is used to evaluate “training data” and the “machine learning model” for “deploy[ment]”; see also Karpathy, Fig. 5 and Karpathy, Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”, where, as discussed above, the second arithmetic processing device, “hardware or one or more physical computing devices”, is “necessary to perform the functionality . . . [due to the] complexity of the calculations involved” in Fig. 5)
includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data output from the predetermined position in the first machine learning model (Karpathy, Para. [0078] – [0080], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415  . . . At 415, the identified sensor data is transmitted. For example, the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”, where the second evaluation data includes “the identified sensor data” that is “transmitted to a computer server”, to perform the functionality of “FIG. 5[’s] . . . flow diagram”, see Karpathy, Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”; see also Karpathy, Para. [0077], “At 411, a trigger classifier score is determined. For example, a trigger classifier score is determined by applying the trigger classifier to the intermediate results of the neural network”, where the “identified sensor data”, which is “transmitted” to be part of the second evaluation data, corresponds to the output data/intermediate output data, “for example . . . intermediate results”, of the first machine learning model, “the neural network”, because the “identified sensor data” is determined by a “classifier score” from “applying the trigger classifier to the intermediate results”; ”; see also Karpathy, Para. [0076], “For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case. In some cases, the trigger properties used to specify the conditions to apply the trigger classifier can be nested using multiple conditional checks and/or logical operators such as AND and OR operators”, where “conditional checks and/or logical operators” can be used to predetermine which “result” of “layer of the neural network” to evaluate in advance)
obtained by inputting the first candidate data to the first machine learning model (Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the first candidate data, is part of the data input to the first machine learning model, “deep learning system”, which is analyzed by the “trigger classifier”, so the first and second evaluation data are directly and indirectly, respectively, obtained from inputting the first candidate data to the first machine learning model, see Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”).
The reasons for obviousness, in regard to the combination of Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 7 above and remain applicable here.
Karpathy in view of Kanno do not explicitly disclose . . . representing a difference between . . . having second arithmetic accuracy higher than the first arithmetic accuracy . . . .
However, Mahajan teaches . . . [performing an action when a value,] . . . representing a difference between . . . [data processed by a first arithmetic processing device and a second arithmetic processing device, is larger than another value] . . . (Pg. 2, Col. 2, Para. 2, “The difference between the imprecise accelerator output and the precise output is the accelerator error”; Pg. 4, Col. 1, Para. 5, “For each invocation, use the original precise result if the accelerator error exceeds the threshold”; see also Pg. 2, Col. 1, Para. 2, “The outputs from the accelerator are an approximation of the outputs that the core would have calculated”, where the “imprecise accelerator” is the first arithmetic processing device that produces the “imprecise” “output” and the “the core”, is the second arithmetic processing device that produces the “precise output”)
[wherein the second arithmetic processing device is configured as] having second arithmetic accuracy higher than the first arithmetic accuracy [of the first arithmetic processing device] . . . (Pg. 2, Col. 1, Para. 2, “Approximate accelerators trade small losses in output quality for significant performance and efficiency gains . . . When a processor core is augmented with an approximate accelerator, the core delegates the computation of frequently executed safe-to-approximate functions to the accelerator . . . The outputs from the accelerator are an approximation of the outputs that the core would have calculated”, where the “processor core” is an arithmetic processing device, which has higher arithmetic accuracy than the first arithmetic device, the “approximate accelerator” with outputs that are “an approximation of the outputs that the core would have calculated”).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the machine learning system comprising a first information processing device with a first arithmetic processing device and a second information processing device with a second arithmetic processing device, wherein the second information processing device selects training data based on evaluation values from data generated by each of the information processing devices of Karpathy in view of Kanno with a second arithmetic processing device configured to execute information processing with higher arithmetic accuracy than a first arithmetic processing device, wherein an action is taken when a difference between a value from the data processed by the first arithmetic processing device and the data processed by the second arithmetic processing device is larger than a value of Mahajan in order to balance data processing tradeoffs of using either of the two information processing devices (compare Karpathy, Para. [0023], “The captured images that are designated by the classifier . . . can then be transmitted to a central server system and used as training data for neural network systems. Since the classifiers may leverage existing machine learning models already being executed by the vehicles in typical operation, the classifiers may be efficient in terms of processing requirements”, where “efficient” allocation of processing resources is used to handle “processing requirements” associated with actions taken on the “vehicles”, with Karpathy, Para. [0083], “at [the server] . . . a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the server must have the increased capacity to use “a highly accurate machine learning model . . . to confirm” the efficient calculations of the vehicles) by offloading model training to the second information processing device for training data instances associated with a significantly increased error rate on the first information processing device, while achieving energy and performance improvements by performing lower accuracy computations, where appropriate for the associated training data, using the first information processing device (Mahajan, Pg. 2, Col. 1, Para. 2-4, “When a processor core is augmented with an approximate accelerator, the core delegates the computation of . . . safe-to-approximate functions to the accelerator . . . Instead of executing the function, the core sends the function’s inputs to the accelerator and retrieves its outputs . . . MITHRA . . . provide[s] flexibility in controlling final quality loss and . . . maximize[s] the performance and energy benefits at any level of quality. MITHRA aims to only filter out those approximate accelerator invocations that cause relatively large quality degradation in the final output”).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Kanno and Yi et al. (hereinafter Yi) (“Transform consistency for learning with noisy labels”).

Regarding Claim 10, Karpathy in view of Kanno teach the machine learning system according to claim 7, wherein the second evaluation unit calculates, as the second evaluation value, a value according to a degree of difference representing a difference between . . . [data, to evaluate] . . . the first candidate data . . . [for selection of pieces of] . . . the first candidate data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated representing a degree of difference between data, “correct answer data” and “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”),
the second selection unit compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured, as discussed above, to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, which is within the broadest reasonable interpretation of a standard value because it sets the standard for “a particular use case”, see Karpathy, Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Karpathy, Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Karpathy, Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value), 
the first evaluation data (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where the first evaluation data is all data produced during “Fig. 4[‘s]” “process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” because it data used to evaluate whether to “identify” data as “potential training data”)
includes at least one of output data of the first machine learning model and intermediate data output from a predetermined position in the first machine learning model (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Karpathy, Fig. 4, where, as discussed above, data generated as part of the “process for identifying potential training data” is first evaluation data, including both “output” “of the final layer” or “output” of an “intermediate” layer, see Karpathy, Para. [0072] – [0076], “At 401, a deep learning analysis is initiated . . . with sensor data captured by sensors attached to a vehicle . . . At 403, inference using one layer of the deep learning analysis is completed. For example, a neural network . . . At 405, a determination is made whether the output of the layer analysis performed at 403 is a result of the final layer of the neural network. In the event the output is not the result of the final layer, for example, the output is an intermediate result . . . At 409, a determination is made whether the layer of the neural network and trigger conditions are appropriate for applying the trigger classifier”; see also Karpathy, Para. [0076], “For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case. In some cases, the trigger properties used to specify the conditions to apply the trigger classifier can be nested using multiple conditional checks and/or logical operators such as AND and OR operators”, where “conditional checks and/or logical operators” can be used to predetermine which “result” of “layer of the neural network” to evaluate in advance), and 
the second evaluation data (Karpathy, Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Karpathy, Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, performs the functionality of “FIG. 5[’s] . . . flow diagram”, and all the data generated during the “process” is the second evaluation data because it is used to evaluate “training data” and the “machine learning model” for “deploy[ment]”; see also Karpathy, Fig. 5 and Karpathy, Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”, where, as discussed above, the second arithmetic processing device, “hardware or one or more physical computing devices”, is “necessary to perform the functionality . . . [due to the] complexity of the calculations involved” in Fig. 5)
includes data corresponding to the first evaluation data, which is any of the output data of the first machine learning model and the intermediate data output from the predetermined position in the first machine learning model (Karpathy, Para. [0078] – [0080], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415  . . . At 415, the identified sensor data is transmitted. For example, the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”, where the second evaluation data includes “the identified sensor data” that is “transmitted to a computer server”, to perform the functionality of “FIG. 5[’s] . . . flow diagram”, see Karpathy, Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”; see also Karpathy, Para. [0077], “At 411, a trigger classifier score is determined. For example, a trigger classifier score is determined by applying the trigger classifier to the intermediate results of the neural network”, where the “identified sensor data”, which is “transmitted” to be part of the second evaluation data, corresponds to the output data/intermediate output data, “for example . . . intermediate results”, of the first machine learning model, “the neural network”, because the “identified sensor data” is determined by a “classifier score” from “applying the trigger classifier to the intermediate results”; ”; see also Karpathy, Para. [0076], “For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case. In some cases, the trigger properties used to specify the conditions to apply the trigger classifier can be nested using multiple conditional checks and/or logical operators such as AND and OR operators”, where “conditional checks and/or logical operators” can be used to predetermine which “result” of “layer of the neural network” to evaluate in advance).
The reasons for obviousness, in regard to the combination of Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 7 above and remain applicable here.
Karpathy in view of Kanno do not explicitly disclose . . . first evaluation data obtained by inputting . . . to the first machine learning model and second evaluation data obtained by inputting data that is obtained by partially changing . . . to the first machine learning model . . . .
However, Yi teaches . . . [calculating a value representing a degree of difference between a] first evaluation data obtained by inputting [input data] . . . to the first machine learning model and second evaluation data obtained by inputting data that is obtained by partially changing [the input data] . . . to the first machine learning model . . . (Pg. 1, Col. 2, Para. 2, “we feed the original and transformed (horizontally flip) images into one single network, and observe the Kullback-Leibler (KL) Divergence”, where the first evaluation data is the first output of the “one single network”, which is obtained by inputting, “feed[ing]”, “the original” data into the “one single network”, and where the second evaluation data is the second output of the “one single network”, which is obtained by inputting, “feed[ing]”, partially changing the input data, “transformed (horizontally flip) images” into the “one single network”, and where the “the Kullback-Leibler (KL) Divergence” is a value representing a degree of difference between the data).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the second evaluation unit to calculate a second evaluation value for use by the second selection unit in selecting data from the first candidate data for use as training data, wherein the second evaluation value is a value according to a degree of difference representing a difference between data of Karpathy in view of Kanno with the calculating of a value representing a degree of difference between first evaluation data obtained by inputting input data to a first machine learning model and second evaluation data obtained by inputting data that is obtained by partially changing the input data to the first machine learning model of Yi in order to utilize a simple and effective method to identify potential training data as clean or noisy (Yi, Pg. 1, Col. 2, Para. 2, “we propose a simple and effective method to distinguish clean samples only using one single network. We find that the prediction consistency under different image transforms (such as scaling, rotation, flipping) in one network is beneficial to select clean samples”), which prevents poor performance of trained models, due to overfitting on noisy labels (Yi, Pg. 1, Para. 2, “Unfortunately, the obtained annotations inevitably contain noisy labels. As DNNs have the capability to memorize all training samples, they will eventually overfit the noisy labels, leading to poor generalization performance”).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Kanno and Liang et al. (hereinafter Liang) (“R-Drop: Regularized Dropout for Neural Networks”).

Regarding Claim 11, Karpathy in view of Kanno teach the machine learning system according to claim 7, wherein the second evaluation unit calculates, as the second evaluation value, a value according to a degree of difference representing a difference between [data] and second evaluation data obtained by inputting the first candidate data to a second machine learning model (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated representing a degree of difference between data, “correct answer data”, and a plurality of pieces of second output data, “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”),
the second selection unit compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured, as discussed above, to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, which is within the broadest reasonable interpretation of a standard value because it sets the standard for “a particular use case”, see Karpathy, Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Karpathy, Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Karpathy, Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value), 
the first evaluation data (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where the first evaluation data is all data produced during “Fig. 4[‘s]” “process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” because it data used to evaluate whether to “identify” data as “potential training data”)
includes at least one of output data of the first machine learning model and intermediate data output from a predetermined position in the first machine learning model (Karpathy, Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Karpathy, Fig. 4, where, as discussed above, data generated as part of the “process for identifying potential training data” is first evaluation data, including both “output” “of the final layer” or “output” of an “intermediate” layer, see Karpathy, Para. [0072] – [0076], “At 401, a deep learning analysis is initiated . . . with sensor data captured by sensors attached to a vehicle . . . At 403, inference using one layer of the deep learning analysis is completed. For example, a neural network . . . At 405, a determination is made whether the output of the layer analysis performed at 403 is a result of the final layer of the neural network. In the event the output is not the result of the final layer, for example, the output is an intermediate result . . . At 409, a determination is made whether the layer of the neural network and trigger conditions are appropriate for applying the trigger classifier”; see also Karpathy, Para. [0076], “For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case. In some cases, the trigger properties used to specify the conditions to apply the trigger classifier can be nested using multiple conditional checks and/or logical operators such as AND and OR operators”, where “conditional checks and/or logical operators” can be used to predetermine which “result” of “layer of the neural network” to evaluate in advance), and 
the second evaluation data (Karpathy, Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Karpathy, Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, performs the functionality of “FIG. 5[’s] . . . flow diagram”, and all the data generated during the “process” is the second evaluation data because it is used to evaluate “training data” and the “machine learning model” for “deploy[ment]”; see also Karpathy, Fig. 5 and Karpathy, Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”, where, as discussed above, the second arithmetic processing device, “hardware or one or more physical computing devices”, is “necessary to perform the functionality . . . [due to the] complexity of the calculations involved” in Fig. 5)
includes data corresponding to the first evaluation data, which is any of the output data of the second machine learning model and the intermediate data output from the predetermined position in the second machine learning model (Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where, as discussed above, data generated during the process of Fig. 5 is the second evaluation data, which includes output data of the second machine learning model, output from the “highly trained machine learning model”, which corresponds with the first evaluation data, because the “sensor data” “confirm[ed]” by the “highly trained learning model” is transmitted to the server based on the first evaluation data, “classifier score exceeds a threshold”, see Karpathy, Para. [0078] – [0080], “At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415  . . . At 415, the identified sensor data is transmitted. For example, the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”, where the second evaluation data includes “the identified sensor data” that is “transmitted to a computer server”, to perform the functionality of “FIG. 5[’s] . . . flow diagram”).
The reasons for obviousness, in regard to the combination of Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 7 above and remain applicable here.
Karpathy in view of Kanno do not explicitly disclose . . . the first evaluation data obtained by inputting the first candidate data to the first machine learning model  . . . that is obtained by partially changing the first machine learning model . . . .
However, Liang teaches . . . [calculating a value representing a difference between] . . . the first evaluation data obtained by inputting the first candidate data to the first machine learning model . . . [and second evaluation data obtained by inputting the first candidate data to a second machine learning model] that is obtained by partially changing the first machine learning model . . . (Pg. 1, Para. 3, “each data sample goes through the forward pass twice, and each pass is processed by a different sub model”, where the first candidate data, “each data sample”, is “processed” “twice”, once by each of two “different sub model[s]”, where the first and second evaluation data are the outputs of the two “different sub model[s]”, the “two distributions of the model predictions”, see Pg. 3, Para. 4, “we can obtain two distributions of the model predictions, denoted as Pw1 (yi |xi) and Pw2 (yi |xi)”; see also Pg. 1, Abstract, “R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models”, where a value of “KL-divergence” represents the difference between the first and second evaluation data, “KL-divergence between the output distributions of two sub models”; see also Pg. 3, Para. 4, “the two forward passes are indeed based on two different sub models (though in the same model)”, where the “two different sub models” are partially changed versions of each other, as versions of the “same model”). 
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the use of the second evaluation unit to calculate the second evaluation for training data selection, wherein the first evaluation value is obtained by inputting data into the first machine learning model and the second evaluation value is a value according to a degree of difference between data and second evaluation data obtained from inputting the first candidate data into a second machine learning model of Karpathy in view of Kanno with the calculation of a value representing a difference between first evaluation data obtained by inputting first candidate data to a first machine learning model and second evaluation data obtained by inputting the first candidate data to a second machine learning model that is obtained by partially changing the first machine learning model of Liang in order to select training data which produces consistent outputs across similar models (Liang, Pg. 1-2, Para. 3-1, “R-Drop forces the two distributions for the same data sample outputted by the two sub models to be consistent with each other, through minimizing the bidirectional Kullback-Leibler (KL) divergence between the two distributions”, where the output “distributions” will be “consistent” for similar “sub models” when “minimizing the bidirectional Kullback-Leibler (KL) divergence”), which will lead to more predictably consistent results during training and increased applicability of the trained models to inference use cases (Liang, Pg. 2, Para. 4, “We theoretically show that our R-Drop can reduce the inconsistency between training and inference of the dropout based models”, where, though framed in regard to reducing inconsistency by a model-altering mechanism, a sampling method to minimize KL divergence would also “reduce inconsistency”; see also Karpathy, Para. [0084] – [0086], “the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case . . . At 509, the trained machine learning model is deployed. For example, the trained machine learning model is installed on a vehicle as an update for the autonomous learning system”, where when “merged with the newly converted training data” the training data would benefit from training data associated with predictably consistent results). 

Claims 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Kanno and Beluch et al. (hereinafter Beluch) (“The power of ensembles for active learning in image classification”).

Regarding Claim 12, Karpathy in view of Kanno teach the machine learning system according to claim 7, wherein the second evaluation unit calculates, as the second evaluation value, a value [from] . . . a plurality of pieces of output data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated from a plurality of pieces of output data, “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”), 
the second selection unit compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured, as discussed above, to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, which is within the broadest reasonable interpretation of a standard value because it sets the standard for “a particular use case”, see Karpathy, Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Karpathy, Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Karpathy, Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value), and 
the pieces of output data are a plurality of inference results obtained by inputting the first candidate data to a . . . machine learning [model] . . . different from . . . the first machine learning model (Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where, as discussed above, the use of “a highly accurate machine learning model” for data evaluation requires the inputting of the candidate data “data identified as potentially useful” to the “machine learning model” , which as demonstrated by the “highly accurate” and its temporal location in the server “120”, is different from the first machine learning model in the “Deep Learning System 700”, see Karpathy, Fig. 1B, and the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated from a plurality of pieces of inference result output data, “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”).
The reasons for obviousness, in regard to the combination of Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 7 above and remain applicable here.
Karpathy in view of Kanno do not explicitly disclose . . . representing variation among . . . plurality of . . . models . . . learned with learning parameters . . . learning parameters of . . . .
	However, Beluch teaches . . . [calculating a value] representing variation among [a plurality of pieces of output data from a] . . . plurality of . . . models . . . learned with learning parameters [different from] learning parameters of [a reference model] (Pg. 9370, Col. 2, Para. 4, “the variance of
the softmax output vectors within the ensemble or within T forward passes can also be used as an acquisition function”, where “the variance” is a value representing variance of a plurality of pieces of output, “output vectors”, from “the ensemble”; see also Pg. 9372, Col.2, Para. 1, “an ensemble of five networks” and Pg. 9370, Col. 1, Para. 5, “all ensembles are trained with the same Dtrain and same network architecture, but different random weight initializations winit. One could also take additional measures to de-correlate the ensembles, such as bootstrapping or using different network architectures”, where “the ensemble” is a plurality of models, which will have different learning parameters than each other when “using different network architectures”, resulting in at least four of the “five networks” having different learning parameters than the learning parameters of another model of reference).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the calculation of an evaluation value from a plurality of pieces of output data for use in training data selection, wherein the plurality of pieces of output data are obtained by inputting the first candidate data to a machine learning model different from the first machine learning model of Karpathy in view of Kanno with the calculation of a value representing variation among a plurality of pieces of output data from a plurality of models learned with learning parameters different from learning parameters of a reference model of Beluch in order to select desirable training data (Beluch, Pg. 9368, Col. 2, Para. 1, “Starting with an initial (small) data set to train a model, new data-points to be labeled (e.g. by a human expert) are selected with a so-called acquisition function. This function ranks unlabeled data by “how desirable” label information is expected to be for each data-point. Commonly used acquisition functions are based on criteria such as  variance reduction”; see also Beluch, Pg. 9370, Col. 1, Para. 5, “all ensembles are trained with the same Dtrain and same network architecture, but different random weight initializations winit. One could also take additional measures to de-correlate the ensembles, such as bootstrapping or using different network architectures”, where “de-correlat[ing]” the models would further allow for an assessment of variance) using a method with demonstrated state-of-the-art performance at training data acquisition (Beluch, Pg. 9375, Col. 2, Para. 2, “We compare the performance of acquisition functions and uncertainty estimation methods for active learning with CNNs on image classification tasks. We show that ensemble-based uncertainties consistently outperform other methods of uncertainty estimation (in particular MC Dropout) and lead to state-of-the-art active learning performance”).

Regarding Claim 13, Karpathy in view of Kanno teach the machine learning system according to claim 7, wherein the second evaluation unit calculates, as the second evaluation value, a value based on a degree of difference representing a difference between . . . [data] and each of one or more pieces of second output data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Convert Sensor Data Into Training Data 503” functionality are collectively the second evaluation unit, which is configured to evaluate whether each of the pieces of candidate data, “the sensor data received at 501 includes data identified as potentially useful training data” is effective when being used for learning of the first machine learning model, “confirm whether the sensor data represents the targeted use case”, by “a highly accurate machine learning model”, see Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the use of “a highly accurate machine learning model” for data evaluation requires the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated representing a degree of difference between data, “correct answer data”, and a plurality of pieces of second output data, “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”), 
the second selection unit compares the second evaluation value with a standard value determined in advance, and selects the first candidate data as one of the pieces of learning data (Karpathy, Fig. 5, where, as discussed above, the associated software and hardware required to execute the “Prepare Training And Validation Data Sets 505” and “Train Machine Learning Model 507” functionality are collectively the second selection unit, which is configured, as discussed above, to select whether each of the pieces of candidate data, “the training data of 503”, is included in the pieces of learning data, “merged into existing training data sets”, by comparing “the training data of 503” with a value, “a particular use case”, which is within the broadest reasonable interpretation of a standard value because it sets the standard for “a particular use case”, see Karpathy, Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”; see also Karpathy, Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the “use cases” must be determined in order to be “identified by a trigger classifier”, which occurs in advance of the “flow diagram” functionality of “FIG.5”; see also Karpathy, Para. [0083], “For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where using the output from “the machine learning model”, which as discussed above, is the second evaluation value, to “confirm whether the sensor data represents the targeted use case”, which as discussed above is the value, is withing the broadest reasonable interpretation of comparing the second evaluation value with the value), 
the first output data is an inference result obtained by inputting the first candidate data to the first machine learning model (Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the first candidate data, is part of the data input to the first machine learning model, “deep learning system”, which is analyzed by the “trigger classifier”, see Karpathy, Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data”, where the “trigger classifier result” is the first output data), and 
the one or more pieces of second output data are respectively one or more inference results obtained by inputting the first candidate data to one or more machine learning models . . . different from . . . the first machine learning model (Karpathy, Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where, as discussed above, the use of “a highly accurate machine learning model” for data evaluation requires the inputting of the candidate data “data identified as potentially useful” to the “machine learning model” , which as demonstrated by the “highly accurate” and its temporal location in the server “120”, is different from the first machine learning model in the “Deep Learning System 700”, see Karpathy, Fig. 1B, and the output of an evaluation value, the second evaluation value in this instance, which, in view of Kanno, the value, “a value indicating the difference”, is calculated from a plurality of pieces of inference result second output data, “a category to which each of training data belongs”, see Kanno, Para. [0036], “the selecting unit 4 determines a category to which each of training data belongs by applying each of the training data to the first model . . . the selecting unit 4 calculates a difference between a category determined for training data and correct answer data corresponding to the training data . . . a value indicating the difference”).
The reasons for obviousness, in regard to the combination of Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 7 above and remain applicable here.
	Karpathy in view of Kanno do not explicitly disclose . . . first output data . . . learned with learning parameters . . . learning parameters of . . . .
However, Beluch teaches . . . [calculating a value based on a degree of difference between] first output data . . . [and second output data from a machine learning model] learned with learning parameters [different from] learning parameters of [the model used to generate the first output data] (Pg. 9370, Col. 2, Para. 4, “the variance of the softmax output vectors within the ensemble or within T forward passes can also be used as an acquisition function”, where “the variance of” is a value based on a degree of difference between two sets of output data, “output vectors”, output from an “the ensemble”; Pg. 9370, Col. 1, Para. 5, “all ensembles are trained with the same Dtrain and same network architecture, but different random weight initializations winit. One could also take additional measures to de-correlate the ensembles, such as bootstrapping or using different network architectures”, where “the ensemble” is a plurality of models, necessarily requiring a first and second model associated with first and second output data, which will have different learning parameters than each other when “using different network architectures”).
The reasons for obviousness, in regard to the combination of Beluch with Karpathy in view of Kanno, were discussed in regard to the rejection of Claim 12 above and remain applicable here.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Pemantle (“A survey of random processes with reinforcement”) and Derakhshan et al. (hereinafter Derakhshan) (“Continuous Deployment of Machine Learning Pipelines”).

Regarding Claim 14, Karpathy teaches the machine learning system according to claim 1, wherein the second information processing device further comprises a . . . unit configured to transmit . . . information . . . [for selection of data] as the learning data, to the first information processing device . . . (Para. [0048] – [0049], “The system 120 may then transmit information to a portion of vehicles to execute the same classifier when proximate to the particular real-world area. In this way, the system 120 may ensure that it is able to obtain a greater quantity of training data based on this same sensor. Furthermore, the system 120 may instruct vehicles to transmit sensor data even if the above-described classifier does not assign a classifier score greater than a threshold”, where the second information processing device, “The system 120”, transmits information for selection of data as the learning data, such as “information to . . . execute the same classifier when proximate to the particular real-world area”, to the first information processing device, the “vehicle”, which requires associated hardware and software, a unit, that is configured to perform the “transmit[ting]”), 
and the first information processing device receives the . . . information, and makes a probability of selecting, as the candidate data, the input data acquired in a . . . range determined in advance after the input data indicated by the . . . information to be higher than a probability of selecting another . . . range (Para. [0046] – [0049], “each classifier may execute for a particular period of time before being swapped for another classifier . . . The system 120 may then transmit information to a portion of vehicles to execute the same classifier when proximate to the particular real-world area. In this way, the system 120 may ensure that it is able to obtain a greater quantity of training data based on this same sensor. Furthermore, the system 120 may instruct vehicles to transmit sensor data even if the above-described classifier does not assign a classifier score greater than a threshold . . . In this example, the system 120 may instruct any vehicle within a threshold distance of that real-world location to transmit sensor data (e.g., images) even if their classifiers do not generate a classifier score greater than a threshold . . . In this way, the outside system 120 may override the classifier and cause the particular vehicle to transmit sensor data”, where the “transmit[ted] information” causes the probability of selecting input data as candidate data to be higher for input data collected after transmission that is within a certain range, “ensure that it is able to obtain a greater quantity of training data . . . [by] overrid[ing] the classifier”, as compared with another range, “any vehicle” outside of the “threshold distance of that real-world location”, and where the range must be determined in advance for it to be indicated by the transmitted information, “the system 120 may instruct”).
Karpathy does not explicitly disclose . . . feedback . . . employment . . . indicating that corresponding input data is selected . . . each time the learning data is selected . . . employment . . . time . . . employment . . . time . . .
	However, Pemantle teaches . . . feedback [comprising] . . . employment . . . indicating that corresponding input data is selected . . . each time the learning data is selected . . . [wherein the] . . . employment [increases the probability that data in the range of the data associated with the] . . . employment [will be selected] (Pg. 4, Para. 5, “The original Polya urn model . . . has an urn that begins with one red ball and one black ball. At each time step, a ball is chosen at random and put back in the urn along with one extra ball of the color drawn, this process being repeated”, where “At each step . . . repeated”, feedback is provided comprising employment indicating that corresponding data is selected, “a ball is chosen at random and put back in the urn along with one extra ball of the color drawn”; see also Pg. 4, Para. 2, “the probability of choosing a ball of a given type is equal to the proportion of that type in the urn”, where the employment, “choosing a ball”, increases the number of data in that range, “of the [same] color drawn” in this instance, which increases the “proportion of that type” and therefore “the probability of choosing”; see generally Pg. 25, Para. 2, “Urn models: applications . . . use reinforcement models (mostly urn models) . . . to provide quick and robust algorithms” and Pg. 35-36, Section “4.4 Learning”, where “urn model[s]” can be used for data sampling algorithms and in learning data applications).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the transmission of information for selection of data as learning data from the second information processing device to the first information processing device, wherein the information increases the probability of selecting input data as candidate data so that it is higher for a range, determined in advance, as indicated by the information, and as compared with the probability of selecting input data from another range of Karpathy with the feedback comprising employment indicating that corresponding input data is selected each time the learning data is selected, wherein the employment increases the probability that data in the range associated with the employment will be selected of Pemantle in order to utilize a quick and robust algorithm for data selection (Pemantle, Pg. 25, Para. 2, “Urn models: applications . . . use reinforcement models (mostly urn models) . . . to provide quick and robust algorithms”), which has achieved rigorously tested results across multiple domain applications (Pemantle, Pg. 43, Para. 4, “models now abound in a variety of social science disciplines, including psychology, sociology [BL03], public health [EL04], political science [OMH+04]. The discussion here will concentrate on a few game-theoretic applications in which rigorous results have been obtained”) in order to prioritize selection of training data for specific use cases, by incorporating reinforcement models to select data with similar attributes (Karpathy, Para. [0004], “Typically, the performance of the deep learning system is limited at least in part by the quality of the training set used to train the model. In many instances, significant resources are invested in collecting, curating, and annotating the training data. The effort required to create the training set can be significant and is often tedious. Moreover, it is often difficult to collect data for particular use cases that a machine learning model needs improvement on”).
	Additionally, Derakhshan teaches . . . [selecting training data for one] time [range] . . . [at a higher probability compared with another] time [range] (Pg. 6, Col. 2, Para. 3, “The time-based sampling strategy assigns weights to every data chunk based on their timestamp such that recent chunks have a higher probability of being sampled. The window-based sampling strategy is similar to the uniform sampling, but instead of sampling from the entire historical data, the data manager samples the data from a given time range”, where “sampling” assigns a “higher probability” for some “time range” “chunk[s]” as compared to other “chunk[s]”).
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the selection of input data as candidate data, wherein, after each selection, feedback information is transmitted from the second information processing device to the first information processing device so that input data in a range of the selected data will have a higher probability of being selected than input data in another range of Karpathy in view of Pemantle with the selecting training data for one time range at a higher probability compared with another time range of Derakhshan in order to prioritize the selection of data from a specific time range, in instances where a machine learning model will have better performance when applied to a particular use case, if trained on data from a specific time range, such as data from vehicles traveling near a known, but impermanent, highway hazard (Derakhshan, Pg. 6, Col. 2, Para. 3, “Based on the specific use-case, the user chooses the appropriate sampling strategy. In many real-world use cases (e.g., e-commerce and online advertising), the deployed model should adapt to the more recent data. Therefore, the time-based and window-based sampling provide more appropriate samples for training”; see also Karpathy, Para. [0046] – [0049], “each classifier may execute for a particular period of time before being swapped for another classifier . . . The system 120 may then transmit information to a portion of vehicles to execute the same classifier when proximate to the particular real-world area. In this way, the system 120 may ensure that it is able to obtain a greater quantity of training data based on this same sensor. Furthermore, the system 120 may instruct vehicles to transmit sensor data even if the above-described classifier does not assign a classifier score greater than a threshold . . . In this example, the system 120 may instruct any vehicle within a threshold distance of that real-world location to transmit sensor data (e.g., images) even if their classifiers do not generate a classifier score greater than a threshold . . . In this way, the outside system 120 may override the classifier and cause the particular vehicle to transmit sensor data”). 

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Karpathy in view of Mahajan.

Regarding Claim 16, Karpathy teaches a machine learning system (Abstract, “Systems and methods for obtaining training data . . . includes . . . applying a neural network to the sensor data. A trigger classifier is applied to an intermediate result of the neural network to determine a classifier score for the sensor data. Based at least in part on the classifier score, a determination is made whether to transmit via a computer network at least a portion of the sensor data. Upon a positive determination, the sensor data is transmitted and used to generate training data”, where “Systems” that include “applying a neural network” are machine learning systems)
comprising: a first information processing device including a first arithmetic processing device as hardware; and a second information processing device that is hardware different from the first arithmetic processing device, and includes a second arithmetic processing device configured to execute information processing . . . (Para. [0007], “FIG. 1B is a block diagram illustrating one embodiment of a system for generating training data” and FIG. 1B, where the machine learning system, as indicated by the inclusion of a “Deep Learning System 700” component, includes a vehicle “102” as a first device and a “Training Data Generating System 120” as a second device, which are connected via the “Network”; see also Para.[0023], “classifiers may be uploaded to a computer system within a vehicle, such that the classifier may be used to recognize specific image features or objects associated with the classifiers. The captured images that are designated by the classifier as including the particular feature or object can then be transmitted to a central server system and used as training data for neural network systems”, where both the “vehicle”, which as discussed above is “102”, and the “server”, which is 120, see Para. [0080], “a computer server (e.g., the training data generation system 120)”, are information processing devices because the “vehicle” processes “image features or objects” and the “server” processes “training data”; see also Fig. 1B and Para. [0039], “a deep learning system 700 of one or more processors, which is included in the vehicle 102”, where the first information processing device includes processing device hardware, “one or more processors” to perform arithmetic, “complex . . . calculations”, see Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”; see also Para.[0023], “The captured images that are designated by the classifier as including the particular feature or object can then be transmitted to a central server system and used as training data for neural network systems” and Para. [0083], “At 503, . . . the sensor data received at [the server] . . .  [is analyzed using] a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where the “server” is hardware different from the first arithmetic processing device, the processor on the vehicle, and the “server” itself must have an arithmetic processing device, the second arithmetic processing device, to “to perform the functionality, for example, due to the volume or complexity of the calculations involved” in analyzing data with a “machine learning model” and “training . . . neural network systems”), 
wherein the first information processing device generates, using the first arithmetic processing device, first evaluation data (Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where, as discussed above, the “vehicle” is the first information processing device, which executes the functionality of Fig. 4, and where the first evaluation data is all data produced during this “process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” because it data used to evaluate whether to “identify” data as “potential training data”; see also Para. [0039], “a deep learning system 700 of one or more processors, which is included in the vehicle 102”, where, as discussed above, the first information processing device includes the first arithmetic processing device, “one or more processors”, used to perform arithmetic, “complex . . . calculations”, see Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”)
including at least one of output data of a first machine learning model and intermediate data output from a predetermined position in the first machine learning model obtained by inputting each of a plurality of pieces of input data to the first machine learning model (Para. [0071]-[0072], “FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using . . . deep learning analysis of an autonomous driving system . . . [of] a vehicle” and Fig. 4, where, as discussed above, data generated as part of the “process for identifying potential training data” is first evaluation data, including both “output” “of the final layer” or “output” of an “intermediate” layer, see Para. [0072] – [0076], “At 401, a deep learning analysis is initiated . . . with sensor data captured by sensors attached to a vehicle . . . At 403, inference using one layer of the deep learning analysis is completed. For example, a neural network . . . At 405, a determination is made whether the output of the layer analysis performed at 403 is a result of the final layer of the neural network. In the event the output is not the result of the final layer, for example, the output is an intermediate result . . . At 409, a determination is made whether the layer of the neural network and trigger conditions are appropriate for applying the trigger classifier”, where “sensor data captured by sensors attached to a vehicle” comprises a plurality of pieces of input data, each of which are input to the first machine learning model, the “deep learn[er]” such as “a neural network” ; see also Para. [0076], “For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case. In some cases, the trigger properties used to specify the conditions to apply the trigger classifier can be nested using multiple conditional checks and/or logical operators such as AND and OR operators”, where “conditional checks and/or logical operators” can be used to predetermine which “result” of “layer of the neural network” to evaluate in advance), 
the second information processing device generates, using the second arithmetic processing device, second evaluation data (Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, which as discussed above is the second information processing device, performs the functionality of “FIG. 5[’s] . . . flow diagram”, all the data generated during the “process” is the second evaluation data because it is used to evaluate “training data” and the “machine learning model” for “deploy[ment]”; see also Fig. 5 and Para. [0106], “application-specific hardware or one or more physical computing devices (utilizing appropriate specialized executable instructions) may be necessary to perform the functionality, for example, due to the volume or complexity of the calculations involved or to provide results substantially in real-time”, where, as discussed above, the second arithmetic processing device, “hardware or one or more physical computing devices”, is “necessary to perform the functionality . . . [due to the] complexity of the calculations involved” in Fig. 5)
including at least one of the output data of the first machine learning model and the intermediate data output from the predetermined position in the first machine learning model obtained by inputting each of the pieces of input data to the first machine learning model (Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where, as discussed above, data generated as part of the “process for creating training data” and “a revised machine learning model” is second evaluation data, including output data of the first machine learning model, the “machine learning model” to be “installed on a vehicle”, during “train[ing]” and “validation”, see Para. [0085]-, “At 507, a machine learning model is trained. For example, a machine learning model is trained using the data prepared at 505 . . . the training model is validated using a validation data set created from the received sensor data . . . At 509, the trained machine learning model is deployed. For example, the trained machine learning model is installed on a vehicle”, where the “machine learning model” outputs are obtained by inputting the plurality of pieces of input data during “train[ing]” and “validation”, and where the input data is the “data prepared at 505” and the “validation data set created from the received sensor data”, which as discussed above is a comprising component of the “sensor data”), and 
the second information processing device selects, as learning data for training the first machine learning model (Para. [0071], “The sensor data is then transmitted to a computer server and may be used to create training data for a revised machine learning model” and Para. [0081], “FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data”, where the “server”, which as discussed above is the second information processing device, performs the functionality of “FIG. 5[’s] . . . flow diagram”, which includes selection of learning data, “training data of 503 is merged . . . for improved coverage of a particular use case”, for training of the machine learning model, “improving the accuracy of the model”, see Para. [0084] – [0085], “At 505 . . . the training data of 503 is merged into existing training data sets. For example, an existing training data set applicable for most use cases is merged with the newly converted training data for improved coverage of a particular use case. The newly converted training data is useful for improving the accuracy of the model in identifying the particular use case. At 507, a machine learning model is trained . . . using the data prepared at 505”), 
input data for which [are associated with sufficient evaluation values of] . . . the first evaluation data and the second evaluation data . . . (Fig. 4, where, as discussed above, the data generated during the process of “Fig. 4”, including the step of “Determine Trigger Classification Score 411” , is first evaluation data, which assigns an evaluation value, “classification score”, to the “sensor data” and transmits data associated with a sufficient evaluation value, the data  “identified sensor data” comprising the input data, to the server, see Para. [0077] – [00], “At 411, a trigger classifier score is determined . . . At 413, a determination is made whether the classifier score exceeds a threshold . . . In the event the classifier score exceeds the threshold value, processing continues to 415 . . .  At 415, the identified sensor data is transmitted. For example, the sensor data identified is transmitted to a computer server (e.g., the training data generation system 120) where it may be used to create training data”; Fig. 5 and Para. [0083], “At 503, . . . the sensor data received at 501 includes data identified as potentially useful training data . . . In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case . . . For example, a highly accurate machine learning model is used to confirm whether the sensor data represents the targeted use case”, where, as discussed above, the data generated during the process of Fig. 5 is the second evaluation data, which includes the evaluation value generated by “a highly accurate machine learning model” for the data “identified as potentially useful for training”, the data comprising the input data, where the data “confirmed” as having a sufficient evaluation value to “represent[t] the targeted use case” are the input data) 
[to satisfy] a standard value determined in advance among the pieces of input data (Para. [0071], “The trigger classifier analyzes sensor data at least partially analyzed by the deep learning system to identify whether the sensor data meets particular use cases that warrant retaining the sensor data” and Para. [0011], “FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier”, where the standard to determine “whether the sensor data meets particular use cases that warrant retaining” is used to determine which of the plurality of pieces of “sensor data” will be “retained”, as the plurality of pieces of input data, which is a standard value, which must be determined in advance because the steps of “Train Trigger Classifier 305” and “Determine Trigger Properties 307” occur in advance of “Deploy Trigger Classifier And Properties 309”, see Fig. 5).
Karpathy does not explicitly disclose . . . with higher arithmetic accuracy than the first arithmetic processing device . . . a difference between . . . is larger than . . . .
However, Mahajan teaches . . . [a second arithmetic processing device configured to execute information processing] with higher arithmetic accuracy than the first arithmetic processing device . . . (Pg. 2, Col. 1, Para. 2, “Approximate accelerators trade small losses in output quality for significant performance and efficiency gains . . . When a processor core is augmented with an approximate accelerator, the core delegates the computation of frequently executed safe-to-approximate functions to the accelerator . . . The outputs from the accelerator are an approximation of the outputs that the core would have calculated”, where the “processor core” is an arithmetic processing device, which has higher arithmetic accuracy than the first arithmetic device, the “approximate accelerator” with outputs that are “an approximation of the outputs that the core would have calculated”) 
[wherein, an action is taken when] a difference between [a value from the data processed by the first arithmetic processing device and the second arithmetic processing device] . . . is larger than [a value] . . . (Pg. 2, Col. 2, Para. 2, “The difference between the imprecise accelerator output and the precise output is the accelerator error”; Pg. 4, Col. 1, Para. 5, “For each invocation, use the original precise result if the accelerator error exceeds the threshold”).
The reasons for obviousness, in regard to the combination of Karpathy with Mahajan, were discussed in regard to the rejection of Claim 9 above and remain applicable here.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW BRYCE GOLAN whose telephone number is (571)272-5159. The examiner can normally be reached Monday through Friday, 8:00 AM to 5:00 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW BRYCE GOLAN/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
MACHINE LEARNING SYSTEM, EDGE DEVICE, AND INFORMATION PROCESSING DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MACHINE LEARNING SYSTEM, EDGE DEVICE, AND INFORMATION PROCESSING DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email