DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 are pending and examined herein.
Claims 1-20 are rejected under 35 U.S.C. 101.
Claims 1-2 are rejected under 35 U.S.C. 102.
Claims 3-7, 11-13, and 15-19 are rejected under 35 U.S.C. 103.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject
matter. The analysis of claims 1-20 in accordance with these steps, follows.
Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine,
manufacture, or composition of matter. Claims 1-10 and 18-20 are directed to a process and claims 11-17 are directed to a machine. All claims are directed to statutory subject matter and analysis proceeds.
Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
None of the claims represent an improvement to technology.
Regarding claim 1, the following claim elements are abstract ideas:
dividing the input data set into a plurality of input data categories; (Dividing data into categories can be practically performed in the human mind. This is a mental process.)
determining a regression curve equation for each of the plurality of input data categories based at least in part on the at least one exploratory training session; (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A method for training a machine learning algorithm, the method comprising: (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
performing at least one exploratory training session of the machine learning algorithm using an input data set, wherein the input data set includes a first plurality of input data samples; (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
collecting a second plurality of input data samples based at least in part on the regression curve equation for each of the plurality of input data categories; and (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
training the machine learning algorithm using the second plurality of input data samples and the input data set. (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, the following are abstract ideas:
wherein dividing the input data set into the plurality of input data categories further comprises: (Dividing data into categories can be practically performed in the human mind. This is a mental process.)
identifying at least one data set parameter by which to categorize the input data set; and (Identifying a parameter to categorize the data set can be practically performed in the human mind. This is a mental process.)
dividing the input data set into the plurality of input data categories based on the at least one data set parameter. (Dividing data into categories based on a parameter can be practically performed in the human mind. This is a mental process.)
Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, the following are abstract ideas:
wherein determining the regression curve equation for each of the plurality of input data categories further comprises: (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
generating an exploratory learning curve plot of a model loss of the machine learning algorithm versus a quantity of input data samples trained for each of the plurality of input data categories in the at least one exploratory training session; and (Plotting data and generating a curve can be practically performed in the human mind, and is therefore a mental process.)
determining the regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories. (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
Regarding claim 4, the rejection of claim 3 is incorporated herein. Further, the following are abstract ideas:
wherein determining the regression curve equation for each of the plurality of input data categories further comprises: (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
determining the regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories, wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form: (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories. (This is a mathematical equation, which is a mathematical concept.)
Regarding claim 5, the rejection of claim 4 is incorporated herein. Further, the following are abstract ideas:
identifying a first subset of the plurality of input data categories based at least in part on a quantity of input data samples in each of the plurality of input data categories; (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
determining a plurality of data collection quotas based at least in part on the regression curve equation for each of the plurality of input data categories, wherein each of the plurality of data collection quotas corresponds to one of the first subset of the plurality of input data categories; and (Determining a quota based on an equation can be practically performed in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein collecting the second plurality of input data samples further comprises: (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
collecting a second plurality of input data samples based at least in part on the plurality of data collection quotas. (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
Regarding claim 6, the rejection of claim 5 is incorporated herein. Further, the following are abstract ideas:
wherein identifying the first subset of the plurality of input data categories further comprises: (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
comparing a quantity of input data samples in each of the plurality of input data categories to a previous data collection quota, wherein the previous data collection quota is one of the plurality of data collection quotas determined during a previous execution of the method; and (Comparing data can be practically performed in the human mind. This is a mental process.)
determining each of the plurality of input data categories having a quantity of input data samples less than the previous data collection quota to be one of the first subset of the plurality of input data categories. (Determining a category that has not met a quota can be practically performed in the human mind. This is a mental process.)
Regarding claim 7, the rejection of claim 5 is incorporated herein. Further, the following are abstract ideas:
wherein determining one of the plurality of data collection quotas corresponding to one of the first subset of the plurality of input data categories further comprises: (Determining a quota can be practically performed in the human mind. This is a mental process.)
identifying a second subset of the plurality of input data categories, wherein the second subset of the plurality of input data categories includes each of the plurality of input data categories not in the first subset of the plurality of input data categories; (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
determining an average steepness
β
g
-
of the regression curve equation for each of the second subset of the plurality of input data categories based at least in part on the regression curve equation of each of the second subset of the plurality of input data categories; (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determining a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determining the one of the plurality of data collection quotas using a predetermined equation based at least in part on the average steepness
β
g
-
, the constant factor
α
'
and the lower bound model loss
γ
'
. (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
Regarding claim 8, the rejection of claim 7 is incorporated herein. Further, the following are abstract ideas:
wherein determining the one of the plurality of data collection quotas using the predetermined equation further comprises: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determining the one of the plurality of data collection quotas using the predetermined equation, wherein the predetermined equation includes: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
m
i
+
1
=
0.5
ε
m
i
-
γ
'
α
'
β
g
-
-
1
wherein
m
i
+
1
is the one of the plurality of data collection quotas,
m
i
is a quantity of input data samples in the one of the first subset of the plurality of input data categories,
α
'
is a first constant factor of the one of the first subset of the plurality of input data categories,
γ
'
is a lower bound model loss of the one of the first subset of the plurality of input data categories, and
β
g
-
is the average steepness of the regression curve equation for each of the second subset of the plurality of input data categories. (This is a mathematical equation, which is a mathematical concept.)
Regarding claim 9, the rejection of claim 8 is incorporated herein. The following are abstract ideas.
determining a quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories using an equation: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
s
=
m
i
+
1
-
m
i
wherein
s
is the quantity of additional input data samples to collect for the one of the first subset of the plurality of input data categories,
m
i
+
1
is the one of the plurality of data collection quotas for the one of the first subset of the plurality of input data categories, and
m
i
is the quantity of input data samples in the one of the first subset of the plurality of input data categories; and (This is a mathematical equation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein collecting the second plurality of input data samples further comprises: (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
transmitting a data sample collection task to a vehicle, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories. (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
Regarding claim 10, the rejection of claim 9 is incorporated herein. Further, the following are abstract ideas:
generating an updated input data set, wherein the updated input data set includes the second plurality of input data samples and the input data set; and (Generating a combined dataset can be practically performed in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein training the machine learning algorithm using the second plurality of input data samples and the input data set further comprises: (This recites generic machine learning components/processes; this amounts to mere instructions to apply an exception.)
receiving the second plurality of input data samples from the vehicle; (Receiving data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
performing the method using the updated input data set. (This is the insignificant extra-solution activity of ‘Selecting a particular data source or type of data to be manipulated’. See MPEP § 2106.05(g), ‘Selecting a particular data source or type of data to be manipulated’, ex. i-iv.)
Regarding claim 11, the following are abstract ideas:
divide the input data set into a plurality of input data categories; (Dividing data into categories can be practically performed in the human mind. This is a mental process.)
generate an exploratory learning curve plot of a model loss of the machine learning algorithm versus a quantity of input data samples trained for each of the plurality of input data categories in the at least one exploratory training session; (Plotting data and generating a curve can be practically performed in the human mind, and is therefore a mental process.)
determine a regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories, wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form: (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories; (This is a mathematical equation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A system for training a machine learning algorithm for a vehicle, the system comprising: (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
a server system including: (This recites a generic computer. This amounts to mere instructions to apply an exception.)
a server storage device; (This recites a generic computer component. This amounts to mere instructions to apply an exception.)
a server communication system; and (This recites a generic computer component. This amounts to mere instructions to apply an exception.)
a server controller in electrical communication with the server storage device and the server communication system, wherein the server controller is programmed to: (This recites a generic computer component and processes. This amounts to mere instructions to apply an exception.)
perform at least one exploratory training session of the machine learning algorithm using an input data set, wherein the input data set includes a first plurality of input data samples, and wherein the input data set is stored on the server storage device; (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
collect a second plurality of input data samples from the vehicle using the server communication system, wherein the second plurality of input data samples is based at least in part on the regression curve equation for each of the plurality of input data categories; and (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
train the machine learning algorithm using the second plurality of input data samples and the input data set. (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
Regarding claim 12, the rejection of claim 11 is incorporated herein. The following are abstract ideas:
identify a first subset of the plurality of input data categories based at least in part on a quantity of input data samples in each of the plurality of input data categories; (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
determine a plurality of data collection quotas based at least in part on the regression curve equation for each of the plurality of input data categories, wherein each of the plurality of data collection quotas corresponds to one of the first subset of the plurality of input data categories; (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determine a quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories using an equation: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
s
=
m
i
+
1
-
m
i
wherein
s
is the quantity of additional input data samples to collect for the one of the first subset of the plurality of input data categories,
m
i
+
1
is the one of the plurality of data collection quotas for the one of the first subset of the plurality of input data categories, and
m
i
is the quantity of input data samples in the one of the first subset of the plurality of input data categories; and (This is a mathematical equation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein to collect the second plurality of input data samples from the vehicle using the server communication system, the server controller is further programmed to: (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
transmit a data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories. (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
Regarding claim 13, the rejection of claim 12 is incorporated herein. Further, the following are abstract ideas:
wherein to determine one of the plurality of data collection quotas corresponding to one of the first subset of the plurality of input data categories, the server controller is further programmed to: (Determining a quota can be practically performed in the human mind. This is a mental process.)
identify a second subset of the plurality of input data categories, wherein the second subset of the plurality of input data categories includes each of the plurality of input data categories not in the first subset of the plurality of input data categories; (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
determine an average steepness
β
g
-
of the regression curve equation for each of the second subset of the plurality of input data categories based at least in part on the regression curve equation of each of the second subset of the plurality of input data categories; (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determine a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determine the one of the plurality of data collection quotas using a predetermined equation based at least in part on the average steepness
β
g
-
, the constant factor
α
'
and the lower bound model loss
γ
'
. (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
Regarding claim 14, the rejection of claim 13 is incorporated herein. Further, the following are abstract ideas:
wherein to determine the one of the plurality of data collection quotas using the predetermined equation, the server controller is further programmed to: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determine the one of the plurality of data collection quotas using the predetermined equation, wherein the predetermined equation includes: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
m
i
+
1
=
0.5
ε
m
i
-
γ
'
α
'
β
g
-
-
1
wherein
m
i
+
1
is the one of the plurality of data collection quotas,
m
i
is a quantity of input data samples in the one of the first subset of the plurality of input data categories,
α
'
is a first constant factor of the one of the first subset of the plurality of input data categories,
γ
'
is a lower bound model loss of the one of the first subset of the plurality of input data categories, and
β
g
-
is the average steepness of the regression curve equation for each of the second subset of the plurality of input data categories. (This is a mathematical equation, which is a mathematical concept.)
Regarding claim 15, the rejection of claim 12 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein to transmit the data sample collection task to the vehicle using the server communication system, the server controller is further programmed to: (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
transmit the data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes a validation algorithm describing one of the plurality of input data categories and at least one of: a task priority, a projected decrease in model loss, and the quantity of additional input data samples to collect. (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
Regarding claim 16, the rejection of claim 15 is incorporated herein. The following are abstract ideas:
determine a priority of the data sample collection task based at least in part on at least one of the task priority, the projected decrease in model loss, and the quantity of additional input data samples to collect; and (Determining a priority can be practically performed in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
a vehicle system including: (This recites a generic system. This amounts to mere instructions to apply an exception.)
at least one vehicle sensor; (This recites a generic sensor. This amounts to mere instructions to apply an exception.)
a vehicle communication system; and (This recites a generic system. This amounts to mere instructions to apply an exception.)
a vehicle controller in electrical communication with the at least one vehicle sensor and the vehicle communication system, wherein the vehicle controller is programmed to: (This recites a generic computer component. This amounts to mere instructions to apply an exception.)
receive the data sample collection task from the server system using the vehicle communication system; (Receiving data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
perform the data sample collection task using the at least one vehicle sensor. (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
Regarding claim 17, the rejection of claim 16 is incorporated herein. Further, the following are abstract ideas:
determine a second plurality of input data samples based at least in part on the validation algorithm, wherein the second plurality of input data samples is a subset of the plurality of unvalidated input data samples; and (Determining a variable based on an algorithm is a mathematical calculation, which is a mathematical calculation.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein to perform the data sample collection task, the vehicle controller is further programmed to: (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
record a plurality of unvalidated input data samples using the at least one vehicle sensor; : (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
transmit the second plurality of input data samples to the server communication system using the vehicle communication system. (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
Regarding claim 18, the following are abstract ideas:
dividing the input data set into a plurality of input data categories; (Dividing data into categories can be practically performed in the human mind. This is a mental process.)
generating an exploratory learning curve plot of a model loss of the machine learning algorithm versus a quantity of input data samples trained for each of the plurality of input data categories in the at least one exploratory training session; and (Plotting data and generating a curve can be practically performed in the human mind, and is therefore a mental process.)
determining a regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories, wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form: (Determining a regression curve equation is performing regression, which is a mathematical calculation, which is a mathematical concept.)
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories; (This is a mathematical equation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A method for training a machine learning algorithm for a vehicle, the method comprising: (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
performing at least one exploratory training session of the machine learning algorithm using an input data set, wherein the input data set includes a first plurality of input data samples, and wherein the input data set is stored on a server storage device; (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
transmitting a data sample collection task to a vehicle communication system of the vehicle using a server communication system; (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
receiving the data sample collection task using a vehicle communication system; collecting a second plurality of input data samples using at least one vehicle sensor, wherein the second plurality of input data samples is based at least in part on the regression curve equation for each of the plurality of input data categories; (Receiving data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
transmitting the second plurality of input data samples from the vehicle communication system to the server communication system; and (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
training the machine learning algorithm using the second plurality of input data samples received from the vehicle communication system and the input data set. (This recites generic machine learning components and processes. This amounts to mere instructions to apply an exception.)
Regarding claim 19, the rejection of claim 18 is incorporated herein. Further, the following are abstract ideas:
identifying a first subset of the plurality of input data categories based at least in part on a quantity of input data samples in each of the plurality of input data categories; (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
determining a plurality of data collection quotas based at least in part on the regression curve equation for each of the plurality of input data categories, wherein each of the plurality of data collection quotas corresponds to one of the first subset of the plurality of input data categories; (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determining a quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories using an equation: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
s
=
m
i
+
1
-
m
i
wherein
s
is the quantity of additional input data samples to collect for the one of the first subset of the plurality of input data categories,
m
i
+
1
is the one of the plurality of data collection quotas for the one of the first subset of the plurality of input data categories, and
m
i
is the quantity of input data samples in the one of the first subset of the plurality of input data categories; and (This is a mathematical equation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
transmitting the data sample collection task to the vehicle communication system using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories. (Transmitting data is an existing process on a computer. This amounts to mere instructions to apply an exception.)
wherein collecting the second plurality of input data samples from the vehicle using the server communication system further comprises: (This is the insignificant extra-solution activity of “Mere Data Gathering”, See MPEP § 2106.05(g), ‘Mere Data Gathering’, ex. i-vi.)
Regarding claim 20, the rejection of claim 19 is incorporated herein. Further, the following are abstract ideas:
wherein determining one of the plurality of data collection quotas corresponding to one of the first subset of the plurality of input data categories further comprises: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
identifying a second subset of the plurality of input data categories, wherein the second subset of the plurality of input data categories includes each of the plurality of input data categories not in the first subset of the plurality of input data categories; (Identifying a subset of data can be practically performed in the human mind. This is a mental process.)
determining an average steepness
β
g
-
of the regression curve equation for each of the second subset of the plurality of input data categories based at least in part on the regression curve equation of each of the second subset of the plurality of input data categories; (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determining a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
determining the one of the plurality of data collection quotas using the predetermined equation, wherein the predetermined equation includes: (Determining a number based on an equation is a mathematical calculation, which is a mathematical concept.)
m
i
+
1
=
0.5
ε
m
i
-
γ
'
α
'
β
g
-
-
1
wherein
m
i
+
1
is the one of the plurality of data collection quotas,
m
i
is a quantity of input data samples in the one of the first subset of the plurality of input data categories,
α
'
is a first constant factor of the one of the first subset of the plurality of input data categories,
γ
'
is a lower bound model loss of the one of the first subset of the plurality of input data categories, and
β
g
-
is the average steepness of the regression curve equation for each of the second subset of the plurality of input data categories. This is a mathematical equation, which is a mathematical concept.)
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-2 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Mahmood (“Optimizing Data Collection for Machine Learning”, 2022).
Regarding claim 1, Mahmood teaches
A method for training a machine learning algorithm, the method comprising: (Page 10 states "Improving data collection practices yields potentially positive and negative societal impacts. LOC reduces the collection of extraneous data, which can, in turn, reduce the environmental costs of training models.")
performing at least one exploratory training session of the machine learning algorithm using an input data set, wherein the input data set includes a first plurality of input data samples; (Page 6 states "Consider
K
∈
N
data sources (e.g.,
K
=
2
with labeled and unlabeled) and for each
k
∈
1
,
.
.
.
,
K
, let
z
k
~
p
k
(
z
k
)
be data drawn from their distribution. We train a model with a data set
D
:
=
∪
k
=
1
K
D
k
where each
D
k
contains points of the
k
-th source." The first training is interpreted as the exploratory training session, and
D
is interpreted as the input data set.)
dividing the input data set into a plurality of input data categories; (As the dataset is split into
D
k
for each data source, the input data set is divided into a plurality of input data categories.)
determining a regression curve equation for each of the plurality of input data categories based at least in part on the at least one exploratory training session; (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session.)
collecting a second plurality of input data samples based at least in part on the regression curve equation for each of the plurality of input data categories; and (Page 15 states "Finally in the third step, we solve our optimization problem (4) via gradient descent. This problem yields the optimal data set sizes
q
1
*
,
.
.
.
,
q
T
*
that we should have at the end of each round. Furthermore, if we are in the
t
-th round for
t
>
1
, we freeze the values for
q
1
,
.
.
.
,
q
t
-
1
to the data set sizes that we have observed in the previous rounds. Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." The multi-variate analogue of optimization problem (4), see page 6, involves the PDF and CDF fit using the regression models for the data categories. Therefore, a second plurality of input data is collected based on the regression curves for the input categories.)
training the machine learning algorithm using the second plurality of input data samples and the input data set. (Page 15 states “Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." Therefore, the collected data, interpreted as the second plurality of input data, is used to train the model, interpreted as the machine learning algorithm.)
Regarding claim 2, the rejection of claim 1 is incorporated herein. Mahmood teaches
wherein dividing the input data set into the plurality of input data categories further comprises: (Page 6 states "Consider
K
∈
N
data sources (e.g.,
K
=
2
with labeled and unlabeled) and for each
k
∈
1
,
.
.
.
,
K
, let
z
k
~
p
k
(
z
k
)
be data drawn from their distribution. We train a model with a data set
D
:
=
∪
k
=
1
K
D
k
where each
D
k
contains points of the
k
-th source." The first training is interpreted as the exploratory training session, and
D
is interpreted as the input data set. As the dataset is split into
D
k
for each data source, the input data set is divided into a plurality of input data categories.)
identifying at least one data set parameter by which to categorize the input data set; and (The is data source is interpreted as the data set parameter, which categorizes the input data set.)
dividing the input data set into the plurality of input data categories based on the at least one data set parameter. (As the dataset is split into
D
k
for each data source (data set parameter), the input data set is divided into a plurality of input data categories.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 3-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahmood (“Optimizing Data Collection for Machine Learning”, 2022) as applied to claim 1 above, and further in view of Hestness (“Deep Learning Scaling is Predictable, Empirically”, 2017).
Regarding claim 3, the rejection of claim 1 is incorporated herein. Mahmood teaches
wherein determining the regression curve equation for each of the plurality of input data categories further comprises: (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model
v
^
q
1
,
.
.
.
,
q
K
;
θ
and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session.)
generating an exploratory learning curve plot of [a performance metric] of the machine learning algorithm versus a quantity of input data samples trained for each of the plurality of input data categories in the at least one exploratory training session; and (Page 24 states "To estimate
F
(
q
)
, we first create an ensemble of estimated learning curves, which we then invert to obtain an empirical distribution of estimated values for
D
q
0
. Figure 5 plots our bootstrap resampled estimated learning curves versus the ground truth performance for the first round of data collection when we have access to an initial
D
q
0
containing 10% of the full data set." Fig. 5 shows the exploratory curve plots, which show a performance metric, such as accuracy, versus a quantity of input data samples for the estimated power law learning curves, interpreted as the exploratory learning curves.)
determining the regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories. (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model
v
^
q
1
,
.
.
.
,
q
K
;
θ
and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session. As the regression curve are the estimated power law learning curves, the regression curves are determined based on the exploratory learning curve plots.)
Mahmood does not appear to explicitly teach
[generating an exploratory learning curve plot of] a model loss of [the machine learning algorithm versus a quantity of input data samples trained]
However, Hestness—directed to analogous art—teaches
[generating an exploratory learning curve plot of a] model loss of [the machine learning algorithm versus a quantity of input data samples trained] (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood with the teachings of Hestness because, as Hestness states on page 5, "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
., where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant."
Regarding claim 4, the rejection of claim 3 is incorporated herein. Mahmood teaches
wherein determining the regression curve equation for each of the plurality of input data categories further comprises: (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model
v
^
q
1
,
.
.
.
,
q
K
;
θ
and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session.)
determining the regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories, (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model
v
^
q
1
,
.
.
.
,
q
K
;
θ
and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session. As the regression curve are the estimated power law learning curves, the regression curves are determined based on the exploratory learning curve plots.)
Mahmood does not appear to explicitly teach
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories.
However, Hestness—directed to analogous art—teaches
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories. (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Hestness for the reasons given above in regards to claim 3.
Regarding claim 5, the rejection of claim 4 is incorporated herein. Mahmood teaches
wherein collecting the second plurality of input data samples further comprises: (Page 15 states "Finally in the third step, we solve our optimization problem (4) via gradient descent. This problem yields the optimal data set sizes
q
1
*
,
.
.
.
,
q
T
*
that we should have at the end of each round. Furthermore, if we are in the
t
-th round for
t
>
1
, we freeze the values for
q
1
,
.
.
.
,
q
t
-
1
to the data set sizes that we have observed in the previous rounds. Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." The multi-variate analogue of optimization problem (4), see page 6, involves the PDF and CDF fit using the regression models for the data categories. Therefore, a second plurality of input data is collected based on the regression curves for the input categories.)
identifying a first subset of the plurality of input data categories based at least in part on a quantity of input data samples in each of the plurality of input data categories; (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
≤
0
, no more samples for that category are collected.)
determining a plurality of data collection quotas based at least in part on the regression curve equation for each of the plurality of input data categories, wherein each of the plurality of data collection quotas corresponds to one of the first subset of the plurality of input data categories; and (Page 19 states "The multi-variate data collection problem considers multiple sources delivering different types of data required to train a model. Consider
K
data sets with
q
1
,
.
.
.
,
q
K
points in each, respectively. Rather than collecting up to
q
t
data points in each round, we optimize a vector
q
t
∈
R
+
K
where each element
q
t
k
refers to how much data we need from the
k
-th source." Therefore, the vector
q
t
∈
R
+
K
contains a plurality of data collection quotas
q
t
k
for each of the input data categories (the sources). Page 20, Section D.2 shows the optimization problem to solve for
q
t
, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
.)
collecting a second plurality of input data samples based at least in part on the plurality of data collection quotas. (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The additional data collected in each round is interpreted as the second plurality of input data samples, which is based on the quotas
q
t
.)
Regarding claim 6, the rejection of claim 5 is incorporated herein. Mahmood teaches
wherein identifying the first subset of the plurality of input data categories further comprises: (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
≤
0
, no more samples for that category are collected.)
comparing a quantity of input data samples in each of the plurality of input data categories to a previous data collection quota, wherein the previous data collection quota is one of the plurality of data collection quotas determined during a previous execution of the method; and (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round."
q
t
-
1
is the previous data collection quota, as the current round is
t
. The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
≤
0
no more samples for that category are collected.)
determining each of the plurality of input data categories having a quantity of input data samples less than the previous data collection quota to be one of the first subset of the plurality of input data categories. (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round."
q
t
-
1
is the previous data collection quota, as the current round is
t
. The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
=
0
no more samples for that category are collected. Therefore, when the quantity of data in the current round
q
t
is less than the previous quota
q
t
-
1
, no samples will be collected.)
Regarding claim 7, the rejection of claim 5 is incorporated herein. Mahmood teaches
wherein determining one of the plurality of data collection quotas corresponding to one of the first subset of the plurality of input data categories further comprises: (Page 19 states "The multi-variate data collection problem considers multiple sources delivering different types of data required to train a model. Consider
K
data sets with
q
1
,
.
.
.
,
q
K
points in each, respectively. Rather than collecting up to
q
t
data points in each round, we optimize a vector
q
t
∈
R
+
K
where each element
q
t
k
refers to how much data we need from the
k
-th source." Therefore, the vector
q
t
∈
R
+
K
contains a plurality of data collection quotas
q
t
k
for each of the input data categories (the sources). Page 20, Section D.2 shows the optimization problem to solve for
q
t
, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
.)
identifying a second subset of the plurality of input data categories, wherein the second subset of the plurality of input data categories includes each of the plurality of input data categories not in the first subset of the plurality of input data categories; (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
>
0
, the data is collected for the category. As the first subset was the categories where
d
t
≤
0
)
[determining variables] of the regression curve equation for each of the second subset of the plurality of input data categories based at least in part on the regression curve equation of each of the second subset of the plurality of input data categories; (As all of the data categories are used to determine the regression curve equations, the variables from the regression curve equation are determined for each of the second subset of the categories. The regression model, see page 20, is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
. Therefore,
θ
k
are the variables determined.)
determining the one of the plurality of data collection quotas using a predetermined equation based at least in part on the [determined variables] (Page 20, Section D.2 shows the optimization problem, interpreted as the predetermined equation, to solve for
q
t
, which contain the data collection quotas, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
. Therefore, the quotas are found using a predetermined equation based at least in part on the determined variables.)
Mahmood does not appear to explicitly teach
determining an average steepness
β
g
-
determining a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and
average steepness
β
g
-
, the constant factor
α
'
and the lower bound model loss
γ
'
.
However, Hestness—directed to analogous art—teaches
determining an average steepness
β
g
-
(Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples.")
determining a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and (Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
average steepness
β
g
-
, the constant factor
α
'
and the lower bound model loss
γ
'
. (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Hestness for the reasons given above in regards to claim 3.
Claim(s) 11-13, 15-16, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahmood (“Optimizing Data Collection for Machine Learning”, 2022), Jiang (US 2020/0342693 A1), and Hestness (“Deep Learning Scaling is Predictable, Empirically”, 2017).
Regarding claim 11, Mahmood teaches
A system for training a machine learning algorithm for a vehicle, the system comprising: (Page 3 states "Motivating Example. A startup is developing an object detector for use in autonomous vehicles within the next
T
=
5
years. Their model must achieve a mean Average Precision greater than
V
*
=
95
%
on a pre-determined validation set or else they will lose an expected profit of
P
=
$
1,000,000
. Collecting training data requires employing drivers to record videos and annotators to label the data, where the marginal cost of obtaining each image is approximately
c
=
$
1
. In order to manage annual finances, the startup must plan how much data to collect at the beginning of each year.")
perform at least one exploratory training session of the machine learning algorithm using an input data set, wherein the input data set includes a first plurality of input data samples, and wherein the input data set is stored on the server storage device; (Page 6 states "Consider
K
∈
N
data sources (e.g.,
K
=
2
with labeled and unlabeled) and for each
k
∈
1
,
.
.
.
,
K
, let
z
k
~
p
k
(
z
k
)
be data drawn from their distribution. We train a model with a data set
D
:
=
∪
k
=
1
K
D
k
where each
D
k
contains points of the
k
-th source." The first training is interpreted as the exploratory training session, and
D
is interpreted as the input data set.)
divide the input data set into a plurality of input data categories; (As the dataset is split into
D
k
for each data source, the input data set is divided into a plurality of input data categories.)
generate an exploratory learning curve plot of [a performance metric] of the machine learning algorithm versus a quantity of input data samples trained for each of the plurality of input data categories in the at least one exploratory training session; (Page 24 states "To estimate
F
(
q
)
, we first create an ensemble of estimated learning curves, which we then invert to obtain an empirical distribution of estimated values for
D
q
0
. Figure 5 plots our bootstrap resampled estimated learning curves versus the ground truth performance for the first round of data collection when we have access to an initial
D
q
0
containing 10% of the full data set." Fig. 5 shows the exploratory curve plots, which show a performance metric, such as accuracy, versus a quantity of input data samples for the estimated power law learning curves, interpreted as the exploratory learning curves.)
determine a regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories. (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model
v
^
q
1
,
.
.
.
,
q
K
;
θ
and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session. As the regression curve are the estimated power law learning curves, the regression curves are determined based on the exploratory learning curve plots.)
collect a second plurality of input data samples … wherein the second plurality of input data samples is based at least in part on the regression curve equation for each of the plurality of input data categories; and (Page 15 states "Finally in the third step, we solve our optimization problem (4) via gradient descent. This problem yields the optimal data set sizes
q
1
*
,
.
.
.
,
q
T
*
that we should have at the end of each round. Furthermore, if we are in the
t
-th round for
t
>
1
, we freeze the values for
q
1
,
.
.
.
,
q
t
-
1
to the data set sizes that we have observed in the previous rounds. Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." The multi-variate analogue of optimization problem (4), see page 6, involves the PDF and CDF fit using the regression models for the data categories. Therefore, a second plurality of input data is collected based on the regression curves for the input categories.)
train the machine learning algorithm using the second plurality of input data samples and the input data set. (Page 15 states “Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." Therefore, the collected data, interpreted as the second plurality of input data, is used to train the model, interpreted as the machine learning algorithm.)
Mahmood does not appear to explicitly teach
a server system including:
a server storage device;
a server communication system; and
a server controller in electrical communication with the server storage device and the server communication system, wherein the server controller is programmed to:
[generating an exploratory learning curve plot of] a model loss of [the machine learning algorithm versus a quantity of input data samples trained]
[collecting additional data] using the server communication system,
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories;
However, Jiang—directed to analogous art—teaches
a server system including: ([0017] states "In a second aspect, a server is communicatively coupled to one or more autonomous driving vehicles (ADVs). The server includes a processor and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to receive driving data of one or more ADVs driven by a human driver in one or more driving categories, for one or more types of ADVs.”)
a server storage device; ([0017] states "In a second aspect, a server is communicatively coupled to one or more autonomous driving vehicles (ADVs). The server includes a processor and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to receive driving data of one or more ADVs driven by a human driver in one or more driving categories, for one or more types of ADVs.” The memory is interpreted as the server storage device.)
a server communication system; and ([0020] states "FIG. 1 is a block diagram illustrating an autonomous vehicle network configuration according to embodiment of the disclosure. Referring to FIG. 1, network configuration 100 includes autonomous vehicle 101 that may be communicatively coupled to one or more servers 103-104 over a network 102." The network is interpreted as the server communication system.)
a server controller in electrical communication with the server storage device and the server communication system, wherein the server controller is programmed to: ([0017] states "In a second aspect, a server is communicatively coupled to one or more autonomous driving vehicles (ADVs). The server includes a processor and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to receive driving data of one or more ADVs driven by a human driver in one or more driving categories, for one or more types of ADVs.” The server processor is interpreted as the server controller.)
[collecting additional data] using the server communication system ([0017] states "The server includes a processor and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to receive driving data of one or more ADVs driven by a human driver in one or more driving categories, for one or more types of ADVs. The instructions further cause the processor to select driving data for a specified type of ADV for one or more categories of driving." [0027] states "Referring back to FIG. 1, wireless communication system 112 is to allow communication between autonomous vehicle 101 and external systems, such as devices, sensors, other vehicles, etc. For example, wireless communication system 112 can wirelessly communicate with one or more devices directly or via a communication network, such as servers 103-104 over network 102." As the devices communicate using the network, when data is received/transmitted, the network is used. Hereinafter, this is considered to be the explanation for data transmission/receiving using the server communication system.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang becase, as Jiang states in [0003], "To operate an autonomous driving vehicle (ADV) in autonomous driving mode, a substantial amount of training data must be collected to train autonomous driving model(s) to safely operate the ADV in a driverless mode. Data is collected while a human driver is operating the ADV in manual (“human driver") mode. Data collection is crucial for generating, and thoroughly training, dynamic models and calibration tables for an autonomous driving vehicle. Collecting a data set that covers all necessary scenarios demands professional knowledge of machine learning and ADV control systems. A human driver does not know what data needs to be collected. Thus, an engineer typically accompanies the human driver to guide the human driver on the data that needs to be collected for training and / or calibrating an autonomous driving vehicle operation model. Such an approach is very inefficient."
The combination of Mahmood and Jiang does not appear to explicitly teach
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories;
However, Hestness—directed to analogous art—teaches
[generating an exploratory learning curve plot of a] model loss of [the machine learning algorithm versus a quantity of input data samples trained] (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis.)
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories; (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang with the teachings of Hestness because, as Hestness states on page 5, "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
., where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant."
Regarding claim 12, the rejection of claim 11 is incorporated herein. Mahmood teaches
wherein to collect the second plurality of input data samples (Page 15 states "Finally in the third step, we solve our optimization problem (4) via gradient descent. This problem yields the optimal data set sizes
q
1
*
,
.
.
.
,
q
T
*
that we should have at the end of each round. Furthermore, if we are in the
t
-th round for
t
>
1
, we freeze the values for
q
1
,
.
.
.
,
q
t
-
1
to the data set sizes that we have observed in the previous rounds. Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." The multi-variate analogue of optimization problem (4), see page 6, involves the PDF and CDF fit using the regression models for the data categories. Therefore, a second plurality of input data is collected based on the regression curves for the input categories.)
identify a first subset of the plurality of input data categories based at least in part on a quantity of input data samples in each of the plurality of input data categories; (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
≤
0
, no more samples for that category are collected.)
determine a plurality of data collection quotas based at least in part on the regression curve equation for each of the plurality of input data categories, wherein each of the plurality of data collection quotas corresponds to one of the first subset of the plurality of input data categories; (Page 19 states "The multi-variate data collection problem considers multiple sources delivering different types of data required to train a model. Consider
K
data sets with
q
1
,
.
.
.
,
q
K
points in each, respectively. Rather than collecting up to
q
t
data points in each round, we optimize a vector
q
t
∈
R
+
K
where each element
q
t
k
refers to how much data we need from the
k
-th source." Therefore, the vector
q
t
∈
R
+
K
contains a plurality of data collection quotas
q
t
k
for each of the input data categories (the sources). Page 20, Section D.2 shows the optimization problem to solve for
q
t
, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
.)
determine a quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories using an equation: (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
contains the quantity of additional input data samples to collect for each of the categories.)
s
=
m
i
+
1
-
m
i
wherein
s
is the quantity of additional input data samples to collect for the one of the first subset of the plurality of input data categories,
m
i
+
1
is the one of the plurality of data collection quotas for the one of the first subset of the plurality of input data categories, and
m
i
is the quantity of input data samples in the one of the first subset of the plurality of input data categories; and (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
contains the quantity of additional input data samples to collect for each of the categories. As
t
is time/round, and
q
contains the quotas, the equation in the next round, will be
d
t
+
1
=
q
t
+
1
-
q
t
. In the next round, the quota from the previous round
q
t
will contain the quantity of input data samples in each category.)
Mahmood does not appear to explicitly teach
[collect data] from the vehicle using the server communication system, the server controller is further programmed to:
transmit a data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories.[collect data] from the vehicle using the server communication system, the server controller is further programmed to:
However, Jiang—directed to analogous art—teaches
[collect data] from the vehicle using the server communication system, the server controller is further programmed to:
transmit a data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories. ([0032] states "Based on driving statistics 123, machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for a variety of purposes. In one embodiment, algorithms 124 may include training of a specified model of ADV using human driving data that is pre-categorized for machine learning by server 103 machine learning engine 122. A plurality of autonomous driving vehicles (ADV) 101 can receive instructions to a human driver of the ADV to collect a specified amount of driving data in accordance with one or more pre-defined driving categories." The specified amount of driving data is interpreted as the quantity of additional input data samples. As the data is received by the ADV, the server must have transmitted the data sample collection task.)
Regarding claim 13, the rejection of claim 12 is incorporated herein. Mahmood teaches
wherein to determine one of the plurality of data collection quotas corresponding to one of the first subset of the plurality of input data categories, the server controller is further programmed to: (Page 19 states "The multi-variate data collection problem considers multiple sources delivering different types of data required to train a model. Consider
K
data sets with
q
1
,
.
.
.
,
q
K
points in each, respectively. Rather than collecting up to
q
t
data points in each round, we optimize a vector
q
t
∈
R
+
K
where each element
q
t
k
refers to how much data we need from the
k
-th source." Therefore, the vector
q
t
∈
R
+
K
contains a plurality of data collection quotas
q
t
k
for each of the input data categories (the sources). Page 20, Section D.2 shows the optimization problem to solve for
q
t
, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
.)
identify a second subset of the plurality of input data categories, wherein the second subset of the plurality of input data categories includes each of the plurality of input data categories not in the first subset of the plurality of input data categories; (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
>
0
, the data is collected for the category. As the first subset was the categories where
d
t
≤
0
)
[determining variables] of the regression curve equation for each of the second subset of the plurality of input data categories based at least in part on the regression curve equation of each of the second subset of the plurality of input data categories; (As all of the data categories are used to determine the regression curve equations, the variables from the regression curve equation are determined for each of the second subset of the categories. The regression model, see page 20, is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
. Therefore,
θ
k
are the variables determined.)
determining the one of the plurality of data collection quotas using a predetermined equation based at least in part on the [determined variables] (Page 20, Section D.2 shows the optimization problem, interpreted as the predetermined equation, to solve for
q
t
, which contain the data collection quotas, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
. Therefore, the quotas are found using a predetermined equation based at least in part on the determined variables.)
The combination of Mahmood and Jiang does not appear to explicitly teach
determine an average steepness
β
g
-
of the regression curve equation for each of the second subset of the plurality of input data categories based at least in part on the regression curve equation of each of the second subset of the plurality of input data categories;
determine a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and
average steepness
β
g
-
, the constant factor
α
'
and the lower bound model loss
γ
'
.
However, Hestness—directed to analogous art—teaches
determine an average steepness
β
g
-
(Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples.")
determine a constant factor
α
'
and a lower bound model loss
γ
'
of the one of the first subset of the plurality of input data categories based at least in part on the regression curve equation of the one of the first subset of the plurality of input data categories; and (Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
average steepness
β
g
-
, the constant factor
α
'
and the lower bound model loss
γ
'
. (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang with the teachings of Hestness for the reasons given above in regards to claim 11.
Regarding claim 15, the rejection of claim 12 is incorporated herein. Mahmood does not appear to explicitly teach
wherein to transmit the data sample collection task to the vehicle using the server communication system, the server controller is further programmed to:
transmit the data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes a validation algorithm describing one of the plurality of input data categories and at least one of: a task priority, a projected decrease in model loss, and the quantity of additional input data samples to collect.
However, Jiang—directed to analogous art—teaches
wherein to transmit the data sample collection task to the vehicle using the server communication system, the server controller is further programmed to: ([0032] states "Based on driving statistics 123, machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for a variety of purposes. In one embodiment, algorithms 124 may include training of a specified model of ADV using human driving data that is pre-categorized for machine learning by server 103 machine learning engine 122. A plurality of autonomous driving vehicles (ADV) 101 can receive instructions to a human driver of the ADV to collect a specified amount of driving data in accordance with one or more pre-defined driving categories." The specified amount of driving data is interpreted as the quantity of additional input data samples. As the data is received by the ADV, the server must have transmitted the data sample collection task.)
transmit the data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes a validation algorithm describing one of the plurality of input data categories and at least one of: a task priority, a projected decrease in model loss, and the quantity of additional input data samples to collect. ([0032] states "Based on driving statistics 123, machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for a variety of purposes. In one embodiment, algorithms 124 may include training of a specified model of ADV using human driving data that is pre-categorized for machine learning by server 103 machine learning engine 122. A plurality of autonomous driving vehicles (ADV) 101 can receive instructions to a human driver of the ADV to collect a specified amount of driving data in accordance with one or more pre-defined driving categories." The specified amount of driving data is interpreted as the quantity of additional input data samples. As the data is received by the ADV, the server must have transmitted the data sample collection task. [0067] states "In operation 505, instructions are received to collect driving data for one or more driving categories, e.g. driving categories 440. The instructions include an amount of data to collect for each driving category. As described above, the ADV can receive the instructions as to which above, the ADV can receive the instructions as to which driving the ADV in accordance with the specified driving interface (e.g. UI 400) to highlight the driving categories that the driving is to collect data for." The instructions are interpreted as the validation algorithm. As it includes the amount of data per driving category, it includes a description of an input data category and the quantity of additional input samples to collect. When a category is transmitted to the vehicle, it indicates that data collection is a priority for that category. This is interpreted as the task priority.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang for the reasons given above in regards to claim 11.
Regarding claim 16, the rejection of claim 15 is incorporated herein. Mahmood does not appear to explicitly teach
a vehicle system including:
at least one vehicle sensor;
a vehicle communication system; and
a vehicle controller in electrical communication with the at least one vehicle sensor and the vehicle communication system, wherein the vehicle controller is programmed to:
receive the data sample collection task from the server system using the vehicle communication system;
determine a priority of the data sample collection task based at least in part on at least one of the task priority, the projected decrease in model loss, and the quantity of additional input data samples to collect; and
perform the data sample collection task using the at least one vehicle sensor.
However, Jiang—directed to analogous art—teaches
a vehicle system including: ([0022] states "In one embodiment, autonomous vehicle 101 includes, but is not limited to, perception and planning system 110, vehicle control system 111, wireless communi cation system 112, user interface system 113, and sensor system 115. Autonomous vehicle 101 may further include certain common components included in ordinary vehicles, such as, an engine, wheels, steering wheel, transmission, etc., which may be controlled by vehicle control system 111 and/or perception and planning system 110 using a variety of communication signals and/or commands, such as, for example, throttle signal or commands, steering signals or commands, braking signals or commands, etc.")
at least one vehicle sensor; ([0025] states "Sensor system 115 may further include other sen sors, such as, a sonar sensor, an infrared sensor, a steering sensor, a throttle sensor, a braking sensor, and an audio sensor (e.g., microphone).")
a vehicle communication system; and ([0027] states "Referring back to FIG. 1, wireless communication system 112 is to allow communication between autonomous vehicle 101 and external systems, such as devices, sensors, other vehicles, etc. For example, wireless communication system 112 can wirelessly communicate with one or more devices directly or via a communication network, such as servers 103-104 over network 102. Wireless communication system 112 can use any cellular communication network or a wireless local area network (WLAN), e.g., using WiFi to communicate with another component or system. Wireless communication system 112 could communicate directly with a device (e.g., a mobile device of a passenger, a display device, a speaker within vehicle 101), for example, using an infrared link, Bluetooth, etc.")
a vehicle controller in electrical communication with the at least one vehicle sensor and the vehicle communication system, wherein the vehicle controller is programmed to: ([0028] states "Some or all of the functions of autonomous vehicle 101 may be controlled or managed by perception and planning system 110, especially when operating in an autonomous driving mode. Perception and planning system 110 includes the necessary hardware (e.g., processor (s), memory, storage) and software (e.g., operating system, planning and routing programs) to receive information from sensor system 115, control system 111, wireless communication system 112, and / or user interface system 113, process the received information, plan a route or path from a starting point to a destination point, and then drive vehicle 101 based on the planning and control information. Alternatively, perception and planning system 110 may be integrated with vehicle control system 111.")
receive the data sample collection task from the server system using the vehicle communication system; ([0067] states "In operation 505, instructions are received to collect driving data for one or more driving categories, e.g. driving categories 440. The instructions include an amount of data to collect for each driving category. As described above, the ADV can receive the instructions as to which above, the ADV can receive the instructions as to which driving the ADV in accordance with the specified driving interface (e.g. UI 400) to highlight the driving categories that the driving is to collect data for.")
determine a priority of the data sample collection task based at least in part on at least one of the task priority, the projected decrease in model loss, and the quantity of additional input data samples to collect; and ([0067] states "In operation 505, instructions are received to collect driving data for one or more driving categories, e.g. driving categories 440. The instructions include an amount of data to collect for each driving category. As described above, the ADV can receive the instructions as to which above, the ADV can receive the instructions as to which driving the ADV in accordance with the specified driving interface (e.g. UI 400) to highlight the driving categories that the driving is to collect data for." As the categories are highlighted, those categories are prioritized. When a category is transmitted to the vehicle, it indicates that data collection is a priority for that category. This is interpreted as the task priority. Therefore, as the highlighted categories depends on the category transmitted, the priority is determined based on the task priority. [0069] states "In operation 515, in response to determining that human driving mode is active for the ADV, and that the human driver is driving the ADV in accordance with one of the driving categories for which the human driver is instructed to collect driving data : the user interface presents an indication to the human driver of the driving category that matches a current driving state of the ADV 101, logging of ADV human driving data for the driving category is per formed, and a progress indicator is updated as to how much of the required amount of driving data to be collected has been collected." Therefore, as the progress indicator is displayed based on the quantity of data to collect, the priority is based on the quantity of data.)
perform the data sample collection task using the at least one vehicle sensor. ([0069] states "As described above, with reference to FIG. 3, driving data is obtained by driving monitor 308 as received messages from a plurality of sensors and modules within perception and planning system 110." [0072] states "In operation 605, one or more ADVs collect a default or predefined amount of driving data for each of all driving categories 440 on the ADV 101 in human driving mode. In operation 610, the one or more ADVs each transmit their collected driving data for each of the driving categories to the server.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang for the reasons given above in regards to claim 11.
Regarding claim 18, Mahmood teaches
A method for training a machine learning algorithm for a vehicle, the method comprising: Page 3 states "Motivating Example. A startup is developing an object detector for use in autonomous vehicles within the next
T
=
5
years. Their model must achieve a mean Average Precision greater than
V
*
=
95
%
on a pre-determined validation set or else they will lose an expected profit of
P
=
$
1,000,000
. Collecting training data requires employing drivers to record videos and annotators to label the data, where the marginal cost of obtaining each image is approximately
c
=
$
1
. In order to manage annual finances, the startup must plan how much data to collect at the beginning of each year.")
performing at least one exploratory training session of the machine learning algorithm using an input data set, wherein the input data set includes a first plurality of input data samples, and wherein the input data set is stored on a server storage device; (Page 6 states "Consider
K
∈
N
data sources (e.g.,
K
=
2
with labeled and unlabeled) and for each
k
∈
1
,
.
.
.
,
K
, let
z
k
~
p
k
(
z
k
)
be data drawn from their distribution. We train a model with a data set
D
:
=
∪
k
=
1
K
D
k
where each
D
k
contains points of the
k
-th source." The first training is interpreted as the exploratory training session, and
D
is interpreted as the input data set.)
dividing the input data set into a plurality of input data categories; (As the dataset is split into
D
k
for each data source, the input data set is divided into a plurality of input data categories.)
generating an exploratory learning curve plot of a model loss of the machine learning algorithm versus a quantity of input data samples trained for each of the plurality of input data categories in the at least one exploratory training session; and (Page 24 states "To estimate
F
(
q
)
, we first create an ensemble of estimated learning curves, which we then invert to obtain an empirical distribution of estimated values for
D
q
0
. Figure 5 plots our bootstrap resampled estimated learning curves versus the ground truth performance for the first round of data collection when we have access to an initial
D
q
0
containing 10% of the full data set." Fig. 5 shows the exploratory curve plots, which show a performance metric, such as accuracy, versus a quantity of input data samples for the estimated power law learning curves, interpreted as the exploratory learning curves.)
determining a regression curve equation for each of the plurality of input data categories based at least in part on the exploratory learning curve plot for each of the plurality of input data categories, (Page 19 states “In order to construct a PDF and CDF in the multi-variate setting, we follow the same general steps as in Algorithm 1. We first collect a data set of performance statistics
R
:
=
{
q
r
,
V
D
q
r
2
2
,
D
q
r
2
2
,
.
.
.
,
D
q
r
2
2
r
=
1
R
as before. We then use bootstrap resamples of this data set to fit parameters
θ
*
to a regression model
v
^
q
1
,
.
.
.
,
q
K
;
θ
and then solve for
q
^
… Finally, we fit a density estimation model over our data set of
q
^
.” Page 15, Section A, states that
V
(
D
)
is the valuation of model trained on data set
D
. Therefore, the regression model is based on the exploratory training session. Page 20 states "We propose an easy-to-implement baseline regression model by adding the contributions of each data set being used. Then, our additive regression model is
v
^
q
1
,
.
.
.
,
q
K
;
θ
≔
∑
k
=
1
K
v
^
k
(
q
k
;
θ
k
)
where
v
^
k
(
q
k
;
θ
k
)
can be any single-variate regression model for estimating score. For instance, consider
K
=
2
data types with power law regression models for each data type. Our multi-variate regression model becomes
v
^
q
1
,
q
2
;
θ
=
θ
1,0
q
1
θ
1,1
+
θ
2,0
q
2
θ
2,1
+
θ
3
.
"
Therefore, as there are regression models for each data source, each having their own parameters
θ
k
, a regression curve equation is determined for each of the categories based on the exploratory training session. As the regression curve are the estimated power law learning curves, the regression curves are determined based on the exploratory learning curve plots.)
collect a second plurality of input data samples … wherein the second plurality of input data samples is based at least in part on the regression curve equation for each of the plurality of input data categories; and (Page 15 states "Finally in the third step, we solve our optimization problem (4) via gradient descent. This problem yields the optimal data set sizes
q
1
*
,
.
.
.
,
q
T
*
that we should have at the end of each round. Furthermore, if we are in the
t
-th round for
t
>
1
, we freeze the values for
q
1
,
.
.
.
,
q
t
-
1
to the data set sizes that we have observed in the previous rounds. Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." The multi-variate analogue of optimization problem (4), see page 6, involves the PDF and CDF fit using the regression models for the data categories. Therefore, a second plurality of input data is collected based on the regression curves for the input categories.)
train the machine learning algorithm using the second plurality of input data samples … and the input data set. (Page 15 states “Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." Therefore, the collected data, interpreted as the second plurality of input data, is used to train the model, interpreted as the machine learning algorithm.)
Mahmood does not appear to explicitly teach
[generating an exploratory learning curve plot of a] model loss of [the machine learning algorithm versus a quantity of input data samples trained]
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories;
transmitting a data sample collection task to a vehicle communication system of the vehicle using a server communication system;
receiving the data sample collection task using a vehicle communication system;
[collecting a second plurality of input data samples] using at least one vehicle sensor,
transmitting the second plurality of input data samples from the vehicle communication system to the server communication system; and
[input data samples] received from the vehicle communication system
However, Jiang—directed to analogous art—teaches
transmitting a data sample collection task to a vehicle communication system of the vehicle using a server communication system; (Page 15 states "Finally in the third step, we solve our optimization problem (4) via gradient descent. This problem yields the optimal data set sizes
q
1
*
,
.
.
.
,
q
T
*
that we should have at the end of each round. Furthermore, if we are in the
t
-th round for
t
>
1
, we freeze the values for
q
1
,
.
.
.
,
q
t
-
1
to the data set sizes that we have observed in the previous rounds. Upon solving this problem, we then collect data until we have
q
t
samples, and then re-train our model to evaluate our current state." The multi-variate analogue of optimization problem (4), see page 6, involves the PDF and CDF fit using the regression models for the data categories. Therefore, a second plurality of input data is collected based on the regression curves for the input categories.)
transmit a data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories. ([0032] states "Based on driving statistics 123, machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for a variety of purposes. In one embodiment, algorithms 124 may include training of a specified model of ADV using human driving data that is pre-categorized for machine learning by server 103 machine learning engine 122. A plurality of autonomous driving vehicles (ADV) 101 can receive instructions to a human driver of the ADV to collect a specified amount of driving data in accordance with one or more pre-defined driving categories." The specified amount of driving data is interpreted as the quantity of additional input data samples. As the data is received by the ADV, the server must have transmitted the data sample collection task.)
receiving the data sample collection task using a vehicle communication system; ([0067] states "In operation 505, instructions are received to collect driving data for one or more driving categories, e.g. driving categories 440. The instructions include an amount of data to collect for each driving category. As described above, the ADV can receive the instructions as to which above, the ADV can receive the instructions as to which driving the ADV in accordance with the specified driving interface (e.g. UI 400) to highlight the driving categories that the driving is to collect data for.")
[collecting a second plurality of input data samples] using at least one vehicle sensor, ([0017] states "The server includes a processor and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to receive driving data of one or more ADVs driven by a human driver in one or more driving categories, for one or more types of ADVs. The instructions further cause the processor to select driving data for a specified type of ADV for one or more categories of driving." [0027] states "Referring back to FIG. 1, wireless communication system 112 is to allow communication between autonomous vehicle 101 and external systems, such as devices, sensors, other vehicles, etc. For example, wireless communication system 112 can wirelessly communicate with one or more devices directly or via a communication network, such as servers 103-104 over network 102." As the devices communicate using the network, when data is received/transmitted, the network is used. Hereinafter, this is considered to be the explanation for data transmission/receiving using the server communication system.)
transmitting the second plurality of input data samples from the vehicle communication system to the server communication system; and ([0072] states "In operation 605, one or more ADVs collect a default or predefined amount of driving data for each of all driving categories 440 on the ADV 101 in human driving mode. In operation 610, the one or more ADVs each transmit their collected driving data for each of the driving categories to the server." As the vehicles transmit their message, they use their vehicle communication system.)
[input data samples] received from the vehicle communication system ([0072] states "In operation 615, the server performs training, described below with reference to FIG. 7, on a dynamic model for the ADV to a grading threshold, using the collected data received from the plurality of ADVs. In operation 620, it is determined whether the server has enough driving data to achieve a threshold grading value for a dynamic self-driving model for driving the ADV in self driving mode to greater than minimum threshold grading value, e.g. 65 %. If not, then method 600 continues at operation 625, otherwise method 600 continues at operation 630.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang becase, as Jiang states in [0003], "To operate an autonomous driving vehicle (ADV) in autonomous driving mode, a substantial amount of training data must be collected to train autonomous driving model(s) to safely operate the ADV in a driverless mode. Data is collected while a human driver is operating the ADV in manual (“human driver") mode. Data collection is crucial for generating, and thoroughly training, dynamic models and calibration tables for an autonomous driving vehicle. Collecting a data set that covers all necessary scenarios demands professional knowledge of machine learning and ADV control systems. A human driver does not know what data needs to be collected. Thus, an engineer typically accompanies the human driver to guide the human driver on the data that needs to be collected for training and / or calibrating an autonomous driving vehicle operation model. Such an approach is very inefficient."
The combination of Mahmood and Jiang does not appear to explicitly teach
[generating an exploratory learning curve plot of a] model loss of [the machine learning algorithm versus a quantity of input data samples trained]
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories;
However, Hestness—directed to analogous art—teaches
[generating an exploratory learning curve plot of a] model loss of [the machine learning algorithm versus a quantity of input data samples trained] (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis.)
wherein the regression curve equation for one of the plurality of input data categories is a power law equation having a form:
ε
m
=
α
m
β
g
+
γ
wherein
ε
is the model loss of the machine learning algorithm for the one of the plurality of input data categories,
α
is a first constant factor for the one of the plurality of input data categories,
m
is a quantity of input data samples trained for the one of the plurality of input data categories,
β
g
is a steepness of a regression curve described by the regression curve equation for the one of the plurality of input data categories, and
γ
is a lower bound model loss of the machine learning algorithm for the one of the plurality of input data categories; (Page 6 states "Figure 1: Neural machine translation learning curves. Left: the learning curves for separate models follow
ε
m
=
α
m
β
g
+
γ
.” Figure 1 shows the learning curve plotted with minimum test loss on the y-axis and training data set size on the x-axis. Page 2 states "Here,
ε
is generalization error,
m
is the number of samples in the training set,
α
is a constant property of the problem, and
β
g
is the scaling exponent that defines the steepness of the learning curve—how quickly a model family can learn from adding more training samples." As Figure 1 shows the learning curve for the minimum test loss, one of ordinary skill in the art would understand that
ε
is the model loss of the machine learning algorithm. Page 5 states "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
, where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant." As
γ
is the error when the model family has exhausted its capacity and the equation is solving for
ε
(
m
)
, the loss of a model
ε
as a function of input size, one of ordinary skill in the art would understand that the constant
γ
represents a lower bound on the model loss of the machine learning algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang with the teachings of Hestness because, as Hestness states on page 5, "Further, prior work predicts that as a model runs out of capacity on larger data sets, the error should plateau, resulting in a power-law + constant,
ε
m
=
α
m
β
g
+
γ
., where
γ
is the error when the model family has exhausted its capacity. Indeed, we find that learning curves for a single model family can be closely represented by a power-law + constant."
Regarding claim 19, the rejection of claim 18 is incorporated herein. Mahmood teaches
identifying a first subset of the plurality of input data categories based at least in part on a quantity of input data samples in each of the plurality of input data categories; (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
identifies a subset of categories based on the quantity of input data samples collected for each input data category as when
d
t
≤
0
, no more samples for that category are collected.)
determining a plurality of data collection quotas based at least in part on the regression curve equation for each of the plurality of input data categories, wherein each of the plurality of data collection quotas corresponds to one of the first subset of the plurality of input data categories; (Page 19 states "The multi-variate data collection problem considers multiple sources delivering different types of data required to train a model. Consider
K
data sets with
q
1
,
.
.
.
,
q
K
points in each, respectively. Rather than collecting up to
q
t
data points in each round, we optimize a vector
q
t
∈
R
+
K
where each element
q
t
k
refers to how much data we need from the
k
-th source." Therefore, the vector
q
t
∈
R
+
K
contains a plurality of data collection quotas
q
t
k
for each of the input data categories (the sources). Page 20, Section D.2 shows the optimization problem to solve for
q
t
, which depends on the density estimation model
F
(
⋅
)
found using the regression curve equations in
v
^
q
1
,
.
.
.
,
q
K
;
θ
.)
determining a quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories using an equation: (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
contains the quantity of additional input data samples to collect for each of the categories.)
s
=
m
i
+
1
-
m
i
wherein
s
is the quantity of additional input data samples to collect for the one of the first subset of the plurality of input data categories,
m
i
+
1
is the one of the plurality of data collection quotas for the one of the first subset of the plurality of input data categories, and
m
i
is the quantity of input data samples in the one of the first subset of the plurality of input data categories; and (Page 20 states "For each
t
, let
d
t
=
q
t
-
q
t
-
1
be the additional data collected in each round." The vector
d
t
contains the quantity of additional input data samples to collect for each of the categories. As
t
is time/round, and
q
contains the quotas, the equation in the next round, will be
d
t
+
1
=
q
t
+
1
-
q
t
. In the next round, the quota from the previous round
q
t
will contain the quantity of input data samples in each category.)
Mahmood does not appear to explicitly teach
wherein collecting the second plurality of input data samples from the vehicle using the server communication system further comprises:
transmit a data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories.
However, Jiang—directed to analogous art—teaches
wherein collecting the second plurality of input data samples from the vehicle using the server communication system further comprises: ([0072] states "In operation 615, the server performs training, described below with reference to FIG. 7, on a dynamic model for the ADV to a grading threshold, using the collected data received from the plurality of ADVs.")
transmit a data sample collection task to the vehicle using the server communication system, wherein the data sample collection task includes at least the quantity of additional input data samples to collect for each of the first subset of the plurality of input data categories. ([0032] states "Based on driving statistics 123, machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for a variety of purposes. In one embodiment, algorithms 124 may include training of a specified model of ADV using human driving data that is pre-categorized for machine learning by server 103 machine learning engine 122. A plurality of autonomous driving vehicles (ADV) 101 can receive instructions to a human driver of the ADV to collect a specified amount of driving data in accordance with one or more pre-defined driving categories." The specified amount of driving data is interpreted as the quantity of additional input data samples. As the data is received by the ADV, the server must have transmitted the data sample collection task.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang for the reasons given above in regards to claim 18.
Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mahmood (“Optimizing Data Collection for Machine Learning”, 2022), Jiang (US 2020/0342693 A1), and Hestness (“Deep Learning Scaling is Predictable, Empirically”, 2017) as applied to claim 11 above, and further in view of Petousis (US 2020/0410787 A1).
Regarding claim 17, the rejection of claim 16 is incorporated herein. Mahmood does not appear to explicitly teach
wherein to perform the data sample collection task, the vehicle controller is further programmed to:
record a plurality of unvalidated input data samples using the at least one vehicle sensor;
determine a second plurality of input data samples based at least in part on the validation algorithm, wherein the second plurality of input data samples is a subset of the plurality of unvalidated input data samples; and
transmit the second plurality of input data samples to the server communication system using the vehicle communication system.
However, Jiang—directed to analogous art—teaches
wherein to perform the data sample collection task, the vehicle controller is further programmed to: ([0069] states "As described above, with reference to FIG. 3, driving data is obtained by driving monitor 308 as received messages from a plurality of sensors and modules within perception and planning system 110." [0072] states "In operation 605, one or more ADVs collect a default or predefined amount of driving data for each of all driving categories 440 on the ADV 101 in human driving mode. In operation 610, the one or more ADVs each transmit their collected driving data for each of the driving categories to the server.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood and Jiang for the reasons given above in regards to claim 11.
The combination of Mahmood, Jiang, and Hestness does not appear to explicitly teach
record a plurality of unvalidated input data samples using the at least one vehicle sensor;
determine a second plurality of input data samples based at least in part on the validation algorithm, wherein the second plurality of input data samples is a subset of the plurality of unvalidated input data samples; and
transmit the second plurality of input data samples to the server communication system using the vehicle communication system.
However, Petousis—directed to analogous art—teaches
record a plurality of unvalidated input data samples using the at least one vehicle sensor; ([0025] - [0026] state "The vehicle sensor data is preferably received through wired connections between vehicle sensor(s) and the vehicle computing system (e.g., the CAN bus), but can additionally or alternatively be received over wireless connections, a com bination of wireless and wired connections, or any other suitable connections. Vehicle sensor data is preferably raw data collected from a sensor of the vehicle, but can be any suitable form of data derived from vehicle sensors." [0029] states "Determining the prioritization scheme functions to establish the criteria against which the vehicle sensor data is prioritized. For example, the prioritization scheme specifies the rules and / or algorithms used to determine the priority (e.g., importance), a piece of data should have. Determining the prioritization scheme can be based on a remote query (e.g., a remote query specifies a prioritization scheme or range of possible priorities for each application), the data contents (e.g., the data type, the data values, etc.), a predetermined set of rules, or otherwise determined." [0034] states "Prioritizing the vehicle sensor data can optionally include categorizing the vehicle sensor data according to the data's respective priority. Such categories can include critical vehicle sensor data (e.g., data relevant to real-time nominal operation of the vehicle), operational vehicle sensor data (e.g., data relevant to operational purpose of the vehicle, navigation, destination, owner/renter of the vehicle), user application vehicle sensor data (e.g., relevant to any other applications as determined by a user, such as via a remote query), or any other suitable data. Critical vehicle sensor data, operational vehicle sensor data, and user application vehicle sensor data may be discrete non-overlapping categories, or they may be partly or wholly overlapping." Determining that the vehicle sensor data is critical is interpreted as validating the data samples. As all data is recorded, unvalidated sensor data is recorded.)
determine a second plurality of input data samples based at least in part on the validation algorithm, wherein the second plurality of input data samples is a subset of the plurality of unvalidated input data samples; and (Determining that the vehicle sensor data is critical is interpreted as validating the data samples. [0022] states "All or some of the processes described herein are preferably performed by a vehicle system in cooperation with a remote computing system, but can alternatively be performed entirely by the remote computing system, the vehicle system, or any other suitable system." Therefore, the determination is made based on a validation algorithm that categorizes data into critical, operational, or user application.)
transmit the second plurality of input data samples to the server communication system using the vehicle communication system. ([0045] states "In a related specific example, vehicle sensor data can be grouped in three divisions (e.g., critical, operations, and applications). In this example, a fraction (e.g., within a range from 0 to a value B, where B is less than or equal to 1) of the bandwidth is reserved for each division, and a dedicated scheduling module corresponds to each division." Therefore, all data, including the unvalidated data (data not categorized as critical) is transmitted to the server. [0063] states "Transmitting message data is preferably performed by a transmitter (e.g., transmission module ) of the vehicle using a wireless network (e.g., 3G, 5G, LTE, Wi-Fi, etc.), a wired connection (e.g., an Ethernet connection while the vehicle is parked), a combination of wireless networks and wired connections, or by any suitable communication module over any suitable data connection.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Mahmood, Jiang, and Hestness with the teachings of Petuosis because, as Petuosis states in [0014], "The inventors have discovered that components of autonomous vehicles ( e.g., sensors, computing systems, navigation systems, etc. ) cooperatively generate too much data during a driving session to practically transfer ( e.g., economically, physically limited ) to remote systems over conventional communication infrastructure. The amount of data can be so large that even on-board vehicle storage may be untenable." Additionally, [0016] states "The method can confer several benefits. First, the method optimizes use of the limited communication prioritizing the data to be sent in real- or near-real time. In one variation, the data can be prioritized based on one or more requests or user queries (e.g., received from a remote computing system, wherein the remote computing system can determine the data priority using a cost function, optimization function, bidding system, contextual system, or other system) (or, based on a number of requests or a number of user queries, etc.). This enables an external system to request and receive data, previously demanded asynchronously, from the vehicle in real- or near-real time. In a second variation, the data can be prioritized based on the current or anticipated quality of service (e.g., as measured by latency, bandwidth, etc.) of available communication channel s). This allows for critical or other high-value data to have preferential real-time transmission, precluding low-value data from consuming limited communication"
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.P./Examiner, Art Unit 2121
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121