Office Action Analysis: 16950017 — DATA PARTITIONING WITH NEURAL NETWORK

Examiner Intelligence

LEY, SALLY THI View full profile →
Grants only 19% of cases
Career Allowance Rate
7 granted / 36 resolved
-35.6% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
17 currently pending
Career history
69
Total Applications
across all art units
Statute-Specific Performance

§101
10.3%
-29.7% vs TC avg
§103
83.2%
+43.2% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	This Office Action is in response to the communication filed on 25 Jul 2025.
	Claims 1-2, 6-12, 15-16, and 19-20 are being considered on the merits.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 13 May 2025, 04 Jun 2025, and 08 Aug 2025 have been considered.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, initialed and dated copies of Applicant's IDS forms 1499 is attached to the instant Office action.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-2, 6-12, 15-16, and 19-20 are rejected under 35 U.S.C. 101 because of the following: 
Regarding Claim 1:
Step 1: Independent claim 1 recites a computer-implemented method and therefore falls under one of the four statutory categories of patent-eligible subject matter. 
Step 2A Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim. 
determining, by one or more processing units, a feature representative data set having a plurality of feature representative data records, each feature representative data record having values of a second number of feature representatives, (Mental Process: determining a data set is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “one or more processing unit”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can contemplate a data set entirely in their minds or with the assistance of a pen and paper)
computing, by one or more processing units, an influential weight for a feature representative in at least one of the feature representative data subsets, (Mental process: Computing a weight is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at a representative data set i.e. the second number of feature representatives and evaluate an influential weight for one feature representative of the set).
wherein the influential weight of the feature representative quantifies an effect of the feature representative on prediction accuracy and is computed by: randomly changing, by one or more processing units, the value of one or more feature representative records and fixing values of other feature representatives records; (Mental process: Computing a weight that quantifies an effect of the feature representative by randomly changing the value of a feature representative is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at a representative data set, change one datum, and evaluate that influential weight for the one feature representative of the set).
determining, by one or more processing units, an accuracy of prediction of the autoencoder utilizing the one or more changed feature representative records and the feature representative records; (Mental process: Determining an accuracy of prediction using changed and unchanged feature representative records is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at the output of an autoencoder neural network and evaluate its accuracy).
evaluating, by one or more processing units, a quality of feature representative data subset based on the influential weight associated with the feature representative; and (Mental process: Evaluating a quality of feature representative data subset is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components; nothing in this claim element precludes the step from practically being performed in the mind. For example, a “quality” could be that the data all falls within a certain range or that the data is more than 8-bits). 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
obtaining, by one or more processing units, an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with the values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives, and (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
wherein the autoencoder reduces a size of features associated with the original data set; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
segmenting, by one or more processing units, the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (insignificant extra-solution activity to the judicial exception: Arranging a hierarchy of groups or sorting information is well-understood, routine, conventional activity – See MPEP § 2106.05(g))
partitioning, by one or more processing units, the feature representative data records in the two or more clusters to form a number of stratified feature representative data subsets; and (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
wherein the feature representative data subsets provide feature distribution consistency as compared to the data set; (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
obtaining, by one or more processing units, the influential weight of the feature representative based on the accuracy of the prediction; and (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
responsive to the quality of feature representative data subset, training, by one or more processing units, a supervised machine learning model with the feature representative data subset. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
obtaining, by one or more processing units, an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with the values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives, and (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
wherein the autoencoder reduces a size of features associated with the original data set; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
segmenting, by one or more processing units, the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (Insignificant Extra Solution Activity: Arranging a hierarchy of groups or sorting information is well-understood, routine, conventional activity – see Berkheimer evidence from MPEP 2106.05(d))
partitioning, by one or more processing units, the feature representative data records in the two or more clusters to form a number of stratified feature representative data subsets; and (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
wherein the feature representative data subsets provide feature distribution consistency as compared to the data set;  (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
obtaining, by one or more processing units, the influential weight of the feature representative based on the accuracy of the prediction; and (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
responsive to the quality of feature representative data subset, training, by one or more processing units, a supervised machine learning model with the feature representative data subsets. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))

Regarding Claim 2:
Step 2A Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim. 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
obtaining, by one or more processing units, data subsets of the original data set according to the number of feature representative data subsets. (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
obtaining, by one or more processing units, data subsets of the original data set according to the number of feature representative data subsets.  (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Regarding Claim 6:
Step 2A Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim. 
wherein evaluating, by one or more processing units, a quality of data partition based on the influential weights and the partition of the feature representative data set further comprises: (Mental process: Evaluating a quality of data is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “one or more processing units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can evaluate a quality of data partition just by looking at the partition and making evaluation of the partition) 
for each feature representative Fi, measuring a distribution similarity si of the feature representative Fi between the respective feature representative data subsets and the feature representative data set; and (Mental process: Evaluating a distribution similarity is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can evaluate a quality of data partition just by looking at the data subsets and making evaluation the distribution similarity) 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
obtaining the quality of the data partition based on the distribution similarity si and the influential weight of the feature representative Fi. (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
obtaining the quality of the data partition based on the distribution similarity si and the influential weight of the feature representative Fi. (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Regarding Claim 7:
Step 2A Prong 1: See the rejection of claim 6 above. The same rationale applies to this dependent claim. 
wherein the quality of the data partition is obtained utilizing the following formula:  
    PNG
    media_image1.png
    42
    80
    media_image1.png
    Greyscale
 (Mathematical formula: Abstract ideas include mathematical concepts such as mathematical relationships, mathematical formulas or equations, and mathematical calculations)
wherein                     
                        q
                    
                 is the quality of the data partition,                     
                        
                                s
                            
                                i
                            
                 is the distribution similarity and                     
                        
                                w
                            
                                i
                            
                 is the influential weight of the feature representative                     
                        
                                F
                            
                                i
                            
                 (Mathematical formula: Abstract ideas include mathematical concepts such as mathematical relationships, mathematical formulas or equations, and mathematical calculations)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements

Regarding Claim 8:
Step 2A Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim. 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
wherein partitioning, by one or more processing units, the feature representative data records in the two or more clusters to form a third number of feature representative data subsets comprises: (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
randomly sampling, by one or more processing units, the feature representative data records in each cluster of the two or more clusters to form the third number of feature representative data subsets (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
wherein partitioning, by one or more processing units, the feature representative data records in the two or more clusters to form a third number of feature representative data subsets comprises: (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d)))
randomly sampling, by one or more processing units, the feature representative data records in each cluster of the two or more clusters to form the third number of feature representative data subsets (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Regarding Claim 9:
Step 2A Prong 1: See the rejection of claim 2 above. The same rationale applies to this dependent claim. 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
the features from the data subsets and the original data set is selected from the group consisting of categorical variables and continuous variables. (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
the features from the data subsets and the original data set are one of categorical variables and continuous variables. (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Regarding Claim 10:
Step 2A Prong 1: See the rejection of claim 1 above. The same rationale applies to this dependent claim. 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
wherein the original data set is selected from the group consisting of: an insurance domain, a banking domain, a healthcare domain, a financial domain, an entertainment domain, and a business domain.  (Generally linking the use of a judicial exception to a particular technological environment or field of use – See MPEP § 2106.05(h))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
wherein the original data set is selected from the group consisting of: an insurance domain, a banking domain, a healthcare domain, a financial domain, an entertainment domain, and a business domain.  (Generally linking the use of a judicial exception to a particular technological environment or field of use – See MPEP § 2106.05(h))

Regarding Claim 11:
Step 1: Independent claim 11 recites a computer program product and therefore falls under one of the four statutory categories of patent-eligible subject matter. 
Step 2A Prong 1: 
program instructions to determine a feature representative data set having a plurality of feature representative data records, each feature representative data record having values of a second number of feature representatives, (Mental process: determining a feature representative data set is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “program instructions”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at data and create a feature representative data entirely in their minds or with the assistance of a pen and paper)
program instructions to compute an influential weight for a feature representative in at least one of the feature representative data subsets, (Mental process: Computing a weight is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at a representative data set i.e. the second number of feature representatives and evaluate an influential weight for one feature representative of the set).
wherein the influential weight of the feature representative in at least one of the feature representative data subsets is computed by: program instructions to randomly change the value of one or more feature representative records and fixing values of other feature representatives records; (Mental process: Computing a weight that quantifies an effect of the feature representative by randomly changing the value of a feature representative is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at a representative data set, change one datum, and evaluate that influential weight for the one feature representative of the set) 
program instructions to determine an accuracy of prediction of the autoencoder utilizing the one or more changed feature representative records and the fixed feature representative records; (Mental process: Determining an accuracy of prediction using changed and unchanged feature representative records is a process that, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at the output of an autoencoder neural network and evaluate its accuracy).
program instructions to evaluate a quality of feature representative data subset based on the influential weight associated with the feature representative; and (Mental process: Evaluating a quality of feature representative data subset is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components; nothing in this claim element precludes the step from practically being performed in the mind. For example, a “quality” could be that the data all falls within a certain range or that the data is more than 8-bits)
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
A computer program product comprising (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to obtain an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number, and (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
wherein the autoencoder reduces a size of features associated with the original data set; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to segment the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (insignificant extra-solution activity to the judicial exception: Arranging a hierarchy of groups or sorting information is well-understood, routine, conventional activity – See MPEP § 2106.05(g))
program instructions to partition the feature representative data records in the two or more clusters to form a predefined number of stratified feature representative data subsets; and (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
program instructions to obtain the influential weight of the feature representative based on the accuracy of the prediction; and (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
program instructions to, responsive to the quality of feature representative data subset, train a supervised machine learning model with the feature representative data subsets. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
A computer program product comprising (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to obtain an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
wherein the autoencoder reduces a size of features associated with the original data set; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to segment the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (Insignificant Extra Solution Activity: Arranging a hierarchy of groups or sorting information is well-understood, routine, conventional activity – see Berkheimer evidence from MPEP 2106.05(d))
program instructions to partition the feature representative data records in the two or more clusters to form a predefined number of feature representative data subsets; and (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
program instructions to obtain the influential weight of the feature representative based on the accuracy of the prediction; and (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
program instructions to, responsive to the quality of feature representative data subset, train a supervised machine learning model with the feature representative data subsets. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))

Regarding Claim 12:
Step 2A Prong 1: See the rejection of claim 11 above. The same rationale applies to this dependent claim. 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
wherein the program instructions stored on the one or more computer readable storage media further comprise: (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to obtain a third number of data subsets of the original data set according to the number of feature representative data subsets.   (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
wherein the program instructions stored on the one or more computer readable storage media further comprise: (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to obtain a third number of data subsets of the original data set according to the number of feature representative data subsets.   (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Regarding Claim 15: 
Step 1: Independent claim 15 recites a computer system and therefore falls under one of the four statutory categories of patent-eligible subject matter.
Step 2A Prong 1:  
program instructions to determine a feature representative data set having a plurality of feature representative data records, each feature representative data record having values of a second number of feature representatives, (Mental process: determining a feature representative data set is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “program instructions”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at records and determine a feature representative data set of those records entirely in their minds or with the assistance of a pen and paper)
program instructions to compute an influential weight for a feature representative in at least one of the feature representative data subsets, (Mental process: Computing a weight is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at a representative data set i.e. the second number of feature representatives and evaluate an influential weight for one feature representative of the set).
wherein the influential weight of the feature representative in at least one of the feature representative data subsets is computed by: program instructions to randomly change the value of one or more feature representative records and fixing values of other feature representatives records; (Mental process: : Computing a weight that quantifies an effect of the feature representative by randomly changing the value of a feature representative is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at a representative data set, change one datum, and evaluate that influential weight for the one feature representative of the set) 
program instructions to determine an accuracy of prediction of the autoencoder utilizing the one or more changed feature representative records and the fixed feature representative records; (Mental process: Determining an accuracy of prediction using changed and unchanged feature representative records is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor units”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at the output of an autoencoder neural network and evaluate its accuracy).
program instructions to evaluate a quality of feature representative data subset based on the influential weight associated with the feature representative; and (Mental process: Evaluating a quality of feature representative data subset is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components; nothing in this claim element precludes the step from practically being performed in the mind. For example, a “quality” could be that the data all falls within a certain range or that the data is more than 8-bits). 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
A computer system for comprising: one or more computer processors; one or more computer readable storage media; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to obtain an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number, and (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
wherein the autoencoder reduces a size of features associated with the original data set; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to segment the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (insignificant extra-solution activity to the judicial exception: Arranging a hierarchy of groups or sorting information is well-understood, routine, conventional activity – See MPEP § 2106.05(g))
program instructions to partition the feature representative data records in the two or more clusters to form a predefined number of stratified feature representative data subsets; and  (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
program instructions to obtain the influential weight of the feature representative based on the accuracy of the prediction;  (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g)) 
program instructions to, responsive to the quality of feature representative data subset, training, by one or more processing units, a supervised machine learning model with the feature representative data subsets. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
A computer system for comprising: one or more computer processors; one or more computer readable storage media; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to obtain an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
wherein the autoencoder reduces a size of features associated with the original data set; (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
program instructions to segment the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (Insignificant Extra Solution Activity: Arranging a hierarchy of groups or sorting information is well-understood, routine, conventional activity – see Berkheimer evidence from MPEP 2106.05(d))
program instructions to partition the feature representative data records in the two or more clusters to form a predefined number of stratified feature representative data subsets; and  (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d)))
program instructions to obtain the influential weight of the feature representative based on the accuracy of the prediction; and.  (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d)) 
program instructions to, responsive to the quality of feature representative data subset, train a supervised machine learning model with the feature representative data subsets. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))

Regarding Claim 16:
Step 2A Prong 1: See the rejection of claim 15 above. The same rationale applies to this dependent claim. 
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
The program instructions further comprise: program instructions to obtain a third number of data subsets of the original data set according to the number of feature representative data subsets.  (insignificant extra-solution activity to the judicial exception: Receiving or transmitting data over a network – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The program instructions further comprise: program instructions to obtain a third number of data subsets of the original data set according to the number of feature representative data subsets.  (Insignificant Extra Solution Activity: Receiving or transmitting data over a network is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Regarding Claim 19:
Step 2A Prong 1: See the rejection of claim 15 above. The same rationale applies to this dependent claim. 
The computer system of claim 15, wherein the program instructions further comprise: program instructions to evaluate a quality of data partition based on the influential weights and the feature representative data subsets.  (Mental process: evaluating a quality of data partition is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “program instructions”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at data partition and evaluate a quality of it entirely in their mind)
Step 2A Prong 2 and Step 2B: The claim does not include additional elements

Regarding Claim 20: 
Step 2A Prong 1: See the rejection of claim 19 above. The same rationale applies to this dependent claim. 
The computer system of claim 19, wherein evaluating a quality of data partition based on the influential weights and the partition of the feature representative data set further comprises: (Mental process: evaluating a quality of data partition is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “program instructions”, nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can look at data partition and evaluate a quality of it entirely in their mind)
for each feature representative                     
                        
                                F
                            
                                i
                            
                 program instructions to measure a distribution similarity                     
                        
                                s
                            
                                i
                            
                 of the feature representative                     
                        
                                F
                            
                                i
                            
                 between the respective feature representative data subsets and the feature representative data set; and (Mental process: Evaluating a distribution similarity is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind; nothing in this claim element precludes the step from practically being performed in the mind. For example, a person can evaluate a quality of data partition just by looking at the data subsets and making evaluation the distribution similarity)
Step 2A Prong 2: The additional elements integrate the judicial exception into practical application. 
program instructions to obtain the quality of the data partition based on the distribution similarity                     
                        
                                s
                            
                                i
                            
                 and the influential weight of the feature representative                     
                        
                                F
                            
                                i
                            
                . (insignificant extra-solution activity to the judicial exception: storing and retrieving data from memory – See MPEP § 2106.05(g))
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
program instructions to obtain the quality of the data partition based on the distribution similarity                     
                        
                                s
                            
                                i
                            
                 and the influential weight of the feature representative                     
                        
                                F
                            
                                i
                            
                . (Insignificant Extra Solution Activity: storing and retrieving data from memory is well-understood, routine, conventional activity – see Berkheimer evidence MPEP § 2106.05(d))

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6, 8-12, 15-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over S. Ryu, H. Choi, H. Lee and H. Kim, ("Convolutional Autoencoder Based Feature Extraction and Clustering for Customer Load Analysis," in IEEE Transactions on Power Systems, vol. 35, no. 2, pp. 1048-1060, March 2020, doi: 10.1109/TPWRS.2019.2936293; hereinafter “Ryu”) in view of Tavakoli, et. al. (“Clustering Time Series Data through Autoencoder-based Deep Learning Models”, arXiv:1807.04001v1 [cs.LG], 11 Jul 2018; hereinafter “Tavakoli”) and further in view of Jing, et. al (“Stratified feature sampling method for ensemble clustering of high dimensional data,” Pattern Recognition, Volume 48, Issue 11, 2015, Pages 3688-3702, ISSN 0031-3203,; hereinafter, “Jing”). 

Claim 1, Ryu teaches: 
A computer-implemented method comprising: Obtaining…an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (Ryu, sec. II(A): “The dataset includes the 1-hour interval load data of 2016 to 2017 from residential customers dispersed all around the country” ).
Determining…a feature representative data set having a plurality of feature representative data records, (Ryu, sec. III(A): “Among the participants, we extract 1,405 households that have full records from March 1, 2016, to February 23, 2017 (i.e., 360 days). We select the number of days to be 360 because it is the largest number of days in a year while having small prime numbers (2,3,5) that can be used as strides in the convolutional layers. Also, by starting from March 1, one YLP covers all four seasons, and each season occupies a region of load image in order.”)
each feature representative data record having values of a second number of feature representatives, (Ryu, sec. I 4th paragraph: “After dimensionality reduction, compressed metering data (i.e., extracted features) can be utilized for smart grid applications.” Examiner notes that Ryu teaches data with features and compressed feature data such that the compressed data is the second number representative of features)
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives; (Ryu, secs. II, III(B) and III(C): “In the encoding step, we determine the structure of CAE and train the network with the preprocessed data. Then normalized [Yearly Load Profiles] are encoded to lower dimensional space by the CAE” “As the name suggests, CAE is the combination of autoencoder and convolutional neural network (CNN). An autoencoder is an artificial neural network which is composed of an encoder and a decoder neural network. When the input data x is fed into the autoencoder, it is nonlinearly transformed into an encoded vector z while passing through multiple fully-connected layers in the encoder network” “We select the final model structure to 1-d CAE with 4/1 layout that has the lowest RMSE on average (0.65), and we further train the selected layout for the comparison with other dimensionality reduction methods.” Examiner notes that Ryu teaches data with features and compressed feature data such that the compressed data is the second number representative of features)
wherein the autoencoder reduces a size of features associated with the original data set; (Ryu sec. I, second to last paragraph: “For feature extraction of YLP, we design CAE structure considering the characteristics of time series metering data. Through the CAE, YLPs in 8,640-dimensional space are efficiently compressed to lower dimensional vectors, which can be restored to original YLPs with less reconstruction errors than other methods”)
segmenting…the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (Ryu, sec. I, 4th paragraph: “After dimensionality reduction, compressed metering data (i.e., extracted features) can be utilized for smart grid applications. Specifically, clustering analysis holds important position in load data mining with various applications, e.g., bad data identification, load forecasting, tariff setting, etc. [12]. Clustering after dimensionality reduction is known as indirect clustering [13] and has advantages in reducing computational complexity and feature extraction.”)
computing, by one or more processing units, an influential weight for a feature representative in at least one of the feature representative data subsets, (Ryu, sec. III(B): “and                                 
                                    
                                            W
                                        
                                            e
                                            n
                                            c
                                            o
                                            d
                                            e
                                            r
                                        
                                    ∈
                                     
                                            R
                                        
                                                    d
                                                
                                                    '
                                                
                                    ,
                                     
                                            b
                                        
                                            e
                                            n
                                            c
                                            o
                                            d
                                            e
                                            r
                                        
                                    ∈
                                     
                                            R
                                        
                                                    d
                                                
                                                    '
                                                
                             are weights and bias of hidden layer, respectively.”)
responsive to the quality of feature representative data subset, training, by one or more processing units, a supervised machine learning model with the feature representative data subset. (Ryu, sec. III(B): “An autoencoder is an artificial neural network which is composed of an encoder and a decoder neural network. When the input data x is fed into the autoencoder, it is nonlinearly transformed into an encoded vector z while passing through multiple fully-connected layers in the encoder network. Next, from the encoded vector z, the reconstruction x is obtained through the decoder network…In training, the autoencoder updates the weights in order to minimize the reconstruction error.”)
Ryu does not explicitly disclose, but Tavakoli teaches: 
…by one or more processing units…(Tavakoli, sec. VIII(A): “The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”), 
determining, by one or more processing units (Tavakoli, sec. VIII(A): “The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”), an accuracy of prediction of the autoencoder (Tavakoli, sec. I: “Deep learning is capable of optimizing a prediction model by iteratively learning new features through various internal neural networks and the neurons incorporated in each layer. To address both problems simultaneously, an autoencoder-based deep learning algorithm is utilized, in which the autoencoder not only take into accounts the hidden features but also preserves the features that are salient for computation and prediction…The results show achieving an accuracy of 87.5% in correctly predicting the cluster labels of time series data”) 
obtaining…the influential weight of the feature representative based on the accuracy of the prediction; (Tavakoli sec. VI: “The adjustment of weights for the internal layers and their nodes are decided and optimized in a repetitive manner where the loss function on the output (i.e., reconstructed input) is used as a means to measure the accuracy of the clustering.” Examiner notes that Tavakoli teaches using (i.e. obtaining) weights for the step of measuring accuracy of clustering and further adjustment such that the adjusted weight is obtained based on (i.e. influences) the accuracy). 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu. Ryu teaches levering autoencoders to capture long-term characteristics for clustering and anlysis; Tavakoli teaches use of autoencoders to learn known and hidden features of time series data for clustering purposes. One of ordinary skill would have been motivated to combine the teachings of Tavakoli into Ryu in order to improve accuracy in clustering and predicting labels for unseen data (Tavakoli, Abstract). 
Ryu, as modified, does not explicitly disclose, but Jing teaches: 
partitioning…the feature representative data records in the two or more clusters (Jing, abstract: “ In this paper, we propose a stratified sampling method for generating subspace component data sets in ensemble clustering of high dimensional data.”) to form a predefined number of stratified feature representative data subsets (Jing, abstract: “Instead of randomly sampling a subset of features for each component data set, in this method we first cluster the features of high dimensional data into a few feature groups called feature strata. Using stratified sampling, we randomly sample some features from each feature stratum and merge the sampled features from different feature strata to generate a component data set.”) ; and  
wherein the feature representative data subsets provide feature distribution consistency as compared to the original data set; (Jing, sec. 3: “ That is, stratified sampling with optimal allocation gives a smaller variance than random sampling. Therefore, the component data set by stratified sampling is more representative to the whole data set than the component data set by random sampling.”)
wherein the influential weight of the feature representative quantifies an effect of the feature representative on prediction accuracy and (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .”) is computed by: randomly changing, by one or more processing units, the value of one or more feature representative records and fixing values of other feature representative records; (Jing, sec. 2: “The random sampling method randomly selected p features from X to generate a low dimensional component data set. Because any feature has the same chance to be selected in every random sampling, the component data sets may have partial common features.” Examiner notes that Jing teaches multiple ses of records, each containing feature records different from each other (i.e. random change) and same feature records as each other (i.e. fixed)). 
utilizing the one or more changed feature representative records and the fixed feature representative records; (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .”)
evaluating, by one or more processing units, a quality of feature representative data subset based on the influential weight associated with the feature representative; (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .” Examiner notes that Jing teaches accuracy as a quality of feature representative data subsets)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Jing into Ryu, as modified. Jing teaches a stratified sampling method for generating subspace component data sets in ensemble clustering of high dimensional data. One of ordinary skill would have been motivated to combine the teachings of Jing into Ryu, as modified, in order to have better representations of the clustering structure in the original data set (Jing, abstract). 

Claim 2, Ryu teaches: 
The computer-implemented method of claim 1, further comprising: obtaining…data subsets of the original data set according to the number of feature representative data subsets.  (Ryu, sec. II(A) and IV(A): “The dataset includes the 1-hour interval load data of 2016 to 2017 from residential customers dispersed all around the country”  “To check the seasonal and daily characteristics, observation period is either year, seasons (summer and winter; Jul. ∼ Aug., Jan. ∼ Feb.) and daily peaks (8 am/pm).” Examiner notes that Ryu teaches partitioning records by predefined time periods).  
Ryu does not explicitly disclose but Tavakoli teaches:
…by one or more processing units… (Tavakoli, sec. VIII(A): “The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”), 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.

Claim 6, Ryu teaches:
The computer-implemented method of claim 5, wherein evaluating…a quality of data partition based on the influential weights and the partition of the feature representative data set further comprises: for each feature representative                                 
                                    
                                            F
                                        
                                            i
                                        
                            ., measuring a distribution similarity                                 
                                    
                                            s
                                        
                                            i
                                        
                            of the feature representative                                 
                                    
                                            F
                                        
                                            i
                                        
                             between the respective feature representative data subsets and the feature representative data set; and (Ryu, sec. IV: “Within our framework one may use other clustering techniques such as partitioning around medoids, hierarchical clustering, or self-organizing map. Given                                 
                                    
                                            z
                                        
                                            n
                                        
                                    ,
                                     
                                    n
                                    =
                                    1
                                    ,
                                    …
                                    N
                                
                             and K, the K-means clustering is the following optimization problem [39]: minimize                                 
                                    P
                                    
                                            W
                                            ,
                                            C
                                        
                                    =
                                     
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                                N
                                            
                                                    ∑
                                                    
                                                        n
                                                        =
                                                        1
                                                    
                                                        N
                                                    
                                                            w
                                                        
                                                            n
                                                            ,
                                                            k
                                                        
                                                                            z
                                                                        
                                                                            n
                                                                        
                                                                    -
                                                                    
                                                                            c
                                                                        
                                                                            k
                                                                        
                                                            2
                                                        
                                                            2
                                                        
                            … where W={wn,k} is an N×K partition matrix, and C={c1,…,cK} is a set of cluster centers, respectively… In this regard, the clustering validation index (CVI) can provide guidance for the goodness of clustering and the selection of cluster number K. However, the selection of K is rather a user's choice depending on the purpose of applications [14].” Examiner notes that the broadest reasonable interpretation of a “distribution similarity” is a distance as taught by Ryu where the centroid may be set as needed by the user as taught by Ryu.”)
obtaining the quality of the data partition based on the distribution similarity                                 
                                    
                                            s
                                        
                                            i
                                        
                             and the influential weight of the feature representative                                 
                                    
                                            F
                                        
                                            i
                                        
                            . (Ryu, sec. IV: “Within our framework one may use other clustering techniques such as partitioning around medoids, hierarchical clustering, or self-organizing map. Given                                 
                                    
                                            z
                                        
                                            n
                                        
                                    ,
                                     
                                    n
                                    =
                                    1
                                    ,
                                    …
                                    N
                                
                             and K, the K-means clustering is the following optimization problem [39]: minimize                                 
                                    P
                                    
                                            W
                                            ,
                                            C
                                        
                                    =
                                     
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                                N
                                            
                                                    ∑
                                                    
                                                        n
                                                        =
                                                        1
                                                    
                                                        N
                                                    
                                                            w
                                                        
                                                            n
                                                            ,
                                                            k
                                                        
                                                                            z
                                                                        
                                                                            n
                                                                        
                                                                    -
                                                                    
                                                                            c
                                                                        
                                                                            k
                                                                        
                                                            2
                                                        
                                                            2
                                                        
                            … where W={wn,k} is an N×K partition matrix, and C={c1,…,cK} is a set of cluster centers, respectively.” Examiner notes that the broadest reasonable interpretation of a “quality” is an attribute or characteristic such as a cluster center as taught by Ryu).
Ryu does not explicitly disclose, but Tavakoli teaches:
…by one or more processing units… (Tavakoli, sec. VIII(A): “The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”),
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.

Claim 8, Ryu teaches:
randomly sampling…the feature representative data records in each cluster of the two or more clusters to form the third number of feature representative data subsets. (Ryu, fig. 15: “Illustration of the center and the samples of distinctive clusters. The first column shows the cluster centers and the remaining columns are random samples in each cluster. (a) Cluster 1, low overall loads. (b) Cluster 3, increase in summer afternoon. (c) Cluster 10, increase in summer and weak daily peaks. (d) Cluster 11, increase in summer and strong daily peaks. (e) Cluster 12, seasonality and day-time constant loads.” Examiner notes that Ryu teaches two or more clusters as well as random sampling of each cluster to form another subset of data)
Ryu does not explicitly disclose, but , Tavakoli teaches: 
The computer-implemented method of claim 1, wherein partitioning, by one or more processing units (Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”), the feature representative data records in the two or more clusters to form a third number of feature representative data subsets comprises: (Tavakoli, sec I: “A simple and typical clustering algorithm (e.g., KMeans) takes as input a numerical vector representing the original data and measures the distance between data items using a simple distance metric (e.g., Euclidean). The assignment of data observations to different groups is then optimized with respect to the adjustment and optimization, which is an repetitive process.” Examiner notes that Tavakoli teaches a repetitive process of adjusting and optimizing groups which, with each adjustment and optimization, would form second, third, fourth, etc., number of data subsets). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as modified, as set forth above with respect to claim 1. 

Claim 9, Ryu teaches:
The computer-implemented method of claim 2, wherein the features from the data subsets and the original data set is selected from the group consisting of categorical variables and continuous variables.  (Ryu, sec. II(A) and IV(A): “The dataset includes the 1-hour interval load data of 2016 to 2017 from residential customers dispersed all around the country” “To check the seasonal and daily characteristics, observation period is either year, seasons (summer and winter; Jul. ∼ Aug., Jan. ∼ Feb.) and daily peaks (8 am/pm).” Examiner notes that Ryu teaches categorical (seasons) as well as continuous features (daily peaks) in a group).  

Claim 10, Tavakoli teaches: 
The computer-implemented method of claim 1, wherein the original data set is selected from the group consisting of: an insurance domain, a banking domain, a healthcare domain, a financial domain, an entertainment domain, and a business domain. (Tavakoli, abstract: “The paper reports a case study in which financial and stock time series data of selected 70 stock indices are clustered into distinct groups using the introduced two-stage procedure”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as modified, as set forth above with respect to claim 1. 

Claim 11, Ryu teaches: 
…obtain an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (Ryu, sec. II(A): “The dataset includes the 1-hour interval load data of 2016 to 2017 from residential customers dispersed all around the country”).
…determine a feature representative data set having a plurality of feature representative data records, (Ryu, sec. III(A): “Among the participants, we extract 1,405 households that have full records from March 1, 2016, to February 23, 2017 (i.e., 360 days). We select the number of days to be 360 because it is the largest number of days in a year while having small prime numbers (2,3,5) that can be used as strides in the convolutional layers. Also, by starting from March 1, one YLP covers all four seasons, and each season occupies a region of load image in order.”)
each feature representative data record having values of a second number of feature representatives, (Ryu, sec. I 4th paragraph: “After dimensionality reduction, compressed metering data (i.e., extracted features) can be utilized for smart grid applications.” Examiner notes that Ryu teaches data with features and compressed feature data such that the compressed data is the second number representative of features)
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives; (Ryu, secs. II, III(B) and III(C): “In the encoding step, we determine the structure of CAE and train the network with the preprocessed data. Then normalized [Yearly Load Profiles] are encoded to lower dimensional space by the CAE” “As the name suggests, CAE is the combination of autoencoder and convolutional neural network (CNN). An autoencoder is an artificial neural network which is composed of an encoder and a decoder neural network. When the input data x is fed into the autoencoder, it is nonlinearly transformed into an encoded vector z while passing through multiple fully-connected layers in the encoder network” “We select the final model structure to 1-d CAE with 4/1 layout that has the lowest RMSE on average (0.65), and we further train the selected layout for the comparison with other dimensionality reduction methods.” Examiner notes that Ryu teaches data with features and compressed feature data such that the compressed data is the second number representative of features)
wherein the autoencoder reduces a size of features associated with the original data set; (Ryu sec. I, second to last paragraph: “For feature extraction of YLP, we design CAE structure considering the characteristics of time series metering data. Through the CAE, YLPs in 8,640-dimensional space are efficiently compressed to lower dimensional vectors, which can be restored to original YLPs with less reconstruction errors than other methods”)
…segment the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (Ryu, sec. I, 4th paragraph: “After dimensionality reduction, compressed metering data (i.e., extracted features) can be utilized for smart grid applications. Specifically, clustering analysis holds important position in load data mining with various applications, e.g., bad data identification, load forecasting, tariff setting, etc. [12]. Clustering after dimensionality reduction is known as indirect clustering [13] and has advantages in reducing computational complexity and feature extraction.”)
…to compute an influential weight for a feature representative in at least one of the feature representative data subsets, (Ryu, sec. III(B): “and                                 
                                    
                                            W
                                        
                                            e
                                            n
                                            c
                                            o
                                            d
                                            e
                                            r
                                        
                                    ∈
                                     
                                            R
                                        
                                                    d
                                                
                                                    '
                                                
                                    ,
                                     
                                            b
                                        
                                            e
                                            n
                                            c
                                            o
                                            d
                                            e
                                            r
                                        
                                    ∈
                                     
                                            R
                                        
                                                    d
                                                
                                                    '
                                                
                             are weights and bias of hidden layer, respectively.”)
program instructions to, responsive to the quality of feature representative data subset, train a supervised machine learning model with the feature representative data subsets. (Ryu, sec. III(B): “An autoencoder is an artificial neural network which is composed of an encoder and a decoder neural network. When the input data x is fed into the autoencoder, it is nonlinearly transformed into an encoded vector z while passing through multiple fully-connected layers in the encoder network. Next, from the encoded vector z, the reconstruction x is obtained through the decoder network…In training, the autoencoder updates the weights in order to minimize the reconstruction error.”)
Ryu does not explicitly disclose, but Tavakoli teaches:
A computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: (Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”),
program instructions to… (Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…”)
…determine an accuracy of prediction of the autoencoder (Tavakoli, sec. I: “Deep learning is capable of optimizing a prediction model by iteratively learning new features through various internal neural networks and the neurons incorporated in each layer. To address both problems simultaneously, an autoencoder-based deep learning algorithm is utilized, in which the autoencoder not only take into accounts the hidden features but also preserves the features that are salient for computation and prediction…The results show achieving an accuracy of 87.5% in correctly predicting the cluster labels of time series data”)
…obtain the influential weight of the feature representative based on the accuracy of the prediction; and (Tavakoli sec. VI: “The adjustment of weights for the internal layers and their nodes are decided and optimized in a repetitive manner where the loss function on the output (i.e., reconstructed input) is used as a means to measure the accuracy of the clustering.” Examiner notes that Tavakoli teaches using (i.e. obtaining) weights for the step of measuring accuracy of clustering and further adjustment such that the adjusted weight is obtained based on (i.e. influences) the accuracy). 
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.
Ryu as modified does not explicitly disclose, but Jing teaches: 
…partition the feature representative data records in the two or more clusters (Jing, abstract: “ In this paper, we propose a stratified sampling method for generating subspace component data sets in ensemble clustering of high dimensional data.”)  to form a predefined number of stratified feature representative data subsets, (Jing, abstract: “Instead of randomly sampling a subset of features for each component data set, in this method we first cluster the features of high dimensional data into a few feature groups called feature strata. Using stratified sampling, we randomly sample some features from each feature stratum and merge the sampled features from different feature strata to generate a component data set.”)
wherein the feature representative data subsets provide feature distribution consistency as compared to the original data set; (Jing, sec. 3: “ That is, stratified sampling with optimal allocation gives a smaller variance than random sampling. Therefore, the component data set by stratified sampling is more representative to the whole data set than the component data set by random sampling.”)
wherein the influential weight of the feature representative in at quantifies an effect of the feature representative on prediction accuracy and (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate q= 1%.”) is computed by: program instructions to randomly change the value of one or more feature representative records and fixing values of other feature representatives records; (Jing, sec. 2: “The random sampling method randomly selected p features from X to generate a low dimensional component data set. Because any feature has the same chance to be selected in every random sampling, the component data sets may have partial common features.” Examiner notes that Jing teaches multiple ses of records, each containing feature records different from each other (i.e. random change) and same feature records as each other (i.e. fixed)). 
utilizing the one or more changed feature representative records and the fixed feature representative records; (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .”)  
…evaluate a quality of feature representative data subset based on the influential weight associated with the feature representative; and (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .” Examiner notes that Jing teaches accuracy as a quality of feature representative data subsets)
	It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Jing into Ryu, as modified, as set forth above with respect to claim 1.

Claim 12, Ryu teaches: 
The computer program product of claim 11, wherein the program instructions stored on the one or more computer readable storage media further comprise: …obtain a third number of data subsets of the original data set according to the number of feature representative data subsets.  (Ryu, sec. II(A) and IV(A): “The dataset includes the 1-hour interval load data of 2016 to 2017 from residential customers dispersed all around the country”  “To check the seasonal and daily characteristics, observation period is either year, seasons (summer and winter; Jul. ∼ Aug., Jan. ∼ Feb.) and daily peaks (8 am/pm).” Examiner notes that Ryu teaches partitioning records by predefined time periods).  
Ryu does not explicitly disclose, but Tavakoli teaches: 
program instructions to…(Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.

Claim 15, Ryu teaches: 
…obtain an original data set including a plurality of data records, each data record in the original data set having values of a first number of features; (Ryu, sec. II(A): “The dataset includes the 1-hour interval load data of 2016 to 2017 from residential customers dispersed all around the country” ).
…determine a feature representative data set having a plurality of feature representative data records, (Ryu, sec. III(A): “Among the participants, we extract 1,405 households that have full records from March 1, 2016, to February 23, 2017 (i.e., 360 days). We select the number of days to be 360 because it is the largest number of days in a year while having small prime numbers (2,3,5) that can be used as strides in the convolutional layers. Also, by starting from March 1, one YLP covers all four seasons, and each season occupies a region of load image in order.”)
each feature representative data record having values of a second number of feature representatives, (Ryu, sec. I 4th paragraph: “After dimensionality reduction, compressed metering data (i.e., extracted features) can be utilized for smart grid applications.” Examiner notes that Ryu teaches data with features and compressed feature data such that the compressed data is the second number representative of features)
wherein the second number of feature representatives is obtained by a trained autoencoder neutral network with values of the first number of features as inputs, and wherein the second number of feature representatives is smaller than the first number of feature representatives; (Ryu, secs. II, III(B) and III(C): “In the encoding step, we determine the structure of CAE and train the network with the preprocessed data. Then normalized [Yearly Load Profiles] are encoded to lower dimensional space by the CAE” “As the name suggests, CAE is the combination of autoencoder and convolutional neural network (CNN). An autoencoder is an artificial neural network which is composed of an encoder and a decoder neural network. When the input data x is fed into the autoencoder, it is nonlinearly transformed into an encoded vector z while passing through multiple fully-connected layers in the encoder network” “We select the final model structure to 1-d CAE with 4/1 layout that has the lowest RMSE on average (0.65), and we further train the selected layout for the comparison with other dimensionality reduction methods.” Examiner notes that Ryu teaches data with features and compressed feature data such that the compressed data is the second number representative of features)
wherein the autoencoder reduces a size of features associated with the original data set; (Ryu sec. I, second to last paragraph: “For feature extraction of YLP, we design CAE structure considering the characteristics of time series metering data. Through the CAE, YLPs in 8,640-dimensional space are efficiently compressed to lower dimensional vectors, which can be restored to original YLPs with less reconstruction errors than other methods”)
…segment the plurality of feature representative data records into two or more clusters based on the values of the second number of feature representatives; (Ryu, sec. I, 4th paragraph: “After dimensionality reduction, compressed metering data (i.e., extracted features) can be utilized for smart grid applications. Specifically, clustering analysis holds important position in load data mining with various applications, e.g., bad data identification, load forecasting, tariff setting, etc. [12]. Clustering after dimensionality reduction is known as indirect clustering [13] and has advantages in reducing computational complexity and feature extraction.”)
…to compute an influential weight for a feature representative in at least one of the feature representative data subsets, (Ryu, sec. III(B): “and                                 
                                    
                                            W
                                        
                                            e
                                            n
                                            c
                                            o
                                            d
                                            e
                                            r
                                        
                                    ∈
                                     
                                            R
                                        
                                                    d
                                                
                                                    '
                                                
                                    ,
                                     
                                            b
                                        
                                            e
                                            n
                                            c
                                            o
                                            d
                                            e
                                            r
                                        
                                    ∈
                                     
                                            R
                                        
                                                    d
                                                
                                                    '
                                                
                             are weights and bias of hidden layer, respectively.”)
responsive to the quality of feature representative data subsets, training, by one or more processing units, a supervised machine learning model with the feature representative data subsets. (Ryu, sec. III(B): “An autoencoder is an artificial neural network which is composed of an encoder and a decoder neural network. When the input data x is fed into the autoencoder, it is nonlinearly transformed into an encoded vector z while passing through multiple fully-connected layers in the encoder network. Next, from the encoded vector z, the reconstruction x is obtained through the decoder network…In training, the autoencoder updates the weights in order to minimize the reconstruction error.”)
Ryu does not explicitly disclose, but Tavakoli teaches:
A computer system for comprising: one or more computer processors; one or more computer readable storage media; (Tavakoli, sec. VIII(A): “The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”), 
and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: (Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…The experiments were executed on a Mac computer with OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3”), 
program instructions to…(Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…”)
… determine an accuracy of prediction of the autoencoder (Tavakoli, sec. I: “Deep learning is capable of optimizing a prediction model by iteratively learning new features through various internal neural networks and the neurons incorporated in each layer. To address both problems simultaneously, an autoencoder-based deep learning algorithm is utilized, in which the autoencoder not only take into accounts the hidden features but also preserves the features that are salient for computation and prediction…The results show achieving an accuracy of 87.5% in correctly predicting the cluster labels of time series data”)
…obtain the influential weight of the feature representative based on the accuracy of the prediction; and (Tavakoli sec. VI: “The adjustment of weights for the internal layers and their nodes are decided and optimized in a repetitive manner where the loss function on the output (i.e., reconstructed input) is used as a means to measure the accuracy of the clustering.” Examiner notes that Tavakoli teaches using (i.e. obtaining) weights for the step of measuring accuracy of clustering and further adjustment such that the adjusted weight is obtained based on (i.e. influences) the accuracy). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.
Ryu as modified does not explicitly disclose, but Jing teaches: 
…partition the feature representative data records in the two or more clusters (Jing, abstract: “ In this paper, we propose a stratified sampling method for generating subspace component data sets in ensemble clustering of high dimensional data.”) to form a predefined number of stratified feature representative data subsets (Jing, abstract: “Instead of randomly sampling a subset of features for each component data set, in this method we first cluster the features of high dimensional data into a few feature groups called feature strata. Using stratified sampling, we randomly sample some features from each feature stratum and merge the sampled features from different feature strata to generate a component data set.”); and 
wherein the feature representative data subsets provide feature distribution consistency as compared to the original data set; (Jing, sec. 3: “ That is, stratified sampling with optimal allocation gives a smaller variance than random sampling. Therefore, the component data set by stratified sampling is more representative to the whole data set than the component data set by random sampling.”)
wherein the influential weight of the feature representative quantifies an effect of the feature representative on prediction accuracy and (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .”)  is computed by: program instructions to randomly change the value of one or more feature representative records and fixing values of other feature representatives records; (Jing, sec. 2: “The random sampling method randomly selected p features from X to generate a low dimensional component data set. Because any feature has the same chance to be selected in every random sampling, the component data sets may have partial common features.” Examiner notes that Jing teaches multiple ses of records, each containing feature records different from each other (i.e. random change) and same feature records as each other (i.e. fixed))
utilizing the one or more changed feature representative records and the fixed feature representative records; (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .”) 
…evaluate a quality of feature representative data subset based on the influential weight associated with the feature representative; and (Jing, sec. 2 and Fig. 2: “For the 100 low dimensional component data sets from each sampling method, we used the k-means algorithm to cluster each data set into 3 clusters and computed the clustering accuracy of the result. We divided the 100 clustering results into 6 accuracy groups of (0,0.5], (0.5,0.6], (0.6,0.7], (0.7,0.8], (0.8,0.9], (0.9,1]). Fig. 2 plots the distributions of the clustering results in accuracy groups on the six data sets with sampling rate                                 
                                    q
                                    =
                                     
                                    1
                                    %
                                
                            .” Examiner notes that Jing teaches accuracy as a quality of feature representative data subsets)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Jing into Ryu, as modified, as set forth above with respect to claim 1.

Claim 16, Ryu teaches: 
The computer system of claim 15, wherein the program instructions further comprise: 
program instructions to obtain a third number of data subsets of the original data set according to the number of feature representative data subsets.  (Ryu, fig. 15: “Illustration of the center and the samples of distinctive clusters. The first column shows the cluster centers and the remaining columns are random samples in each cluster. (a) Cluster 1, low overall loads. (b) Cluster 3, increase in summer afternoon. (c) Cluster 10, increase in summer and weak daily peaks. (d) Cluster 11, increase in summer and strong daily peaks. (e) Cluster 12, seasonality and day-time constant loads.” Examiner notes that Ryu teaches two or more clusters as well as random sampling of each cluster to form another subset of data)
Ryu does not explicitly disclose, but Tavakoli teaches: 
program instructions to…(Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.

Claim 19, Ryu teaches:
The computer system of claim 15, wherein the program instructions further comprise…evaluate a quality of data partition based on the influential weights and the feature representative data subsets.  (Ryu, sec. IV: “Within our framework one may use other clustering techniques such as partitioning around medoids, hierarchical clustering, or self-organizing map. Given                                 
                                    
                                            z
                                        
                                            n
                                        
                                    ,
                                     
                                    n
                                    =
                                    1
                                    ,
                                    …
                                    N
                                
                             and K, the K-means clustering is the following optimization problem [39]: minimize                                 
                                    P
                                    
                                            W
                                            ,
                                            C
                                        
                                    =
                                     
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                                N
                                            
                                                    ∑
                                                    
                                                        n
                                                        =
                                                        1
                                                    
                                                        N
                                                    
                                                            w
                                                        
                                                            n
                                                            ,
                                                            k
                                                        
                                                                            z
                                                                        
                                                                            n
                                                                        
                                                                    -
                                                                    
                                                                            c
                                                                        
                                                                            k
                                                                        
                                                            2
                                                        
                                                            2
                                                        
                            … where W={wn,k} is an N×K partition matrix, and C={c1,…,cK} is a set of cluster centers, respectively.” Examiner notes that the broadest reasonable interpretation of a “quality” is an attribute or characteristic such as a cluster center as taught by Ryu).
However, Tavakoli teaches: 
program instructions to…(Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.

Claim 20, Ryu teaches: 
The computer system of claim 19, wherein evaluating a quality of data partition based on the influential weights and the partition of the feature representative data set further comprises: for each feature representative                                 
                                    
                                            F
                                        
                                            i
                                        
                             program instructions to measure a distribution similarity                                 
                                    
                                            s
                                        
                                            i
                                        
                             of the feature representative                                 
                                    
                                            F
                                        
                                            i
                                        
                             between the respective feature representative data subsets and the feature representative data set; and (Ryu, sec. IV: “Within our framework one may use other clustering techniques such as partitioning around medoids, hierarchical clustering, or self-organizing map. Given                                 
                                    
                                            z
                                        
                                            n
                                        
                                    ,
                                     
                                    n
                                    =
                                    1
                                    ,
                                    …
                                    N
                                
                             and K, the K-means clustering is the following optimization problem [39]: minimize                                 
                                    P
                                    
                                            W
                                            ,
                                            C
                                        
                                    =
                                     
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                                N
                                            
                                                    ∑
                                                    
                                                        n
                                                        =
                                                        1
                                                    
                                                        N
                                                    
                                                            w
                                                        
                                                            n
                                                            ,
                                                            k
                                                        
                                                                            z
                                                                        
                                                                            n
                                                                        
                                                                    -
                                                                    
                                                                            c
                                                                        
                                                                            k
                                                                        
                                                            2
                                                        
                                                            2
                                                        
                            … where W={wn,k} is an N×K partition matrix, and C={c1,…,cK} is a set of cluster centers, respectively… In this regard, the clustering validation index (CVI) can provide guidance for the goodness of clustering and the selection of cluster number K. However, the selection of K is rather a user's choice depending on the purpose of applications [14].” Examiner notes that the broadest reasonable interpretation of a “distribution similarity” is a distance as taught by Ryu where the centroid may be set as needed by the user as taught by Ryu.”)
…obtain the quality of the data partition based on the distribution similarity                                 
                                    
                                            s
                                        
                                            i
                                        
                             and the influential weight                                 
                                    
                                            w
                                        
                                            i
                                        
                             of the feature representative                                 
                                    
                                            F
                                        
                                            i
                                        
                            . (Ryu, sec. IV: “Within our framework one may use other clustering techniques such as partitioning around medoids, hierarchical clustering, or self-organizing map. Given                                 
                                    
                                            z
                                        
                                            n
                                        
                                    ,
                                     
                                    n
                                    =
                                    1
                                    ,
                                    …
                                    N
                                
                             and K, the K-means clustering is the following optimization problem [39]: minimize                                 
                                    P
                                    
                                            W
                                            ,
                                            C
                                        
                                    =
                                     
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                                N
                                            
                                                    ∑
                                                    
                                                        n
                                                        =
                                                        1
                                                    
                                                        N
                                                    
                                                            w
                                                        
                                                            n
                                                            ,
                                                            k
                                                        
                                                                            z
                                                                        
                                                                            n
                                                                        
                                                                    -
                                                                    
                                                                            c
                                                                        
                                                                            k
                                                                        
                                                            2
                                                        
                                                            2
                                                        
                            … where W={wn,k} is an N×K partition matrix, and C={c1,…,cK} is a set of cluster centers, respectively.” Examiner notes that the broadest reasonable interpretation of a “quality” is an attribute or characteristic such as a cluster center as taught by Ryu).
Ryu does not explicitly disclose: 
program instructions to…
However, Tavakoli teaches: 
program instructions to…(Tavakoli, sec. VIII(A): “The authors implemented the algorithms in Python 2.7.13, the anaconda version…”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Tavakoli into Ryu, as set forth above with respect to claim 1.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Ryu in view of Tavakoli, in view of Jing, and further in view of Sharma, Aditya (“Activation Functions in Deep Learning – A Complete Overview”, LearnOpenCV, 30 Oct 2017; hereinafter “Sharma”)

Claim 7, Ryu teaches:
The computer-implemented method of claim 6, wherein the quality of the data partition is obtained utilizing the following formula:  
    PNG
    media_image1.png
    42
    80
    media_image1.png
    Greyscale
 (Sharma, sec. 1, second figure: “In the above figure,                         
                            
                                            x
                                        
                                            1
                                        
                                    ,
                                    …
                                    ,
                                     
                                            x
                                        
                                            n
                                        
                     is the signal vector that gets multiplied with the weight                         
                            
                                            w
                                        
                                            1
                                        
                                    ,
                                    
                                            w
                                        
                                            2
                                        
                                    ,
                                    …
                                    ,
                                     
                                            w
                                        
                                            n
                                        
                    . This is followed by accumulation ( i.e., summation + addition of bias                         
                            b
                        
                    . Finally, an activation function                         
                            b
                        
                     is applied to this sum.” Examiner notes that the equation claimed is the same as taught by Sharma where the bias,                         
                            b
                            ,
                        
                     is zero)
wherein                         
                            q
                        
                     is the quality of the data partition,                         
                            
                                    s
                                
                                    i
                                
                     is the distribution similarity and                         
                            
                                    w
                                
                                    i
                                
                     is the influential weight of the feature representative                         
                            
                                    F
                                
                                    i
                                
                    . (Sharma, sec. 1, second figure: “An example of a neuron showing the input                         
                            
                                            x
                                        
                                            1
                                        
                                    -
                                    
                                            x
                                        
                                            n
                                        
                    , their corresponding weights                         
                            
                                            w
                                        
                                            1
                                        
                                    -
                                    
                                            w
                                        
                                            n
                                        
                    , a bias                         
                            
                                    b
                                
                     and the activation function                         
                            f
                        
                     applied to the weighted sum of the inputs.”) 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Sharma into Ryu, as modified. Sharma teaches different activation functions. One of ordinary skill would have been motivated to combine the teachings of Sharma into Ryu, as modified in order to determine which activation function is better (Sharma, second paragraph). 

Response to Applicant Remarks and Arguments
Claim Rejections – 35 U.S.C. §101
Beginning at the bottom of page 11 of applicant’s remarks, applicant argues that the pending claims recite patent-eligible improvements to a computer technology. In particular, applicant argues that the claimed invention solves the problem of imbalanced data. However, applicant’s claims include processes of observation, evaluation, judgment, or opinion—each of which are processes that may be performed entirely within the human mind or with the assistance of a pen and paper, as set forth above.
At the bottom of page 12 of applicant’s remarks, applicant further argues that independent claims 1, 11, and 15 are analogous USPTO Example 45. However, applicant’s manipulation of data is not analogous to the mold injection method claimed in Example 45. While the claims in Example 45 recited specific processes for improved the physical functioning of a mold injection system, applicant’s claims involve randomly changing values and determining an accuracy of prediction. Randomization and determination of accuracy do not integrate the judicial exception into practical application nor amount to significantly more. 

Claim Rejections – 35 U.S.C. §103  
	Beginning at the bottom of page 15 of applicant’s remarks, applicant argues that that the art cited do not teach the claims. Applicant specifically argues that Ryu and Tavakoli do not teach the claim limitations asserted in claim 1. However, in light of applicant’s amendments, claim 1 now stands rejected as unpatentable over Ryu in view of Tavakoli, and further in view of Jing for the reasons set forth in the rejection above. 
	At the top of page 17, applicant specifically argues that Ryu and Tavakoli do not teach “computing, by one or more processing units, an influential weight for a feature representative in at least one of the feature representative data subsets, where the influential weight of the feature representative quantifies an effect of the feature representative on prediction accuracy…” Applicant argues that the weights are not “influential weights”. However, it is unclear how applicant defines an influential weight over a standard weight. All weights create a bias on a feature wherein such weight influences the feature; in a neural network, the various weights on the various layers manipulating the features ultimately affects prediction accuracy. Where there is any weight at all, that weight will influence accuracy. Where the weight is zero, then the weight could be considered non-influential as a zero weight may be ignored. Given this lack of clarity, a person of ordinary skill in the art could reasonably interpret any non-zero weight as an “influential weight”. 
	At the bottom of page 17 of applicant’s remarks, applicant argues that independent claims 11 and 15 distinguish over prior art similarly to claim 1. Applicant further argues that dependent claims 2-10, 12-14, and 16-20 are similarly not obvious due to their dependency from claims 1, 11, and 15. However, dependent claims 3-5, 13-14, and 17-18 have been cancelled. Nevertheless as to independent claims 11 and 15 and the remaining dependent claims: Applicant does not set forth any additional arguments as to the aforementioned claimed, which therefore remain rejection for the same reasons as claim 1. As independent claims 1, 11, and 15 have not traversed prior art rejections, the dependent claims remain rejected as well. 
 
Claim 7 Rejection – 35 U.S.C. §103  
	At the bottom of page 15 of applicant’s remarks, applicant further argues that claim 7 is allowable as a result of its dependency on claim 1. However, as claim 1 stands rejected under 35 U.S.C. §101 and 35 U.S.C. §103 , claim 7 likewise remains as least as a result of its dependency on claim 1 and also for the additional reasons set forth above. 

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Ley whose telephone number is (571)272-3406. The examiner can normally be reached Monday - Thursday, 10:00am - 6:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/STL/Examiner, Art Unit 2147                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2151
Read full office action
Prosecution Timeline

Show 12 earlier events
Apr 25, 2025
Non-Final Rejection mailed — §101, §103
Jun 30, 2025
Interview Requested
Jul 16, 2025
Examiner Interview Summary
Jul 16, 2025
Applicant Interview (Telephonic)
Jul 25, 2025
Response Filed
Aug 21, 2025
Final Rejection mailed — §101, §103
Sep 16, 2025
Interview Requested
Oct 21, 2025
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/981,796
Patent 12632746
A METHOD AND APPARATUS FOR DISPLAYING CATEGORIZED CARBON EMISSIONS
3y 6m to grant Granted May 19, 2026
16/733,393
Patent 12443830
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS
5y 9m to grant Granted Oct 14, 2025
16/835,892
Patent 12135927
EXPERT-IN-THE-LOOP AI FOR MATERIALS DISCOVERY
4y 7m to grant Granted Nov 05, 2024
17/992,958
Patent 11880776
GRAPH NEURAL NETWORK (GNN)-BASED PREDICTION SYSTEM FOR TOTAL ORGANIC CARBON (TOC) IN SHALE
1y 2m to grant Granted Jan 23, 2024
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
19%
Grant Probability
53%
With Interview (+33.3%)
4y 8m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allowance rate.
DATA PARTITIONING WITH NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

DATA PARTITIONING WITH NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email