Office Action Analysis: 18214024 — Systems and Methods for Programmatic Labeling of Training Data for Machine Learning Models via Clustering

Office Action

§101 §103 §112 §DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on January 14, 2023 and October 3rd, 2023 was filed. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because Figures 2, 3(a), 3(c), 3(d), and 3(e) are difficult to examine, as the annotations and words are too small to be legible.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8, 14, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The phrase “the method proceeds as described” renders the scope of the claims unclear, as it fails to specify which limitations are being incorporated.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 8, 14, and 20 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. These claims recite “instead of” language which attempts to replace or remove a limitation of the base claim, rather than further limiting and incorporating all of the limitations of the claim upon which it depends.
Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.

Claim 1
Step 1: The claim recites a method; therefore, it is directed to the statutory category of
processes.
Step2A Prong 1: The claim recites, inter alia:
[G]enerating a real-valued representation for each datapoint in a dataset: This limitation is seen as a mental process because it involves analyzing information and creating a representation of a datapoint by expressing it numerically, which can be performed in the human mind.
based on a similarity between the generated representations, forming one or more groups or clusters of datapoints: This limitation recites a mental process because it involves grouping representations into clusters, which can be performed in the human mind.
[R]epresenting each formed group or cluster by a unique identifier: This limitation recites a mental process because it assigns a label to each group of clusters, which is mentally performable.
for each new datapoint… and determining a most likely cluster or group to which the new datapoint is assigned: This is a mental process because it involves the determination of which group the new datapoint belongs to.
[A]ssigning a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned: This is a mental process because it applies a label to an item based on a comparison, which can be performed in the human mind. 
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
[T]raining a machine learning model, comprising… and using a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning model: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
[U]sing the new datapoint as input to each trained classifier: Data Gather- Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
for each group or cluster, training a classifier to classify a datapoint as either inside or outside the group or cluster: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
[S]toring each trained classifier and associating the stored trained classifier with the cluster or group's unique identifier: This limitation is merely a post-solution step of storing the data—a nominal addition to the claim that does not meaningfully limit the claim. The method storing is recited at a high level of generality. Simply implementing the abstract idea in a generic method is not a practical application of the abstract idea. Therefore, storing step is an insignificant extra-solution activity. See MPEP 2106.05(g).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
[T]raining a machine learning model, comprising… and using a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning model: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
[U]sing the new datapoint as input to each trained classifier: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
for each group or cluster, training a classifier to classify a datapoint as either inside or outside the group or cluster: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
[S]toring each trained classifier and associating the stored trained classifier with the cluster or group's unique identifier: This limitation is merely a post-solution step of storing the data—a nominal addition to the claim that does not meaningfully limit the claim. The method storing is recited at a high level of generality. Simply implementing the abstract idea in a generic method is not a practical application of the abstract idea. Therefore, storing step is an insignificant extra-solution activity. See MPEP 2106.05(g).

The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes for generating representations, grouping data based on similarity, and assigning identifiers). The claim merely describes a process of analyzing data, forming clusters based on similarity comparisons, labeling those clusters, and assigning new datapoints to the most likely cluster which constitute abstract idea of data evaluation and organization. The recitation of training a machine learning model using classifiers, receiving datapoints as input, and storing data merely indicates a technological environment in which the abstract ideas are applied, without improving the functioning of a computer or any machine learning technology itself.
Therefore, the claim as a whole remains focused on the abstract idea and fails Step 2B of the eligibility
analysis.

Claim 2
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the real-valued representation for each datapoint in a dataset is generated by an embedding process: This limitation is a mental process because it involves generating a representation of data and expressing characteristics of the datapoint numerically, which is mentally performable.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 3
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the embedding process is a text embedding process: This limitation is a mental process because it involves analyzing textual information and representing it using numbers.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 4
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the unique identifier is based on one or more attributes of a datapoint or datapoints in the cluster or group: This limitation is a mental process because it involves evaluating characteristics of information and assigning a label based on the characteristics of the group.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 5
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
determining a most likely cluster or group to which the new datapoint should be assigned further comprises determining the cluster or group associated with the trained classifier having the highest level of certainty in its output: This limitation is a mental process because it involves the determination of which group a datapoint should be assigned to and determining which group has the highest level of certainty, which is mentally performable.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 6
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the similarity between the generated representations is determined based on a metric: This limitation recites a mathematical concept because it uses a math equation to calculate the similarity between representations.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 7
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the metric is one of Manhattan distance, Euclidean distance, or Cosine distance: This limitation recites a mathematical concept because it uses a math equation to calculate the similarity between representations.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 8
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
instead of generating a real-valued representation for each datapoint in a dataset, a plurality of real-valued representations for each datapoint in a dataset are generated, and for each of the plurality of representations, the method proceeds as described: This is a mental process because it involves creating a numerical representation of datapoints.
Step 2A Prong Two and Step 2B:The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 9
Step 1: The claim recites a system; therefore, it is directed to the statutory category of machine.
Step2A Prong 1: The claim recites, inter alia:
generate a real-valued representation for each datapoint in a dataset; based on a similarity between the generated representations: This limitation recites a mental process because it involves creating a numerical representation for each datapoint.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
A system, comprising: one or more electronic processors configured to execute a set of computer-executable instructions; and one or more non-transitory electronic data storage media containing the set of computer- executable instructions, wherein when executed, the instructions cause the one or more electronic processors: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
A system, comprising: one or more electronic processors configured to execute a set of computer-executable instructions; and one or more non-transitory electronic data storage media containing the set of computer- executable instructions, wherein when executed, the instructions cause the one or more electronic processor: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The remainder of claim 9 recites identical limitations to claim 1. Therefore, claim 9 is rejected using the same rationale as claim 1.

Claim 10 recites identical limitations to claim 2. Therefore claim 10 is rejected using the same rationale as claim 2.
Claim 11 recites identical limitations to claim 4. Therefore claim 11 is rejected using the same rationale as claim 4.
Claim 12 recites identical limitations to claim 5. Therefore claim 12 is rejected using the same rationale as claim 5.
Claim 13 recites identical limitations to claims 6 and 7. Therefore claim 13 is rejected using the same rationale as claims 6 and 7.
Claim 14 recites identical limitations to claim 8. Therefore claim 14 is rejected using the same rationale as claim 8.

Claim 15
Step 1: The claim recites a non-transitory computer medium; therefore, it is directed to the statutory
category of manufacture.
Step2A Prong 1: The claim recites, inter alia:
 generate a real-valued representation for each datapoint in a dataset:  This limitation recites a mental process because it involves creating a numerical representation for each datapoint.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
One or more non-transitory computer-readable media comprising a set of computer-executable instructions that when executed by one or more programmed electronic processors, cause the processors to:  Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
One or more non-transitory computer-readable media comprising a set of computer-executable instructions that when executed by one or more programmed electronic processors, cause the processors to:  Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The remainder of claim 15 recites identical limitations to claim 1. Therefore, claim 15 is rejected using the same rationale as claim 1.

Claim 16 recites identical limitations to claim 2. Therefore, claim 16 is rejected using the same rationale as claim 2.
Claim 17 recites identical limitations to claim 4. Therefore, claim 17 is rejected using the same rationale as claim 4.
Claim 18 recites identical limitations to claim 5. Therefore, claim 18 is rejected using the same rationale as claim 5.
Claim 19 recites identical limitations to claims 6 and 7. Therefore, claim 19 is rejected using the same rationale as claims 6 and 7.
Claim 20 recites identical limitations to claim 8. Therefore, claim 20 is rejected using the same rationale as claim 8.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-3, and 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Xie (“Unsupervised Deep Embedding for Clustering Analysis”, 2016) in view of Rifkin (“In Defense of One-Vs-All Classification”, 2004) and Wang (CN-110020147-A).

Regarding claim 1,
Xie teaches based on a similarity between the generated representations, forming one or more groups or clusters of datapoints (Page 3 Section 3.1, “In the first step, we compute a soft assignment between the embedded points and the cluster centroids.”
Xie teaches the soft assignment of embedded points to centroids based on the similarity in the embedding space to group similar points into clusters.); 
representing each formed group or cluster by a unique identifier (Page 2 Section 3, “Consider the problem of clustering a set of n points {xi ∈ X}n i=1 into k clusters, each represented by a centroid µj,j = 1,...,k. Instead of clustering directly in the data space X, we propose to first transform the data with a non-linear mapping fθ : X → Z, where θ are learnable parameters and Z is the latent feature space… The proposed algorithm (DEC) clusters data by simultaneously learning a set of k cluster centers {µj ∈ Z}k j=1 in the feature space Z and the parameters θ of the DNN that maps data points into Z.”, Page 5 Section 4.2, “For all algorithms we set the number of clusters to the number of ground-truth categories… ci is the cluster assignment produced by the algorithm, and m ranges over all possible one-to-one mappings between clusters and labels.”
Each cluster is represented by a specific centroid c_i , which corresponds to a unique identifier for each cluster.); 
assigning a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned (Page 3 Section 3.1, “In the first step, we compute a soft assignment between the embedded points and the cluster centroids... embedding, α are the degrees of freedom of the Student’s t distribution and qij can be interpreted as the probability of assigning sample i to cluster j (i.e., a soft assignment).”, Page 5 Section 4.2, “For all algorithms we set the number of clusters to the number of ground-truth categories… ci is the cluster assignment produced by the algorithm, and m ranges over all possible one-to-one mappings between clusters and labels.”
Xie’s Deep Embedding Clustering algorithm assigns each datapoint to a specific cluster index c_i, which serves as its label.); 
and using a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning model (Page 2 Section 3, “Consider the problem of clustering a set of n points {xi ∈X}n
i=1 into k clusters, each represented by a centroid µj,j = 1,...,k… The proposed algorithm (DEC) clusters data by simultaneously learning a set of k cluster centers {µj ∈ Z}k j=1 in the feature space Z and the parameters θ of the DNN that maps data points into Z.”, Page 3 Section 3.1.2, “Specifically, our model is trained by matching the soft assignment to the target distribution.”
Xie teaches grouping n datapoints to labeled clusters, which is the process of training the machine learning model.).
Xie does not teach for each group or cluster, training a classifier to classify a datapoint as either inside or outside the group or cluster; storing each trained classifier and associating the stored trained classifier with the cluster or group's unique identifier; for each new datapoint, using the new datapoint as input to each trained classifier and determining a most likely cluster or group to which the new datapoint is assigned.
Rifkin, in the same field of endeavor, teaches for each group or cluster, training a classifier to classify a datapoint as either inside or outside the group or cluster (Page 2 Introduction, “One of the simplest multiclass classification schemes built on top of real-valued binary classifiers is to train N different binary classifiers, each one trained to distinguish the examples in a single class from the examples in all remaining classes. When it is desired to classify a new example, the N classifiers are run, and the classifier which outputs the largest (most positive) value is chosen.”
Each binary classifier treats datapoints belonging to its assigned class as being either inside or outside the group when classifying new data points. When classifying a new datapoint, each classifier mapped to a class takes the point as input and determines if the datapoint belongs to that particular class.); 
storing each trained classifier and associating the stored trained classifier with the cluster or group's unique identifier (Page 2 Introduction, “…train N different binary classifiers, each one trained to distinguish the examples in a single class from the examples in all remaining classes.”, Page 11 Section 3.1.2, “On the test set, the one-vs-all system has an error rate…”, Page 9 Section 3.1, “There is an additional problem with the claim that the single-machine approach requires fewer support vectors, which is that in the OVA (or AVA) case, it is computationally easy to “reuse” support vectors that appear in multiple machines, leading to a large reduction in the total computational costs.”
Each binary classifier is trained for a specific class and is labeled with that class. The classifier and its corresponding class are saved when the model is run to predict new data points. Rifkin trains N binary classifiers each corresponding to a specific class, evaluates them on a test set, and reuses the svm classifiers. If the classifiers are reused, then the learned parameters were stored in memory to predict new data points.); 
for each new datapoint, using the new datapoint as input to each trained classifier and determining a most likely cluster or group to which the new datapoint is assigned (Page 2 Introduction, “When it is desired to classify a new example, the N classifiers are run, and the classifier which outputs the largest (most positive) value is chosen. This scheme will be referred to as the “one-vs-all…”
Rifkin teaches using the new datapoint example as input to each trained classifier to see which classifier receives the highest score in order to determine whether it belongs to the associated class or not.); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s teaching of generating and clustering embedded representations of datapoints into groups identified by labels with Rifkin’s teaching of training each class using a separate classifier to see which class a new example belongs to in order to provide a more efficient mechanism for assigning new datapoints to the most likely cluster/class (Introduction of Rifkin).
Xie and Rifkin do not teach [a] method of training a machine learning model, comprising: generating a real-valued representation for each datapoint in a dataset.
Wang, in the same field of endeavor, teaches [a] method of training a machine learning model, comprising: generating a real-valued representation for each datapoint in a dataset (Page 5 Paragraph 3, “S31, using word2vec (a word representation as tool real-valued vector) of the first intermediate data to calculate the vector of each of the words;”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie and Rifkin’s teaching with Wang’s real-valued representation for each datapoint in order to enhance the model’s learning and prediction capability (Page 7 Paragraph 3 of Wang)

Regarding claim 2,
	Xie teaches the real-valued representation for each datapoint in a dataset is generated by an embedding process (Page 3 Section 3.1.1, “where zi = fθ(xi) ∈ Z corresponds to xi ∈ X after embedding, α…”, Page 2 Introduction, “Consider the problem of clustering a set of n points {xi ∈ X}n i=1 into k clusters, each represented by a centroid µj,j = 1,...,k. Instead of clustering directly in the data space X, we propose to first transform the data with a non linear mapping fθ : X → Z, where θ are learnable parameters and Z is the latent feature space.”
Xie generates a latent representation for each datapoint by embedding it into a feature space using a deep neural network.).

	Regarding claim 3, 
Xie teaches the embedding process is a text embedding process (Page 1 Introduction, “Our experiments show significant improvements over state of-the-art clustering methods in terms of both accuracy and running time on image and textual datasets.”, Page 4 Section 4.1, “We evaluate the proposed method (DEC) on one text dataset… We then computed tf-idf features on the 2000 most frequently occurring word stems.”
Xie explicitly teaches that the embedding process is performed on both images and texts and specifically uses tf-idf to embed the text datapoints.).
Regarding claim 5,
Xie does not teach determining a most likely cluster or group to which the new datapoint should be assigned further comprises determining the cluster or group associated with the trained classifier having the highest level of certainty in its output.
Rifkin, in the same field of endeavor, teaches determining a most likely cluster or group to which the new datapoint should be assigned further comprises determining the cluster or group associated with the trained classifier having the highest level of certainty in its output (Page 2 Introduction, “to train N different binary classifiers, each one trained to distinguish the examples in a single class from the examples in all remaining classes. When it is desired to classify a new example, the N classifiers are run, and the classifier which outputs the largest (most positive) value is chosen.”
Rifkin teaches the determination of which class a datapoint belongs to by inputting the new datapoint example to each classifier to determine which classifier has the highest output (certainty) that the data point belongs to.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s clustering framework with Rifkin’s teaching of selecting the classifier with the largest (most positive) output in order to provide a decision rule for determining the most likely cluster a datapoint belongs to to improve the accuracy of the machine learning model (Section 3.1.5 of Rifkin). 

Regarding claim 6,
Xie teaches the similarity between the generated representations is determined based on a metric (Page 3 Section 3.1.1, “we use the Student’s t-distribution as a kernel to measure the similarity between embedded point zi and centroid µj”).


Claims 4, 7, 9-13, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Xie (“Unsupervised Deep Embedding for Clustering Analysis”, 2016) in view of Rifkin (“In Defense of One-Vs-All Classification”, 2004), Wang (CN-110020147-A), and Haghighat (US 20240370740 A1).
	
Regarding claim 4,
Xie does not teach the unique identifier is based on one or more attributes of a datapoint or datapoints in the cluster or group.
Haghighat, in the same field of endeavor, teaches the unique identifier is based on one or more attributes of a datapoint or datapoints in the cluster or group (Paragraph 32, “the CNN 22 can evaluate the proximity of subsequent new vectors… using the Euclidean distance between the evaluated new vector 40 and the unregistered new vectors… If the evaluated new vector 40 is closest to a previously-added new vector 40, with a corresponding confidence level, then the two new vectors 40 can be considered as corresponding with a new class and may be used to establish a new cluster 42 associated with that new class… The new class can then be registered, including by associating a generic or preliminary identifier with the class and establishing a covariance matrix of the new vectors 40 in the new cluster 42.”
Haghighat teaches that feature vectors representing datapoint attributes are evaluated using the Euclidean distance to determine proximity to establish a new cluster which creates a new class and associated a preliminary identifier with it. The identifier is created based on the attributes of the datapoints forming the cluster.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s clustering framework with Haghighat’s teaching of creating a new class that’s associating an identifier with a cluster based on the feature vectors of datapoints in order to improve the organization of clustered data (Paragraph 31 of Haghighat).

Regarding claim 7,
Xie does not teach the metric is one of Manhattan distance, Euclidean distance, or Cosine distance.
Haghighat, in the same field of endeavor, teaches the metric is one of Manhattan distance, Euclidean distance, or Cosine distance (Paragraph 32, “the CNN 22 can evaluate the proximity of subsequent new vectors… using the Euclidean distance between the evaluated new vector 40 and the unregistered new vectors… If the evaluated new vector 40 is closest to a previously-added new vector 40, with a corresponding confidence level, then the two new vectors 40 can be considered as corresponding with a new class and may be used to establish a new cluster 42 associated with that new class… The new class can then be registered, including by associating a generic or preliminary identifier with the class and establishing a covariance matrix of the new vectors 40 in the new cluster 42.”
Haghighat teaches that feature vectors representing datapoint attributes are evaluated using the Euclidean distance to determine proximity to establish a new cluster which creates a new class and associated a preliminary identifier with it. The identifier is created based on the attributes of the datapoints forming the cluster.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s clustering framework with Haghighat’s teaching of creating a new class that’s associating an identifier with a cluster based on the Euclidean distance of feature vectors in order to improve the accuracy and organization of clustered data (Paragraphs 31 and 32 of Haghighat).

Regarding claim 9,
Xie does not teach a system, comprising: one or more electronic processors configured to execute a set of computer-executable instructions; and one or more non-transitory electronic data storage media containing the set of computer- executable instructions, wherein when executed, the instructions cause the one or more electronic processors to…
Haghighat, in the same field of endeavor, teaches [a] system, comprising: one or more electronic processors configured to execute a set of computer-executable instructions; and one or more non-transitory electronic data storage media containing the set of computer- executable instructions, wherein when executed, the instructions cause the one or more electronic processors to (Paragraph 24 of Hagighat, “In one implementation, the controller 18 can be a microprocessor that executes a program, stored in the memory 20, for operation of the oven 10. Alternatively, the controller 18 can be an application-specific integrated circuit (“ASIC”). The memory 20 can be packaged with the controller 18 (i.e. a “system-on-chip” configuration) or can be connected with the controller 18 by associated circuitry and/or wiring.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s clustering framework with Haghighat’s teaching of a system comprising a processor and non-transitory memory to execute instructions in order to implement Xie’s method in a computing environment that executes the clustering and classification steps (Paragraph 24 of Hasghighat).
The remainder of claim 9 recites identical limitations to claim 1. Therefore, claim 9 is rejected using the same rationale as claim 1.

Claim 10 recites identical limitations to claim 2. Therefore claim 10 is rejected using the same rationale as claim 2.
Claim 11 recites identical limitations to claim 4. Therefore claim 11 is rejected using the same rationale as claim 4.
Claim 12 recites identical limitations to claim 5. Therefore claim 12 is rejected using the same rationale as claim 5.
Claim 13 recites identical limitations to claims 6 and 7. Therefore claim 13 is rejected using the same rationale as claims 6 and 7.

Regarding claim 15, 
Xie does not teach One or more non-transitory computer-readable media comprising a set of computer-executable instructions that when executed by one or more programmed electronic processors, cause the processors to…
Haghighat, in the same field of endeavor, teaches [o]ne or more non-transitory computer-readable media comprising a set of computer-executable instructions that when executed by one or more programmed electronic processors, cause the processors to… (Paragraph 24 of Hagighat, “In one implementation, the controller 18 can be a microprocessor that executes a program, stored in the memory 20, for operation of the oven 10. Alternatively, the controller 18 can be an application-specific integrated circuit (“ASIC”). The memory 20 can be packaged with the controller 18 (i.e. a “system-on-chip” configuration) or can be connected with the controller 18 by associated circuitry and/or wiring.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s clustering framework with Haghighat’s teaching of a system comprising a processor and non-transitory memory to execute instructions in order to implement Xie’s method in a computing environment that executes the clustering and classification steps (Paragraph 24 of Hasghighat).
The remainder of claim 15 recites identical limitations to claim 1. Therefore, claim 15 is rejected using the same rationale as claim 1.
Claim 16 recites identical limitations to claim 2. Therefore, claim 16 is rejected using the same rationale as claim 2.
Claim 17 recites identical limitations to claim 4. Therefore, claim 17 is rejected using the same rationale as claim 4.
Claim 18 recites identical limitations to claim 5. Therefore, claim 18 is rejected using the same rationale as claim 5.
Claim 19 recites identical limitations to claims 6 and 7. Therefore, claim 19 is rejected using the same rationale as claims 6 and 7.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Xie (“Unsupervised Deep Embedding for Clustering Analysis”, 2016) in view of Rifkin (“In Defense of One-Vs-All Classification”, 2004), Wang (CN-110020147-A), and Romano (US 11741133 B1).

	Regarding claim 8,
Xie does not teach instead of generating a real-valued representation for each datapoint in a dataset, a plurality of real-valued representations for each datapoint in a dataset are generated, and for each of the plurality of representations, the method proceeds as described.
Romano, in the same field of endeavor, teaches instead of generating a real-valued representation for each datapoint in a dataset, a plurality of real-valued representations for each datapoint in a dataset are generated, and for each of the plurality of representations, the method proceeds as described (Paragraph 5 of Romano, “…breaking down the first asset file into a first plurality of asset rows; for each asset row of the first plurality of asset rows: extracting one or more features from the asset row, encoding each of the one or more features, and embedding each encoded feature with a generational time stamp;”
Romano teaches that for each asset row (datapoint), the system extracts one or more features and embeds each encoded feature, thus creating a plurality of embedded representations (real-valued) for each datapoint.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Xie’s clustering framework with Romano’s teaching of generating multiple feature-based representations for each datapoint in order to enhance the datapoint representations and improve the classification performance (Paragraph 3 of Romano).

Claims 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Xie (“Unsupervised Deep Embedding for Clustering Analysis”, 2016) in view of Rifkin (“In Defense of One-Vs-All Classification”, 2004), Haghighat (US 20240370740 A1), and Romano (US 11741133 B1).

Claim 14 recites identical limitations to claim 8. Therefore claim 14 is rejected using the same rationale as claim 8.
Claim 20 recites identical limitations to claim 8. Therefore, claim 20 is rejected using the same rationale as claim 8.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claims 1-7, 9 and 15 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-7, 8, and 12 of U.S. Patent No. 18/389404. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims recite substantially the same method steps and differ only in minor variations such as referring to multiple datapoints or including alternative forms of models.

Instant Application
Reference Application (18/389404)
A method of training a machine learning model, comprising: generating a real-valued representation for each datapoint in a dataset;
based on a similarity between the generated representations, forming one or more groups or clusters of datapoints; 
representing each formed group or cluster by a unique identifier; 
for each group or cluster, training a classifier to classify a datapoint as either inside or outside the group or cluster; 
storing each trained classifier and associating the stored trained classifier with the cluster or group's unique identifier; 
for each new datapoint, using the new datapoint as input to each trained classifier and determining a most likely cluster or group to which the new datapoint is assigned; 
assigning a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned;
and using a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning model.  


A method of training a model, comprising: generating a real-valued representation for each datapoint in a dataset;
 based on a similarity between the generated representation for multiple datapoints, forming one or more groups or clusters of datapoints; representing each formed group or cluster by a unique identifier;
for each group or cluster, training a classifier to classify a datapoint as either inside or outside the group or cluster;
storing each trained classifier and associating the stored trained classifier with the cluster or group's unique identifier; 
for each new datapoint, using the new datapoint as input to each trained classifier and determining a most likely cluster or group to which the new datapoint is assigned;
assigning a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned; 
and using a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning or other form of model.
2. The method of claim 1, wherein the real-valued representation for each datapoint in a dataset is generated by an embedding process.  
2. The method of claim 1, wherein the real-valued representation for each datapoint in a dataset is generated by an embedding process.
3. The method of claim 2, wherein the embedding process is a text embedding process.  
3. The method of claim 2, wherein the embedding process is a text embedding process.
4. The method of claim 1, wherein the unique identifier is based on one or more attributes of a datapoint or datapoints in the cluster or group.  
4. The method of claim 1, wherein the unique identifier is based on one or more attributes of a datapoint or datapoints in the cluster or group.
5. The method of claim 1, wherein determining a most likely cluster or group to which the new datapoint should be assigned further comprises determining the cluster or group associated with the trained classifier having the highest level of certainty in its output. 
5. The method of claim 1, wherein determining a most likely cluster or group to which the new datapoint should be assigned further comprises determining the cluster or group associated with the trained classifier having the highest level of certainty in its output.
6. The method of claim 1, wherein the similarity between the generated representations is determined based on a metric.  
6. The method of claim 1, wherein the similarity between the generated representations is determined based on a metric.
7. The method of claim 6, wherein the metric is one of Manhattan distance, Euclidean distance, or Cosine distance.
7. The method of claim 6, wherein the metric is one of Manhattan distance, Euclidean distance, or Cosine distance.
9. A system, comprising: one or more electronic processors configured to execute a set of computer-executable instructions; 
and one or more non-transitory electronic data storage media containing the set of computer- executable instructions, wherein when executed, the instructions cause the one or more electronic processors to generate a real-valued representation for each datapoint in a dataset;
based on a similarity between the generated representations, form one or more groups or clusters of datapoints; 
represent each formed group or cluster by a unique identifier;
 for each group or cluster, train a classifier to classify a datapoint as either inside or outside the group or cluster; 
store each trained classifier and associate the stored trained classifier with the cluster or group's unique identifier; 
for each new datapoint, use the new datapoint as input to each trained classifier and determine a most likely cluster or group to which the new datapoint is assigned; 
assign a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned; 
and use a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning model.
8. A system, comprising: one or more electronic processors configured to execute a set of computer-executable instructions;
 and one or more non-transitory electronic data storage media containing the set of computer- ex cutable instructions, wherein when executed, the instructions cause the one or more electronic processors to generate a real-valued representation for each datapoint in a dataset;
based on a similarity between the generated representation for multiple datapoints, form one or more groups or clusters of datapoints; 
represent each formed group or cluster by a unique identifier; 
for each group or cluster, train a classifier to classify a datapoint as either inside or outside the group or cluster; 
store each trained classifier and associate the stored trained classifier with the cluster or group's unique identifier; 
for each new datapoint, use the new datapoint as input to each trained classifier and determine a most likely cluster or group to which the new datapoint is assigned; 
assign a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned;
 and use a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning or other form of model.
15. One or more non-transitory computer-readable media comprising a set of computer-executable instructions that when executed by one or more programmed electronic processors, cause the processors to:
 generate a real-valued representation for each datapoint in a dataset; 
based on a similarity between the generated representations, form one or more groups or clusters of datapoints; 
represent each formed group or cluster by a unique identifier;
for each group or cluster, train a classifier to classify a datapoint as either inside or outside the group or cluster;
store each trained classifier and associate the stored trained classifier with the cluster or group's unique identifier; 
for each new datapoint, use the new datapoint as input to each trained classifier and determine a most likely cluster or group to which the new datapoint is assigned;
assign a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned; 
and use a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning model.
12. One or more non-transitory computer-readable media comprising a set of computer-executable instructions that when executed by one or more programmed electronic processors, cause the processors to: 
generate a real-valued representation for each datapoint in a dataset; 
based on a similarity between the generated representation for multiple datapoints, form one or more groups or clusters of datapoints; represent each formed group or cluster by a unique identifier; 
for each group or cluster, train a classifier to classify a datapoint as either inside or outside the group or cluster;
 store each trained classifier and associate the stored trained classifier with the cluster or group's unique identifier; 
for each new datapoint, use the new datapoint as input to each trained classifier and determine a most likely cluster or group to which the new datapoint is assigned; 
assign a label to the new datapoint based on the identifier of the cluster or group to which the new datapoint is assigned; 
and use a plurality of new datapoints and the new datapoints' assigned labels to train a machine learning or other form of model.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAJD MAHER HADDAD whose telephone number is (571)272-2265. The examiner can normally be reached Mon-Friday 8-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.M.H./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
Systems and Methods for Programmatic Labeling of Training Data for Machine Learning Models via Clustering

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Systems and Methods for Programmatic Labeling of Training Data for Machine Learning Models via Clustering

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email