Last updated: May 29, 2026
Application No. 18/525,447
LARGE-SCALE TEXT CLUSTER METHODS AND APPARATUSES

Non-Final OA §101§112
Filed
Nov 30, 2023
Priority
Dec 02, 2022 — CN 202211538156.6
Examiner
YEN, ERIC L
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Alipay (Hangzhou) Information Technology Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +11.6% interview lift. Interview lift (+11.6%) is below the 15.0% threshold. A written response is recommended.
Based on 772 resolved cases, 2023–2026
Examiner Intelligence

YEN, ERIC L View full profile →
Grants 85% — above average
Career Allowance Rate
657 granted / 772 resolved
+23.1% vs TC avg
Moderate +12% lift
Without
With
+11.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
9 currently pending
Career history
776
Total Applications
across all art units
Statute-Specific Performance

§101
9.9%
-30.1% vs TC avg
§103
48.1%
+8.1% vs TC avg
§102
2.0%
-38.0% vs TC avg
§112
22.8%
-17.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 772 resolved cases
Office Action

§101 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
	As per Claim 1 (and similarly claims 12 and 13):
“M” in “M similar texts” is interpreted as an integer.
As per Claim 6:
“C” is also interpreted as an integer (which is greater than M).
“to obtain several first candidate class clusters” in the last 2 lines of claim 6 is interpreted as a result of “for any first central text in the several central texts, determining, from the similarity matrix, C similar texts with maximum similarities corresponding to the first central text, and using a similar text with a similarity greater than a second threshold in the C similar texts and the first central text as a corresponding first candidate class cluster” (i.e. a result of the entire “for-loop”, not the result of one loop in the “for-loop”).
Claim Objections
Claim 13 is objected to because of the following informalities: 
“cause processor” in line 2 of claim 13 seems like it should be –cause the processor—(grammar).
	“a plurality of texts to-be-clustered texts” in line 4 of claim 13 seems like it should be either –a plurality of texts to-be-clustered—(probably this one since it matches claims 1 and 12), or --a plurality of to-be-clustered texts—, or –to-be-clustered texts—(probably not this one because claim 13 later recites “the plurality of texts”).
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per Claim 1 (and similarly claims 12-13):
“determining, from the similarity matrix, M similar texts with maximum similarities respectively corresponding to the plurality of texts; and using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold” in lines 6-9 of claim 1 is unclear, because:
1. “respectively corresponding to the plurality of texts” in line 7 of claim 1 can refer to either “maximum similarities” in line 6 of claim 1 (such that the M similar texts have maximum similarities, where the maximum similarities respectively correspond to the plurality of texts) or to “M similar texts” (such that the M similar texts have maximum similarities and where there M similar text respectively correspond to the plurality of texts), and it is not clear which one of “maximum similarities” or “M similar texts” is the one that “respectively corresponding to the plurality of texts” in line 7 of claim 1 is supposed to refer to.
2. it is not clear if “the corresponding texts” in line 7 of claim 1 is supposed to refer to “M similar texts” in line 6 of claim 1 and “the plurality of texts” in line 7 of claim 1 (under the interpretation that the M similar texts respectively correspond to the plurality of texts, the “M similar texts” can be “the corresponding texts” because they correspond to “the plurality of texts” and “the plurality of texts” can also be “the corresponding texts” because they correspond to the “M similar texts”)
3. it is not clear how the M similar texts or the maximum similarities are supposed to “respectively [correspond] to the plurality of texts”.  Applicant’s Specification (paragraphs 74-77 [especially paragraph 76]) seem to indicate that Applicant intends to claim where each of “the plurality of texts” has a respective set of “M similar texts” (e.g. if “the plurality of texts” includes text 1, text 2, text 3, and text 4 and M=2, then a set of 2 texts that are most similar to text 1 is determined, a set of 2 texts that are most similar to text 2 is determined, a set of 2 texts that are most similar to text 3 is determined, a set of 2 texts that are most similar to text 4 is determined), but, as claimed, only one set of “M similar texts” is determined (the claims do not recite that “M similar texts” are determined for each of “the plurality of texts”), and there is no claimed one-to-one correspondence between the one set of “M similar texts” and “the plurality of texts” and/or no ordering of either of the “M similar texts” or “the plurality of texts” which would result in a “respective correspondence” between the “M similar texts” and “the plurality of texts” (so that if, for example, there are 7 similar texts and 5 texts in the plurality of texts, it is not clear how those 7 texts are supposed to correspond to one or more of the 5 texts because the quantity of texts in the similar texts and the quantity of texts among the plurality of texts are not equal, and even if there were 5 similar texts and 5 texts in the plurality of texts, it is not clear which one of the 5 similar texts is supposed to correspond to which one of the 5 texts in the plurality of texts [as opposed to where there are 3 similar texts ordered 1, 2, and 3, and where there are 3 texts in the plurality of texts ordered 1, 2, and 3, then the similar texts can “respectively correspond” to the plurality of texts because similar text 1 can correspond to text 1 of the plurality of texts, similar text 2 can correspond to text 2 of the plurality of texts, and similar text 3 can correspond to text 3 of the plurality of texts]).  Similarly, there is no claimed one-to-one correspondence between the one set of “maximum similarities” corresponding to the one set of “M similar texts” that are determined in claim 1 (as currently claimed), and there is no ordering of the “maximum similarities” and “the plurality of texts”, and therefore it is not clear how the “maximum similarities” are supposed to “respectively [correspond]” to “the plurality of texts” when the number of similarities and the number of “the plurality of texts” is unequal, and/or when there is no particular ordering of the similarities and the texts.
and
4. even assuming lines 6-9 of claim 1 are interpreted as determining one set of “M similar texts” for each of “the plurality of texts”, this would lead to an ambiguity for “the similarities corresponding to the M similar texts” and “The M similar texts” in the 4th to last line of claim 1 because it would not be clear which set of M similar texts corresponding to which one of the plurality of texts is the one that “the M similar texts” in the 4th to last line of claim 1 is supposed to refer to, and it would not be clear which set of similarities corresponding to which set of M similar texts corresponding to which one of the plurality of texts is the one that “the similarities corresponding to the M similar texts” in the 4th to last line of claim 1 is supposed to refer to (e.g. if there are texts 1 and 2 in “the plurality of texts”, then text 1 has M similar texts that have respective similarities, and text 2 has M similar texts that have respective similarities, and it is not clear which of text 1 or text 2’s similarities and similar texts are the ones that “the M similar texts” in the 4th to last line of claim 1 and “the similarities corresponding to the M similar texts” in the 4th to last line of claim 1 are supposed to refer to).
It is also not clear, as claimed, if “in the similarity matrix” in the last 2 lines of claim 1 is supposed to refer to “the central texts” in the 2nd to last line of claim 1 or to “data corresponding to the central texts” in the 2nd to last line of claim 1.
If “in the similarity matrix” in the last 2 lines of claim 1 refers to “the central texts”, then “the central texts in the similarity matrix” in the last 2 lines of claim 1 lacks antecedent basis.  Nothing in claim 1 establishes that any of the “selected central texts” are “in the similarity matrix”, and Applicant’s Specification (Table 1) also seems to indicate that the similarity matrix includes only similarities between texts and does not include the texts themselves.

As per Claim 2:
“the texts” in line 5 of claim 2 is ambiguous (Applicant appears to have intended to refer to “the plurality of texts” but this phrase, as claimed, can also refer to “M similar texts” or “the corresponding texts” or “selected central texts” in claim 1).

As per Claim 3:
“the step of determining, from the similarity matrix, M similar texts with maximum similarities respectively corresponding to the plurality of texts” in lines 1-3 of claim 3 is unclear for the reasons discussed above in the 112 rejection of claim 1 pertaining to “determining, from the similarity matrix, M similar texts with maximum similarities respectively corresponding to the plurality of texts” (i.e. any amendments made to the “determining… M similar texts…” limitation in claim 1 to address the 112 issues should also be made to the “determining… M similar texts…” limitation in lines 1-3 of claim 3 to make sure that the language is consistent for antecedent basis purposes).

As per Claim 4:
“wherein the step of using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold” in lines 1-3 of claim 4 include the same issues as “using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold” in lines 7-9 of claim 1 (including where it is not clear which set of texts “the corresponding texts” is supposed to refer to and if one set of M similar texts is determined for each of the plurality of texts, then it is not clear which set of M similar texts and which M-similar-text set’s similarities are the ones that “the M similar texts” and “the similarities corresponding to the M similar texts” are supposed to refer to).
“the text” in line 4 of claim 4 is ambiguous (it can refer to any one of “the plurality of texts”) and “the M similar texts corresponding to the text” in line 4 of claim 4 lacks antecedent basis, and even assuming “the text” is interpreted in a “for-loop” manner (i.e. as referring to a respective/given one of “the plurality of texts” for a “current loop”), “the M similar texts” in claim 1 does not, as claimed, correspond to any particular one of the plurality of texts (Claim 1 recites where the M similar texts somehow “respectively correspond” to the whole group of “the plurality of texts” [and not to any particular one of the texts in the group])
“the text” in line 5 of claim 4 is ambiguous (this phrase can refer to any one of “the plurality of texts”, any one of “the M similar texts”, any one of “the corresponding texts”, or any one of “selected central texts”)
“the selected central text” in line 5 of claim 4 is ambiguous (claim 1 recites multiple “selected central texts” and it is not clear which of the multiple selected central texts is the one that “the selected central text” in line 5 of claim 4 is supposed to refer to).

As per Claim 5:
It is not clear if “in the similarity matrix” in line 2 of claim 5 is supposed to refer to “data” in line 2 of claim 5 or to “the central texts” in line 2 of claim 5 (where, if “in the similarity matrix” in line 2 of claim 5 refers to “the central texts”, then “the central texts in the similarity matrix” lacks antecedent basis [same issue as discussed in the 112 rejection of claim 1])
“several” in line 3 of claim 5, in line 4 of claim 5, and in line 5 of claim 5, and in line 7 of claim 5 is unclear (a common definition of several is “more than two” but “less than many” which establishes the lower boundary of “several”, but it is not clear what qualifies as “too many” to be considered “several”).

As per Claim 6:
“several” in line 2 of claim 6, in line 3 of claim 6, and in the 2nd to last line of claim 6 is unclear (same issue as discussed in the last paragraph of the 112 rejection of claim 5).

As per Claim 7:
“several” in line 3 of claim 7, and in line 5 of claim 7 is unclear (same issue as discussed in the last paragraph of the 112 rejection of claim 5).

As per Claim 8:
“several” in line 2 of claim 8, in line 3 of claim 8, and in line 5 of claim 8 is unclear (same issue as discussed in the last paragraph of the 112 rejection of claim 5).
	“the texts comprised in the several first candidate class clusters” in lines 3-4 of claim 8 lacks antecedent basis (Claim 5 recites “separately determining similar texts of several central texts from the similarity matrix, to obtain several first candidate class clusters”, but claim 5 does not specifically recite where the obtained first candidate class clusters comprise texts [as opposed to being separate cluster non-text data generated based in part on the separately determined similar texts])

As per Claim 10:
“several” in line 2 of claim 10, and in line 4 of claim 10 is unclear (same issue as discussed in the last paragraph of the 112 rejection of claim 5).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

As an initial observation:
Applicant’s Specification does describe where “it is desirable to more quickly cluster texts in a large-scale scenario” (paragraph 3) and “For example, when the plurality of texts are 10 million texts,” (paragraph 74) and other examples where the data being processed is too large to be practically performed by a human, but the claims are not limited to only those embodiments, such that the following rejections are proper because the claims include, within their scope, “smaller scale” embodiments which can be practically performed by a human.

As per Claims 1, 12, and 13:
Claim 1 recites 
A text clustering method, (mental process, a human can group documents/texts together based on similarity or other criteria)
comprising: determining, by using a semantic representation model, semantic vectors respectively corresponding to a plurality of texts to-be-clustered; (mental process, a human can look at the contents of texts that he/she is assigned to group/cluster, and think of and/or write down numerical vector representations of the meaning of the texts’ contents, where the knowledge in the human’s brain that is used to mentally determine numerical vector representations of texts can be interpreted as “a semantic representation model”)
determining a similarity matrix between the plurality of texts based on the semantic vectors of the plurality of texts; (mental process, a human can think of and/or write down a table listing mentally-determined quantities representing similarities between the to-be-clustered texts)
determining, from the similarity matrix, M similar texts with maximum similarities respectively corresponding to the plurality of texts; and using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold; (mental process, a human can look at the matrix and find an integer number of highest similarity values and mentally compare those similarity values to a threshold, can identify texts corresponding to those found highest similarity values as similar texts, and can choose texts whose highest similarity values all exceed the threshold to be “central texts”)
and clustering the to-be-clustered texts based on data corresponding to the central texts in the similarity matrix (mental process, a human can group texts together based on the chosen central texts’ similarities in the similarity matrix)
Claim 12 recites “A non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor of a computing device, cause the processor to:” perform the method steps of claim 1 (which are mental processes, as discussed above)
Claim 13 recites “A computing device, comprising a memory and a processor, wherein the memory stores executable instructions that, in response to execution by the processor, cause processor to:” perform steps identical in substance to the method steps of claim 1 (which are mental processes, as discussed above)
This judicial exception is not integrated into a practical application because:
For claim 1, there are no additional elements in the claim. 
For claims 12-13, the preambles are directed to generic computer implementation of the mental processes (see “even if an element does not integrate a judicial exception into a practical application or amount to significantly more on its own (e.g., because it is merely a generic computer component performing generic computer functions)” in MPEP 2106.07[b], “claims that amount to nothing more than an instruction to apply the abstract idea using a generic computer do not render an abstract idea eligible” and “For example, an examiner could explain that implementing an abstract idea on a generic computer, does not integrate the abstract idea into a practical application in Step 2A Prong Two or add significantly more in Step 2B, similar to how the recitation of the computer in the claim in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer” in MPEP 2106.05[f], “Examples that the courts have indicated may not be sufficient to show an improvement in computer-functionality… iii. Mere automation of manual processes, such as using a generic computer to process an application for financing a purchase, Credit Acceptance Corp. v. Westlake Services, 859 F.3d 1044, 1055, 123 USPQ2d 1100, 1108-09 (Fed. Cir. 2017) or speeding up a loan-application process by enabling borrowers to avoid physically going to or calling each lender and filling out a loan application, LendingTree, LLC v. Zillow, Inc., 656 Fed. App'x 991, 996-97 (Fed. Cir. 2016) (non-precedential)” and “Merely adding generic computer components to perform the method is not sufficient. Thus, the claim must include more than mere instructions to perform the method on a generic component or machinery to qualify as an improvement to an existing technology” in MPEP 2106.05[a], MPEP 2106.04[a][2] III., “In bracket 3, explain why the combination of additional elements fails to integrate the judicial exception into a practical application. For example, if the claim is directed to an abstract idea with additional generic computer elements, explain that the generically recited computer elements do not add a meaningful limitation to the abstract idea because they amount to simply implementing the abstract idea on a computer” in 2106.07[a][1])
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because:
For claim 1, there are no additional elements in the claim. 
For claims 12-13, the preambles are directed to generic computer implementation of the mental processes.

Claim 2: wherein the step of determining semantic vectors respectively corresponding to the plurality of texts comprises: determining, by using the semantic representation model, semantic vectors that are respectively corresponding to the plurality of texts and that comprise global semantic information of the texts (mental process, a human can, as part of mentally determining semantic vector representations of texts, determine, using the mental knowledge for determining semantic vector representations of texts [which can be interpreted as “the semantic representation model”] vectors containing “global semantic information” that the human mentally identifies by reading the texts.)

As per Claim 3, wherein the step of determining, from the similarity matrix, M similar texts with maximum similarities respectively corresponding to the plurality of texts comprises: determining, from the similarity matrix, the M similar texts with the maximum similarities respectively corresponding to the plurality of texts by using a parallel computing tool encapsulated by a deep learning framework, or by constructing an index by a vector retrieval engine (mental process with generic computer implementation; 
a human can, as part of looking at the matrix and finding an integer number of highest similarity values and mentally comparing those similarity values to a threshold, identifying texts corresponding to those found highest similarity values as similar texts, and choosing texts whose highest similarity values all exceed the threshold to be “central texts”, find an integer number of texts with the highest similarity values for to-be-clustered texts, or can think of and/or write down an index.
“by using a parallel computing tool encapsulated by a deep learning framework” and “by a vector retrieval engine” are directed to generic computer implementation.)

As per Claim 4:
	wherein the step of using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold comprises: for any of the plurality of texts, comparing a minimum similarity of the M similar texts corresponding to the text with the first threshold, and using the text as the selected central text when the minimum similarity is greater than the first threshold (mental process; a human can, as part of “using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold” [a mental process as discussed in the 101 rejection of claim 1], look at the similarity values of a group of M similar texts for one of the to-be-clustered texts, identify the smallest value, compare the smallest value to the threshold, determine that the smallest value is higher than the threshold, and, as a consequence, choose the one of the to-be-clustered texts as a central text)

	As per Claim 5:
wherein the step of clustering the to-be-clustered texts based on data corresponding to the central texts in the similarity matrix comprises: separately determining similar texts of several central texts from the similarity matrix, to obtain several first candidate class clusters; combining first candidate class clusters with a cross-text to obtain several second candidate class clusters; and separately performing secondary fine clustering on the several second candidate class clusters based on texts respectively comprised in the second candidate class clusters, to obtain a class cluster for clustering the to-be-clustered texts (mental process, a human can, as part of grouping/clustering texts based on similarity matrix information, determine groups of similar texts corresponding to selected central texts based on similarity matrix information, and can call those groups of similar texts “first candidate class clusters”, and can mentally/write down a combination of the information in the clusters and the information in a cross-text and can call the results of the combination “second candidate class clusters”, and can perform another grouping/clustering process based on texts in the second candidate class clusters to obtain a class cluster)

As per Claim 6:
wherein the step of separately determining similar texts of several central texts from the similarity matrix comprises: for any first central text in the several central texts, determining, from the similarity matrix, C similar texts with maximum similarities corresponding to the first central text, and using a similar text with a similarity greater than a second threshold in the C similar texts and the first central text as a corresponding first candidate class cluster, to obtain several first candidate class clusters, wherein C is greater than M (mental process, a human can, as part of determining similar texts for central texts based on similarity matrix information, determine C similar texts to be similar to one of the central texts, and can put the one of the central texts and a text whose similarity to the one of the central texts is greater than another threshold into a cluster, and can repeat the process for multiple other central texts [thereby leading to “several first candidate class clusters”])

As per Claim 7:
wherein the step of combining first candidate class clusters with a cross-text comprises: sorting the several first candidate class clusters in descending order of quantities of comprised texts; and sequentially performing cross-text determining on the sorted several first candidate class clusters, and performing class cluster combination based on a determining result (mental process; a human can, as part of combining class cluster text information with cross-text information, count the number of texts in each cluster and sort the clusters based on how many texts are in each cluster, can identify cross-text based on the sorted clusters, and can combine texts/clusters into class clusters based on something that he/she mentally determined)

As per Claim 8:
wherein the step of sequentially performing cross-text determining on the sorted several first candidate class clusters comprises: determining hash values of identifiers of the texts comprised in the several first candidate class clusters; and sequentially performing cross-text determining on the sorted several first candidate class clusters based on matching between the hash values (mental process, a human can, as part of determining cross-text for sorted class clusters, mentally apply a mental hash algorithm to text identifiers for texts in the clusters, and can match mentally-determined hash values to determine cross-text for the sorted clusters in a sequential order [e.g. based on an order that the sorted clusters are sorted into]).

As per Claim 9:
after the performing class cluster combination based on a determining result, further comprising: for any combined first candidate class cluster, upon determining that a quantity of texts comprised in the combined first candidate class cluster is greater than a predetermined quantity threshold, stopping continuing to perform combination on the combined first candidate class cluster (mental process, a human can, after combining texts/clusters into class clusters, count how many texts are in each of a plurality of clusters and decide to stop adding information to the plurality of clusters when he/she determines that any of the plurality of clusters has too many texts).

As per Claim 10:
wherein the step of separately performing secondary fine clustering on the several second candidate class clusters comprises: separately performing, by using a hierarchical clustering algorithm, secondary fine clustering on the several second candidate class clusters based on texts respectively comprised in the second candidate class clusters (mental process, a human can, as part of clustering text clusters, use a mental “hierarchical clustering algorithm” to cluster second candidate class clusters based on texts in the second candidate class clusters).

As per Claim 11:
wherein M is a value in a predetermined range, or M is determined based on a total quantity of the plurality of texts (mental process, a human can decide for M to be a value that is within a range of values, or a human can count the plurality of texts and decide M to be a particular value appropriate for how many texts there are in the plurality of texts).
Allowable Subject Matter
The following is a statement of reasons for the indication of allowable subject matter:  
	As per Claim(s) 1 (and similarly claim[s] 12-13, and consequently claim[s] 2-11 which depend on claim[s] 1), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) A text clustering method, comprising: determining, by using a semantic representation model, semantic vectors respectively corresponding to a plurality of texts to-be-clustered; determining a similarity matrix between the plurality of texts based on the semantic vectors of the plurality of texts; determining, from the similarity matrix, M similar texts with maximum similarities respectively corresponding to the plurality of texts; and using the corresponding texts as selected central texts when the similarities corresponding to the M similar texts are greater than a first threshold; and clustering the to-be-clustered texts based on data corresponding to the central texts in the similarity matrix.
	L. Wang and Y. Yu, "Study of Text Clustering in Semantic Web", 2019, 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), Dalian, China, 2019, pp. 1287-1293, teaches text clustering using a text semantic similarity matrix based on calculating semantic similarity of words (Abstract; Section III, first paragraph).  This reference does not appear to describe at least selecting central texts when similarities of M similar texts are greater than a first threshold.
	CN112182206A teaches “clustering a plurality of texts according to N gravity center texts to determine N classification clusters, wherein the texts comprise N gravity center texts, one gravity center text is used for indicating one clustering gravity center, and N is a positive integer” (see Google Translation).  This reference does not appear to describe how the gravity center texts are determined/identified/selected to be gravity center texts.
	CN109783816B teaches in the Background, generating initial cluster centers by “randomly selecting K samples from the n sample set” (see Google Translation).  This reference describes where weighting of texts is performed based on TF-IDF (representing term frequency, which appears to quantify appearance/occurrence of terms [“In a given document, term frequency (TF) refers to the frequency with which a given word appears in the document”], and not necessarily what the terms mean).  “The calculation formula for text similarity” in this reference also does not appear to specifically use semantic vectors.  This reference appears to describe “Step 1023: Calculate the similarity between all the texts and other texts according to the method for calculating the similarity between the first text and other texts. Step 1024: construct a feature item matrix of the similarities between all the texts and other texts based on the calculated similarities between all the texts and other texts. This embodiment uses a feature item matrix to represent the similarities between all texts and other texts, so that the representation of text similarity is more intuitive and regular. Step 103: determining the cluster center of all the texts according to the similarity between all the texts and other texts”, which appears to describe a similarity matrix containing information pertaining to the similarities between all texts in a text set.  This reference does describe comparing a value to a threshold to identify a “transition point” which is then used to determine “cluster centers of all the texts” by “determin[ing] the text with the largest number of items among the items included in the transition points of all the texts as the first cluster center” and “clustering all the texts according to the cluster centers”, but the value compared to the threshold appears to be one average gradient of N similarity-ranked items, and not where multiple similarities corresponding to similar texts are each greater than a threshold.
	CN107943982B teaches “In the step S200, k texts in the text set to be clustered are selected as cluster centers. This means that the remaining (N-k) texts in the text set to be clustered will be classified into these k clusters by the subsequent steps. The k texts are acquired as cluster centers, the k texts which are manually specified are acquired as cluster centers by a computer, or the k texts are randomly acquired as cluster centers by the computer, which is not limited in the present application” (see Google Translation).  This reference does not appear to describe acquiring cluster centers based on maximum similarities being greater than a threshold.
	CN114741517A teaches “a similarity threshold may be preset, so that multiple sample texts may be clustered according to a similarity (such as a text similarity and a semantic similarity) between the multiple sample texts, so as to obtain at least one target cluster. Wherein the similarity between the various samples belonging to the same target cluster is greater than a similarity threshold” (see Google Translation).  This reference does not appear to use the similarities which are greater than a threshold to select center texts.
	2019/0197129 teaches “calculating a text similarity between the text to be compared and the target text based on a semantic similarity algorithm for short text” (Abstract).
	2017/0220545 teaches “each cluster includes a cluster centroid which is a document determined to have the lowest average distance” (paragraph 74).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 2/13/2026
/ERIC YEN/           Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Nov 30, 2023
Application Filed
Feb 24, 2026
Non-Final Rejection mailed — §101, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

19/273,406
Patent 12639346
Method and System for Multi-Level Artificial Intelligence Supercomputer Design
10m to grant Granted May 26, 2026
19/285,289
Patent 12639347
Method and System for Multi-Level Artificial Intelligence Supercomputer Design
10m to grant Granted May 26, 2026
18/387,470
Patent 12632500
INFORMATION PROCESSING APPARATUS METHOD FOR RESPONDING TO USER INQUIRY VIA INFORMATION SOURCE OR OPERATOR
2y 6m to grant Granted May 19, 2026
18/481,741
Patent 12632479
Method and System for Multi-Level Artificial Intelligence Supercomputer Design
2y 7m to grant Granted May 19, 2026
18/638,592
Patent 12608553
SYSTEM FOR SEMANTIC ANALYSIS AND AUTOMATIC SOLUTION OF MATHEMATICAL APPLICATION PROBLEM
2y 0m to grant Granted Apr 21, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
85%
Grant Probability
97%
With Interview (+11.6%)
2y 9m (~2m remaining)
Median Time to Grant
Low
PTA Risk
Based on 772 resolved cases by this examiner. Grant probability derived from career allowance rate.
LARGE-SCALE TEXT CLUSTER METHODS AND APPARATUSES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email