DETAILED ACTION
In response to communication filed on 06 October 2025, claims 2-3 are canceled. Claims 1 and 17-18 are amended. Claims 1, 4-18 are pending.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see “Claim Rejections – 35 U.S.C. § 101”, filed 06 October 2025, have been carefully considered and are not considered to be persuasive.
APPLICANT’S ARGUMENT: Applicant argues that the claims have been amended to overcome the rejection under 35 U.S.C. § 101. In particular, by performing clustering, the information processing apparatus is able to compare images of a group that has a high similarity to a main group, a group that has low similarity to the main group and a group that has similarity between the high and low group to the main group. This comparison function improves work efficiency by preventing judgement errors.
EXAMINER’S RESPONSE: Examiner has carefully considered the argument but respectfully disagrees. The functionality of comparing images has been identified as an abstract idea. The functionality of comparison of specific data structures based on a criteria can be determined based on the mental process of evaluation. According to MPEP, 2106.05(a)(II) “However, it is important to keep in mind that an improvement in the abstract idea itself (e.g. a recited fundamental economic concept) is not an improvement in technology”. Therefore, the process of comparing that has already been identified as an abstract idea cannot be considered to be an improvement in the technology. As a result, the above argument cannot be considered to be persuasive.
Applicant’s arguments, see “Claim Rejections – 35 U.S.C. § 103”, filed 06 October 2025, have been carefully considered and are not considered to be persuasive.
APPLICANT’S ARGUMENT: Applicant argues that none of the cited references discloses "identifying a second cluster based on a first degree of similarity, which is the degree of similarity between each of the clusters generated and the representative cluster, and a second degree of similarity, which is the degree of similarity between each of the clusters generated and the first cluster, the first degree of similarity and the second degree of similarity of the second cluster being values in a middle range between a minimum degree of the first degree of similarity and a maximum degree of the first degree of similarity'. Moreover, since Chen, Ma, and Wang are publications related to different technical themes, there is no motivation to combine them.
EXAMINER’S RESPONSE: Examiner has carefully considered the argument but respectfully disagrees. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. To a person of ordinary skill in the art based on the broadest reasonable interpretation in the light of specification first degree of similarity may be reasonably interpreted as an overlap score measures how many duplicate items are included in different clusters. Also, to a person of ordinary skill in the art based on the broadest reasonable interpretation in the light of specification second degree of similarity may be reasonably interpreted as a similarity between search items from the cluster to the nearest neighbor cluster. Chen reference teaches in cols 7, 13 and 19 first clusters, generating second clusters, identifying duplicate search items in different clusters which is first degree of similarity in the current application and a second similarity between search items for the cluster to the nearest neighbor cluster which is the second degree of similarity in the current application. However, Chen reference does not explicitly teach a degree of similarity between each of the clusters generated; for which Ma reference has been incorporated. Ma reference discloses object clusters and also teaches correlation between first cluster and a second cluster (i.e. each cluster) in cols 8 and 11. Both Chen and Ma reference are related to processing clusters and hence they belong to the same field of technology. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of correlation and similarity between clusters as being disclosed and taught by Ma, in the system taught by Chen to yield the predictable results of efficiently applying clusters to retrieve information in response to user-generated queries (see Ma, [col 4 lines 29-40] “During the initialization, database objects having similar feature vectors are clustered into common clusters. The feature vectors can be low level features such as color, size, shape, and texture of the subject matter of stored images within an image database… the more responsive and efficient the system is in performing retrieval in response to user-generated queries. The initialization of the database is indicative of system-perceived relationships among the objects and among the clusters”). Further the combination of Chen and Ma does not explicitly teach the first degree of similarity and the second degree of similarity of the second cluster being values in a middle range between a minimum degree of the first degree of similarity and a maximum degree of the first degree of similarity. As a result Wang reference has been incorporated. Wang reference discloses similarity metric and that also appears to be same field of technology with Chen reference where data is processed to determine similarity information. Wang reference in [0095], [0104]-[0106] teaches how the median intensities are determined from the maximum and minimum values for the purpose of computing similarity metric computation. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of values in middle range of minimum value and maximum value as being disclosed and taught by Wang, in the system taught by the proposed combination of Chen and Ma to yield the predictable results of efficiently quantifying degree of similarity for further analysis (see Wang, [0022] “The similarity metric quantifies the degree of similarity of a feature space surrounding the candidate cell to a feature space of the HD map. The similarity metric can be based, at least in part, on a mean of the intensity attribute of the candidate cell, and the variance of the elevation attribute of the cell. In an embodiment, the first candidate cell similarity metric can be computed. A lookup table can be generated using data produced by the first candidate computed similarity metric. The lookup table can be used to lookup an approximation of the similarity metric for the second, and subsequent, candidate cells”). Thus a combination of Chen, Ma and Wang teaches the above argued limitation. Also, according to MPEP (2144 – IV), “The reason or motivation to modify the reference may often suggest what the inventor has done, but for a different purpose or to solve a different problem. It is not necessary that the prior art suggest the combination to achieve the same advantage or result discovered by applicant. See, e.g., In re Kahn, 441 F.3d 977, 987, 78 USPQ2d 1329, 1336 (Fed. Cir. 2006) (motivation question arises in the context of the general problem confronting the inventor rather than the specific problem solved by the invention)”. As a result, Chen, Ma and Wang appear to be focused towards a similar field of technology with respect to cluster processing and then data processing geared towards determining similarities. As a result, the above argument is not considered to be persuasive.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 4-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1:
Claims 1, 4-16 are recited as being directed to an “apparatus”. Claim 17 is recited as being directed to a “method”. Claim 18 is recited as being directed to a “computer readable form”. Thus claims 1, 4-18 have been identified to be directed towards the appropriate statutory category. Below is further analysis related to step 2.
Regarding claim 1,
Step 2A: Prong One:
Claim 1 recites limitations:
performing clustering on a data group based on a feature value of each of a plurality of pieces of data;
determining a representative cluster among clusters generated;
identifying a first cluster based on a degree of similarity between each of the clusters generated and the representative cluster;
identifying a second cluster based on a first degree of similarity, which is the degree of similarity between each of the clusters generated and the representative cluster, and a second degree of similarity, which is the degree of similarity between each of the clusters generated and the first cluster, the first degree of similarity and the second degree of similarity of the second cluster being values in a middle range between a minimum degree of the first degree of similarity and a maximum degree of the first degree of similarity;
selecting at least one piece of data for display from among the representative cluster, the first cluster, and the second cluster; and
…for comparison among the plurality of the selected at least one piece of data.
These claim limitations appear to be reciting a “Mental Process” including evaluation and observation which may be performed in a human mind.
A human being can apply evaluation to mentally determine clustering of data based on feature value of plurality of pieces of data. A human mind can evaluate to determine representative cluster among generated clusters. A human mind can evaluate to identify a first cluster based on a degree of similarity. A human being can apply evaluation to determine a second cluster based on first degree of similarity between each of the clusters generated and the representative cluster and a second degree of similarity, where the first degree of similarity and second degree of similarity are the values in the middle range of maximum and minimum degree of similarity. A human mind can observe to select one piece of data for display among the representative cluster, the first cluster and the second cluster. A human being can apply evaluation to compare among the plurality of selected pieces of data.
Step 2A: Prong Two:
The abstract idea does not appear to be integrated into a practical application with the recitation of the following claim language.
Claim 1 further recites limitations:
An information processing apparatus comprising: a processor; and
a memory storing one or more programs configured to be executed by the processor, the one or more programs including instructions for:
These claim limitations appear to be to merely add the use of generic computer components which are merely executing the abstract idea within a computer device (see MPEP 2106.05(b)) and do not appear to integrate the abstract idea into a particular application.
Claim 1 further recites limitations:
displaying a plurality of the selected at least one piece of data on a display device…
These claim limitations as a whole have been identified as insignificant extra-solution activity. Per MPEP 2106.05(g) “An example of post-solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent” and “Whether the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)”. Similarly the claim limitations as a whole above appear to be reciting an output of presenting the generated data and in the other limitations and do not appear to integrate the abstract idea into a practical application.
Step 2B:
The abstract idea does not appear to amount to significantly more than the abstract idea with the recitation of the following claim language.
Claim 1 further recites limitations:
An information processing apparatus comprising: a processor; and
a memory storing one or more programs configured to be executed by the processor, the one or more programs including instructions for:
These claim limitations appear to be to merely add the use of generic computer components which are merely executing the abstract idea within a computer device (see MPEP 2106.05(b)) and do not appear to amount to significantly more.
Claim 1 further recites limitations:
displaying a plurality of the selected at least one piece of data on a display device…
These claim limitations as a whole have been identified as insignificant extra-solution activity. Per MPEP 2106.05(g) “An example of post-solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent” and “Whether the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)”. Similarly the claim limitations as a whole above appear to be reciting an output of presenting the generated data and appear to be conventional computer functionality. Also, MPEP 2106.05(d)(II) has identified “Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93” as conventional computer technology. Similarly, the claim limitations identified above appear to be reciting an output of presenting the generated data. As a result, these claim limitations as a whole do not appear to amount to significantly more than the abstract idea itself.
Claim 18 incorporates substantively all the limitations of claim 1 in a computer-readable medium form (wherein claim limitations - A non-transitory recording medium that records a program that causes a computer to execute an information processing method, the method comprising: are directed towards in Step 2A: Prong Two as these claim limitations appear to be to merely add the use of generic computer components which are merely executing the abstract idea within a computer device (see MPEP 2106.05(b)) and do not appear to integrate the abstract idea into a particular practical application. These claim limitations in Step 2B appear to be to merely add the use of generic computer components which are merely executing the abstract idea within a computer device (see MPEP 2106.05(b)) and do not appear to amount to significantly more) and is rejected under the same rationale.
Regarding claim 17,
Step 2A: Prong One:
performing clustering on a data group based on a feature value of each of a plurality of pieces of data;
determining a representative cluster among clusters generated by the clustering;
performing first identification to identify a first cluster based on a degree of similarity between each of the clusters generated by the clustering and the representative cluster;
performing second identification to identify a second cluster based on a first degree of similarity, which is the degree of similarity between each of the clusters generated by the clustering and the representative cluster, and a second degree of similarity, which is the degree of similarity between each of the clusters generated by the clustering and the first cluster;
the first degree of similarity and the second degree of similarity of the second cluster being values in a middle range between a minimum degree of the first degree of similarity and a maximum degree of the first degree of similarity;
selecting at least one piece of data for display from each of the representative cluster, the first cluster, and the second cluster; and
… for comparison among the plurality of the selected at least one piece of data.
These claim limitations appear to be reciting a “Mental Process” including evaluation and observation which may be performed in a human mind.
A human being can apply evaluation to mentally determine clustering of data based on feature value of plurality of pieces of data. A human mind can evaluate to determine representative cluster among generated clusters. A human mind can evaluate to identify a first cluster based on a degree of similarity. A human being can apply evaluation to determine a second cluster based on first degree of similarity between each of the clusters generated and the representative cluster and a second degree of similarity, where the first degree of similarity and second degree of similarity are the values in the middle range of maximum and minimum degree of similarity. A human mind can observe to select one piece of data for display among the representative cluster, the first cluster and the second cluster. A human being can apply evaluation to compare among the plurality of selected pieces of data.
Step 2A: Prong Two:
The abstract idea does not appear to be integrated into a practical application with the recitation of the following claim language.
Claim 17 further recites limitations:
An information processing method comprising:
These claim limitations appear to be to merely add the use of generic computer components which are merely executing the abstract idea within a computer device (see MPEP 2106.05(b)) and do not appear to integrate the abstract idea into a particular application.
Claim 17 further recites limitations:
displaying a plurality of the selected at least one piece of data on a display device…
These claim limitations as a whole have been identified as insignificant extra-solution activity. Per MPEP 2106.05(g) “An example of post-solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent” and “Whether the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)”. Similarly the claim limitations as a whole above appear to be reciting an output of presenting the generated data and in the other limitations and do not appear to integrate the abstract idea into a practical application.
Step 2B:
The abstract idea does not appear to amount to significantly more than the abstract idea with the recitation of the following claim language.
Claim 17 further recites limitations:
An information processing method comprising:
These claim limitations appear to be to merely add the use of generic computer components which are merely executing the abstract idea within a computer device (see MPEP 2106.05(b)) and do not appear to amount to significantly more.
Claim 17 further recites limitations:
displaying a plurality of the selected at least one piece of data on a display device…
These claim limitations as a whole have been identified as insignificant extra-solution activity. Per MPEP 2106.05(g) “An example of post-solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent” and “Whether the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)”. Similarly the claim limitations as a whole above appear to be reciting an output of presenting the generated data and appear to be conventional computer functionality. Also, MPEP 2106.05(d)(II) has identified “Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93” as conventional computer technology. Similarly, the claim limitations identified above appear to be reciting an output of presenting the generated data. As a result, these claim limitations as a whole do not appear to amount to significantly more than the abstract idea itself.
Regarding claims 4-5 and 10-16,
Step 2A and 2B:
Claim 4 further recites limitations:
wherein the first cluster is identified as, a cluster having a lowest degree of similarity to the representative cluster.
Claim 5 further recites limitations:
wherein the first cluster among clusters excluding a predetermined number or predetermined ratio of clusters having lower degrees of similarity to the representative cluster is identified.
Claim 10 further recites limitations:
wherein the representative cluster is determined as, a cluster having a higher degree of similarity between feature values of a plurality of pieces of data included in the cluster having the higher degree of similarity.
Claim 11 further recites limitations:
wherein the representative cluster is determined as, a cluster including a larger number of pieces of data.
Claim 12 further recites limitations:
further comprising acquiring a target data group,
wherein each acquired piece of the plurality of pieces of data is associated with rank information indicating a rank regarding likelihood of being a target, and
wherein the representative cluster is determined as- a cluster including a larger number of data having the rank information indicating smaller numbers.
Claim 13 further recites limitations:
wherein, with the second cluster not identified, re-clustering on the data group is performed.
Claim 14 further recites limitations:
wherein, with the second cluster not identified, the first cluster among clusters excluding a predetermined number or predetermined ratio of clusters having lower degrees of similarity to the representative cluster is re-identified.
Claim 15 further recites limitations:
wherein a degree of similarity between the clusters is dependent upon performing clustering.
Claim 16 further recites limitations:
wherein the data group as a processing target is an image group.
These claim limitations appear to be reciting a “Mental Process” including evaluation and observation which may be performed in a human mind.
A human being can apply evaluation to identify a cluster, with lowest degree of similarity of representative cluster. A human being can evaluate to exclude a predetermined number or predetermined ratio of clusters having lower degrees of similarity to the representative cluster is identified. A human being can apply evaluation to determine the representative cluster as a cluster having a higher degree of similarity between feature values of a plurality of pieces of data included in the cluster having the higher degree of similarity. A human mind can apply evaluation to determine representative cluster as cluster including a larger number of pieces of data. A human being can apply evaluation to acquire a target group data based on the each acquired piece of the plurality of pieces of data is associated with rank information and determining the representative cluster including a larger number of data having the rank information indicating smaller numbers. A human being can evaluate to re-cluster data group when the second cluster is not identified. A human mind can apply evaluation wherein, with the second cluster is not identified, the first cluster among clusters excluding a predetermined number or predetermined ratio of clusters having lower degrees of similarity to the representative cluster is re-identified. A human mind can evaluate to apply algorithms wherein a degree of similarity between the clusters is dependent upon performing clustering. A human being can apply evaluation to determine that the data group as a processing target is an image group.
There are no additional claim limitations that integrate into a practical application or amount to significantly more than the abstract idea.
Regarding claims 6-9,
Claim 6 further recites limitations:
further comprising performing control to display the at least one piece of data for display such that the at least one piece of data is arranged for each of the clusters to which the at least one piece of data for display belongs.
Claim 7 further recites limitations:
further comprising performing control to display the at least one piece of data for display in order of the representative cluster, the second cluster, and the first cluster.
Claim 8 further recites limitations:
further comprising displaying a user interface (UI) to cause a user to input whether the at least one piece of data for display is an identical type of data.
Claim 9 further recites limitations:
further comprising displaying a UI to cause a user to input whether each of the at least one piece of data for display is a target type of data.
Step 2A: Prong Two:
These claim limitations as a whole have been identified as insignificant extra-solution activity. Per MPEP 2106.05(g) “An example of post-solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent” and “Whether the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)”. Similarly the claim limitations as a whole above appear to be reciting an output of presenting the generated data and in the other limitations and do not appear to integrate the abstract idea into a practical application.
Step 2B:
These claim limitations as a whole have been identified as insignificant extra-solution activity. Per MPEP 2106.05(g) “An example of post-solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent” and “Whether the limitation amounts to necessary data gathering and outputting, (i.e., all uses of the recited judicial exception require such data gathering or data output)”. Similarly the claim limitations as a whole above appear to be reciting an output of presenting the generated data and appear to be conventional computer functionality. Also, MPEP 2106.05(d)(II) has identified “Presenting offers and gathering statistics, OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93” as conventional computer technology. Similarly, the claim limitations identified above appear to be reciting an output of presenting the generated data. As a result, these claim limitations as a whole do not appear to amount to significantly more than the abstract idea itself.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 4, 6-7, 9-10, 13, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 10,496,691 B1, hereinafter “Chen”) in view of Ma et al. (US 6,347,313 B1, hereinafter “Ma”) further in view of Wang (US 2018/0143647 A1, hereinafter “Wang”).
Regarding claim 1, Chen teaches
An information processing apparatus comprising: a processor; and (see Chen, [col 3 lines 32-35] “the server 110 can include one or more processors formed in a substrate configured to execute one or more machine executable instructions or pieces of software, firmware, or a combination thereof”).
a memory storing one or more programs configured to be executed by the processor, the one or more programs including instructions for: (see Chen, [col 3 lines 38-44] “The server 110 can also include an operating system and one or more computer memories… The memory may include any type of storage device that stores information in a format that can be read and/or executed by the one or more processors”).
performing clustering on a data group based on a feature value of each of a plurality of pieces of data; (see Chen, [col 8 lines 10-18] “The clustering result of each round of distance-based clustering may be evaluated using the evaluation criteria described above. In some implementations, the system may adjust the similarity score, e.g., the embedding similarity, for cluster pairs that include entities are related in the entity ontology, e.g., are synonyms, hypernyms, or co-hypernyms to favor similarity (e.g., reducing the distance value)”; [col 9 lines 44-48] “An embedding model is used to represent things, such as search items, in a feature vector, also known as an embedding. A classifier, such as a WALS model, can be provided features for an item and the classifier generates an embedding that represents the item”).
determining a representative cluster among clusters generated; (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster).
identifying a first cluster (see Chen, [col 19 lines 5-7] “generating first clusters from items responsive to a query each cluster representing an entity in a knowledge base and including items mapped to the entity”) based on a degree of similarity… (see Chen, [col 7 lines 22-29] “The cluster evaluation metric can also be based on balance. Balance is a measure of how proportional or uniform the clusters are in terms of conversion size, and is calculated as the entropy of the conversion distribution across clusters. A high balance score means the clusters are of equal or similar size. In other words, high balance indicates the most popular/top-ranked search items appear fairly evenly across the clusters”) each of the clusters generated and (see Chen, [col 13 lines 9-] “The clusters generated at each round may be evaluated using the evaluation metrics outlined”) the representative cluster; (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster).
identifying a second cluster (see Chen, [col 19 lines 23-27] “may also include generating second clusters by applying a clustering methodology to the first clusters and calculating a respective cluster score for each second cluster as a regression of a coverage score, a balance score, an overlap score, a silhouette score, and a silhouette ratio”) based on a first degree of similarity, which is the degree of similarity… (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”) each of the clusters generated and (see Chen, [col 13 lines 9-] “The clusters generated at each round may be evaluated using the evaluation metrics outlined”) the representative cluster, and (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster) a second degree of similarity, which is the degree of similarity… (see Chen, [col 7 lines 44-48] “second a similarity between search items from the cluster to the nearest neighbor cluster. Nearest neighbor is understood to be another cluster that is most similar to the cluster, usually determined by a similarity score (e.g., embedding similarity or other similarity score)”) each of the clusters generated and (see Chen, [col 13 lines 9-] “The clusters generated at each round may be evaluated using the evaluation metrics outlined”) the first cluster, (see Chen, [col 19 lines 5-7] “generating first clusters from items responsive to a query each cluster representing an entity in a knowledge base and including items mapped to the entity”) the first degree of similarity (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”) and the second degree of similarity (see Chen, [col 7 lines 44-48] “second a similarity between search items from the cluster to the nearest neighbor cluster. Nearest neighbor is understood to be another cluster that is most similar to the cluster, usually determined by a similarity score (e.g., embedding similarity or other similarity score)”) of the second cluster… (see Chen, [col 19 lines 23-27] “may also include generating second clusters by applying a clustering methodology to the first clusters and calculating a respective cluster score for each second cluster as a regression of a coverage score, a balance score, an overlap score, a silhouette score, and a silhouette ratio”) of the first degree of similarity… (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”) of the first degree of similarity; (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”).
… for display from among (see Chen, [col 9 lines 19-26] “The clustering engine 122 decides on a final set of clusters (e.g., selecting final clusters from the clustering method that produces the highest quality clusters), the result engine 124 may generate information used to display the responsive search items to the query requestor as search results. The result engine 124 may organize responsive search items by cluster. FIG. 2 illustrates an example user interface 200 displaying search items organized by cluster”; [col 13 lines 28-30] “The system may use the selected final cluster candidate to organize the search result that is presented to the user”) the representative cluster, (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster) the first cluster, and (see Chen, [col 19 lines 5-7] “generating first clusters from items responsive to a query each cluster representing an entity in a knowledge base and including items mapped to the entity”) the second cluster; and (see Chen, [col 19 lines 23-27] “may also include generating second clusters by applying a clustering methodology to the first clusters and calculating a respective cluster score for each second cluster as a regression of a coverage score, a balance score, an overlap score, a silhouette score, and a silhouette ratio”).
displaying results… (see Chen, [col 9 lines 19-26] “The clustering engine 122 decides on a final set of clusters (e.g., selecting final clusters from the clustering method that produces the highest quality clusters), the result engine 124 may generate information used to display the responsive search items to the query requestor as search results. The result engine 124 may organize responsive search items by cluster. FIG. 2 illustrates an example user interface 200 displaying search items organized by cluster”; [col 13 lines 28-30] “The system may use the selected final cluster candidate to organize the search result that is presented to the user”) for comparison among clusters (see Chen, [col 11 lines 35-45] “The cluster score is then computed for the AC cluster and the BD cluster. Cluster E remains by itself in the second round/level. The cluster scores are then compared to the cluster scores for the individual clusters, namely A, B, C, D, and E… the system may compare the AC cluster score to the cluster score for A and C. If the cluster score of AC is not better than A and C… If the cluster score for B and D is better (e.g., higher), than the cluster scores for B and D alone, the clustering may be kept”).
Chen does not explicitly teach degree of similarity between each of the clusters generated and the representative cluster; the degree of similarity between each of the clusters generated and the representative cluster; the degree of similarity between each of the clusters generated and the first cluster; the second degree of similarity being values in a middle range between a minimum degree of the first degree of similarity and a maximum degree of the first similarity; selecting at least one piece of data for display from among the representative cluster, the first cluster, and the second cluster; and displaying a plurality of the selected at least one piece of data on a display device for comparison among the plurality of the selected at least one piece of data.
However, Ma discloses object clusters and also teaches
correlation between first cluster and second cluster (see Ma, [col 8 lines 35-51] “For each candidate cluster, a set of clusters is selected that are highly correlated with the candidate cluster… If D is approximately equal to 1, then the first cluster and the second cluster are considered to be close to one another”).
correlation between first cluster and second cluster (see Ma, [col 8 lines 35-51] “For each candidate cluster, a set of clusters is selected that are highly correlated with the candidate cluster… If D is approximately equal to 1, then the first cluster and the second cluster are considered to be close to one another”).
correlation between two clusters (see Ma, [col 11 lines 14-18] “If two clusters are identified as having centroids that are closely located within the feature space and the two clusters have a high correlation value in the correlation matrix”).
selecting at least one piece of data from the cluster for display at the user computer (see Ma, [col 5 lines 31-34] “Database objects are selected from the cluster which has a centroid closest to the query object and the selected database objects are displayed at the user computer 18”).
a plurality of the selected at least one piece of data on a display device… the plurality of the selected at least one piece of data (see Ma, [col 5 lines 31-34] “Database objects are selected from the cluster which has a centroid closest to the query object and the selected database objects are displayed at the user computer 18”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of similarity between clusters, selecting at least one piece of data from the cluster for display, predetermined range between minimum and maximum values, range being a middle range, low degree of similarity, target type of data, similarity between feature values of data pieces, re-clustering and data is an image group as being disclosed and taught by Ma, in the system taught by Chen to yield the predictable results of efficiently applying clusters to retrieve information in response to user-generated queries (see Ma, [col 4 lines 29-40] “During the initialization, database objects having similar feature vectors are clustered into common clusters. The feature vectors can be low level features such as color, size, shape, and texture of the subject matter of stored images within an image database… the more responsive and efficient the system is in performing retrieval in response to user-generated queries. The initialization of the database is indicative of system-perceived relationships among the objects and among the clusters”).
The proposed combination of Chen and Ma does not explicitly teach the second degree of similarity being values in a middle range between a minimum degree of the first degree of similarity and a maximum degree of the first similarity.
However, Wang discloses similarity metric and teaches
being values in a middle range between a minimum degree of first candidate cell… and a maximum degree of first candidate cell (see Wang, [0095] “The similarity score represents a degree of similarity between the candidate cell feature space and the HD map feature space 430”; [0104]-[0106] “building a lookup table 313 to determine a similarity score of a candidate cell feature space to an HD map feature space… a range of median intensities can be determined for use in accessing the lookup table 313. The range of median intensities can be determined during a similarity metric computation of a first candidate cell by determining a minimum and maximum median intensity value found during the similarity metric computation… the range of mean intensity values can be rounded to 0, 1, 2”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of values in middle range of minimum value and maximum value as being disclosed and taught by Wang, in the system taught by the proposed combination of Chen and Ma to yield the predictable results of efficiently quantifying degree of similarity for further analysis (see Wang, [0022] “The similarity metric quantifies the degree of similarity of a feature space surrounding the candidate cell to a feature space of the HD map. The similarity metric can be based, at least in part, on a mean of the intensity attribute of the candidate cell, and the variance of the elevation attribute of the cell. In an embodiment, the first candidate cell similarity metric can be computed. A lookup table can be generated using data produced by the first candidate computed similarity metric. The lookup table can be used to lookup an approximation of the similarity metric for the second, and subsequent, candidate cells”).
Claim 18 incorporates substantively all the limitations of claim 1 in a computer-readable medium form (see Chen, [col 20 lines 24-30] “the terms "machine-readable medium" "computer-readable medium" refers to any non-transitory computer program product, apparatus and/or device (e.g., magnetic discs, optical disks… used to provide machine instructions and/or data to a programmable processor”) and is rejected under the same rationale.
Regarding claim 17, Chen teaches
An information processing method comprising: (see Chen, [col 3 lines 20-21] “various clustering methods”).
performing clustering on a data group based on a feature value of each of a plurality of pieces of data; (see Chen, [col 8 lines 10-18] “The clustering result of each round of distance-based clustering may be evaluated using the evaluation criteria described above. In some implementations, the system may adjust the similarity score, e.g., the embedding similarity, for cluster pairs that include entities are related in the entity ontology, e.g., are synonyms, hypernyms, or co-hypernyms to favor similarity (e.g., reducing the distance value)”; [col 9 lines 44-48] “An embedding model is used to represent things, such as search items, in a feature vector, also known as an embedding. A classifier, such as a WALS model, can be provided features for an item and the classifier generates an embedding that represents the item”).
determining a representative cluster among clusters generated by the clustering; (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster).
performing first identification to identify a first cluster (see Chen, [col 19 lines 5-7] “generating first clusters from items responsive to a query each cluster representing an entity in a knowledge base and including items mapped to the entity”) based on a degree of similarity… (see Chen, [col 7 lines 22-29] “The cluster evaluation metric can also be based on balance. Balance is a measure of how proportional or uniform the clusters are in terms of conversion size, and is calculated as the entropy of the conversion distribution across clusters. A high balance score means the clusters are of equal or similar size. In other words, high balance indicates the most popular/top-ranked search items appear fairly evenly across the clusters”) each of the clusters generated by the clustering and (see Chen, [col 13 lines 9-] “The clusters generated at each round may be evaluated using the evaluation metrics outlined”) the representative cluster; (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster).
performing second identification to identify a second cluster (see Chen, [col 19 lines 23-27] “may also include generating second clusters by applying a clustering methodology to the first clusters and calculating a respective cluster score for each second cluster as a regression of a coverage score, a balance score, an overlap score, a silhouette score, and a silhouette ratio”) based on a first degree of similarity, which is the degree of similarity… (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”) each of the clusters generated by the clustering and (see Chen, [col 13 lines 9-] “The clusters generated at each round may be evaluated using the evaluation metrics outlined”) the representative cluster, and (see Chen, [col 6 lines 19-22] “the system may use many different clustering methods in parallel to produce different candidate final cluster results and select as the final cluster result the candidate with the best evaluation criteria, e.g., cluster score” – candidate cluster has been interpreted as representative cluster) a second degree of similarity, which is the degree of similarity… (see Chen, [col 7 lines 44-48] “second a similarity between search items from the cluster to the nearest neighbor cluster. Nearest neighbor is understood to be another cluster that is most similar to the cluster, usually determined by a similarity score (e.g., embedding similarity or other similarity score)”) each of the clusters generated by the clustering and (see Chen, [col 13 lines 9-] “The clusters generated at each round may be evaluated using the evaluation metrics outlined”) the first cluster; (see Chen, [col 19 lines 5-7] “generating first clusters from items responsive to a query each cluster representing an entity in a knowledge base and including items mapped to the entity”).
the first degree of similarity and (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”) the second degree of similarity (see Chen, [col 7 lines 44-48] “second a similarity between search items from the cluster to the nearest neighbor cluster. Nearest neighbor is understood to be another cluster that is most similar to the cluster, usually determined by a similarity score (e.g., embedding similarity or other similarity score)”) of the second cluster… (see Chen, [col 19 lines 23-27] “may also include generating second clusters by applying a clustering methodology to the first clusters and calculating a respective cluster score for each second cluster as a regression of a coverage score, a balance score, an overlap score, a silhouette score, and a silhouette ratio”) of the first degree of similarity… (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”) of the first degree of similarity; (see Chen, [col 7 lines 30-31] “An overlap score measures how many duplicate search items are included in different clusters”)
… for display from (see Chen, [col 9 lines 19-26] “The clustering engine 122 decides on a final set of clusters (e.g., selecting final clusters from the clustering method that produces the highest quality clusters), the result engine 124 may generate information used to display the responsive search items to the query requestor as search results. The result engine 124 may organize responsive search items by cluster. FIG. 2 illustrates an example use