DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) were submitted on 06/28/2023. The submission are in compliance with the provisions of 37 CFR § 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 15-20 is/are rejected under 35 U.S.C. § 101 as directed to non-statutory subject matter.
Claims 15-20 is/are rejected under 35 U.S.C. 101 as not falling within one of the four statutory categories of invention because the broadest reasonable interpretation of the instant claims in light of the specification encompasses transitory signals. But transitory signals are not within one of the four statutory categories ((i.e. process, machine, manufacture, or composition of matter).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shukla et al. (US 20240220511, hereinafter Shukla) in view of Raghavan et al. (US 20240378492, hereinafter Raghavan) and Zhong et al. (US 20210303638, hereinafter Zhong)
Regarding Claim 1, Shukla discloses a computer-implemented method comprising:
receiving, by at least one processor, a plurality of entity record datasets associated with one more entities, wherein each entity record dataset comprises at least one thousand data elements ([0077], FIG. 4, #402, predictive data analysis computing entity 106 receives classification input data comprising one or more unstructured data elements [can scale from thousands to billions] that describes a dataset comprising one or more electronic data records provided as input to a machine learning model);
utilizing, by the at least one processor, a computer-based merge module to resolve a candidate entity record from a plurality of entity record datasets ([0078], FIG. 4, #402, the predictive data analysis computing entity 106 generates, using an NLP machine learning model, for each of the one or more unstructured data elements, one or more NLP candidate classification labels based at least in part on the unstructured data element and extracts features from the one or more unstructured data elements and determine potential mappings for standardized classification labels via NLP candidate classification labels);
wherein the computer-based merge module is configured to utilize at least one trained language learning model to determine a set of embeddings for the plurality of entity record datasets ([0099], determines D classifications for D unstructured data elements based at least in part on D unstructured data-wide embedded representations for the D unstructured data elements);
determining, by the at least one processor, a classification of the set of embeddings based on at least one similarity measure ([0100] FIG. 7, generating one or more NLP candidate classification labels to generate NLP candidate classification labels for unstructured data elements);
wherein the at least one similarity measure is utilized to determine:
a set of low similarity embeddings associated with the classification of the embeddings ([0102], document/sentence/word similarity measuring techniques are used to generate the one or more synonymous words, such as word mover's distance, support vector machine, bag-of-words, term frequency-inverse document frequency, latent semantic analysis, linear discriminant analysis, mixed sample data augmentation, combinatory categorical grammar, and Naïve Bayes);
merging, by the at least one processor, the plurality of entity record datasets ([0049], NLP machine learning model and the structured data classification machine learning model may be combined by selectively merging prediction outputs generated by each model).
Shukla does not explicitly disclose utilizing, by the at least one processor, a clustering engine to form a set of low similarity feature groups from at least the set of low similarity embeddings group; determining, by the at least one processor, at least one search space group rule based on the low similarity feature groups; utilizing, by the at least one processor, the search space group rule to eliminate at least one entity record from the plurality of entity records datasets;
Raghavan teaches utilizing, by the at least one processor, a clustering engine to form a set of low similarity feature groups from at least the set of low similarity embeddings group ([0088], similarity engine 210 may cluster the plurality of entities based on the representations using any suitable method e.g. using density-based clustering, centroid-based clustering, distribution based clustering and/or hierarchical clustering);
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of clustering engine to form a set of low similarity feature groups as taught by Raghavan ([0088]) into the machine learning system of Shukla in order to apply multiple filters to the stores before they are input to the trained machine learning model, thus reducing the number of stores that are input to the model, and reducing processing time (Raghavan, [0009]).
Shukla & Raghavan do not explicitly disclose determining, by the at least one processor, at least one search space group rule based on the low similarity feature groups; utilizing, by the at least one processor, the search space group rule to eliminate at least one entity record from the plurality of entity records datasets.
Zhong teaches determining, by the at least one processor, at least one search space group rule based on the low similarity feature groups ([0020], searching a hierarchy of clusters of the standardized entity embeddings with lowest level of the hierarchy contains the largest number of clusters and the smallest clusters, and the highest level of the hierarchy contains the smallest number of clusters and the largest clusters; [0087], the hierarchy is searched in a top-down fashion, starting with the largest clusters at a highest (e.g., root) level of the hierarchy and ending with a cluster at the lowest (e.g., leaf) level of the hierarchy. The clusters at the highest level are inserted into a priority queue that orders the clusters by ascending distances between the clusters' centers and the first embedding ); utilizing, by the at least one processor, the search space group rule to eliminate at least one entity record from the plurality of entity records datasets ([0087], the hierarchy of clusters is searched for a subset of embeddings that are within a threshold proximity to the first embedding in a vector space (operation 404) and the cluster is then searched for the subset of embeddings that are within a threshold distance to the first embedding in the vector space; [0103], FIG. 6, at least a portion of the ranking of the documents by the relevance scores is outputted (operation 616)).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of search space group rule based on the low similarity feature groups as taught by Zhong ([0020]) into the machine learning system of Shukla & Raghavan in order to provide systems for improving accuracy, resource consumption, latency, and scalability associated with large-scale and/or online scoring by machine learning models (Zhong, [0004]).
Regarding Claim 2, Shukla in view of Raghavan and Zhong discloses the computer-implemented method of claim 1,
Shukla discloses wherein the at least one similarity measure is utilized to determine a set of high similarity embeddings group associated with the classification of the set of embeddings ([0100] FIG. 7, generating one or more NLP candidate classification labels to generate NLP candidate classification labels for unstructured data elements); [0102], document/sentence/word similarity measuring techniques are used to generate the one or more synonymous words, such as word mover's distance, support vector machine, bag-of-words, term frequency-inverse document frequency, latent semantic analysis, linear discriminant analysis, mixed sample data augmentation, combinatory categorical grammar, and Naïve Bayes).
Regarding Claim 3, Shukla in view of Raghavan and Zhong discloses the computer-implemented method of claim 2,
Raghavan teaches wherein the clustering engine is utilized to form a set of high similarity feature groups from the set of high similarity embeddings group ([0088], similarity engine 210 may cluster the plurality of entities based on the representations using any suitable method e.g. using density-based clustering, centroid-based clustering, distribution based clustering and/or hierarchical clustering). The same reason or rational of obviousness motivation applied as used above in claim 1.
Regarding Claim 4, Shukla in view of Raghavan and Zhong discloses the computer-implemented method of claim 3,
Zhong teaches wherein a search space group rule is determined based on the high similarity feature groups ([0020], searching a hierarchy of clusters of the standardized entity embeddings with lowest level of the hierarchy contains the largest number of clusters and the smallest clusters, and the highest level of the hierarchy contains the smallest number of clusters and the largest clusters; [0087], the hierarchy is searched in a top-down fashion, starting with the largest clusters at a highest (e.g., root) level of the hierarchy and ending with a cluster at the lowest (e.g., leaf) level of the hierarchy. The clusters at the highest level are inserted into a priority queue that orders the clusters by ascending distances between the clusters' centers and the first embedding). The same reason or rational of obviousness motivation applied as used above in claim 1.
Regarding Claim 5, Shukla in view of Raghavan and Zhong discloses the computer-implemented method of claim 4,
Zhong teaches wherein the search space group rule is utilized to eliminate at least one entity record from the plurality of entity record datasets ([0087], the hierarchy of clusters is searched for a subset of embeddings that are within a threshold proximity to the first embedding in a vector space (operation 404) and the cluster is then searched for the subset of embeddings that are within a threshold distance to the first embedding in the vector space; [0103], FIG. 6, at least a portion of the ranking of the documents by the relevance scores is outputted (operation 616)). The same reason or rational of obviousness motivation applied as used above in claim 1.
Regarding Claim 6, Shukla in view of Raghavan and Zhong discloses the computer-implemented method of claim 5,
Shukla discloses wherein the plurality of entity record datasets are merged ([0049], NLP machine learning model and the structured data classification machine learning model may be combined by selectively merging prediction outputs generated by each model).
Regarding Claim 7, Shukla in view of Raghavan and Zhong discloses the computer-implemented method of claim 3,
Shukla discloses wherein the merge module determines an entity merge based on the features of the candidate entity record and at least the set of high similarity feature groups ([0049], NLP machine learning model and the structured data classification machine learning model may be combined by selectively merging prediction outputs generated by each model; [0102], generating embeddings for the one or more extracted features and determining a distance measure).
Regarding Claims 8-14, System claims 8-14 of using the corresponding method claimed in claims 1-7, and the rejections of which are incorporated herein for the same reasons as used above.
Regarding Claims 15-20, Computer-readable storage medium claims 15-20 of using the corresponding method claimed in claims 1-7, and the rejections of which are incorporated herein for the same reasons as used above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samuel D Fereja whose telephone number is (469)295-9243. The examiner can normally be reached 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID CZEKAJ can be reached at (571) 272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SAMUEL D FEREJA/Primary Examiner, Art Unit 2487