DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on May 01, 2024, May 09, 2025 and December 16, 2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because a computer program product, under the broadest reasonable interpretation, will cover an ineligible software per se as defined by the specification in paragraph 0067 application as filed. The specification is silent, concerning the computer program product, thus the broadest reasonable interpretation of a computer program product is a software per se.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-10 and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ramaswamy et al. (US 10,789,410 B1) in view of Roustant et al. (US 2019/0361985 A1).
As to claim 1, Ramaswamy discloses a method for screening data instances based on a target text of a target corpus [Column 1, lines 28-32], the method comprising:
determining, by a screening device [Quality Control Service 102 on FIG. 1] and for each data instance of a plurality of data instances, a word score and an n-gram score [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that ranks the word score.” Column 10, lines 28-46];
filtering, by the screening device, the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, and at least one or more of a threshold word score or a threshold n-gram score, to generate a short list of data instances [“The classification module filters multiple confidence scores for multiple language which exceed a threshold and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67].
Ramaswamy fails to disclose at least one data instance of the short list and an indication of its corresponding term similarity score.
However, Roustant teaches providing, by the screening device, at least one data instance of the short list and an indication of its corresponding term similarity score, wherein the corresponding term similarity score was determined using a term overlap function [“The asymmetrical rank overlaps similarity scores of the short list and ranks the reference list of the most important items than the second list.” Paragraph 0044].
Ramaswamy and Roustant are analogous because they are all directed to natural language processing management system. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious to modify Ramaswamy reference with the teaching of Roustant, so that the list would include the similarity score in the list of Ramaswamy, would have been combined into a list of potential typos, for the obvious purpose of providing the higher rank to the second list, by combining prior art elements according to known methods to yield predictable results.
As to claim 2, Ramaswamy discloses the method of claim 1, wherein the target corpus is a watch list, the target text corresponds to an entity listed on the watch list, and each data instance is a transaction [“The classification module assigns multiple confidence scores for multiple language and generate a short list of potential typos identify by the quality control.” Column 2, lines 17-30 and Column 5, lines 42-67].
As to claim 3, Ramaswamy discloses the method of claim 1, further comprising: identifying, by the screening device, a count of k data instances of the short list that have term similarity scores indicating they are the k data instances of the short list that are most similar to the target text [“The classification module assigns multiple confidence scores for multiple language and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67]; and
providing, by the screening device, the k data instances and an indication of their corresponding term similarity scores [“The classification module assigns multiple confidence scores for multiple language and generate a short list of potential typos identify by the quality control.” Column 2, lines 17-30 and Column 5, lines 42-67].
As to claim 4, Ramaswamy discloses the method of claim 1, wherein a term weight is determined by a product of (a) a number of times a term appears in the target text divided by a number of terms in the target text [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that determines the weight of the term.” Column 10, lines 28-46] and
(b) a number of target texts in the target corpus divided by a number of target texts in the target corpus comprising the term, the term being a word or an n-gram [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score from the target texts.” Column 10, lines 28-46].
As to claim 5, Ramaswamy discloses the method of claim 1, wherein an n-gram is a portion of a character string comprising n sequential characters of the character string [“The n-gram is a contiguous sequence of n item from a given sequence of text.” Column 2, lines 31-53].
As to claim 6, Ramaswamy discloses the method of claim 5, where n is equal to three [“The n-gram is a contiguous sequence of n item from a given sequence of text. A n-gram of size of 3 is a trigram syllables such as green.” Column 2, lines 31-53].
As to claim 7, Ramaswamy discloses the method of claim 1, wherein a data instance is included in the short list when the data instance comprises at least one of (a) at least one word in common with the target text that has a word weight greater than a word weight threshold [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that has a greater weight. The examiner chooses the first limitation because of the simple or.” Column 10, lines 28-46].
As to claim 8, Ramaswamy discloses the method of claim 1, further comprising: analyzing, by the screening device, the target corpus to generate a word dictionary and an n-gram dictionary for the target corpus, wherein the word dictionary and the n-gram dictionary comprise a respective dictionary of at least two term dictionaries [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that have multiple terms in the corpus text.” Column 10, lines 28-46].
As to claim 9, Ramaswamy discloses the method of claim 8, further comprising: determining, by the screening device and based on a frequency of a word in the target corpus, a word weight for the word [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that determines the weight of the term.” Column 10, lines 28-46]; and
determining, by the screening device and based on a frequency of an n-gram in the target corpus, an n-gram weight for the n-gram [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that determines the weight of the term.” Column 10, lines 28-46].
As to claim 10, Ramaswamy discloses the method of claim 8, but fails to disclose the data instance of the short list and an indication of its corresponding term similarity score.
However, Roustant teaches wherein determining the term similarity scores for the data instance and the target text comprises: determining, by the screening device, a word overlap function between words present in at least a portion of the data instance and words present in the target text [“The asymmetrical rank overlaps similarity scores of the word and ranks the reference list of the most important item than the second list.” Paragraph 0044];
determining, by the screening device, an n-gram overlap function between n-grams present in at least a portion of the data instance and n-grams present in the target text [“The asymmetrical rank overlaps similarity scores of the short list and ranks the distance of the most important item than the second list.” Paragraph 0044]; and
providing, by the screening device, a result of the word overlap function as a word similarity score and a result of the n-gram overlap function as an n-gram similarity score [“The asymmetrical rank overlaps similarity scores of the short list and ranks the reference list most important item than the second list.” Paragraph 0044].
Ramaswamy and Roustant are analogous because they are all directed to natural language processing management system. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious to modify Ramaswamy reference with the teaching of Roustant, so that the list would include the similarity score in the list of Ramaswamy, would have been combined into a list of potential typos, for the obvious purpose of providing the higher rank to the second list, by combining prior art elements according to known methods to yield predictable results.
As to claim 14, Ramaswamy discloses an apparatus [Quality Control Service 102 on FIG. 1] for screening data instances based on a target text of a target corpus, the apparatus comprising:
processing circuitry [Processing units 104 on FIG. 1] configured to:
determine, for each data instance of a plurality of data instances, a word score and an n-gram score [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that ranks the word score.” Column 10, lines 28-46];
filter the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, and at least one or more of a threshold word score or a threshold n-gram score, to generate a short list of data instances [“The classification module filters multiple confidence scores for multiple language which exceed a threshold and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67].
Ramaswamy fails to disclose at least one data instance of the short list and an indication of its corresponding term similarity score.
However, Roustant teaches provide at least one data instance of the short list and an indication of its corresponding term similarity score, wherein the corresponding term similarity score was determined using a term overlap function [“The asymmetrical rank overlaps similarity scores of the short list and ranks the reference list of the most important items than the second list.” Paragraph 0044].
Ramaswamy and Roustant are analogous because they are all directed to natural language processing management system. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious to modify Ramaswamy reference with the teaching of Roustant, so that the list would include the similarity score in the list of Ramaswamy, would have been combined into a list of potential typos, for the obvious purpose of providing the higher rank to the second list, by combining prior art elements according to known methods to yield predictable results.
As to claim 15, Ramaswamy discloses the apparatus of claim 14, wherein the target corpus is a watch list, the target text corresponds to an entity listed on the watch list, and each data instance is a transaction [“The classification module assigns multiple confidence scores for multiple language which exceed a threshold and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67].
As to claim 16, Ramaswamy discloses the apparatus of claim 14, wherein the processing circuitry is further configured to: identify a count of k data instances of the short list that have term similarity scores indicating they are the k data instances of the short list that are most similar to the target text [“The classification module assigns multiple confidence scores for multiple language and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67]; and
provide the k data instances and an indication of their corresponding term similarity scores [“The classification module assigns multiple confidence scores for multiple language and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67].
As to claim 17, Ramaswamy discloses the apparatus of claim 14, wherein a term weight is determined by a product of (a) a number of times a term appears in the target text divided by a number of terms in the target text [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that .” Column 10, lines 28-46] and
(b) a number of target texts in the target corpus divided by a number of target texts in the target corpus comprising the term, the term being a word or an n-gram [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that .” Column 10, lines 28-46].
As to claim 18, Ramaswamy discloses the apparatus of claim 14, wherein an n-gram is a portion of a character string comprising n sequential characters of the character string [“The n-gram is a contiguous sequence of n item from a given sequence of text.” Column 2, lines 31-53].
As to claim 19, Ramaswamy discloses the apparatus of claim 18, where n is equal to three [“The n-gram is a contiguous sequence of n item from a given sequence of text. A n-gram of size of 3 is a trigram syllables such as green.” Column 2, lines 31-53].
As to claim 20, Ramaswamy discloses a computer program product for screening data instances based on a target text of a target corpus, comprising at least one non-transitory storage medium, the at least one non-transitory storage medium storing computer executable instructions [Column 6, line 65 to column 7, line 33], the computer executable instructions comprising computer executable code configured to, when executed by processing circuitry of an apparatus, cause the apparatus to:
determine, for each data instance of a plurality of data instances, a word score and an n-gram score [“The quality control service identify the n-grams and words present within the document of a training corpus, determines a confidence score that ranks the word score.” Column 10, lines 28-46];
filter the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, and at least one or more of a threshold word score or a threshold n-gram score, to generate a short list of data instances [“The classification module filters multiple confidence scores for multiple language which exceed a threshold and generate a short list of potential typos identify by the quality control Column 2, lines 17-30 and Column 5, lines 42-67].
Ramaswamy fails to disclose at least one data instance of the short list and an indication of its corresponding term similarity score.
However, Roustant teaches provide at least one data instance of the short list and an indication of its corresponding term similarity score, wherein the corresponding term similarity score was determined using a term overlap function [“The asymmetrical rank overlaps similarity scores of the short list and ranks the reference list of the most important items than the second list.” Paragraph 0044].
Ramaswamy and Roustant are analogous because they are all directed to natural language processing management system. One of ordinary skill in the art before the effective filing date of the claimed invention would have found obvious to modify Ramaswamy reference with the teaching of Roustant, so that the list would include the similarity score in the list of Ramaswamy, would have been combined into a list of potential typos, for the obvious purpose of providing the higher rank to the second list, by combining prior art elements according to known methods to yield predictable results.
Allowable Subject Matter
Claims 11-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11,501,067 B1. Although the claims at issue are not identical, they are not patentably distinct from each other because at least one claim of the instant application is being taught by the claims of the U.S. Patent.
Patented claim 1 recites a method for screening data instances based on a target text of a target corpus which perform the feature of providing, by the screening device, at least one data instance of the short list and an indication of the corresponding similarity score.
The pending claim 1 recites a method for screening data instances based on a target text of a target corpus which perform the similar feature of providing, by the screening device, at least one data instance of the short list and an indication of the corresponding similarity score.
Therefore, the patented claim 1 anticipates the pending 1.
Pending claims 2-20 have similar limitations comparing the patented claims 2-20 as shown on the table below.
Pending claims
Patented claims
1. A method for screening data instances based on a target text of a target corpus, the method comprising: determining, by a screening device and for each data instance of a plurality of data instances, a word score and an n-gram score; filtering, by the screening device, the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, and at least one or more of a threshold word score or a threshold n-gram score, to generate a short list of data instances; and providing, by the screening device, at least one data instance of the short list and an indication of its corresponding term similarity score, wherein the corresponding term similarity score was determined using a term overlap function.
2. The method of claim 1, wherein the target corpus is a watch list, the target text corresponds to an entity listed on the watch list, and each data instance is a transaction.
3. The method of claim 1, further comprising: identifying, by the screening device, a count of k data instances of the short list that have term similarity scores indicating they are the k data instances of the short list that are most similar to the target text; and providing, by the screening device, the k data instances and an indication of their corresponding term similarity scores.
4. The method of claim 1, wherein a term weight is determined by a product of (a) a number of times a term appears in the target text divided by a number of terms in the target text and (b) a number of target texts in the target corpus divided by a number of target texts in the target corpus comprising the term, the term being a word or an n-gram.
5. The method of claim 1, wherein an n-gram is a portion of a character string comprising n sequential characters of the character string.
6. The method of claim 5, where n is equal to three.
7. The method of claim 1, wherein a data instance is included in the short list when the data instance comprises at least one of (a) at least one word in common with the target text that has a word weight greater than a word weight threshold or (b) at least one n-gram in common with the target text that has an n-gram weight greater than an n-gram weight threshold.
8. The method of claim 1, further comprising: analyzing, by the screening device, the target corpus to generate a word dictionary and an n-gram dictionary for the target corpus, wherein the word dictionary and the n-gram dictionary comprise a respective dictionary of at least two term dictionaries.
9. The method of claim 8, further comprising: determining, by the screening device and based on a frequency of a word in the target corpus, a word weight for the word; and determining, by the screening device and based on a frequency of an n-gram in the target corpus, an n-gram weight for the n-gram.
10. The method of claim 8, wherein determining the term similarity scores for the data instance and the target text comprises: determining, by the screening device, a word overlap function between words present in at least a portion of the data instance and words present in the target text; determining, by the screening device, an n-gram overlap function between n-grams present in at least a portion of the data instance and n-grams present in the target text; and providing, by the screening device, a result of the word overlap function as a word similarity score and a result of the n-gram overlap function as an n-gram similarity score.
11. The method of claim 10, wherein determining the term similarity scores for the data instance and the target text further comprises: generating, by the screening device, a word vector for the at least a portion of the data instance, wherein each element of the word vector corresponds to a word in the word dictionary and when a word in the word dictionary is present in the at least a portion of the data instance, an element of the word vector corresponding to the word has a non-zero value; and generating, by the screening device, an n-gram vector for the at least a portion of the data instance, wherein each element of the n-gram vector corresponds to an n-gram in the n-gram dictionary and when an n-gram in the n-gram dictionary is present in the at least a portion of the data instance, an element of the n-gram vector corresponding to the n-gram present in the at least a portion of the data instance has a non-zero value, wherein the word overlap function between words present in the at least a portion of the data instance and words present in the target text is a dot product between the word vector for the at least a portion of the data instance and a word vector corresponding to the target text, and wherein the n-gram overlap function between the n-grams present in the at least a portion of the data instance and the n-grams present in the target text is a dot product between the n-gram vector for the at least a portion of the data instance and a n-gram vector corresponding to the target text.
12. The method of claim 11, wherein the non-zero value of the word vector is equal to a word weight corresponding to the word for the target text and the non-zero value of the n-gram vector is equal to the n-gram weight corresponding to the n-gram for the target text.
13. The method of claim 1, further comprising: determining, by the screening device, an average term weight for the target text; determining, by the screening device, a standard deviation of term weight for the target text; and determining, by the screening device, one or more of the threshold word scores or the threshold n-gram scores based on the average term weight and the standard deviation of term weight.
14. An apparatus for screening data instances based on a target text of a target corpus, the apparatus comprising: processing circuitry configured to: determine, for each data instance of a plurality of data instances, a word score and an n-gram score; filter the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, and at least one or more of a threshold word score or a threshold n-gram score, to generate a short list of data instances; and provide at least one data instance of the short list and an indication of its corresponding term similarity score, wherein the corresponding term similarity score was determined using a term overlap function.
15. The apparatus of claim 14, wherein the target corpus is a watch list, the target text corresponds to an entity listed on the watch list, and each data instance is a transaction.
16. The apparatus of claim 14, wherein the processing circuitry is further configured to: identify a count of k data instances of the short list that have term similarity scores indicating they are the k data instances of the short list that are most similar to the target text; and provide the k data instances and an indication of their corresponding term similarity scores.
17. The apparatus of claim 14, wherein a term weight is determined by a product of (a) a number of times a term appears in the target text divided by a number of terms in the target text and (b) a number of target texts in the target corpus divided by a number of target texts in the target corpus comprising the term, the term being a word or an n-gram.
18. The apparatus of claim 14, wherein an n-gram is a portion of a character string comprising n sequential characters of the character string.
19. The apparatus of claim 18, where n is equal to three.
20. A computer program product for screening data instances based on a target text of a target corpus, comprising at least one non-transitory storage medium, the at least one non-transitory storage medium storing computer executable instructions, the computer executable instructions comprising computer executable code configured to, when executed by processing circuitry of an apparatus, cause the apparatus to: determine, for each data instance of a plurality of data instances, a word score and an n-gram score; filter the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, and at least one or more of a threshold word score or a threshold n-gram score, to generate a short list of data instances; and provide at least one data instance of the short list and an indication of its corresponding term similarity score, wherein the corresponding term similarity score was determined using a term overlap function.
1. A method for screening data instances based on a target text of a target corpus, the method comprising: analyzing, by a processor of a screening device, a target corpus to generate a word dictionary and an n-gram dictionary for the target corpus, the target corpus comprising the target text; based on a frequency of a word in the target corpus, determining, by the screening device, a word weight for the word; based on a frequency of an n-gram in the target corpus, determining, by the screening device, an n-gram weight for the n-gram; for each data instance of a plurality of data instances, determining, by the screening device, a word score and an n-gram score for the data instance and the target text based on the determined word and n-gram weights; filtering, by the screening device, the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, to generate a short list of data instances; determining, by the screening device, word and n-gram similarity scores between each data instance of the short list and target text based on a term overlap function between a term present in at least a portion of the data instance and the term present in the target text and a corresponding term weight, the term being a respective word or n-gram; and providing, by the screening device, at least one data instance of the short list and an indication of the corresponding similarity score.
2. The method of claim 1, further comprising: identifying, by the screening device, a count of k data instances of the short list that have similarity scores indicating they are the k data instances of the short list that are most similar to the target text; and providing the k data instances and an indication of the corresponding similarity scores.
3. The method of claim 1, wherein the target corpus is a watch list, the target text corresponds to an entity listed on the watch list, and the data instance is a transaction.
4. The method of claim 1, wherein a term weight is determined by the product of (a) a number of times the term appears in the target text divided by a number of terms in the target text and (b) a number of target texts in the target corpus divided by a number of target texts in the target corpus comprising the term, the term being a word or an n-gram.
5. The method of claim 1, wherein an n-gram is a portion of a character string comprising n sequential characters of the character string.
6. The method of claim 5, where n is equal to three.
7. The method of claim 1, wherein a data instance is included in the short list when the data instance comprises at least one of (a) at least one word in common with the target text that has a word weight greater than a word weight threshold or (b) at least one n-gram in common with the target text that has an n-gram weight greater than an n-gram weight threshold.
8. The method of claim 1, wherein determining the similarity scores for the data instance and the target text comprises: determining a word overlap function between words present in at least a portion of the data instance and words present in the target text; determining an n-gram overlap function between the n-grams present in at least a portion of the data instance and n-grams present in the target text; and providing a result of the word overlap function as the word similarity score and a result of the n-gram overlap function as the n-gram similarity score.
9. The method of claim 8, wherein determining the similarity scores for the data instance and the target text further comprises: generating a word vector for the at least a portion of the data instance, wherein each element of the word vector corresponds to a word in the word dictionary and when a word in the word dictionary is present in the at least a portion of the data instance, an element of the word vector corresponding to the word has a non-zero value; and generating an n-gram vector for the at least a portion of the data instance, wherein each element of the n-gram vector corresponds to an n-gram in the n-gram dictionary and when an n-gram in the n-gram dictionary is present in the at least a portion of the data instance, an element of the n-gram vector corresponding to the n-gram present in the at least a portion of the data instance has a non-zero value, wherein the word overlap function between words present in the at least a portion of the data instance and words present in the target text is a dot product between the word vector for the at least a portion of the data instance and a word vector corresponding to the target text; wherein the n-gram overlap function between n-grams present in the at least a portion of the data instance and n-grams present in the target text is a dot product between the n-gram vector for the at least a portion of the data instance and a n-gram vector corresponding to the target text.
10. The method of claim 9, wherein the non-zero value of the word vector is equal to the word weight corresponding to the word for the target text and the non-zero value of the n-gram vector is equal to the n-gram weight corresponding to the n-gram for the target text.
11. An apparatus for screening data instances based on a target text of a target corpus, the apparatus comprising: processing circuitry configured to: analyze a target corpus to generate a word dictionary and an n-gram dictionary for the target corpus, the target corpus comprising the target text; based on a frequency of a word in the target corpus, determine a word weight for the word; based on a frequency of an n-gram in the target corpus, determine an n-gram weight for the n-gram; for each data instance of a plurality of data instances, determine a word score and an n-gram score for the data instance and the target text based on the determined word and n-gram weights; filter the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, to generate a short list of data instances; determine word and n-gram similarity scores between each data instance of the short list and target text based on a term overlap function between a term present in at least a portion of the data instance and the term present in the target text and a corresponding term weight, the term being a respective word or n-gram; and provide at least one data instance of the short list and an indication of the corresponding similarity score.
12. The apparatus of claim 11, wherein the processing circuitry is further configured to: identify a count of k data instances of the short list that have similarity scores indicating they are the k data instances of the short list that are most similar to the target text; and provide the k data instances and an indication of the corresponding similarity scores.
13. The apparatus of claim 11, wherein the target corpus is a watch list, the target text corresponds to an entity listed on the watch list, and the data instance is a transaction.
14. The apparatus of claim 11, wherein a term weight is determined by the product of (a) a number of times the term appears in the target text divided by a number of terms in the target text and (b) a number of target texts in the target corpus divided by a number of target texts in the target corpus comprising the term, the term being a word or an n-gram.
15. The apparatus of claim 11, wherein an n-gram is a portion of a character string comprising n sequential characters of the character string.
16. The apparatus of claim 15, where n is equal to three.
17. The apparatus of claim 11, wherein a data instance is included in the short list when the data instance comprises at least one of (a) at least one word in common with the target text that has a word weight greater than a word weight threshold or (b) at least one n-gram in common with the target text that has an n-gram weight greater than an n-gram weight threshold.
18. The apparatus of claim 11, wherein determining the similarity scores for the data instance and the target text further comprises: generating a word vector for the at least a portion of the data instance, wherein each element of the word vector corresponds to a word in the word dictionary and when a word in the word dictionary is present in the at least a portion of the data instance, an element of the word vector corresponding to the word has a non-zero value; and generating an n-gram vector for the at least a portion of the data instance, wherein each element of the n-gram vector corresponds to an n-gram in the n-gram dictionary and when an n-gram in the n-gram dictionary is present in the at least a portion of the data instance, an element of the n-gram vector corresponding to the n-gram has a non-zero value, wherein the word overlap function between words present in the at least a portion of the data instance and words present in the target text is a dot product between the word vector for the at least a portion of the data instance and a word vector corresponding to the target text; wherein the n-gram overlap function between n-grams present in the at least a portion of the data instance and n-grams present in the target text is a dot product between the n-gram vector for the at least a portion of the data instance and a n-gram vector corresponding to the target text.
19. The apparatus of claim 18, wherein the non-zero value of the word vector is equal to the word weight corresponding to the word for the target text and the non-zero value of the n-gram vector is equal to the n-gram weight corresponding to the n-gram for the target text.
20. A computer program product for screening data instances based on a target text of a target corpus, comprising at least one non-transitory storage medium, the at least one non-transitory storage medium storing computer executable instructions, the computer executable instructions comprising computer executable code configured to, when executed by processing circuitry of an apparatus, cause the apparatus to: analyze a target corpus to generate a word dictionary and an n-gram dictionary for the target corpus, the target corpus comprising the target text; based on a frequency of a word in the target corpus, determine a word weight for the word; based on a frequency of an n-gram in the target corpus, determine an n-gram weight for the n-gram; for each data instance of a plurality of data instances, determine a word score and an n-gram score for the data instance and the target text based on the determined word and n-gram weights; filter the plurality of data instances based on the word score and the n-gram score corresponding to each data instance, to generate a short list of data instances; determine word and n-gram similarity scores between each data instance of the short list and target text based on a term overlap function between a term present in at least a portion of the data instance and the term present in the target text and a corresponding term weight, the term being a respective word or n-gram; and provide at least one data instance of the short list and an indication of the corresponding similarity score.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892 form.
For example: Raiskin (US 2021/0311728 A1) discloses automatically detecting positions of various webpage elements within a webpage when the webpage is rendered, based on analyzing the programming code of the webpage using graph-based and NLP-based techniques.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GERALD GAUTHIER whose telephone number is (571)272-7539. The examiner can normally be reached 8:00 AM to 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CAROLYN R EDWARDS can be reached at (571) 270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GERALD GAUTHIER/Primary Examiner, Art Unit 2692
January 6, 2026