Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 11, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Tandecki (PG Pub. No. 2021/0319785 A1), and further in view of Bhatacharjee (PG Pub. No. 2017/0177743 A1).
Regarding Claim 1, Tandecki discloses an electronic device for record linkage, comprising:
a memory storing one or more instructions (see Tandecki, Fig. 8, for memory 750); and
a processor (see Tandecki, Fig. 8, for processor 710) configured to execute the one or more instructions to;
generate one or more vectors of one or more strings from a reference database based on a modified Levenshtein distance (see Tandecki, paragraph [0022], where at step 204, pairs of training vectors are created for multiple pairs of the encoded training words using the CNN and a twin of the CNN; the pairs can be randomly selected from the training dictionaries to train for large word edit distances, and pairs can be created by adding noise to a word to train for small word edit distances; at step 206, a Similarity Metric (SM) is calculated for each of the multiple pairs of the plurality of training words; the SM can be calculated based on an Edit Distance (ED) (e.g., Levenshtein ED)); and
generate a vector database for spelling similarity based on the one or more vectors (see Tandecki, paragraph [0074], where the system for classifying words in a batch of words can … create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word [it is the position of the Examiner that the statement ‘for spelling similarity’ constitutes an intended use]).
Tandecki does not disclose:
wherein the modified Levenshtein distance is based on one or more parameters, comprising at least one of: a first number of insertions, a second number of deletions, a third number of replacements, or a fourth number of matches; and
wherein one or more of the one or more parameters comprise a predefined weight.
Bhattacharjee discloses:
wherein the modified Levenshtein distance is based on one or more parameters, comprising at least one of: a first number of insertions, a second number of deletions, a third number of replacements, or a fourth number of matches (see Bhattacharjee, paragraph [0042], where the Levenshtein distance is a string metric for measuring the minimum number of single-character edits required to change one textblock into the other textblock); and
wherein one or more of the one or more parameters comprise a predefined weight (see Bhattacharjee, paragraph [0003], where searching may also include combining weighted results of the approximate string-match with weighted results of the exact string match to generate match scores for each of the function signatures [it is the position of the Examiner that weighing Levenshtein single-character edits with exact character matches is not patentably distinguishable from weighting one or more parameters]).
Both Tandecki and Bhattacharjee disclose Levinshtein edit distance. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to apply the definition of Leveinshtein edit distance in Bhattacharjee to the Levenshtein edit distance implemented in Tandecki as the definition in Bhattacharjee provides technical enablement to Tandecki in a manner that does not alter the principle of operation and thus constitutes simple substitution of one prior art element for another to yield predictable results (see MPEP 2143(I)(B)).
Regarding Claim 2, Tandecki in view of Bhattacharjee discloses the electronic device according to Claim 1, wherein:
Tandecki does not disclose:
the modified Levenshtein distance is based on a plurality of parameters comprising a plurality of predefined weights; and
wherein one or more of the plurality of predefined weights are different.
Bhattacharjee discloses:
the modified Levenshtein distance is based on a plurality of parameters comprising a plurality of predefined weights (see Bhattacharjee, paragraph [0003], where searching may also include combining weighted results of the approximate string-match with weighted results of the exact string match to generate match scores for each of the function signatures [it is the position of the Examiner that weighing Levenshtein single-character edits with exact character matches is not patentably distinguishable from weighting one or more parameters]); and
wherein one or more of the plurality of predefined weights are different (see Bhattacharjee, paragraph [0040], where weight for each score can be a ratio the determines how much each score contributes to the overall combined score; by way of example, some embodiments can multiply one of the two scores by a weight (W), and can multiply the other score by (1-W)).
Both Tandecki and Bhattacharjee disclose Levinshtein edit distance. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to apply the definition of Leveinshtein edit distance in Bhattacharjee to the Levenshtein edit distance implemented in Tandecki as the definition in Bhattacharjee provides technical enablement to Tandecki in a manner that does not alter the principle of operation and thus constitutes simple substitution of one prior art element for another to yield predictable results (see MPEP 2143(I)(B)).
Regarding Claim 11, Tandecki discloses a method for record linkage, comprising:
generate one or more vectors of one or more strings from a reference database based on a modified Levenshtein distance (see Tandecki, paragraph [0022], where at step 204, pairs of training vectors are created for multiple pairs of the encoded training words using the CNN and a twin of the CNN; the pairs can be randomly selected from the training dictionaries to train for large word edit distances, and pairs can be created by adding noise to a word to train for small word edit distances; at step 206, a Similarity Metric (SM) is calculated for each of the multiple pairs of the plurality of training words; the SM can be calculated based on an Edit Distance (ED) (e.g., Levenshtein ED)); and
generate a vector database for spelling similarity based on the one or more vectors (see Tandecki, paragraph [0074], where the system for classifying words in a batch of words can … create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word [it is the position of the Examiner that the statement ‘for spelling similarity’ constitutes an intended use]).
Tandecki does not disclose:
wherein the modified Levenshtein distance is based on one or more parameters, comprising at least one of: a first number of insertions, a second number of deletions, a third number of replacements, or a fourth number of matches; and
wherein one or more of the one or more parameters comprise a predefined weight.
Bhattacharjee discloses:
wherein the modified Levenshtein distance is based on one or more parameters, comprising at least one of: a first number of insertions, a second number of deletions, a third number of replacements, or a fourth number of matches (see Bhattacharjee, paragraph [0042], where the Levenshtein distance is a string metric for measuring the minimum number of single-character edits required to change one textblock into the other textblock); and
wherein one or more of the one or more parameters comprise a predefined weight (see Bhattacharjee, paragraph [0003], where searching may also include combining weighted results of the approximate string-match with weighted results of the exact string match to generate match scores for each of the function signatures [it is the position of the Examiner that weighing Levenshtein single-character edits with exact character matches is not patentably distinguishable from weighting one or more parameters]).
Both Tandecki and Bhattacharjee disclose Levinshtein edit distance. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to apply the definition of Leveinshtein edit distance in Bhattacharjee to the Levenshtein edit distance implemented in Tandecki as the definition in Bhattacharjee provides technical enablement to Tandecki in a manner that does not alter the principle of operation and thus constitutes simple substitution of one prior art element for another to yield predictable results (see MPEP 2143(I)(B)).
Regarding Claim 12, Tandecki discloses a non-transitory computer readable medium, for record linkage, containing computer program code configured to cause a processor to:
generate one or more vectors of one or more strings from a reference database based on a modified Levenshtein distance (see Tandecki, paragraph [0022], where at step 204, pairs of training vectors are created for multiple pairs of the encoded training words using the CNN and a twin of the CNN; the pairs can be randomly selected from the training dictionaries to train for large word edit distances, and pairs can be created by adding noise to a word to train for small word edit distances; at step 206, a Similarity Metric (SM) is calculated for each of the multiple pairs of the plurality of training words; the SM can be calculated based on an Edit Distance (ED) (e.g., Levenshtein ED)); and
generate a vector database for spelling similarity based on the one or more vectors (see Tandecki, paragraph [0074], where the system for classifying words in a batch of words can … create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word [it is the position of the Examiner that the statement ‘for spelling similarity’ constitutes an intended use]).
Tandecki does not disclose:
wherein the modified Levenshtein distance is based on one or more parameters, comprising at least one of: a first number of insertions, a second number of deletions, a third number of replacements, or a fourth number of matches; and
wherein one or more of the one or more parameters comprise a predefined weight.
Bhattacharjee discloses:
wherein the modified Levenshtein distance is based on one or more parameters, comprising at least one of: a first number of insertions, a second number of deletions, a third number of replacements, or a fourth number of matches (see Bhattacharjee, paragraph [0042], where the Levenshtein distance is a string metric for measuring the minimum number of single-character edits required to change one textblock into the other textblock); and
wherein one or more of the one or more parameters comprise a predefined weight (see Bhattacharjee, paragraph [0003], where searching may also include combining weighted results of the approximate string-match with weighted results of the exact string match to generate match scores for each of the function signatures [it is the position of the Examiner that weighing Levenshtein single-character edits with exact character matches is not patentably distinguishable from weighting one or more parameters]).
Both Tandecki and Bhattacharjee disclose Levinshtein edit distance. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to apply the definition of Leveinshtein edit distance in Bhattacharjee to the Levenshtein edit distance implemented in Tandecki as the definition in Bhattacharjee provides technical enablement to Tandecki in a manner that does not alter the principle of operation and thus constitutes simple substitution of one prior art element for another to yield predictable results (see MPEP 2143(I)(B)).
Claim 3, 5, and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Tandecki and Bhattacharjee as applied to Claims 1, 2, 11, and 12 above, and further in view of Gil (US Patent No. 11,694,276 B1).
Regarding Claim 3, Tandecki in view of Bhattacharjee discloses the electronic device according to Claim 1, further comprising:
a storage (see Tandecki, Fig. 8, for data memory 770), wherein the electronic device is further configured to execute the one or more instructions to:
Tandecki does not disclose:
receive a user input of one or more candidate records;
perform one or more vector searches of the candidate records against the vector database; and
write a result of the one or more vector searches to the storage.
Gil discloses:
receive a user input of one or more candidate records (see Gil, Claim 11, where the method comprises … an entered record submitted to be matched with a dataset record on a data storage);
perform one or more vector searches of the candidate records against the vector database (see Gil, column 7, lines 49-55, where the Fellegi-Sunter algorithm compares the similarity of two records; this comparison is done on a field by field basis (aka level by level), calculating the probability that the field matches and a probability that the field does not match; the probabilities are then summed to determine a match score; Fellegi and Sunter algorithm considers the binary comparison vector); and
write a result of the one or more vector searches to the storage (see Gil, Claim 11, where the method comprises … saving a location of the one of the dataset records as a matching record if the score is above a previous highest score).
Both Tandecki and Gil are directed to record association and linking. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the Leveinshtein based record association technique in Tandecki with the Fellegi-Sunter based record linking technique in Gil as they are directed to the same field or endeavor and their operations do not interfere with each other, thus their combination is predictable (see MPEP 2143(I)(B)).
Regarding Claim 5, Tandecki in view of Bhattacharjee and Gil discloses the electronic device according to Claim 3, wherein:
Tandecki does not explicitly disclose the result is based on a similarity search. Gil discloses the result is based on a similarity search (see Gil, column 7, lines 49-55, where the Fellegi-Sunter algorithm compares the similarity of two records; this comparison is done on a field by field basis (aka level by level), calculating the probability that the field matches and a probability that the field does not match; the probabilities are then summed to determine a match score; Fellegi and Sunter algorithm considers the binary comparison vector).
Both Tandecki and Gil are directed to record association and linking. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the Leveinshtein based record association technique in Tandecki with the Fellegi-Sunter based record linking technique in Gil as they are directed to the same field or endeavor and their operations do not interfere with each other, thus their combination is predictable (see MPEP 2143(I)(B)).
Regarding Claim 6, Tandecki in view of Bhattacharjee and Gil discloses the electronic device according to Claim 5, wherein:
Tandecki does not disclose the similarity search is a Fellegi Sunter comparison. Gil discloses the similarity search is a Fellegi Sunter comparison (see Gil, column 7, lines 49-55, where the Fellegi-Sunter algorithm compares the similarity of two records; this comparison is done on a field by field basis (aka level by level), calculating the probability that the field matches and a probability that the field does not match; the probabilities are then summed to determine a match score; Fellegi and Sunter algorithm considers the binary comparison vector).
Both Tandecki and Gil are directed to record association and linking. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the Leveinshtein based record association technique in Tandecki with the Fellegi-Sunter based record linking technique in Gil as they are directed to the same field or endeavor and their operations do not interfere with each other, thus their combination is predictable (see MPEP 2143(I)(B)).
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Tandecki, Bhattacharjee, and Gil as applied to Claims 3, 5, and 6 above, and further in view of Chen (PG Pub. No. 2015/0058019 A1).
Regarding Claim 4, Tandecki in view of Bhattacharjee and Gil discloses the electronic device according to Claim 3, wherein:
Tandecki does not disclose the electronic device is further configured to execute the one or more instructions to display a visualization plotting the results of the search on a display. Chen discloses the electronic device is further configured to execute the one or more instructions to display a visualization plotting the results of the search on a display (see Chen, paragraph [0215], where Fig. 15 shows a plot useful for visualizing how the speaker voices and expressions are related; the plot of Fig. 15 is shown in 3 dimensions but can be extended to higher dimension orders).
Gil discloses vector-based similarity searching, but does not explicitly disclose displaying the results of the vector space similarity search. Chen discloses visualization of a vector space similarity search. Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Gil with Chen as addition of a visual plot provides a benefit known in the art that does not interfere in the operation of Gil and thus provides a predictable benefit to Gil (see MPEP 2143(I)(C)).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Tandecki, Bhattacharjee, and Gil as applied to Claims 3, 5, and 6 above, and further in view of Chowdhary (PG Pub. No. 2025/0278407 A1).
Regarding Claim 7, Tandecki in view of Bhattacharjee and Gil discloses the electronic device according to Claim 3, wherein:
Tandecki does not disclose performing one or more vector searches includes using a specialized vector search database. Chowdhary discloses performing one or more vector searches includes using a specialized vector search database (see Chowdhary, paragraph [0028], where examples of vector stores include Pgvector, Pinecone, Qdrant, and other extant variations).
Gil discloses vector-based similarity searching, but does not explicitly disclose storage of vectors in a special purpose vector database. Chowdhary discloses a special purpose vector database. Accordingly, it would have been obvious to one of ordinary skill in the art to apply the special purpose vector database in Chowdhary to the vector-based similarity search system of Gil as this provides a known benefit to that does not interfere with the operations of Gil and thus constitutes simple substation of one known element for another to obtain predictable results (see MPEP 2143(I)(B)).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Tandecki, Bhattacharjee, and Gil as applied to Claims 3, 5, and 6 above, and further in view of Nurvitadhi (PG Pub. No. 2018/0189675 A1).
Regarding Claim 8, Tandecki in view of Bhattacharjee and Gil discloses the electronic device according to Claim 3, wherein:
Tandecki does not disclose the electronic device further comprises a vector accelerator, and wherein the one or more vector searches is performed using the vector accelerator. Nurvitadhi discloses the electronic device further comprises a vector accelerator, and wherein the one or more vector searches is performed using the vector accelerator (see Nurvitadhi, paragraph [0093], where web-scale k-means clustering algorithms typically utilize matrix and vector operations (as well as other operations), some embodiments use a matrix/vector accelerator architecture 100).
Gil discloses vector-based similarity searching, but does not explicitly disclose a vector accelerator. Nurvitadhi discloses a vector accelerator. Accordingly, it would have been obvious to one of ordinary skill in the art to apply the a vector accelerator in Nurvitadhi to the vector-based similarity search system of Gil as this provides a known benefit to that does not interfere with the operations of Gil and thus constitutes simple substation of one known element for another to obtain predictable results (see MPEP 2143(I)(B)).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Tandecki and Bhattacharjee as applied to Claims 1, 2, 11, and 12 above, and further in view of Hong (PG Pub. No. 2016/0027437 A1).
Regarding Claim 9, Tandecki in view of Bhattacharjee discloses the electronic device according to Claim 1, wherein:
Tandecki does not disclose the one or more vectors are multidimensional. Hong discloses the one or more vectors are multidimensional (see Hong, paragraph [0070], where the speech recognition apparatus may calculate a phonetic distance between words based on a distance calculation method that is modified from Levenshtein distance; see also paragraph [0074], where the speech recognition apparatus may apply a multidimensional scaling (MDS) method to the inter-word distance matrix and may arrange, at one point on an N-dimensional embedding space, an embedding vector to which each word is mapped).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to substitute the single-dimensional vectors of Tandecki with multidimensional vectors of Hong as they are both well known implementations of vectors and thus constitute simple substitution of one known element for another to obtain predictable results (see MPEP 2143(I)(B)).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Tandecki and Bhattacharjee as applied to Claims 1, 2, 11, and 12 above, and further in view of Wu (US Patent No. 9,740,858 B1).
Regarding Claim 10, Tandecki in view of Bhattacharjee discloses the electronic device according to Claim 1, wherein:
Tandecki does not disclose the modified Levenshtein distance is calculated based on one or more fixed strings. Wu discloses the modified Levenshtein distance is calculated based on one or more fixed strings (see Wu, column 7, lines 22-25, where the base ratio of the target string and the reference string is determined (step 402); in one embodiment, the base ratio of the target and reference strings is calculated using the Levenshtein algorithm [it is the position of the Examiner that a reference string is not patentably distinguishable from a fixed string]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to substitute the pairwise based vector comparison in Tandecki with the reference vector based vector comparison in Wu for the benefit of comparing known data to incoming data to identify suspicious data (see Wu, Abstract).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARHAD AGHARAHIMI whose telephone number is (571)272-9864. The examiner can normally be reached M-F 9am - 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached at 571-272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FARHAD AGHARAHIMI/Examiner, Art Unit 2161
/APU M MOFIZ/Supervisory Patent Examiner, Art Unit 2161