DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination - 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17[e], was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17[e] has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR1.114. The applicant’s submission for RCE filed on 9 January 2026 has been entered.
Remarks
This action is in response to the applicant’s RCE filed 9 January 2026, which is in response to the USPTO office action mailed 9 October 2025. Claims 1, 9, 12, 15 and 16 are amended. Claims 1-20 are currently pending.
Response to Arguments
With respect to the 35 USC §103 rejections of claims 1-20, the applicant’s arguments are moot in view of a new grounds of rejection, as necessitated by the applicant's amendments.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5, 6, 12, 13 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al., US 2020/0250197 A1 (hereinafter “Yang”) in view of Tamayo-Rios et al., US 2020/0364243 A1 (hereinafter “Tamayo-Rios”) in further view of Patel et al., US 2022/0101182 A1 (hereinafter “Patel”).
Claim 1: Yang teaches a record-matching computing system comprising:
a processing device (Yang, [Fig. 6], [0066] note a processor);
a data repository for storing data records regarding entities, wherein each data record comprises a numerical identifier (Yang, [Fig. 1] note Databases 130, [0014] note Numerous historical records have been digitized, indexed, and placed online in various databases. Record data in databases may include birth records, death records, marriage records, adoption records, census records, obituary records, etc); and
a non-transitory computer-readable storage medium having program code executable by the processing device to perform operations comprising (Yang, [Fig. 6] note 616, [0074] note storage unit 616 includes a computer-readable medium 622 on which is stored instructions 624 embodying any one or more of the methodologies or functions described herein):
receiving a query record comprising a first value of the numerical identifier (Yang, [0016] note client device 110 receives user queries via the user interface… user queries may be formatted to include multiple character strings provided via the one or more search fields. For example, one search field is configured to input a date range while another search filed is configured to input a family name (also referred to as a last name)); and
searching the data records for a record matching the query record, the searching comprising (Yang, [0017] note search system 120 searches for records based on user queries received from the client device 110):
retrieving a reference record from the data records, the reference record comprising a second value of the numerical identifier (Yang, [0019] note record retrieval module 150 searches for and retrieves records using the specified characteristics and/or the expanded characteristics);
generating matching attributes for the query record and the reference record, wherein the matching attributes comprise one or more of (Yang, [0018] note query processing module 140 processes a user query to generate an enhanced query including one or more expanded characteristics… query processing module 140 may also expand the user query to create multiple expanded characteristics that are derived from specified characteristics in the user query):
a numerical identifier score measuring a degree of matching between the first value of the numerical identifier and the second value of the numerical identifier, a name identifier score measuring a degree of matching between a query name for a name identifier of the query record and a reference name for the name identifier of the reference record, an address identifier score measuring a degree of matching between a first address for an address identifier in the query record and a second address for an address identifier in the reference record, a date identifier score measuring a degree of matching between a first value of a name identifier in the query record and a second value of the name identifier in the reference record, or a compound score generated based on two or more of the numerical identifier score, the address identifier score, the address identifier score, and the date identifier score (Yang, [0044] note User queries may generally be noisy due to typos or misinformation. Common errors, for example, for a user query to search for a year or two off of the actual year of a birth date, or to mistake a middle name for a first name, [0045] note Because of the noise in both the queries and content, a certain amount of fuzziness must be allowed to achieve acceptable recall… fuzziness expansion can include a calculation of an edit distance, or a number of edits Some example search clauses include names, places, and dates, [0046], [0047]; i.e. the examiner interprets an edit distance reads on a degree of matching);
determining, using a machine learning model, a match classification for the reference record and the query record based on the matching attributes (Yang, [0020] note record ranking module 160 ranks records retrieved from the various databases 130. The record ranking module 160 may use a machine learning model to rank the records across different search results. The machine learning model is trained to assign a weight to the search result returned from each of the databases based on the characteristics (specified and/or expanded) that are indicated in the user query, [0048]-[0051] note evaluation metrics for ranking),
returning the reference record as a match to the query record based on the match classification indicating the match (Yang, [0020] note machine learning model is trained to assign a weight to the search result returned from each of the databases based on the characteristics (specified and/or expanded) that are indicated in the user query, [0024] note the machine learning model used to rank and combine records for a particular user query, [0028] note combined results 265 are returned to the client device 110).
Yang does not explicitly teach wherein the machine learning model is trained by a training process comprising: training the machine learning model using a plurality of training samples comprising matching labels indicating a match or a no-match between pairs of data records; identifying a subset of the plurality of training samples as misclassified training samples based on the subset of the plurality of training samples differing from predicted classifications output by the machine learning model; correcting the matching labels of the subset of the plurality of the matching labels of the misclassified training samples; and re-training the machine learning model using the plurality of training samples with the corrected matching labels.
However, Tamayo-Rios teaches wherein the machine learning model is trained by a training process comprising: training the machine learning model using a plurality of training samples comprising matching labels indicating a match or a no-match between pairs of data records (Tamayo-Rios, [0018] note Deep neural networks are a class of machine learning algorithm, [0019] note linking records from different databases using a deep learning model, [0032] note retraining may further using the plurality of generated negative vectors associated with the one of the records flagged as not matching other recommended records to improve performance of the trained model for similarity scoring, [0041] note training data generated by the simulator may be adjusted to have a predetermined proportion of matching and non-matching records);
identifying a subset of the plurality of training samples as misclassified training samples based on the subset of the plurality of training samples differing from predicted classifications output by the machine learning model (Tamayo-Rios, [0032] note When the selection indicates that the one of the records does not match the requested record, the modifying the weights based on the received selection may include re-training the trained model using just the one of the records having the same cluster identifier flagged as not matching the requested record);
re-training the machine learning model using the plurality of training samples (Tamayo-Rios, [0032] note re-training the trained model using just the one of the records having the same cluster identifier flagged as not matching the requested record, [0043] note The score-outputting model may be optimized by modifying feature weights applied to differences between extracted features of the records of the sub-group and the extracted features of the sample record to minimize an error function between determined similarity scores and ground truths of the training data at step 525).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model of Yang with the model re-training of Tamayo-Rios according to known methods (i.e. re-training the machine learning model based on matching and non-matching records). Motivation for doing so is that this may improve performance of the trained mode (Tamayo-Rios, [0032]).
Yang and Tamayo-Rios do not explicitly teach correcting the matching labels of the subset of the plurality of the matching labels of the misclassified training samples; and with the corrected matching labels.
However, Patel teaches this (Patel, [0020] note FIG. 1 illustrates a method for assessing the quality of a dataset, used in the building or training of a machine-learning model, across multiple attributes of the dataset and providing recommendations and explanations for attributes that have low quality scores. At 101, the system obtains a dataset that is intended to be used in building or training a machine-learning model, [0033] note if a data point has an incorrect label, then the data point will cause the quality score with respect to the label purity attribute to be lower. Thus, removing that data point or correcting the data label will causing an increase in the quality score, [0037] note in making a recommendation for remediating the noisy and/or confusing points the system may recommend a different feature extract, addition of features that will assist in differentiating between class samples, recommend correct labels for mislabeled points, or the like, [0038] note The user may also manually correct or modify some of the data points or data labels in order to increase the data quality score. Once any recommendations or modifications are integrated into the dataset, the dataset may then be employed for use in building a machine-learning model).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model re-training of Yang and Tamayo-Rios with the label correction of Patel according to known methods (i.e. re-training the machine learning model based on corrected labels). Motivation for doing so is that correcting the data label will causing an increase in the quality (Patel, [0033]).
Claim 2: Yang, Tamayo-Rios and Patel teach the record-matching computing system of claim 1, wherein the matching attributes further comprise an address attribute generated based on a geographical distance between the first address and the second address (Yang, [0018] note a specified location may also be expanded geographically to one or more other regions in proximity of the specified location).
Claim 3: Yang, Tamayo-Rios and Patel teach the record-matching computing system of claim 2, wherein the matching attributes further comprise an address frequency attribute indicating a number of records in the data records having a same address as the second address (Yang, [0014] note an address database, [0018] note query processing module 140 identifies specified characteristics in the user query. This may be done for example by identifying different types of information (names, dates, locations, etc.) included in the user query, [0019] note record retrieval module 150 searches for and retrieves records using the specified characteristics and/or the expanded characteristics).
Claim 5: Yang, Tamayo-Rios and Patel teach the record-matching computing system of claim 1, wherein the matching attributes further comprise a name frequency attribute indicating a frequency of a last name in the reference name (Yang, [0016] note another search filed is configured to input a family name (also referred to as a last name), [0018] note query processing module 140 identifies specified characteristics in the user query. This may be done for example by identifying different types of information (names, dates, locations, etc.) included in the user query, [0019] note record retrieval module 150 searches for and retrieves records using the specified characteristics and/or the expanded characteristics).
Claim 6: Yang, Tamayo-Rios and Patel teach the record-matching computing system of claim 1, wherein the numerical identifier score is generated based on one or more of a keyboard distance between mismatching digits of the first value and the second value of the numerical identifier or a probability distribution of errors over digits of the numerical identifier (Yang, [0044] note User queries may generally be noisy due to typos or misinformation. Common errors, for example, for a user query to search for a year or two off of the actual year of a birth date, or to mistake a middle name for a first name, [0045] note Because of the noise in both the queries and content, a certain amount of fuzziness must be allowed to achieve acceptable recall… fuzziness expansion can include a calculation of an edit distance, or a number of edits Some example search clauses include names, places, and dates, [0046], [0047]).
Claim 12: Yang teaches a method that includes one or more processing devices performing operations comprising:
receiving a query record comprising a first value of a numerical identifier (Yang, [0016] note client device 110 receives user queries via the user interface… user queries may be formatted to include multiple character strings provided via the one or more search fields. For example, one search field is configured to input a date range while another search filed is configured to input a family name (also referred to as a last name)); and
searching a set of data records for a record matching the query record, the searching comprising (Yang, [0017] note search system 120 searches for records based on user queries received from the client device 110):
retrieving a reference record from the set of data records, the reference record comprising a second value of the numerical identifier (Yang, [0019] note record retrieval module 150 searches for and retrieves records using the specified characteristics and/or the expanded characteristics);
generating matching attributes for the query record and the reference record, wherein the matching attributes comprise one or more of (Yang, [0018] note query processing module 140 processes a user query to generate an enhanced query including one or more expanded characteristics… query processing module 140 may also expand the user query to create multiple expanded characteristics that are derived from specified characteristics in the user query):
a numerical identifier score measuring a degree of matching between the first value of the numerical identifier and the second value of the numerical identifier, a name identifier score measuring a degree of matching between a query name for a name identifier of the query record and a reference for the name identifier of the reference record, an address identifier score measuring a degree of matching between a first address for an address identifier in the query record and a second address for an address identifier in the reference record, a date identifier score measuring a degree of matching between a first value of a date identifier in the query record and a second value of the date identifier in the reference record, or a compound score generated based on two or more of the numerical identifier score, the address identifier score, the address identifier score, and the date identifier score (Yang, [0044] note User queries may generally be noisy due to typos or misinformation. Common errors, for example, for a user query to search for a year or two off of the actual year of a birth date, or to mistake a middle name for a first name, [0045] note Because of the noise in both the queries and content, a certain amount of fuzziness must be allowed to achieve acceptable recall… fuzziness expansion can include a calculation of an edit distance, or a number of edits Some example search clauses include names, places, and dates, [0046], [0047]; i.e. the examiner interprets an edit distance reads on a degree of matching);
determining, using a machine learning model, a match classification for the reference record and the query record based on the matching attributes (Yang, [0020] note record ranking module 160 ranks records retrieved from the various databases 130. The record ranking module 160 may use a machine learning model to rank the records across different search results. The machine learning model is trained to assign a weight to the search result returned from each of the databases based on the characteristics (specified and/or expanded) that are indicated in the user query, [0048]-[0051] note evaluation metrics for ranking); and
returning the reference record as a match to the query record based on the match classification indicating the match (Yang, [0020] note machine learning model is trained to assign a weight to the search result returned from each of the databases based on the characteristics (specified and/or expanded) that are indicated in the user query, [0024] note the machine learning model used to rank and combine records for a particular user query, [0028] note combined results 265 are returned to the client device 110).
Yang does not explicitly teach wherein the machine learning model is trained by a training process comprising: training the machine learning model using a plurality of training samples comprising matching labels indicating a match or a no-match between pairs of data records; identifying a subset of the plurality of training samples as misclassified training samples based on the subset of the plurality of training samples differing from predicted classifications output by the machine learning model; correcting the matching labels of the subset of the plurality of the matching labels of the misclassified training samples; and re-training the machine learning model using the plurality of training samples with the corrected matching labels.
However, Tamayo-Rios teaches wherein the machine learning model is trained by a training process comprising: training the machine learning model using a plurality of training samples comprising matching labels indicating a match or a no-match between pairs of data records (Tamayo-Rios, [0018] note Deep neural networks are a class of machine learning algorithm, [0019] note linking records from different databases using a deep learning model, [0032] note retraining may further using the plurality of generated negative vectors associated with the one of the records flagged as not matching other recommended records to improve performance of the trained model for similarity scoring, [0041] note training data generated by the simulator may be adjusted to have a predetermined proportion of matching and non-matching records);
identifying a subset of the plurality of training samples as misclassified training samples based on the subset of the plurality of training samples differing from predicted classifications output by the machine learning model (Tamayo-Rios, [0032] note When the selection indicates that the one of the records does not match the requested record, the modifying the weights based on the received selection may include re-training the trained model using just the one of the records having the same cluster identifier flagged as not matching the requested record);
re-training the machine learning model using the plurality of training samples (Tamayo-Rios, [0032] note re-training the trained model using just the one of the records having the same cluster identifier flagged as not matching the requested record, [0043] note The score-outputting model may be optimized by modifying feature weights applied to differences between extracted features of the records of the sub-group and the extracted features of the sample record to minimize an error function between determined similarity scores and ground truths of the training data at step 525).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model of Yang with the model re-training of Tamayo-Rios according to known methods (i.e. re-training the machine learning model based on matching and non-matching records). Motivation for doing so is that this may improve performance of the trained mode (Tamayo-Rios, [0032]).
Yang and Tamayo-Rios do not explicitly teach correcting the matching labels of the subset of the plurality of the matching labels of the misclassified training samples; and with the corrected matching labels.
However, Patel teaches this (Patel, [0020] note FIG. 1 illustrates a method for assessing the quality of a dataset, used in the building or training of a machine-learning model, across multiple attributes of the dataset and providing recommendations and explanations for attributes that have low quality scores. At 101, the system obtains a dataset that is intended to be used in building or training a machine-learning model, [0033] note if a data point has an incorrect label, then the data point will cause the quality score with respect to the label purity attribute to be lower. Thus, removing that data point or correcting the data label will causing an increase in the quality score, [0037] note in making a recommendation for remediating the noisy and/or confusing points the system may recommend a different feature extract, addition of features that will assist in differentiating between class samples, recommend correct labels for mislabeled points, or the like, [0038] note The user may also manually correct or modify some of the data points or data labels in order to increase the data quality score. Once any recommendations or modifications are integrated into the dataset, the dataset may then be employed for use in building a machine-learning model).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model re-training of Yang and Tamayo-Rios with the label correction of Patel according to known methods (i.e. re-training the machine learning model based on corrected labels). Motivation for doing so is that correcting the data label will causing an increase in the quality (Patel, [0033]).
Claim 13: Yang, Tamayo-Rios and Patel teach the method of claim 12, wherein the numerical identifier score is generated based on one or more of a keyboard distance between mismatching digits of the first value and the second value of the numerical identifier or a probability distribution of errors over digits of the numerical identifier (Yang, [0044] note User queries may generally be noisy due to typos or misinformation. Common errors, for example, for a user query to search for a year or two off of the actual year of a birth date, or to mistake a middle name for a first name, [0045] note Because of the noise in both the queries and content, a certain amount of fuzziness must be allowed to achieve acceptable recall… fuzziness expansion can include a calculation of an edit distance, or a number of edits Some example search clauses include names, places, and dates, [0046], [0047]).
Claim 16: Yang teaches a non-transitory computer-readable storage medium having program code executable by a processing device to perform operations comprising:
receiving a query record comprising a first value of a numerical identifier (Yang, [0016] note client device 110 receives user queries via the user interface… user queries may be formatted to include multiple character strings provided via the one or more search fields. For example, one search field is configured to input a date range while another search filed is configured to input a family name (also referred to as a last name)); and
searching a set of data records for a record matching the query record, the searching comprising (Yang, [0017] note search system 120 searches for records based on user queries received from the client device 110):
retrieving a reference record from the set of data records, the reference record comprising a second value of the numerical identifier (Yang, [0019] note record retrieval module 150 searches for and retrieves records using the specified characteristics and/or the expanded characteristics);
generating matching attributes for the query record and the reference record, wherein the matching attributes comprise one or more of (Yang, [0018] note query processing module 140 processes a user query to generate an enhanced query including one or more expanded characteristics… query processing module 140 may also expand the user query to create multiple expanded characteristics that are derived from specified characteristics in the user query):
a numerical identifier score measuring a degree of matching between the first value of the numerical identifier and the second value of the numerical identifier, a name identifier score measuring a degree of matching between a query name for a name identifier of the query record and a reference name for the name identifier of the reference record, an address identifier score measuring a degree of matching between a first address for an address identifier in the query record and a second address for an address identifier in the reference record, a date identifier score measuring a degree of matching between a first value of a date identifier in the query record and a second value of the date identifier in the reference record, or a compound score generated based on two or more of the numerical identifier score, the address identifier score, the address identifier score, and the date identifier score (Yang, [0044] note User queries may generally be noisy due to typos or misinformation. Common errors, for example, for a user query to search for a year or two off of the actual year of a birth date, or to mistake a middle name for a first name, [0045] note Because of the noise in both the queries and content, a certain amount of fuzziness must be allowed to achieve acceptable recall… fuzziness expansion can include a calculation of an edit distance, or a number of edits Some example search clauses include names, places, and dates, [0046], [0047]; i.e. the examiner interprets an edit distance reads on a degree of matching);
determining, using a machine learning model, a match classification for the reference record and the query record based on the matching attributes (Yang, [0020] note record ranking module 160 ranks records retrieved from the various databases 130. The record ranking module 160 may use a machine learning model to rank the records across different search results. The machine learning model is trained to assign a weight to the search result returned from each of the databases based on the characteristics (specified and/or expanded) that are indicated in the user query, [0048]-[0051] note evaluation metrics for ranking); and
returning the reference record as a match to the query record based on the match classification indicating the match (Yang, [0020] note machine learning model is trained to assign a weight to the search result returned from each of the databases based on the characteristics (specified and/or expanded) that are indicated in the user query, [0024] note the machine learning model used to rank and combine records for a particular user query, [0028] note combined results 265 are returned to the client device 110).
Yang does not explicitly teach wherein the machine learning model is trained by a training process comprising: training the machine learning model using a plurality of training samples comprising matching labels indicating a match or a no-match between pairs of data records; identifying a subset of the plurality of training samples as misclassified training samples based on the subset of the plurality of training samples differing from predicted classifications output by the machine learning model; correcting the matching labels of the subset of the plurality of the matching labels of the misclassified training samples; and re-training the machine learning model using the plurality of training samples with the corrected matching labels.
However, Tamayo-Rios teaches wherein the machine learning model is trained by a training process comprising: training the machine learning model using a plurality of training samples comprising matching labels indicating a match or a no-match between pairs of data records (Tamayo-Rios, [0018] note Deep neural networks are a class of machine learning algorithm, [0019] note linking records from different databases using a deep learning model, [0032] note retraining may further using the plurality of generated negative vectors associated with the one of the records flagged as not matching other recommended records to improve performance of the trained model for similarity scoring, [0041] note training data generated by the simulator may be adjusted to have a predetermined proportion of matching and non-matching records);
identifying a subset of the plurality of training samples as misclassified training samples based on the subset of the plurality of training samples differing from predicted classifications output by the machine learning model (Tamayo-Rios, [0032] note When the selection indicates that the one of the records does not match the requested record, the modifying the weights based on the received selection may include re-training the trained model using just the one of the records having the same cluster identifier flagged as not matching the requested record);
re-training the machine learning model using the plurality of training samples (Tamayo-Rios, [0032] note re-training the trained model using just the one of the records having the same cluster identifier flagged as not matching the requested record, [0043] note The score-outputting model may be optimized by modifying feature weights applied to differences between extracted features of the records of the sub-group and the extracted features of the sample record to minimize an error function between determined similarity scores and ground truths of the training data at step 525).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model of Yang with the model re-training of Tamayo-Rios according to known methods (i.e. re-training the machine learning model based on matching and non-matching records). Motivation for doing so is that this may improve performance of the trained mode (Tamayo-Rios, [0032]).
Yang and Tamayo-Rios do not explicitly teach correcting the matching labels of the subset of the plurality of the matching labels of the misclassified training samples; and with the corrected matching labels.
However, Patel teaches this (Patel, [0020] note FIG. 1 illustrates a method for assessing the quality of a dataset, used in the building or training of a machine-learning model, across multiple attributes of the dataset and providing recommendations and explanations for attributes that have low quality scores. At 101, the system obtains a dataset that is intended to be used in building or training a machine-learning model, [0033] note if a data point has an incorrect label, then the data point will cause the quality score with respect to the label purity attribute to be lower. Thus, removing that data point or correcting the data label will causing an increase in the quality score, [0037] note in making a recommendation for remediating the noisy and/or confusing points the system may recommend a different feature extract, addition of features that will assist in differentiating between class samples, recommend correct labels for mislabeled points, or the like, [0038] note The user may also manually correct or modify some of the data points or data labels in order to increase the data quality score. Once any recommendations or modifications are integrated into the dataset, the dataset may then be employed for use in building a machine-learning model).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model re-training of Yang and Tamayo-Rios with the label correction of Patel according to known methods (i.e. re-training the machine learning model based on corrected labels). Motivation for doing so is that correcting the data label will causing an increase in the quality (Patel, [0033]).
Claim 17: Yang, Tamayo-Rios and Patel teach the non-transitory computer-readable storage medium of claim 16, wherein the matching attributes further comprise an address attribute generated based on a geographical distance between the first address and the second address (Yang, [0018] note a specified location may also be expanded geographically to one or more other regions in proximity of the specified location).
Claim 18: Yang, Tamayo-Rios and Patel teach the non-transitory computer-readable storage medium of claim 16, wherein the matching attributes further comprise a name frequency attribute indicating a frequency of a last name in the reference name (Yang, [0016] note another search filed is configured to input a family name (also referred to as a last name), [0018] note query processing module 140 identifies specified characteristics in the user query. This may be done for example by identifying different types of information (names, dates, locations, etc.) included in the user query, [0019] note record retrieval module 150 searches for and retrieves records using the specified characteristics and/or the expanded characteristics).
Claim 19: Yang, Tamayo-Rios and Patel teach the non-transitory computer-readable storage medium of claim 16, wherein the numerical identifier score is generated based on one or more of a keyboard distance between mismatching digits of the first value and the second value of the numerical identifier or a probability distribution of errors over digits of the numerical identifier (Yang, [0044] note User queries may generally be noisy due to typos or misinformation. Common errors, for example, for a user query to search for a year or two off of the actual year of a birth date, or to mistake a middle name for a first name, [0045] note Because of the noise in both the queries and content, a certain amount of fuzziness must be allowed to achieve acceptable recall… fuzziness expansion can include a calculation of an edit distance, or a number of edits Some example search clauses include names, places, and dates, [0046], [0047]).
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Yang, Tamayo-Rios and Patel in view of Schumacher et al., US 2008/0005106 A1 (hereinafter “Schumacher”).
Claim 4: Yang, Tamayo-Rios and Patel do not explicitly teach the record-matching computing system of claim 2, wherein the reference record further comprises a list of past addresses, and wherein the matching attributes further comprise an address attribute indicating whether a phone area code in the query record matches a state indicated in the second address or the list of past addresses.
However, Schumacher teaches this (Schumacher, [0043] note master entity index system may include a master entity index (MEI) 32 that processes, updates and stores data records about one or more entities, [0064] note The MEI may also be queried about the past history of changes of the data in the data records so that, for example, the past addresses for a particular entity may be displayed, [0084] note confidence level may be calculated based on a scoring routine, which may use historical data about a particular attribute, such as a last address, [0270] note The address-by-phone discrepancy tables are calculated by calculating edit distance on the address/phone subset of the matched set. As with the unmatched probability tables, two attributes are considered simultaneously. When compare two members from the address/phone subset of the matched set, several distances can be obtained, for instance, both a phone and an address distance (if both members have at least one valid value for both address and phone), only a phone distance (if one member has no valid address but both have a valid phone values), only an address distance (if one member has no valid phone but both have a valid address), or null).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the databases of Yang, Tamayo-Rios and Patel with the address-by-phone discrepancy tables of according to known methods (i.e. determining discrepancies between address and phone information). Motivation for doing so is that this processes data records to increase accuracy (Schumacher, [0047]).
Claims 7-10, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Yang, Tamayo-Rios and Patel in view of Chickering et al., US 2016/0162802 A1 (hereinafter “Chickering”).
Claim 7: Yang, Tamayo-Rios and Patel do not explicitly teach the record-matching computing system of claim 1, wherein updating the matching labels of the misclassified training samples comprises: generating two or more auxiliary classifications for each of the misclassified training samples using two or more auxiliary models; and updating the matching labels of the misclassified training samples based on the two or more auxiliary classifications; and re-training with the updated matching labels.
However, Chickering teaches this (Chickering, [Fig. 2], [0038] note FIG. 2 is a flowchart showing aspects of one illustrative method for active machine learning 200… he active machine learning system 110 can direct the auxiliary machine learning model 112 to select one or more sample unlabeled observations 116 for processing and outputting a score, [0039] note the featuring component 122 can refine the target machine learning model capacity based on the score… if the score indicates the unlabeled observation belongs to a class, one or more possible features 126 may be extracted from a labeled observation such as labeled observation 132 for training the target machine learning model, [0040] note Upon refinement of the capacity of the target machine learning model 114, the method 200 further includes retraining the auxiliary machine learning model with labeled observations 132, [0057] note output scores of one or more auxiliary machine learning models, [Fig. 6] note Auxiliary Machine Learning Model(s) 122).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model of Yang, Tamayo-Rios and Patel with the auxiliary machine learning models of Chickering according to known methods (i.e. training the machine learning model using auxiliary machine learning models). Motivation for doing so is can increase efficiency in training limited-capacity models (Chickering, [0014]).
Claim 8: Yang, Tamayo-Rios, Patel and Chickering teach the record-matching computing system of claim 7, wherein updating the matching labels of the misclassified training samples comprises:
assigning the matching label of a training sample to be a value based on determining that each of the two or more auxiliary classifications has the value (Chickering, [Fig. 2], [0038] note FIG. 2 is a flowchart showing aspects of one illustrative method for active machine learning 200… he active machine learning system 110 can direct the auxiliary machine learning model 112 to select one or more sample unlabeled observations 116 for processing and outputting a score, [0039] note the featuring component 122 can refine the target machine learning model capacity based on the score… if the score indicates the unlabeled observation belongs to a class, one or more possible features 126 may be extracted from a labeled observation such as labeled observation 132 for training the target machine learning model).
Claim 9: Yang, Tamayo-Rios, Patel and Chickering teach the record-matching computing system of claim 7, wherein the training process further comprises:
prior to generating the two or more auxiliary classifications, training the two or more auxiliary models using the plurality of training samples; and after updating the matching labels of the misclassified training samples, re-training the two or more auxiliary models using the plurality of training samples with the updated matching labels (Chickering, [0038] note method 200 of active machine learning can include initiating an active learning process with the auxiliary machine learning model 112, at block 202, [0039] note the featuring component 122 can refine the target machine learning model capacity based on the score, at block 204, [0040] note Upon refinement of the capacity of the target machine learning model 114, the method 200 further includes retraining the auxiliary machine learning model with labeled observations 132, at block 206).
Claim 10: Yang, Tamayo-Rios, Patel and Chickering teach the record-matching computing system of claim 7, wherein training the two or more auxiliary models using the plurality of training samples comprises training the two or more auxiliary models using a subset of the training matching attributes for each of the plurality of training samples (Chickering, [0059] note a subset labelset of unlabeled observations can be identified for the particular pool or pools of observations. Thereafter, the auxiliary machine learning model 112 can be configured to select sample unlabeled observations only from the subset labelset to increase diversity in the unlabeled samples).
Claim 14: Yang, Tamayo-Rios and Patel do not explicitly teach the method of claim 12, wherein updating the matching labels of the misclassified training samples comprises: generating two or more auxiliary classifications for each of the misclassified training samples using two or more auxiliary models; and updating the matching labels of the misclassified training samples based on the two or more auxiliary classifications; and re-training with the updated matching labels.
However, Chickering teaches this (Chickering, [Fig. 2], [0038] note FIG. 2 is a flowchart showing aspects of one illustrative method for active machine learning 200… he active machine learning system 110 can direct the auxiliary machine learning model 112 to select one or more sample unlabeled observations 116 for processing and outputting a score, [0039] note the featuring component 122 can refine the target machine learning model capacity based on the score… if the score indicates the unlabeled observation belongs to a class, one or more possible features 126 may be extracted from a labeled observation such as labeled observation 132 for training the target machine learning model, [0040] note Upon refinement of the capacity of the target machine learning model 114, the method 200 further includes retraining the auxiliary machine learning model with labeled observations 132, [0057] note output scores of one or more auxiliary machine learning models, [Fig. 6] note Auxiliary Machine Learning Model(s) 122).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning model of Yang, Tamayo-Rios and Patel with the auxiliary machine learning models of Chickering according to known methods (i.e. training the machine learning model using auxiliary machine learning models). Motivation for doing so is can increase efficiency in training limited-capacity models (Chickering, [0014]).
Claim 15: Yang, Tamayo-Rios, Patel and Chickering teach the method of claim 14, wherein the training process further comprises:
prior to generating the two or more auxiliary classifications, training the two or more auxiliary models using a subset of the training matching attributes of the plurality of training samples and the matching labels; and after updating the matching labels of the misclassified training samples, re-training the two or more auxiliary models using the subset of the training matching attributes of the plurality of training samples with the corrected matching labels (Chickering, [0038] note method 200 of active machine learning can include initiating an active learning process with the auxiliary machine learning model 112, at block 202, [0039] note the featuring component 122 can refine the target machine learning model capacity based on the score, at block 204, [0040] note Upon refinement of the capacity of the target machine learning model 114, the method 200 further includes retraining the auxiliary machine learning model with labeled observations 132, at block 206).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Yang, Tamayo-Rios, Patel and Chickering in further view of WALTERS et al., US 2020/0012886 A1 (hereinafter “Walters”).
Claim 11: Yang, Tamayo-Rios, Patel and Chickering do not explicitly teach the record-matching computing system of claim 7, wherein the two or more auxiliary models comprises two or more of a naive Bayes model, a multi-layer perception model, a random forest model, or a support vector machine (SVC) model, and wherein the machine learning model is one of a decision tree model, a random forest model, or a repeated incremental pruning to produce error reduction (RIPPER) model.
However, Walters teaches this (Walters, [0042] note a plurality of embedding network layers 304a, 304b, 304c, 304d, and 304n to classify and cluster data 302, [Fig. 4], [0057] note in the method of FIG. 4, generating preliminary clustered data based on the received data may include passing an embedding network layer output comprising clustered data to subsequent embedding network layers, [0068] note Machine-learning models may include… a random forest model… a support vector machine (SVM) model, [0075] note a Bayesian model).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning models of Yang, Tamayo-Rios, Patel and Chickering with the machine learning models of Walters according to known methods (i.e. classifying data based on applying machine learning models including a random forest model). Motivation for doing so is that this reduces the dimensionality of clustered data, leading to improved accuracy and efficiency (Walters, [0010]).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Yang, Tamayo-Rios and Patel in view of Walters.
Claim 20: Yang, Tamayo-Rios and Patel do not explicitly teach the non-transitory computer-readable storage medium of claim 16, wherein the machine learning model is one of a decision tree model, a random forest model, or a repeated incremental pruning to produce error reduction (RIPPER) model.
However, Walters teaches this (Walters, [0042] note a plurality of embedding network layers 304a, 304b, 304c, 304d, and 304n to classify and cluster data 302, [Fig. 4], [0057] note in the method of FIG. 4, generating preliminary clustered data based on the received data may include passing an embedding network layer output comprising clustered data to subsequent embedding network layers, [0068] note Machine-learning models may include… a random forest model… a support vector machine (SVM) model, [0075] note a Bayesian model).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the machine learning models of Yang, Tamayo-Rios and Patel with the machine learning models of Walters according to known methods (i.e. classifying data based on applying machine learning models including a random forest model). Motivation for doing so is that this reduces the dimensionality of clustered data, leading to improved accuracy and efficiency (Walters, [0010]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
FUJITA et al., US 2021/0350283 A1 - Data of a sample that has been identified to be highly likely to be in the mislabeled state may be relabeled to remain as the teacher data without being excluded.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached at (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2153