Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the amendment filed on 09/08/2025.
Claims 1-20 are pending and claims 1-20 have been amended.
The rejection to claims 1-20 set forth under 35 U.S.C. §101 in previous office action is hereby withdrawn.
Claim Interpretation
Amended claims 1-20 recite the limitation, “recognizing that the data quality analysis will not cause slower performance based on sizes of the data source when performed using a matching data quality configuration set,..” (See Applicants’ disclosure, [0023] – “(t)he advantage of applying fingerprint dimension weight is improvement of the fingerprint matching process.”)
Applicant indicated that support for the amendments to claims 1, 8, and 15 can be found at least in Figures 3A, 5C, and 6, and paragraphs [0028] and [0032] of the disclosure as filed.
Applicants’ disclosure discloses the following.
[0023] Figure 5C is a chart depicting matching a source fingerprint with various target fingerprints. Fingerprint matching 550 matches source fingerprint 560 with target fingerprints 570. During the fingerprint matching process, the process builds leading keys according to a fingerprint dimension weight. In one embodiment, this is performed by scanning all of the fingerprint dimensions, selecting the most common dimension as the leading dimension. Keys are used (first, second or third leading keys) to build the tree. All of the fingerprint dimensions are scanned, the process selects the dimension as the leading key to build the tree based on weight being larger than a threshold. The advantage of applying fingerprint dimension weight is improvement of the fingerprint matching process.
Reference is made to MPEP 2111.04.II.Contingent Limitations: “The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.” See Ex parte Schulhauser, Appeal 2013-007847 (PTAB April 28, 2016) for an analysis of contingent claim limitations in the context of both method claims and system claims
It appears that the invention improves how organizations check and maintain the quality of their data. Traditionally, analyzing large volumes of data for errors and inconsistencies can be slow and resource-intensive, sometimes causing system crashes or delays. The proposed method creates “fingerprints” of data sources using specific configuration sets. These fingerprints are unique summaries or profiles of the data’s structure and content. When a new dataset needs to be analyzed, the system quickly compares its fingerprint to previously stored fingerprints. If a match is found, the system uses the matching configuration to analyze the data efficiently. If not, the system creates a new fingerprint and updates its repository for future use. This approach speeds up data quality checks, reduces system bottlenecks, and allows the system to learn and improve over time by automatically updating its configuration sets.
The step of “recognizing” as in claim is a step that is not being performed. There is no description in applicants disclosure as to how a user recognizes that the analysis would or would not slow the performance.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1 is produced below for convenience.
Claim 1. (Currently Amended) A computer-implemented method, implemented by an information handling system that includes a processor and a memory, the method comprising:
receiving a data source;
retrieving one or more fingerprint configuration sets corresponding to the received data source;
generating one or more data source fingerprints corresponding to the data source based on a set of fingerprint attributes included in the fingerprint configuration sets by:
scanning fingerprint dimensions; selecting a most common dimension as a leading dimension; building a tree based on weight of the leading dimension being larger than a threshold;
comparing the generated data source fingerprints to a plurality of stored fingerprints in a repository before performing a data quality analysis that can cause delays and performance issues;
in response to the comparing identifying a match:
recognizing that the data quality analysis will not cause slower performance based on sizes of the data source when performed using a matching data quality configuration set, retrieving the matching data quality configuration set from the repository; and performing the data quality analysis on the received data source using the retrieved data quality configuration set; and
in response to the comparing not identifying a match: performing the data quality analysis on the received data source using a selected one of the fingerprint configuration sets; and updating the repository with the selected fingerprint configuration set as the data quality configuration set corresponding to the received data source.
Applicants apparently relied on Fig. 5C and par. [0023] to support the limitation, “scanning fingerprint dimensions; selecting a most common dimension as a leading dimension; building a tree based on weight of the leading dimension being larger than a threshold..”
As it can be seen hereinbelow, the description below in [0023] does not describe how a tree is built. Fig. 5C shows 2 tables 550, and 570, however, there is not adequate description as to how the element in 550 map to the elements in 570 in a one-many relationships. Fig 5C, without proper descriptions, is not a tree.
Applicants’ disclosure:
[0023] Figure 5C is a chart depicting matching a source fingerprint with various target fingerprints. Fingerprint matching 550 matches source fingerprint 560 with target fingerprints 570. During the fingerprint matching process, the process builds leading keys according to a fingerprint dimension weight. In one embodiment, this is performed by scanning all of the fingerprint dimensions, selecting the most common dimension as the leading dimension. Keys are used (first, second or third leading keys) to build the tree. All of the fingerprint dimensions are scanned, the process selects the dimension as the leading key to build the tree based on weight being larger than a threshold. The advantage of applying fingerprint dimension weight is improvement of the fingerprint matching process.
Claims 2-20 incorporate the deficiencies as stated above.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
In the amended claim 1, it is unclear as to how the step of the step of “building a tree” is realized. A tree cannot be built from two tables (applicants disclosure, Fig. 5C, and [0023]) that have rows, unless the one to many relationships between the element of tables 550 and 570 are described.
Claims 2-20 incorporate the deficiencies as stated above.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Regensburger et al., US 20200320224 (hereinafter Regensburger), in view of US PG PUB 20090307273 published 10-December-2009 hereinafter Johnson, further in view of US PG PUB 20070198609 published 23-August-2007 hereinafter Black.
Regarding claim 1, Regensburger discloses, a computer-implemented method, implemented by an information handling system that includes a processor and a memory, the method comprising:
receiving a data source (e.g. data source and Fig. 3); Page 11 125, Regensburger: [0053]- [0056]
retrieving one or more fingerprint configuration sets corresponding to the received data source (e.g. FIG. 3 shows an example of the relationships between a data source 125, a fingerprint 310, and a set of ridges 315-335 [as fingerprint configuration sets]. As shown in this example, the basic numeric ridge 315, the cardinality ridge 320, the PG (Postgres) stats ridge 325, the string ridge 330, and the sensitivity ridge 335 make up the fingerprint 310 that describes and represents the data source 125. In various implementations, the ridges may include numeric statistics, descriptive statistics, timestamp statistics, lists of the most frequently occurring values, other frequency metrics, etc. that are used [as retrieving] in measuring, estimating, or calculating the information loss, Regensburger: [0053]-[0056] and Fig. 3);
generating one or more data source fingerprints corresponding to the data source based on a set of fingerprint attributes included in the fingerprint configuration sets (e.g. generating fingerprint 310 corresponding to the data source 125 based on the set of fingerprint attributes (i.e., "Basic numeric ridge" 315, "Cardinality Ridge" 320, "PG Stats Ridge" 325, "String Ridge" 330, "Sensitivity Ridge" 335) included in the configuration set, Regensburger: Fig. 3);
comparing the generated data source fingerprints to a plurality of stored fingerprints in a repository (e.g. the fingerprint cache 221 [as a plurality of stored fingerprints in a repository] stores at least one fingerprint, which is a series of measurements and artifacts about the data source 125. At block 417, the system does a check to determine whether [as comparing] a fingerprint 310 of the data source 125 exists, Regensburger: [0053], [0073]-[0079] and Figs 4A-B);
in response to the comparing identifying a match:
retrieving a data quality configuration set from the repository (e.g. the fingerprint cache 221 [as a plurality of stored fingerprints in a repository] stores at least one fingerprint, which is a series of measurements and artifacts about the data source 125 is retrieving, Regensburger: [0053]); and
performing a data quality analysis on the received data source using the retrieved data quality configuration set (e.g. if the fingerprint exists in fingerprint cache 221, determine information loss, using the policy and the fingerprint, Regensburger: [0053], [0073]-[0079] and Figs 4A-B); and in response to the comparing not identifying a match (e.g. If a fingerprint does not exit, Regensburger: [0076]);
performing the data quality analysis on the received data source using a selected one of the fingerprint configuration sets (e.g. performing the process as shown at Fig. 4B, which includes determining the ridge statistic Page 13 for the data source at step 449, Regensburger: [0053], [0073]-[0079] and Figs 4A-B); and
updating the repository with the selected fingerprint configuration set as the data quality configuration set corresponding to the received data source (e.g. storing the result 4A-B). at step 453, Regensburger: [0053], [0073]-[0079] and Figs 4A-B).
With respect to claim 1, Regensburger does not explicitly indicate the steps of scanning fingerprint dimensions; selecting a most common dimension as a leading dimension; building a tree based on weight of the leading dimension being larger than a threshold; comparing the generated data source fingerprints to a plurality of stored fingerprints in a repository before performing a data quality analysis that can cause delays and performance issues; in response to the comparing identifying a match: recognizing that the data quality analysis will not cause slower performance based on sizes of the data source when performed using a matching data quality configuration set, and retrieving the matching data quality configuration set from the repository.
Johnson, in a system and method for monitoring and processing data from data sources (see Johnson, abstract,). Johnson in Fig 4 shows a processing of data from data source (402), creating (406-408) first and second metadata fingerprints, comparing the first and second metadata fingerprints (410) , and determining if the two metadata are within certain tolerance (412), wherein the specified tolerance provides a set of percentages as a range, and an analysis of deviation from the average percentage (Johnson, [0049]).
In par. [0033] Johnson teaches monitoring and logging fingerprints related to records by amount of data or size (number of bytes) and the number of records received within a period of time. Johnson, [0033] “Horizontal axis of graph 200 may represent any particular period of time. For example, particular periods of time may be an hour, 12 hours, a day, a week, or a month. The vertical axis of graph 200 may represent metadata information about data stored in log file container 24, event log file container 32, and/or annotated log file container 36. Examples of the metadata information may be events per unit time, records per unit time, and/or bytes per unit time.” In [0050], Johnson teaches, “…(t)he “difference between the value of baseline metadata fingerprint and comparison metadata fingerprint is greater than 10% and a log entry may be created …. the tolerance may be set to 2 times of the standard deviation.” If “the difference between baseline metadata fingerprint and comparison metadata fingerprint is one time of the standard deviation. No log entry would be generated in such a circumstance.” In Johnson, par. [0045], “(t)he baseline fingerprint may use time on the horizontal axis and the criteria on the vertical axis. Additionally, a metadata fingerprint may be created according to a running total of the number of bytes, records, and/or events received up to a particular time of day. For example, at time 00:00 for a particular day, the number of bytes received is set to 0. At 05:00, the total number of bytes received since 00:00 is 10,000; at 10:00 the total number of bytes received since 00:00 may be 50,000.” The number of bytes/records received during a period of time of the day is equated with the weight as claimed and the “standard deviation” is, equated with the “threshold” as claimed. Examiner notes that deviation less than 10% is not recorded in the log,” see Johnson, [0050]). In other words, Johnson monitors and logs fingerprints based on amount of data and duration of receiving records and sets a standard deviation of 10% when the values are weighted against the 10% deviation.
It would have been obvious to incorporate the teachings of Johnson in Regensburger because Johnson recognizes the problem with network failure and network utilization (Johnson, [0002] Information technology managers often must monitor and manage an information technology architecture consisting of a large number of individual components such as, routers, firewalls, servers, and personal computers for failures, security breaches, and network utilization. These individual components often generate status messages about their current state of operation that are stored in log files) and suggests how to optimize the use of computational resources (Johnson, [0013], “(t)herefore, the content of the data may be less important than the amount or type of data received. For example, metadata analysis may be concerned with the amount of data such as the number of bytes received, the number of events detected, or the number of records recorded. This may reduce computational and resource demands in security monitoring applications by reducing the need to analyze the content of received data.)
With respect to claim 1, Regensburger and Johnson (RJ)combination does not explicitly teach the step of building a tree based on weight of the leading dimension. In other words, the Regensburger-Johnson combination does not teach the arrangement of fingerprints in a tree.
Black teaches a system for performing the discovery of fingerprints and collecting data regarding the fingerprints (Fig. 8, Black). In collecting the fingerprint attribute data, Black scans the task queues that “[0082] ….periodically or continually scans the task queue ….. retrieves the data and stores it in the next available row of the column in collected data tables 128..” ([0109] … block 132 in FIG. 8…… continuously or periodically scans the task queue 134 in FIG. 8, and when it finds that a task is present…. put it into the collected data store 128.” More importantly, Black teaches ([0112], Fig. 14; ) the use of a search index involving a tree structure that stores fingerprints at different levels of the hierarchical tree structure (Black, [0130], “…the fingerprint search pointer index to point at the next fingerprint to be processed... line 209 of FIG. 14 …. . each fingerprint is used to collect attribute data and analyze it, configuration data is checked to make sure the fingerprint is "turned on", i.e., the system administrator wants new attribute data gathered …. This can be done at every level of the hierarchical organization of fingerprints shown in FIG. 14…”
It would have been obvious to adapt the teachings of Black in Regensburger-Johnson because Black recognizes the difficulty of conducting time-consuming operations of logging all records (see Black, [0004]) and suggest logging the records incrementally ([Black, [0010] “need only to check the incremental data file for changes”) and thus improve the efficiency of the system.
Regarding claim 2, Regensburger further discloses: applying each of the fingerprint attributes to one or more characteristics of the received data source, the applying resulting in a fingerprint component value corresponding to each of the fingerprint attributes (e.g. FIG. 3 shows an example of the relationships between a data source 125, a fingerprint 310, and a set of ridges 315-335 [as fingerprint attributes]. As shown in this example, the basic numeric ridge 315, the cardinality ridge 320, the PG (Postgres) stats ridge 325, the string ridge 330, and the sensitivity ridge 335 make up the fingerprint 310 that describes and represents the data source 125. In various implementations, the ridges may include numeric statistics, descriptive statistics, timestamp statistics, lists of the most frequently occurring values, other frequency metrics, etc. that are used in measuring, estimating, or calculating the information loss caused by a policy 217, e.g., when applied to a specified dataset (e.g., to a column) of the data source 125, Regensburger: [0056] [0063]).
Regarding claim 3, Regensburger further discloses: Page 14 retrieving a weighting value corresponding to one or more fingerprint attributes included in the fingerprint configuration sets (e.g. FIG. 3 shows an example of the relationships between a data source 125, a fingerprint 310, and a set of ridges 315 335 [as fingerprint configuration sets]. As shown in this example, the basic numeric ridge 315, the cardinality ridge 320, the PG (Postgres) stats ridge 325, the string ridge 330, and the sensitivity ridge 335 make up the fingerprint 310 that describes and represents the data source 125. In various implementations, the ridges may include numeric statistics, descriptive statistics, timestamp statistics, lists of the most frequently occurring values, other frequency metrics, etc. that are used [as retrieving] in measuring, estimating, or calculating the information loss, Regensburger: [0053] [0056] and Fig. 3);and adjusting the values corresponding to each of the fingerprint attributes by the retrieved weighting values corresponding to the fingerprint attributes (e.g. if the calculated information-loss estimate number does not match the target information-loss Page 15 number plus or minus the tolerances, then the admin user 135 and/or the admin device 140 may iteratively provide a new or adjusted policy 217 at block 405 for the process 400 to evaluate, until the target information-loss [0053]-[0056], [0087]). number is reached, Regensburger:
Regarding claim 4, Regensburger further discloses: identifying one or more close fingerprint configuration sets based on one or more characteristics of the received data source, wherein the identified fingerprint configuration sets are the retrieved fingerprint configuration sets (e.g. In various embodiments, the process 450 will generate one or more of the ridges 315, 320, 330, 335 for a dataset (e.g., a column) of the data source 125, depending on the type of data in the dataset, Regensburger: [0053]-[0056], [0091]); and performing the data quality analysis using each of the retrieved fingerprint configuration sets (e.g. performing the process as shown at Fig. 4B, which includes determining the ridge statistic for the data source at step 449, Regensburger: [0053], [0073]-[0079] and Figs 4A-B).
Regarding claim 5, Regensburger further discloses: based on the performance of the data quality analysis, selecting one of the close fingerprint configuration set as a matching fingerprint configuration set corresponding to the received data source, (e.g. if the fingerprint exists in fingerprint cache 221, determine Page 16 information loss, using the policy and the fingerprint, Regensburger: [0053], [0073]-[0079] and Figs 4A-B); generating a plurality of configuration set fingerprint mappings based on processing the received data source with the matching fingerprint configuration set, (e.g. generating fingerprint 310 corresponding to the data source 125 based on the set of fingerprint attributes (i.e., "Basic numeric ridge" 315, "Cardinality Ridge" 320, "PG Stats Ridge" 325, "String Ridge" 330, "Sensitivity Ridge" 335) included in the configuration set, Regensburger: Fig. 3); and adding the plurality of configuration set fingerprint mappings to the repository (e.g. the processes 400, 450, and 460 of FIGS. 4A-C are presented for conciseness and clarity of explanation, and that blocks and operations may be added to, deleted from, reordered, performed in parallel, or modified within process 400 without departing from the principles of the invention. For example, in the process 400, blocks may be added to compare the information-loss estimate to a target information-loss number and either loop back to the top to try a different policy or automatically apply the policy. Other variations are possible within the scope of the invention, Regensburger: [0113]).
Regarding claim 6, Regensburger further discloses: Page 17 based on the performance of the data quality analysis, determining that none of the close fingerprint configuration sets matches the received data source (e.g. If a fingerprint does not exit, performing the process as shown at Fig. 4B, which includes determining the ridge statistic for the data source at step 449, Regensburger: [0053], [0073]-[0079] and Figs 4A-B); and responsively: creating a new configuration set corresponding to the received data source (e.g. If a fingerprint does not exit (or optionally if it is older than the time-in-the past threshold (e.g., more than 14 days old) ) (317 No), then a new fingerprint 310 is generated from the data source 125, for example, using the process for fingerprint generation as shown in FIG. 4B, Regensburger: [0076] and Fig. 4B); generating a plurality of configuration set fingerprint mappings based on processing the received data source with the new configuration set (Regensburger: [0076] and Fig. 4B); and adding the plurality of configuration set fingerprint mappings to the repository (e.g. storing the result at step 453, Regensburger: [0053], [0073] [0079] and Figs 4A-B).
Regarding claim 7, Regensburger further discloses, wherein at least one of the fingerprint attributes selected from the fingerprint configuration sets is selected from the group consisting of a number of columns, a number of rows, a data volume, a number of tables, a value contained in a particular cell, a value contained in a particular column, a value contained in a particular row, and a value of one or more tables (e.g. the information loss produced by the policy 217 can be measured, calculated, or evaluated for a single column of the data source 125 that is represented by the fingerprint 310. While in other implementations, the information loss produced by the policy 217 can be measured, calculated, or evaluated for multiple columns, e.g., on a column by column basis, and the fingerprint 310 may have different ridges for different columns of the data source 125. In some implementations, the system 115 may weight some columns more heavily than others in quantifying the information loss, Regensburger: [0078]).
Claims 8-14 are essentially the same as claim 1-7 except that they are directed to an apparatus rather than a method. Therefore, claims 8-14 are rejected under the same rationale as applied to claims 1-7 above.
Claims 15-20 essentially the same as claim 1-7 except that they are directed to a computer program product rather than a method. Therefore, claims 15-20 are rejected under the same rationale as applied to claims 1-7 above.
Response to Arguments
Applicant’s arguments with respect to amended claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
The Johnson and Black references have been applied to address the amended aspects of the claims.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HOSAIN T ALAM whose telephone number is (571)272-3978. The examiner can normally be reached Mon-Thu, 8:00 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2132