DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 25 September 2025 has been entered. Applicant amended claims 14-20. Claims 1-20 remain pending.
Applicant amendment to the claims overcomes the 35 USC 112(b) rejection of 19 May 2025. Therefore the 35 USC 112(b) rejection of 19 May 2025 has been withdrawn.
Response to Arguments
Regarding the 35 USC 112(b) rejection:
Applicant’s arguments, filed 25 September 2025, with respect to 35 USC 112(b) rejection of 19 May 2025 have been fully considered and are persuasive. The 35 USC 112(b) rejection of 19 May 2025 has been withdrawn.
Regarding the 35 USC 103 rejection:
Applicant's arguments filed 25 September 2025 have been fully considered but they are not persuasive.
Applicant’s argument 1:
Applicant respectfully submits that Hodgman, Butler, Hoa, and Bezzi, either alone or in combination, fail to teach or suggest at least the following features as recited in independent claim 1. In particular, independent claim 1 recites "causing, by the classifier component, information that classifies the at least one data field as a sensitive data field to be stored in a data catalog without sending content of any data field in the first set of data fields to the data catalog... “.
Applicant’s support:
Hodgman, at best, describes that the classifiers can include asset classifier, user classifier, and threat classifier, among other such classifiers. The asset classifier is trained to analyze asset data to classify an asset into a physical classification or a role classification, whereas a role classification can include software development, medical services, or finance. However, Hodgman fails to teach or suggest storing classified data field in a data catalog without sending content of any data field to the data catalog.
In contrast, the present application expressly distinguishes between a "data field" (i.e., the category of data) and the "content" or "data item" (i.e., the actual value stored in that field). In particular, paragraph [0026] of the as-filed specification recites "Each datastore 14 may comprise a plurality of records that contain a same set of data fields. The term data field refers to a portion of a record in a datastore. The term "content" or "data item", as used herein, refers to the actual data that is stored in a data field. For example, a Name data field of a first record in the datastore 14-1 may store the name "Bob Johnson". "Bob Johnson" is the content of the Name data field for the first record, and may also be referred to as the data item of the Name data field for the first record. The Name data field of a second record in the datastore 14-1 may store the name "John Smith". Thus, the data fields are the same in each record of the datastore 14-1, but the data items (i.e., content) of the data fields may differ from record to record. Solely as an example, the datastore 14-1 may be a "customer" datastore and may include a set of data fields such as a Name data field, an address data field, a city data field, a state data field, and a zip code data field. The datastores 14 may number in the hundreds or thousands.
Thus, Hodgman fails to teach or suggest storing only categories of the content in a data catalog without storing the actual content.
Examiner’s remarks:
Paragraph 26 of Applicant’s specification recites “The term data fields refers to a portion of a record in a datastore”. Furthermore, it is the interpretation that the classifier component cause “the classified information of the at least one data field (a portion of a record in a datastore) as a sensitive data field” to be stored, but the classifier component may not necessarily directly send “content…”. Cause” and “To be stored” does not necessarily recite a positive storage of the data field by the classifier component itself without sending content of any data field in the first set of data fields…., but rather a potential future storage of the data by another component or application. Therefore, applying the broadest reasonable interpretation in light of the specification, the examiner maintains the rejection, providing the interpretation as best understood. Note: The examiner has also updated typographical error in the office action where the examiner duplicated “causing information that classifies the at least one data field as a sensitive data field to be stored in a data catalog…”.
Applicant’s argument 2:
Applicant respectfully submits that Hodgman, Butler, Hoa, and Bezzi, either alone or in combination, fail to teach or suggest at least the following features as recited in independent claim 1. In particular, independent claim 1 recites "determining, by the data security component based on the data catalog, that the query requested content from a sensitive data field... “.
Applicant’s support 2: see pages 13-14 of Applicant’s remarks
Examiner’s remarks:
Paragraph 54 of Hodgman discloses “Security threat data and new vulnerability definition data can include, for example, known data describing various security threats directed to users, assets, or a combination thereof, as well as data that may potentially pose a security threat to users and assets….”. Information/identifier of the user, security threat can be a sensitive data field or “a portion of a record in a datastore” of a data catalog. Therefore, applying the broadest reasonable interpretation in light of the specification, the examiner maintains the rejection as pointed out, providing the interpretation as best understood.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
Claim 14 recitation of “one or more computing devices operable to receive…; analyze…; cause…; subsequently access…; determine…; and store…” is interpreted per the description disclosed in paragraph 28 of Applicant’s specification.
Claim 17 recitation of “the one or more computing devices are further operable to ….obtain…; parse…; remove…; send…” …” is interpreted per the description disclosed in paragraph 28 of Applicant’s specification.
Claim 18 recitation of “one or more computing devices to receive…; analyze…; cause…; subsequently access…; determine…; and store…” is interpreted per the description disclosed in paragraph 28 of Applicant’s specification.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-5, 14-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hodgman et al US 20200053115 (hereinafter Hodgman), in view of Butler et al US 20200380212 (hereinafter Butler), in further view of Hoa US 20200293681 (hereinafter Hoa), and in further view of Bezzi US 20150007249 (hereinafter Bezzi).
As to claim 1, Hodgman teaches a method (paragraph 1 discloses the disclosure pertains to method of analyzing data), comprising:
receiving, by a classifier component executing on one or more processor devices (paragraph 55 discloses a classifier may execute any suitable machine learning procedures, rule-based classification techniques, heuristic techniques, or some combination thereof), an instruction to determine whether any [data ] in a first datastore are sensitive [data type] in which sensitive data is stored (paragraph 23 discloses the instructions of the non-transitory computer readable storage medium, when executed by the at least one processor, further enable the computing system to provide an asset classifier, a user classifier, and a threat classifier, wherein the classifiers classify/determine different categories for the data, such as an asset as role asset, classify a user being associated with one of an employee type, a group type, or a role type. Role asset, employee type, role type are sensitive data types. As shown in Figures 2-3 and paragraphs 49-50, the classifier 204 within the threat analysis system receives the data from a data source 104 via interface 216. Paragraph 6 reveals at least one data store in a service provider environment maintain at least three data sets from a plurality of data sources, each data set including information from one of assets, users, or security threats Paragraph 7 reveals wherein the asset data set includes first identification information identifying individual devices on a network, the user data set includes second identifying information identifying user accounts associated with the individual devices, and the threat data set includes third identification information identifying threats to one of a device or an user account), the first datastore comprising a first data structure comprising a plurality of records, (Figure 3 and paragraph 54 reveal the threat analysis system receives data from a number of data sources such as data warehouses. Paragraph 6 reveals at least one data store in a service provider environment maintain at least three data sets from a plurality of data sources/records, each data set including information for one of assets, users, or security threats. Data set/data warehouses are a type of data structure);
analyzing, by the classifier component, the first set of [data] and determining that at least one [data] is a sensitive [data type] (paragraph 55 discloses the classifier is trained to analyze the data and classify the data into role classification (medical services, finances), employee type classification because the analyzed asset data can include, for example, information that identifies an electronic device, service, or other resource of a provider, and user data include data from network logs, organization chart information, employment records. See Figure 4, step 404. Role classification of medical services and finances and employee type are sensitive data types);
causing, by the classifier component, information that classifies the at least one data field as a sensitive [data type] to be stored in a data catalog without sending content of any data field in the first set of data fields to the data catalog (paragraph 55 reveals the dataset is analyzed by the classifier to augment the data into classification types. The classifier is trained to analyze the data and classify the data into role classification (medical services, finances), employee type classification because the analyzed asset data can include, for example, information that identifies an electronic device, service, or other resource of a provider, and the user data include data from network logs, organization chart information, employment records. Paragraphs 52 and 56 reveal the augment data by the classifier is stored in various data catalogs. Augmenting data to the data catalog involves enriching dataset via classification into classified data type/metadata and inputting the classified data type/metadata into a catalog. See Figure 4, step 404);
subsequently accessing, by a data security component executing on the one or more processor devices (paragraph 52 discloses the query is received from a query source and directed to a query component/data security component. Paragraph 21 discloses a non-transitory computer readable storage medium stores instructions that, when executed by at least one processor of a computing system, causes the computing system to receive a query associated with a subject, the subject being at least one of an asset, a user, or a security threat), a query [associated with] the first datastore, the query including a data field name that identifies the at least one data field (paragraph 62 discloses subsequent to step 404 and 406 of Figure 4, a query associated with a subject/data field is received. The query can be automated or manual. The subject includes information such as an identifier or other data associated with a particular data type (asset, user, or security threat data). Paragraph 52 discloses the query is received from a query source and directed to a query component/data security component. The query is associated with subject of a data source);
determining, by the data security component based on the data catalog, that the query requested content (paragraphs 52 and 62 disclose the query is analyzed by the query component to determine a subject associated with the query or at least identify a type of query. The subject can include information such as an identifier or other data associated with a particular asset, user. Mapping information, such as a lookup table, can be used to tag or otherwise identify at least one of an asset, a user, or security threat associated with the subject. Query component/data security component can direct the query to an appropriate correlator component based on the subject of the query. Mapping information, such as a lookup table, can be used to tag or otherwise identify at least one of an asset, a user, or security threat associated with the subject. The mapping information can be used to determine insights between the catalog information based on at least one of the asset, the user, or the security threat associated with the subject).
While Hodges teaches classifying the data in a data source, Hodges does not teach that the data pertains to data field(s), and thus does not teach receiving instructions to determine whether any data fields in a first datastore are sensitive data fields; each record comprising a first set of data fields; analyzing the first set of data fields and determining that at least one data field is a sensitive data field; causing information that classifies the at least one data field as a sensitive data field to be stored in a data catalog; a query made to a first datastore; determining that the query requested content from a sensitive data field, and storing, by the data security component, information that the query requested the content from the sensitive data field.
Butler teaches receiving instructions to determine whether any data fields in a first datastore are sensitive data fields (paragraphs 34-35 reveal an execution system configured to profile source data received from data sources and classify the source data and associate portions of the source data with labels representing the semantic meaning of those portions of the source data. A portion of the source data can include a field in the source data. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII); each record comprising a first set of data fields (paragraphs 5 and 34-35 disclose a portion of the source data can include data fields); analyzing the first set of data fields and determining that at least one data field is a sensitive data field (paragraphs 36-40 disclose classifying each field as having a data type and determining a label for the data field. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII); causing information that classifies the at least one data field as a sensitive data field to be stored in a data catalog (paragraph 39 discloses load data module sends the classified labels/ label index to a reference database/catalog).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the data from a first data source in Hodgman’s teachings of analyzing the data and classifying the data with Butler’s teachings of classifying and labeling data fields from a data source such that applications that can use the generated labels of data sets can include data quality enforcement, personal data anonymization, data masking, personally identifiable information (PII) reports, test data management, data set annotation, and so forth. Furthermore, such modification can allow the system administrator to know and understand what data is in the data set stored on the system, such as for regulatory reasons (paragraph 6 of Butler).
The combination of Hodgman in view of Butler does not teach a query made to a first datastore; determining that the query requested content from a sensitive data field, and storing, by the data security component, information that the query requested the content from the sensitive data field.
Hoa teaches accessing, by a data security component, a query made to a first datastore and determining that the query requested content from a sensitive data field (paragraph 66 discloses a security engine/data security component receives/access a request to access data in a personnel database. The security engine determines that the requested data is stored in one or more database columns (data fields) of a personal database corresponding to sensitive information).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to further modify the queries of Hodgman with the modification of the data from a first data source in Hodgman’s teachings of analyzing the data, classifying the data, and accessing a query in view of Butler’s teachings of classifying and labeling data fields from a data source with Hoa’s teachings of determining whether query requested content from a sensitive data field to efficiently restrict and track the queried access of sensitive data in the database without detrimentally impacting the database or the security of the stored data and to improve the auditing of access to such sensitive information (paragraph 2 of Hoa).
The combination of Hodgman in view of Butler does not teach storing, by the data security component, information that the query requested the content from the sensitive data field.
Bezzi teaches storing, by the data security component, information that the query requested the content from the sensitive data field (paragraphs 31-32 disclose a processor/data security component stores results of the query in a temporary data store, one of the columns of the results of the query may include data associated with sensitive identifier of social security number. Paragraphs 16 and 22- 24 disclose the data stored in the database tables and the columns of the table may be classified as identifiers and sensitive attributes and are associated with a privacy risk).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to further modify the queries of Hodgman with the modification of the data from a first data source in Hodgman’s teachings of analyzing the data, classifying the data, and accessing a query and of Butler’s teachings of classifying and labeling data fields from a data source with Hoa’s teachings of determining whether query requested content from a sensitive data field to further include storing the query result as taught by Bezzi to prevent users from retrieving the data from the databases as soon as the database provide the information. By storing the resultant query information first, the system can further secure the query result data for anonymization as needed according to specific needs/authorizations of the requestor to prevent data leakage (paragraphs 3 and 5 of Bezzi).
As to claim 2, the combination of Hodgman in view of Butler and Bezzi teaches wherein analyzing, by the classifier component, the first set of data fields and determining that the at least one data field is a sensitive data field (see claim 1 mapping above) comprises: accessing, by the classifier component, a subset of records of the plurality of records (Butler: paragraphs 51 and 58 disclose the classification module access the fields of the data source and generate profile data from the datasets/records. The profile data module can discover fields by identifying rows of tables in the source data, finding field names, references to fields, or using any similar process. The profile data module determines statistical attribute(s) of the data fields and generates profile data including those statistical attributes. The profile data identifies patterns in the source data. More specifically, the profile data includes statistics about the values of data fields of tables of the source data. For example, the profile data can include information specifying whether the data values of a data field include numerical data, character strings, etc. For example, the statistics about the data values can include a maximum value, a minimum value, a standard deviation, a mean, and so forth of the values that are included in each of the data fields (if the data are numerical). In some implementations, the statistics about the data can include how many digits or characters are in each entry of the data values. The profile data is the subset of records); determining that content stored in the at least one data field in each record in the subset of records comprises sensitive data (Butler: paragraphs 58-59 disclose the classification module classify the data fields using the profile data. Paragraphs 36-40 disclose classifying each field as having a data type and determining a label for the data field. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII) ; and in response to determining that the content stored in the at least one data field in each record in the subset of records comprises sensitive data, determining that the at least one data field is a sensitive data field (Butler: paragraphs 58-59 disclose the classification module classify the data fields using the profile data. Paragraphs 36-40 disclose classifying each field as having a data type and determining a label for the data field. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII). Motivation similar to the motivation presented in claim 1.
As to claim 3, the combination of Hodgman in view of Butler and Bezzi teaches wherein determining that the content stored in the at least one data field in each record in the subset of records (see claim 1 and claim 2 mapping above) comprises sensitive data further comprises: processing the content with at least one regular expression (Butler: paragraphs 35 and 51 disclose for the profiling of the data, a profile data module identifies patterns in the source data. More specifically, the profile data includes statistics about the values of data fields of tables of the source data. For example, the profile data can include information specifying whether the data values of a data field include numerical data, character strings, etc. For example, the statistics about the data values can include a maximum value, a minimum value, a standard deviation, a mean, and so forth of the values that are included in each of the data fields (if the data are numerical). In some implementations, the statistics about the data can include how many digits or characters are in each entry of the data values. For example, the data profile can indicate that each data value of a data field includes seven (or ten) numbers, which may provide a contextual clue indicating that the data field includes telephone numbers); and determining, based on processing the content with the at least one regular expression, that the at least one data item is a sensitive data item (Butler: paragraphs 36 and 44 disclose each field is classified based on the profile data and is labeled. A label index provides a quick reference for the downstream applications to determine the meaning of the data values of the dataset without the downstream application having to analyze the dataset. For example, an application need only refer to the label index to determine the semantic meaning of a field. The label index can indicate whether a particular field includes personally identifying information (PII)). Motivation similar to the motivation presented in claim 1.
As to claim 4, the combination of Hodgman in view of Butler and Bezzi teaches wherein analyzing, by the classifier component, the first set of data fields and determining that the at least one data field is a sensitive data field (see claim 1 mapping above) comprises: accessing a datastore schema that identifies data field names that correspond respectively to the data fields in the first set of data fields (Butler: paragraphs 58-62 disclose for each field, the classification module is configured to look up the label index including existing labels for discovered fields of the source data from a reference database/datastore schema. A data dictionary database is further used if there is not a match with the existing labels from the references database); comparing the data field names to predetermined words; and based on comparing the data field names to the predetermined words (Butler: paragraph 65 reveals a testing module performs classification tests on the field names to determine how to label the data field. Examples are shown in paragraphs 65-69, which disclose using the data content of the fields to determine a data type of the field. Identify a data type of a field involves determining that the data are numerical. The profile data also indicates that each entry in the data field is 13-18 characters long. This may indicate to the testing module that the data field may be a credit card number data field. To confirm this, one or more pattern tests can be executed by the testing module against the data of the suspect data field. For example, the first 4-6 digits for each entry can be checked against a table of issuer codes. The last number can include a check digit defined by a Luhn test. If a threshold percentage of the entries for the data field satisfy each of these patterns, the testing module can conclude that the field holds credit card numbers, and associate the field name with the appropriate label and probability. For the pattern matching logic, both the data itself of a given field and the patterns of the data in the field (e.g., identified in the profile data) can be used to discern which pattern tests to run and what labels to apply to the given data field), determining that the at least one data field is a sensitive data field (Butler: paragraphs 65-69 disclose using the data content of the fields to determine a data type of the field; identify a data type of a field involves determining that the data are numerical. The profile data also indicates that each entry in the data field is 13-18 characters long, which indicate to the testing module that the data field may be a credit card number data field/sensitive data field. Paragraphs 36 and 44 further disclose each field is classified based on the profile data and is labeled. A label index provides a quick reference for the downstream applications to determine the meaning of the data values of the dataset without the downstream application having to analyze the dataset. For example, an application need only refer to the label index to determine the semantic meaning of a field. The label index can indicate whether a particular field includes personally identifying information (PII)). Motivation similar to the motivation presented in claim 1.
As to claim 5, the combination of Hodgman in view of Butler and Bezzi teaches wherein the classifier component (Hodgman: Figure 2, reference number 204 and paragraphs 49-50) executes in a restricted computing environment requiring authorization to access the first datastore (Hodgman: paragraph 49 reveals classifier component is in the threat analysis system component 202. Paragraph 46 reveals the threat analysis system 202 receives user authentication data that is associated with an access policy that identifies access rights of a user, including access to one or more assets) , and wherein the data security component executes in an environment external to the restricted computing environment and has no access to the first datastore (Hodgman: Figure 2, reference 218 “query source”/data security component is external to the threat analysis system 202 that contains the classifier component 204. The query source includes include authorized users of a service provider and does not have direct access to the data source 104).
As to claim 14, Hodgman teaches a computer system (Figure 6 and paragraph 64 disclose basic components of a computing device in accordance with the disclosure; paragraph 1 discloses the disclosure pertains to system and method of analyzing data) comprising: one or more computing devices operable to (paragraph 64 discloses the computing device includes at least one central processor for executing instructions that can be stored in at least one memory device or element. The instructions, when executed by the processor, can enable processor to cause implement the method):
Receive (paragraph 55 discloses a classifier may execute any suitable machine learning procedures, rule-based classification techniques, heuristic techniques, or some combination thereof), an instruction to determine whether any [data ] in a first datastore are sensitive [data type] in which sensitive data is stored (paragraph 23 discloses the instructions of the non-transitory computer readable storage medium, when executed by the at least one processor, further enable the computing system to provide an asset classifier, a user classifier, and a threat classifier, wherein the classifiers classify/determine different categories for the data, such as an asset as role asset, classify a user being associated with one of an employee type, a group type, or a role type. Role asset, employee type, role type are sensitive data types. As shown in Figures 2-3 and paragraphs 49-50, the classifier 204 within the threat analysis system receives the data from a data source 104 via interface 216. Paragraph 6 reveals at least one data store in a service provider environment maintain at least three data sets from a plurality of data sources, each data set including information from one of assets, users, or security threats Paragraph 7 reveals wherein the asset data set includes first identification information identifying individual devices on a network, the user data set includes second identifying information identifying user accounts associated with the individual devices, and the threat data set includes third identification information identifying threats to one of a device or an user account), the first datastore comprising a first data structure comprising a plurality of records, (Figure 3 and paragraph 54 reveal the threat analysis system receives data from a number of data sources such as data warehouses. Paragraph 6 reveals at least one data store in a service provider environment maintain at least three data sets from a plurality of data sources/records, each data set including information for one of assets, users, or security threats. Data set/data warehouses are a type of data structure);
Analyze (paragraph 55 discloses the classifier is trained to analyze the data and classify the data into role classification (medical services, finances), employee type classification because the analyzed asset data can include, for example, information that identifies an electronic device, service, or other resource of a provider, and user data include data from network logs, organization chart information, employment records. See Figure 4, step 404. Role classification of medical services and finances and employee type are sensitive data types);
cause information that classifies the at least one data field as a sensitive [data type] to be stored in a data catalog without sending content of any data field in the first set of data fields to the data catalog (paragraph 55 reveals the dataset is analyzed by the classifier to augment the data into classification types. The classifier is trained to analyze the data and classify the data into role classification (medical services, finances), employee type classification because the analyzed asset data can include, for example, information that identifies an electronic device, service, or other resource of a provider, and the user data include data from network logs, organization chart information, employment records. Paragraphs 52 and 56 reveal the augment data by the classifier is stored in various data catalogs. Augmenting data to the data catalog involves enriching dataset via classification into classified data type/metadata and inputting the classified data type/metadata into a catalog. See Figure 4, step 404);
subsequently access (paragraph 52 discloses the query is received from a query source and directed to a query component/data security component. Paragraph 21 discloses a non-transitory computer readable storage medium stores instructions that, when executed by at least one processor of a computing system, causes the computing system to receive a query associated with a subject, the subject being at least one of an asset, a user, or a security threat), a query [associated with] the first datastore, the query including a data field name that identifies the at least one data field (paragraph 62 discloses subsequent to step 404 and 406 of Figure 4, a query associated with a subject/data field is received. The query can be automated or manual. The subject includes information such as an identifier or other data associated with a particular data type (asset, user, or security threat data). Paragraph 52 discloses the query is received from a query source and directed to a query component/data security component. The query is associated with subject of a data source);
determine that the query requested content (paragraphs 52 and 62 disclose the query is analyzed by the query component to determine a subject associated with the query or at least identify a type of query. The subject can include information such as an identifier or other data associated with a particular asset, user. Mapping information, such as a lookup table, can be used to tag or otherwise identify at least one of an asset, a user, or security threat associated with the subject. Query component/data security component can direct the query to an appropriate correlator component based on the subject of the query. Mapping information, such as a lookup table, can be used to tag or otherwise identify at least one of an asset, a user, or security threat associated with the subject. The mapping information can be used to determine insights between the catalog information based on at least one of the asset, the user, or the security threat associated with the subject).
While Hodges teaches classifying the data in a data source, Hodges does not teach that the data pertains to data field(s), and thus does not teach receiving instructions to determine whether any data fields in a first datastore are sensitive data fields; each record comprising a first set of data fields; analyzing the first set of data fields and determining that at least one data field is a sensitive data field; causing information that classifies the at least one data field as a sensitive data field to be stored in a data catalog; a query made to a first datastore; determining that the query requested content from a sensitive data field, and storing information that the query requested the content from the sensitive data field.
Butler teaches receiving instructions to determine whether any data fields in a first datastore are sensitive data fields (paragraphs 34-35 reveal an execution system configured to profile source data received from data sources and classify the source data and associate portions of the source data with labels representing the semantic meaning of those portions of the source data. A portion of the source data can include a field in the source data. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII); each record comprising a first set of data fields (paragraphs 5 and 34-35 disclose a portion of the source data can include data fields); analyzing the first set of data fields and determining that at least one data field is a sensitive data field (paragraphs 36-40 disclose classifying each field as having a data type and determining a label for the data field. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII); causing information that classifies the at least one data field as a sensitive data field to be stored in a data catalog (paragraph 39 discloses load data module sends the classified labels/ label index to a reference database/catalog).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the data from a first data source in Hodgman’s teachings of analyzing the data and classifying the data with Butler’s teachings of classifying and labeling data fields from a data source such that applications that can use the generated labels of data sets can include data quality enforcement, personal data anonymization, data masking, personally identifiable information (PII) reports, test data management, data set annotation, and so forth. Furthermore, such modification can allow the system administrator to know and understand what data is in the data set stored on the system, such as for regulatory reasons (paragraph 6 of Butler).
The combination of Hodgman in view of Butler does not teach a query made to a first datastore; determining that the query requested content from a sensitive data field, and storing information that the query requested the content from the sensitive data field.
Hoa teaches accessing, by a data security component, a query made to a first datastore and determining that the query requested content from a sensitive data field (paragraph 66 discloses a security engine/data security component receives/access a request to access data in a personnel database. The security engine determines that the requested data is stored in one or more database columns (data fields) of a personal database corresponding to sensitive information).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to further modify the queries of Hodgman with the modification of the data from a first data source in Hodgman’s teachings of analyzing the data, classifying the data, and accessing a query in view of Butler’s teachings of classifying and labeling data fields from a data source with Hoa’s teachings of determining whether query requested content from a sensitive data field to efficiently restrict and track the queried access of sensitive data in the database without detrimentally impacting the database or the security of the stored data and to improve the auditing of access to such sensitive information (paragraph 2 of Hoa).
The combination of Hodgman in view of Butler does not teach storing information that the query requested the content from the sensitive data field.
Bezzi teaches storing information that the query requested the content from the sensitive data field (paragraphs 31-32 disclose a processor/data security component stores results of the query in a temporary data store before anonymization of the query results, one of the columns of the results of the query may include data associated with sensitive identifier of social security number. Paragraphs 16 and 22- 24 disclose the data stored in the database tables and the columns of the table may be classified as identifiers and sensitive attributes and are associated with a privacy risk).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to further modify the queries of Hodgman with the modification of the data from a first data source in Hodgman’s teachings of analyzing the data, classifying the data, and accessing a query and of Butler’s teachings of classifying and labeling data fields from a data source with Hoa’s teachings of determining whether query requested content from a sensitive data field to further include storing the query result as taught by Bezzi to prevent users from retrieving the data from the databases as soon as the database provide the information. By storing the resultant query information first, the system can further secure the data for anonymization as needed according to specific needs/authorizations of the requestor to prevent data leakage (paragraphs 3 and 5 of Bezzi).
As to claim 15, the combination of Hodgman in view of Butler and Bezzi teaches wherein analyzing the first set of data fields and determining that the at least one data field is a sensitive data field (see claim 14 mapping above) comprises: access a subset of records of the plurality of records (Butler: paragraphs 51 and 58 disclose the classification module access the fields of the data source and generate profile data from the datasets/records. The profile data module can discover fields by identifying rows of tables in the source data, finding field names, references to fields, or using any similar process. The profile data module determines statistical attribute(s) of the data fields and generates profile data including those statistical attributes. The profile data identifies patterns in the source data. More specifically, the profile data includes statistics about the values of data fields of tables of the source data. For example, the profile data can include information specifying whether the data values of a data field include numerical data, character strings, etc. For example, the statistics about the data values can include a maximum value, a minimum value, a standard deviation, a mean, and so forth of the values that are included in each of the data fields (if the data are numerical). In some implementations, the statistics about the data can include how many digits or characters are in each entry of the data values. The profile data is the subset of records); determine that content stored in the at least one data field in each record in the subset of records comprises sensitive data (Butler: paragraphs 58-59 disclose the classification module classify the data fields using the profile data. Paragraphs 36-40 disclose classifying each field as having a data type and determining a label for the data field. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII) ; and in response to determining that the content stored in the at least one data field in each record in the subset of records comprises sensitive data, determine that the at least one data field is a sensitive data field (Butler: paragraphs 58-59 disclose the classification module classify the data fields using the profile data. Paragraphs 36-40 disclose classifying each field as having a data type and determining a label for the data field. Paragraph 44 reveals the label can indicate whether a particular field includes sensitive data such as PII). Motivation similar to the motivation presented in claim 14.
As to claim 16, the combination of Hodgman in view of Butler and Bezzi teaches wherein the classification of the at least one data field as sensitive data field is performed in a restricted computing environment requiring authorization to access the first datastore (Hodgman: paragraph 49 reveals classifier component is in the threat analysis system component 202. Paragraph 46 reveals the threat analysis system 202 receives user authentication data that is associated with an access policy that identifies access rights of a user, including access to one or more assets), and wherein the determination based on the catalog is performed in an environment external to the restricted computing environment and has no access to the first datastore (Hodgman: Figure 2, reference 218 “query source”/data security component is external to the threat analysis system 202 that contains the classifier component 204. The query source includes include authorized users of a service provider and does not have direct access to the data source 104).
As to claim 18, Hodgman teaches a non-transitory computer-readable storage medium that includes executable instructions operable to cause one or more computing devices to (Figure 6 and paragraph 64 disclose basic components of a computing device in accordance with the disclosure; paragraph 1 discloses the disclosure pertains to system and method of analyzing data; paragraph 64 discloses the computing device includes at least one central processor for executing instructions that can be stored in at least one memory device or element. The instructions, when executed by the processor, can enable processor to cause implement the method):
Receive (paragraph 55 discloses a classifier may execute any suitable machine learning procedures, rule-based classification techniques, heuristic techniques, or some combination thereof), an instruction to determine whether any [data ] in a first datastore are sensitive [data type] in which sensitive data is stored (paragraph 23 discloses the instructions of the non-transitory computer readable storage medium, when executed by the at least one processor, further enable the computing system to provide an asset classifier, a user classifier, and a threat classifier, wherein the classifiers classify/determine different categories for the data, such as an asset as role asset, classify a user being associated with one of an employee type, a group type, or a role type. Role