DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Application filed on November 17, 2022. Claims 1-20 are pending in the case. Claims 1, 2, and 17 are the independent claims.
This action is non-final.
Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA 35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA 35 U.S.C. 103(c) and potential pre-AIA 35 U.S.C. 102€, (f) or (g) prior art under pre-AIA 35 U.S.C. 103(a).
Claims 1-9 and 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Thompson et al. (US 11003645 B2) in view of U et al. (US 20240320252 A1).
With respect to claim 1, Thompson teaches a system for integrating disparate feature groups during feature engineering, comprising: one or more processors; and a non-transitory computer readable medium comprising instructions that when executed by the one or more processors causes operations (e.g. col. 20, data pipeline system enabling data transformation systems to perform transformations, such as to raw data sets, to produce output transformed data; col. 55 lines 32-50, computer system upon which described embodiments are implemented, including memory storing information and instructions executed by processor; i.e. where the output data is the result of applying a data transformation to existing data, such as raw data, this appears to be analogous to feature engineering (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data)) comprising:
receiving, via a user interface, a user request for a first modification for an integrated structure for an integrated feature graph for a feature engineering pipeline management system, wherein the integrated structure defines integrated feature lineages in the integrated feature graph (e.g. col. 16 lines 11-41, data pipeline performing multi-step transformation of data to produce output data sets; visual node graph displayed through user interface to convey dependency relationships between resources such as data sets; node in visual graph corresponds to actual node in graph data structure; col. 22 lines 11-26, nodes of the graph represent information and edges of the graph represent relationships between the nodes; data corresponding to the graph interacted with visually through graphical user interface; user interacting with data objects by placing, dragging, linking and deleting visual entities on a graphical user interface; col. 22 lines 55-56, data items stored via graph-related data structures; col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; user interface configured to receive user inputs from the user to update the visual node graph and user interface in response to the user input, including the user editing column relationships and column metadata; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation); col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. a user input/request is received (such as from a person Q, requesting a data transformation to column Y to produce column Z) via a user interface to modify a graph structure for a data/feature engineering pipeline (i.e. where the data is to undergo a series of transformations to produce output data sets, this is considered analogous to feature engineering), where the graph structure includes structures representing data element/table/feature lineage information),
wherein the integrated feature lineages comprise a plurality of nodes and a linear relationships between the plurality of nodes, and wherein a respective node of the plurality of nodes corresponds to a respective data engineering transformation occurring at the respective node (e.g. col. 22 lines 11-26, nodes of the graph represent different information including data transformations and edges of graph represent relationships between the nodes; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation));
determining a first structure node in the integrated structure corresponding to the first modification (e.g. col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. the user, such as person Q, has requested a new column, represented by a structure node, be created by a requested modification, such as a data transformation which causes the resulting column/node to be added to the graph, resulting in a determination of the corresponding new column/node for the requested modification, such as a new column Z and its corresponding node according to the applied transformation);
retrieving, from a feature engineering knowledge database, a first structure for a first feature group, wherein the first structure defines a first feature lineage for the first feature group (e.g. col. 14 lines 31-57, drilling into particular node/column of graph within represented data set; accessing data describing lineage of selected column of, including columns upstream and downstream of the selected column and transformations applied at each step of the pipeline; implemented using columns metadata associated with columns of the dataset, the column metadata including column lineage metadata describing column provenance such that column metadata describes the origins and history of a column throughout the lifecycle, from the source column in the source dataset to the target column in a target dataset from a particular data transformation step in the data pipeline; col. 17 lines 31-44, indicating that immutable history data recording and transformation actions, referred to as the catalog, are stored in in a database; col. 18 line 42-67 , the catalog provides data set provenance at the transaction level of granularity, including transformations associated with the transactions; i.e. accessing column provenance/lineage metadata for a particular column/node, which identifies other upstream and downstream columns/nodes in the corresponding provenance/lineage, where the lineage/provenance including the corresponding columns/nodes collectively forms a first structure for a first feature group, including a defined feature lineage for the first feature group, and this corresponding lineage/provenance data is stored in, and retrieved, from a catalog in a database storing the data transformation pipeline information, analogous to a feature engineering knowledge database);
determining that the first structure that corresponds to the first structure node (e.g. col. 22 lines 11-26, nodes of the graph represent different information including data transformations and edges of graph represent relationships between the nodes; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation); col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. a structure for the requested data lineage/transformation is also determined, as reflected by it being present in the graph as a node, or as an edge representing the relationship between the newly created column/feature and another column/feature which is utilized in the transformation to generate the newly created column/feature);
determining a second structure node in the integrated structure, wherein the second structure node is shared by the first structure and a second structure, wherein the second structure defines a second feature lineage (e.g. col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 27 line 27-56, transformation code applied to data set A/initial column to transform column X of data set A into column Y of data set B, and target column metadata also generated for target column Y; column Y initially has corresponding current column lineage metadata indicating that column Y was derived from column X using transformation code at target date and time by person M; all target column metadata stored as column metadata; i.e. a second node, corresponding to column Y (i.e. the data item/feature which is transformed to acquire new column Z) is also determined, as reflected by it being present in the graph structure as a node, and the second node (representing column Y) is shared by both the first structure (i.e. the data lineage/relationship representing the transformation applied to column Y to acquire column Z) and a second structure defining a second feature lineage (i.e. the data lineage/relationship representing the transformation applied to column X to acquire column Y));
generating an updated first structure based on the first modification (e.g. col. 27 line 56 col. 28 line 14, target column metadata including column lineage metadata is generated for target column Z, where the complete column lineage metadata for column Z includes current column lineage metadata (identifying currently derived relationship between column Y and column Z) and the existing column lineage metadata of column Y, indicating that column Z was derived from column Y using transformation code, where the existing column lineage metadata of target column Z is carried over from the complete column lineage metadata for column Y, and the current and existing column lineage metadata of target column Z is combined to form a complete column lineage of the column Z; i.e. the data structure representing lineage information for the column Z feature/node in the graph is updated to include corresponding lineage and transformation information);
merging the updated first structure and the second structure at the second structure node to generate an updated integrated structure (e.g. col. 27 lines 6-13, retrieving existing column lineage metadata; combining or concatenating current column lineage metadata with the existing column lineage metadata to form complete column lineage metadata for particular column; col. 28 lines 28-25, in addition to tracing upstream provenance of target column, going back and determining downstream lineage of column and updating the column lineage metadata for the column to additionally capture that downstream lineage; determining downstream lineage of columns periodically or at a set time; col. 29, lines 7-8, saving downstream column lineage metadata for each target column; i.e. in order to determine the complete column lineage metadata for a given column, such as column Y, the downstream column lineage metadata (i.e. the same metadata describing the transformation of column Y to acquire column Z as previously discussed) may be acquired and combined/concatenated/merged with the existing column lineage metadata for column Y, analogous to merging the updated first structure and the second structure at the second structure node; Examiner additionally notes that col. 41 lines 18-35 indicate that the nodes and corresponding relationship representations (i.e. edges) may be graphically combined into a single representation according to user selection); and
in response to generating an updated integrated structure, generating for display, on the user interface, a notification corresponding to the updated integrated structure (e.g. col. 29 lines 9-20, generating visual node graph for conveying all dependency relationships associated with selected column or data set; col. 44 lines 55-61, visual node graph continually updated to reflect new developments as they occur, such as addition of resources and nodes, changes to resources represented by nodes, and changes to dependency relationships; i.e. after updating the graph structure corresponding to the data/feature processing pipeline, such as to include new column/node Z and associated lineage metadata information (including associated transformations, etc.), the visual node graph may be generated/updated and displayed to the user, effectively providing a notification regarding the updated structure).
Thompson does not explicitly disclose that the feature engineering is of training data for artificial intelligence models. However, U teaches that the feature engineering is of training data for artificial intelligence models (e.g. paragraph 0003, automatically generating features that may be used by one or more machine learning pipelines from raw data; data structures of raw data, such as plurality of columns of raw data; calculating input feature for machine learning model based on data elements of the data structure; determining, generating, and engineering features (measurable attributes depicted by a column in a raw dataset) that may be input to a machine learning pipeline; paragraph 0063-0064, engineered features selected for inclusion in training dataset for machine learning model).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Thompson and U in front of him to have modified the teachings of Thompson (directed to column lineage for resource dependency system and graphical user interface), to incorporate the teachings of U (directed to feature engineering based on semantic types) to include the capability to utilize the datasets and associated processes (i.e. of Thompson) for use in feature engineering of data for training artificial intelligence models (as taught by U). One of ordinary skill would have been motivated to perform such a modification in order to provide domain-specific feature engineering without coding or pre-existent domain knowledge required by conventional systems, and to enable features to be ranked and optimized based on a target machine learning pipeline as described in U (paragraph 0019).
With respect to claim 2, Thompson teaches method for integrating disparate feature groups during feature engineering (e.g. col. 20, data pipeline system enabling data transformation systems to perform transformations, such as to raw data sets, to produce output transformed data; i.e. where the output data is the result of applying a data transformation to existing data, such as raw data, this appears to be analogous to feature engineering (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data)) models, comprising:
receiving, via a user interface, a user request for a first modification for an integrated structure for an integrated feature graph for a feature engineering pipeline management system, wherein the integrated structure defines integrated feature lineages in the integrated feature graph (e.g. col. 16 lines 11-41, data pipeline performing multi-step transformation of data to produce output data sets; visual node graph displayed through user interface to convey dependency relationships between resources such as data sets; node in visual graph corresponds to actual node in graph data structure; col. 22 lines 11-26, nodes of the graph represent information and edges of the graph represent relationships between the nodes; data corresponding to the graph interacted with visually through graphical user interface; user interacting with data objects by placing, dragging, linking and deleting visual entities on a graphical user interface; col. 22 lines 55-56, data items stored via graph-related data structures; col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; user interface configured to receive user inputs from the user to update the visual node graph and user interface in response to the user input, including the user editing column relationships and column metadata; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation); col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. a user input/request is received (such as from a person Q, requesting a data transformation to column Y to produce column Z) via a user interface to modify a graph structure for a data/feature engineering pipeline (i.e. where the data is to undergo a series of transformations to produce output data sets, this is considered analogous to feature engineering), where the graph structure includes structures representing data element/table/feature lineage information);
determining a first structure node in the integrated structure corresponding to the first modification (e.g. col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. the user, such as person Q, has requested a new column, represented by a structure node, be created by a requested modification, such as a data transformation which causes the resulting column/node to be added to the graph, resulting in a determination of the corresponding new column/node for the requested modification, such as a new column Z and its corresponding node according to the applied transformation);
determining a first structure that corresponds to the first structure node, wherein the first structure defines a first feature lineage (e.g. col. 22 lines 11-26, nodes of the graph represent different information including data transformations and edges of graph represent relationships between the nodes; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation); col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. a structure for the requested data lineage/transformation is also determined, as reflected by it being present in the graph as a node, or as an edge representing the relationship between the newly created column/feature and another column/feature which is utilized in the transformation to generate the newly created column/feature);
determining a second structure node in the integrated structure shared by the first structure and a second structure, wherein the second structure defines a second feature lineage (e.g. col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 27 line 27-56, transformation code applied to data set A/initial column to transform column X of data set A into column Y of data set B, and target column metadata also generated for target column Y; column Y initially has corresponding current column lineage metadata indicating that column Y was derived from column X using transformation code at target date and time by person M; all target column metadata stored as column metadata; i.e. a second node, corresponding to column Y (i.e. the data item/feature which is transformed to acquire new column Z) is also determined, as reflected by it being present in the graph structure as a node, and the second node (representing column Y) is shared by both the first structure (i.e. the data lineage/relationship representing the transformation applied to column Y to acquire column Z) and a second structure defining a second feature lineage (i.e. the data lineage/relationship representing the transformation applied to column X to acquire column Y));
generating an updated first structure based on the first modification and generating an updated integrated structure (e.g. col. 27 line 56 col. 28 line 14, target column metadata including column lineage metadata is generated for target column Z, where the complete column lineage metadata for column Z includes current column lineage metadata (identifying currently derived relationship between column Y and column Z) and the existing column lineage metadata of column Y, indicating that column Z was derived from column Y using transformation code, where the existing column lineage metadata of target column Z is carried over from the complete column lineage metadata for column Y, and the current and existing column lineage metadata of target column Z is combined to form a complete column lineage of the column Z; i.e. the data structure representing lineage information for the column Z feature/node in the graph is updated to include corresponding lineage and transformation information);
merging the updated first structure and the second structure at the second structure node to generate an updated integrated structure (e.g. col. 27 lines 6-13, retrieving existing column lineage metadata; combining or concatenating current column lineage metadata with the existing column lineage metadata to form complete column lineage metadata for particular column; col. 28 lines 28-25, in addition to tracing upstream provenance of target column, going back and determining downstream lineage of column and updating the column lineage metadata for the column to additionally capture that downstream lineage; determining downstream lineage of columns periodically or at a set time; col. 29, lines 7-8, saving downstream column lineage metadata for each target column; i.e. in order to determine the complete column lineage metadata for a given column, such as column Y, the downstream column lineage metadata (i.e. the same metadata describing the transformation of column Y to acquire column Z as previously discussed) may be acquired and combined/concatenated/merged with the existing column lineage metadata for column Y, analogous to merging the updated first structure and the second structure at the second structure node; Examiner additionally notes that col. 41 lines 18-35 indicate that the nodes and corresponding relationship representations (i.e. edges) may be graphically combined into a single representation according to user selection);
in response to generating an updated integrated structure, generating for display, on the user interface, a notification corresponding to the updated integrated structure (e.g. col. 29 lines 9-20, generating visual node graph for conveying all dependency relationships associated with selected column or data set; col. 44 lines 55-61, visual node graph continually updated to reflect new developments as they occur, such as addition of resources and nodes, changes to resources represented by nodes, and changes to dependency relationships; i.e. after updating the graph structure corresponding to the data/feature processing pipeline, such as to include new column/node Z and associated lineage metadata information (including associated transformations, etc.), the visual node graph may be generated/updated and displayed to the user, effectively providing a notification regarding the updated structure).
Thompson does not explicitly disclose that the feature engineering is of training data for artificial intelligence models. However, U teaches that the feature engineering is of training data for artificial intelligence models (e.g. paragraph 0003, automatically generating features that may be used by one or more machine learning pipelines from raw data; data structures of raw data, such as plurality of columns of raw data; calculating input feature for machine learning model based on data elements of the data structure; determining, generating, and engineering features (measurable attributes depicted by a column in a raw dataset) that may be input to a machine learning pipeline; paragraph 0063-0064, engineered features selected for inclusion in training dataset for machine learning model).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Thompson and U in front of him to have modified the teachings of Thompson (directed to column lineage for resource dependency system and graphical user interface), to incorporate the teachings of U (directed to feature engineering based on semantic types) to include the capability to utilize the datasets and associated processes (i.e. of Thompson) for use in feature engineering of data for training artificial intelligence models (as taught by U). One of ordinary skill would have been motivated to perform such a modification in order to provide domain-specific feature engineering without coding or pre-existent domain knowledge required by conventional systems, and to enable features to be ranked and optimized based on a target machine learning pipeline as described in U (paragraph 0019).
With respect to claim 3 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein determining the first structure node in the integrated structure corresponding to the first modification comprises: determining an engineered feature corresponding to the first modification; determining that the engineered feature corresponds to the first feature lineage; and selecting the first structure from a plurality of structures in the integrated structure based on determining that the engineered feature corresponds to the first feature lineage (e.g. col. 14 lines 31-57, drilling into particular node/column of graph within represented data set; accessing data describing lineage of selected column of, including columns upstream and downstream of the selected column and transformations applied at each step of the pipeline; implemented using columns metadata associated with columns of the dataset, the column metadata including column lineage metadata describing column provenance such that column metadata describes the origins and history of a column throughout the lifecycle, from the source column in the source dataset to the target column in a target dataset from a particular data transformation step in the data pipeline; col. 32 lines 19-56, user using user interface to drill down into any particular node of the visual node graph and select any particular column of data within the data set; accessing provenance/lineage information of the selected column and generating user interface displaying updated visual node graph including representations of selected node, selected column within the node, edges connecting it with other columns, etc.; user interface including features for informing user of data transformations applied to columns at each step of the data pipeline; i.e. where the column/node is the result of applying a data transformation to existing data, such as raw data, this appears to be analogous to an engineered feature (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data), such that, for the given column/node (feature) which is selected by the user for addition or modification, the system determines (based on the user’s input) the column represented by the relevant node structure in the graph, and further determines the data/feature lineage which corresponds to this column/node along with the corresponding data structure representing the data/feature lineage for the column/node (i.e. the data structure representing the metadata for that column/node indicating other columns/nodes in the lineage along with corresponding transformations, and its potential representation in the graph, such as an edge connecting the column/node to other columns/nodes)).
With respect to claim 4 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein determining the first structure node in the integrated structure corresponding to the first modification comprises: determining a plurality of nodes in the first structure; and determining whether each of the plurality of nodes is shared with another structure in the integrated structure (e.g. col. 31 lines 20-31, dependency manager operating in conjunction with column metadata manager to retrieve column provenance/lineage from the column metadata to generate a graph representing the column lineage of all relevant columns, where a node in the graph represents a column and an edge between two nodes indicates column lineage between the two columns represented by those to nodes; storing graph as dependency relationships; col. 37 lines 5-23, determining selected column from dataset; resource dependency system determining target columns in other datasets that are dependent upon the selected column and source columns from other data sets from which the selected column depends; i.e. the system determines, for a given node/column, a plurality of all relevant nodes based on lineage/dependency information between the nodes, where the lineage/dependency information (which itself is represented as a data structure as previously cited) indicates that each of the nodes/columns are shared with other structures).
With respect to claim 5 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches determining a third structure node in the integrated structure, wherein the third structure node is shared by the first structure and a third structure, wherein the third structure defines a third feature lineage; and merging the updated first structure and the third structure at the third structure node to generate the updated integrated structure (e.g. Fig. 3B, showing that a given column/node may share a common edge with multiple other columns/nodes, such as column/edge User_id in Dataset-1 which shares outgoing edges with multiple other columns/nodes and column/edge User_id+Vendor_id which shares incoming edges with multiple other columns/nodes in the graph; col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 28 line 39-67, after generation of column Z, using column Z as source column for generating target column Z1; current column lineage metadata for column Z1 indicating that column Z was a source column and data transformation was applied in order to generate target column Z1; col. 27 lines 6-13, retrieving existing column lineage metadata; combining or concatenating current column lineage metadata with the existing column lineage metadata to form complete column lineage metadata for particular column; col. 28 lines 28-25, in addition to tracing upstream provenance of target column, going back and determining downstream lineage of column and updating the column lineage metadata for the column to additionally capture that downstream lineage; determining downstream lineage of columns periodically or at a set time; col. 29, lines 7-8, saving downstream column lineage metadata for each target column; i.e. a third node may exist in the graph which shares both the first edge/lineage (i.e. from a same column/node as the new/modified column/node, such as a new column Z1 which shares a lineage with column Z in that they both ultimately inherit from an upstream column Y or another column at a same hierarchical level in the graph which also inherits from a same parent in the lineage as column Z as shown in Fig. 3B) and an additional edge/lineage (i.e. to a different column/node, such as a different downstream column/node), and corresponding combination/concatenation/merging may be performed with respect to these edges/lineages as well).
With respect to claim 6 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein generating the updated first structure based on the first modification further comprises: receiving an updated feature engineering transformation for a current structure node in the first feature lineage; and replacing a current feature engineering transformation for the current structure node in the first feature lineage with the updated feature engineering transformation (e.g. col. 8 lines 14-19, column metadata indicates column lineage of columns and transformation code applied to generate the columns; displaying the transformation code applied to generate the selected column through GUI; col. 25 lines 23-30, user interface receiving inputs from user and updating the visual node graph in response to user input; user viewing visual node graph displaying column lineage and editing column relationships/column metadata; col. 26 lines 6-7, updates to the data or transformation applied; i.e. the user can modify the transformation applicable to a column/node in the graph, such as by changing a corresponding relationship (i.e. modifying the source data that the transformation is applied to obtain the target column/node) or by editing the underlying transformation code itself, and this modification results in an updated feature engineering transformation for the current/target column/node within its respective lineage which effectively replaces the original/current transformation).
With respect to claim 7 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein generating the updated first structure based on the first modification further comprises: receiving an updated structure node for the first feature lineage; and replacing a current structure node in the first feature lineage with the updated structure node (e.g. col. 22 lines 11-26, nodes of the graph represent different information including dependency relationships, datasets, data transformations, etc.; data corresponding to the graph can be updated in various ways, allowing user to interact with data objects by placing, dragging, linking, an deleting visual entities on a graphical user interface; col. 25 lines 23-30, receiving user inputs and updating visual graph node; user viewing visual node graph displaying column lineage and editing column relationships or column metadata; col. 44 lines 53-61, visual node graph continually updated to reflect new developments as they occur, such as addition of resources and nodes, changes to the resources represented by nodes (such as properties and attributes of those resources) or changes to the dependency relationships between the underlying resources; i.e. the user may change/update a given node and/or its underlying resource (such as a corresponding table or data element), resulting in an updated node in a given lineage, where the updated node effectively replaces the node which existed prior to the updating).
With respect to claim 8 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein generating the updated first structure based on the first modification further comprises: receiving an updated feature transformer data for a current structure node in the first feature lineage; and replacing current feature transformer data for the current structure node in the first feature lineage with the updated feature transformer data (e.g. col. 8 lines 14-19, column metadata indicates column lineage of columns and transformation code applied to generate the columns; displaying the transformation code applied to generate the selected column through GUI; col. 25 lines 23-30, user interface receiving inputs from user and updating the visual node graph in response to user input; user viewing visual node graph displaying column lineage and editing column relationships/column metadata; col. 26 lines 6-7, updates to the data or transformation applied; i.e. the user can modify the transformer data applicable to a column/node in the graph, such as by changing a corresponding relationship (i.e. modifying the source data that the transformation is applied to obtain the target column/node) or by editing the underlying transformation code itself, and this modification results in updated feature engineering transformer data for the current/target column/node within its respective lineage which effectively replaces the original/current transformation).
With respect to claim 9 Thompson in view of U teaches all of the limitations of claim 8 as previously discussed, and Thompson further teaches receiving, via the user interface, a user selection of the current structure node; and in response to the user selection of the current structure node, generating for display, on the user interface, native data, for the updated first structure, and the updated feature transformer data that describes, in a human-readable format, a transformation of the native data at the current structure node (e.g. col. 7 lines 11-27, accessing column metadata including column lineage and transformation code applied to generate columns and displaying the column metadata through the GUI; displaying transformation code applied to generate selected column through the GUI; col. 32 lines 19-56, user using user interface to drill down into any particular node of the visual node graph and select any particular column of data within the data set; accessing provenance/lineage information of the selected column and generating user interface displaying updated visual node graph including representations of selected node, selected column within the node, edges connecting it with other columns, etc.; user interface including features for informing user of data transformations applied to columns at each step of the data pipeline; col. 36 lines 19-37, describing user interface of Figs. 3A-B, user interface includes selectable tab 330, labeled “Dataset code,” where selecting it opens additional information panel displaying code associated with the dataset, including describing the transformations or functions used to generate a target column/particular selected column; displaying cumulative transformation code associated with generating the target column; describing transformations to directly generate column, or all collective transformations applied upstream in the data pipeline that results in generation of target column; col. 39 lines 8-12, user interface allowing user to select specific node to view transformation code applied in generating the corresponding column at that step or all the cumulative transformations applied up to that step; col. 41 line 64-col. 42 line 8, selectable tab 530 of UI including code tab to display code associated with particular node of visual node graph such as code used to generate a resource that the node represents (e.g. transformation or functions applied upstream data to produce a data set, etc.)).
With respect to claim 11 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches receiving a first user request corresponding to an engineered feature; and in response to the first user request, generating for display, on the user interface, a first result to the first user request, wherein the first result describes, in a human-readable format, whether the engineered feature is an output of a feature lineage in the updated integrated structure (e.g. col. 7 lines 11-27, accessing column metadata including column lineage and transformation code applied to generate columns and displaying the column metadata through the GUI; displaying transformation code applied to generate selected column through the GUI; col. 32 lines 19-56, user using user interface to drill down into any particular node of the visual node graph and select any particular column of data within the data set; accessing provenance/lineage information of the selected column and generating user interface displaying updated visual node graph including representations of selected node, selected column within the node, edges connecting it with other columns, etc.; user interface including features for informing user of data transformations applied to columns at each step of the data pipeline; col. 36 lines 19-37, describing user interface of Figs. 3A-B, user interface includes selectable tab 330, labeled “Dataset code,” where selecting it opens additional information panel displaying code associated with the dataset, including describing the transformations or functions used to generate a target column/particular selected column; displaying cumulative transformation code associated with generating the target column; describing transformations to directly generate column, or all collective transformations applied upstream in the data pipeline that results in generation of target column; col. 39 lines 8-12, user interface allowing user to select specific node to view transformation code applied in generating the corresponding column at that step or all the cumulative transformations applied up to that step; i.e. where the column/node is the result of applying a data transformation to existing data, such as raw data, this appears to be analogous to an engineered feature (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data), such that selection of the column/node by the user and displaying the column/node’s provenance/lineage is analogous to providing a human-readable description that the table/node (feature) is an output of a feature lineage in the data structure).
With respect to claim 12 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches receiving a second user request corresponding to a feature transformation; and in response to the second user request, generating for display, on the user interface, a second result to the second user request, wherein the second result describes, in a human-readable format, whether the feature transformation corresponds to any feature transformer data in the updated integrated structure (e.g. col. 32 lines 19-56, user using user interface to drill down into any particular node of the visual node graph and select any particular column of data within the data set; accessing provenance/lineage information of the selected column and generating user interface displaying updated visual node graph including representations of selected node, selected column within the node, edges connecting it with other columns, etc.; user interface including features for informing user of data transformations applied to columns at each step of the data pipeline; col. 36 lines 19-37, describing user interface of Figs. 3A-B, user interface includes selectable tab 330, labeled “Dataset code,” where selecting it opens additional information panel displaying code associated with the dataset, including describing the transformations or functions used to generate a target column/particular selected column; displaying cumulative transformation code associated with generating the target column; describing transformations to directly generate column, or all collective transformations applied upstream in the data pipeline that results in generation of target column; i.e. the user interface may display information related to data transformations applied to various columns, and the user may provide a selection/request to view associated transformation code for a particular column (analogous to a second request corresponding to a feature transformation) and in response the transformation code is displayed (analogous to generating for display a second result describing feature transformer data which corresponds to the feature transformation, where generating this code for display corresponding to a particular node (at the associated transformation which results in the node) also constitutes a description that the transformer code (feature transformer data) does correspond to the feature transformation)).
With respect to claim 13 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches receiving a third user request corresponding to an engineered feature; and in response to the third user request, generating for display, on the user interface, a third result to the third user request, wherein the third result describes, in a human-readable format, whether the engineered feature corresponds to the first feature lineage or the second feature lineage (e.g. col. 7 lines 11-27, accessing column metadata including column lineage and transformation code applied to generate columns and displaying the column metadata through the GUI; displaying transformation code applied to generate selected column through the GUI; col. 32 lines 19-56, user using user interface to drill down into any particular node of the visual node graph and select any particular column of data within the data set; accessing provenance/lineage information of the selected column and generating user interface displaying updated visual node graph including representations of selected node, selected column within the node, edges connecting it with other columns, etc.; user interface including features for informing user of data transformations applied to columns at each step of the data pipeline; col. 36 lines 19-37, describing user interface of Figs. 3A-B, user interface includes selectable tab 330, labeled “Dataset code,” where selecting it opens additional information panel displaying code associated with the dataset, including describing the transformations or functions used to generate a target column/particular selected column; displaying cumulative transformation code associated with generating the target column; describing transformations to directly generate column, or all collective transformations applied upstream in the data pipeline that results in generation of target column; col. 39 lines 8-12, user interface allowing user to select specific node to view transformation code applied in generating the corresponding column at that step or all the cumulative transformations applied up to that step; i.e. where the column/node is the result of applying a data transformation to existing data, such as raw data, this appears to be analogous to an engineered feature (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data), such that selection of the column/node by the user and displaying the column/node’s provenance/lineage is analogous to providing a human-readable description that the table/node (feature) is an output of at least a first feature lineage in the data structure).
With respect to claim 14 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein the integrated feature graph comprises a knowledge graph of the integrated structure, and wherein generating the knowledge graph comprises determining a plurality of structure nodes for the integrated structure and graphically representing a relationship of the plurality of structure nodes (e.g. col. 37 lines 5-23, determining selected column from dataset; resource dependency system determining target columns in other datasets that are dependent upon the selected column and source columns from other data sets from which the selected column depends; col. 38 lines 13-24, mapping out visual node graph on basis of determined dependency relationships and building/updating corresponding graph that maps out column provenance/lineage relationships; columns represented as nodes on the graph and edges connecting nodes correspond to column provenance/lineage relationships between nodes; this graph serves as visual basis for the visual node graph that is presented to the user).
With respect to claim 15 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein receiving the user request for the first modification comprises: receiving a first user update to feature transformer data, wherein the feature transformer data describes, in a human-readable format, a transformation of native data at a current structure node in the integrated structure; generating updated feature transformer data; and storing the updated feature transformer data (e.g. col. 8 lines 14-19, column metadata indicates column lineage of columns and transformation code applied to generate the columns; displaying the transformation code applied to generate the selected column through GUI; col. 25 lines 22-30, user interface configured to receive inputs from user and update visual graph node and UI in response; user viewing visual node graph, displaying column lineage and editing column relationships/column metadata such as via column metadata manager 206; col. 26 lines 6-7, updates to the data or transformation applied; i.e. the user may update/edit, via the GUI, column metadata for a selected column, including relevant transformation code, and this update/edit to the column metadata, including the corresponding transformation, is subsequently applied, analogous to receiving a user update to human-readable transformer data (i.e. human-editable transformation code) for a current structure node (i.e. column/node) and generating and storing updated feature transformer data).
With respect to claim 16 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed, and Thompson further teaches wherein receiving the user request for the first modification comprises: receiving a second user update to a current structure in the integrated structure; and generating the first structure by updating the current structure (e.g. col. 22 lines 11-26, nodes of the graph represent information and edges of the graph represent relationships between the nodes; data corresponding to the graph interacted with visually through graphical user interface; user interacting with data objects by placing, dragging, linking and deleting visual entities on a graphical user interface; col. 25 lines 22-30, user interface configured to receive inputs from user and update visual graph node and UI in response; user viewing visual node graph, displaying column lineage and editing column relationships/column metadata such as via column metadata manager 206; col. 44 lines 55-61, visual node graph continually updated to reflect new developments as they occur, such as addition of resources and nodes, changes to resources represented by nodes, and changes to dependency relationships; i.e. the user interaction requesting the modification may include modifications/updates to various data entities represented in the visual graph in the UI, including updates to relationships and column metadata, where column relationships and metadata, including provenance/lineage information, are stored in a corresponding data structure, such that updating this information is analogous to a user update to a current structure which causes generation of a new/first structure based on the updates (i.e. such as a user changing provenance, lineage, dependency, or other relationship for a given column/node in the graph, resulting in an updated version).
With respect to claim 17, Thompson teaches a non-transitory, computer-readable medium comprising instructions that, when executed by one or more processors, cause operations (e.g. col. 52, lines 29-35, computer program product including computer readable storage medium having computer readable program instructions for causing processor to carry out aspects of disclosure) comprising:
receiving, via a user interface, a user request for a first modification for an integrated structure for an integrated feature graph for a feature engineering pipeline management system, wherein the integrated structure defines integrated feature lineages in the integrated feature graph (e.g. col. 16 lines 11-41, data pipeline performing multi-step transformation of data to produce output data sets; visual node graph displayed through user interface to convey dependency relationships between resources such as data sets; node in visual graph corresponds to actual node in graph data structure; col. 20, data pipeline system enabling data transformation systems to perform transformations, such as to raw data sets, to produce output transformed data; col. 22 lines 11-26, nodes of the graph represent information and edges of the graph represent relationships between the nodes; data corresponding to the graph interacted with visually through graphical user interface; user interacting with data objects by placing, dragging, linking and deleting visual entities on a graphical user interface; col. 22 lines 55-56, data items stored via graph-related data structures; col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; user interface configured to receive user inputs from the user to update the visual node graph and user interface in response to the user input, including the user editing column relationships and column metadata; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation); col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. a user input/request is received (such as from a person Q, requesting a data transformation to column Y to produce column Z) via a user interface to modify a graph structure for a data/feature engineering pipeline (i.e. where the data is to undergo a series of transformations to produce output data sets, this is considered analogous to feature engineering), where the graph structure includes structures representing data element/table/feature lineage information (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data));
determining a first structure node in the integrated structure corresponding to the first modification (e.g. col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. the user, such as person Q, has requested a new column, represented by a structure node, be created by a requested modification, such as a data transformation which causes the resulting column/node to be added to the graph, resulting in a determination of the corresponding new column/node for the requested modification, such as a new column Z and its corresponding node according to the applied transformation);
determining a first structure that corresponds to the first structure node, wherein the first structure defines a first feature lineage (e.g. col. 22 lines 11-26, nodes of the graph represent different information including data transformations and edges of graph represent relationships between the nodes; col. 26 line 23-col. 27 line 5, describing column metadata for columns (i.e. represented by nodes) as including various types of column lineage metadata describing the column’s provenance including recent/current column lineage metadata (reflecting immediate relationship between source and target columns including applied transformations), existing column lineage metadata (including the history of transformation of the column through its life cycle before the most recent transformation); col. 27 line 57-col. 28 line 14, second point of time after data set created, transformation code N resulting in transformation of column Y of data set B into column Z of data set C; current column lineage metadata information indicating that the column Z was derived from column Y using transformation code on specific data an time by person Q; i.e. a structure for the requested data lineage/transformation is also determined, as reflected by it being present in the graph as a node, or as an edge representing the relationship between the newly created column/feature and another column/feature which is utilized in the transformation to generate the newly created column/feature);
determining a second structure node in the integrated structure, wherein the second structure node is shared by the first structure and a second structure, wherein the second structure defines a second feature lineage (e.g. col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 27 line 27-56, transformation code applied to data set A/initial column to transform column X of data set A into column Y of data set B, and target column metadata also generated for target column Y; column Y initially has corresponding current column lineage metadata indicating that column Y was derived from column X using transformation code at target date and time by person M; all target column metadata stored as column metadata; i.e. a second node, corresponding to column Y (i.e. the data item/feature which is transformed to acquire new column Z) is also determined, as reflected by it being present in the graph structure as a node, and the second node (representing column Y) is shared by both the first structure (i.e. the data lineage/relationship representing the transformation applied to column Y to acquire column Z) and a second structure defining a second feature lineage (i.e. the data lineage/relationship representing the transformation applied to column X to acquire column Y));
generating an updated first structure based on the first modification (e.g. col. 27 line 56 col. 28 line 14, target column metadata including column lineage metadata is generated for target column Z, where the complete column lineage metadata for column Z includes current column lineage metadata (identifying currently derived relationship between column Y and column Z) and the existing column lineage metadata of column Y, indicating that column Z was derived from column Y using transformation code, where the existing column lineage metadata of target column Z is carried over from the complete column lineage metadata for column Y, and the current and existing column lineage metadata of target column Z is combined to form a complete column lineage of the column Z; i.e. the data structure representing lineage information for the column Z feature/node in the graph is updated to include corresponding lineage and transformation information);
merging the updated first structure and the second structure at the second structure node to generate an updated integrated structure (e.g. col. 27 lines 6-13, retrieving existing column lineage metadata; combining or concatenating current column lineage metadata with the existing column lineage metadata to form complete column lineage metadata for particular column; col. 28 lines 28-25, in addition to tracing upstream provenance of target column, going back and determining downstream lineage of column and updating the column lineage metadata for the column to additionally capture that downstream lineage; determining downstream lineage of columns periodically or at a set time; col. 29, lines 7-8, saving downstream column lineage metadata for each target column; i.e. in order to determine the complete column lineage metadata for a given column, such as column Y, the downstream column lineage metadata (i.e. the same metadata describing the transformation of column Y to acquire column Z as previously discussed) may be acquired and combined/concatenated/merged with the existing column lineage metadata for column Y, analogous to merging the updated first structure and the second structure at the second structure node; Examiner additionally notes that col. 41 lines 18-35 indicate that the nodes and corresponding relationship representations (i.e. edges) may be graphically combined into a single representation according to user selection); and
in response to generating an updated integrated structure, generating for display, on the user interface, a notification corresponding to the updated integrated structure (e.g. col. 29 lines 9-20, generating visual node graph for conveying all dependency relationships associated with selected column or data set; col. 44 lines 55-61, visual node graph continually updated to reflect new developments as they occur, such as addition of resources and nodes, changes to resources represented by nodes, and changes to dependency relationships; i.e. after updating the graph structure corresponding to the data/feature processing pipeline, such as to include new column/node Z and associated lineage metadata information (including associated transformations, etc.), the visual node graph may be generated/updated and displayed to the user, effectively providing a notification regarding the updated structure).
Assuming arguendo that Thompson does not explicitly disclose feature engineering, U teaches feature engineering (e.g. paragraph 0003, automatically generating features that may be used by one or more machine learning pipelines from raw data; data structures of raw data, such as plurality of columns of raw data; calculating input feature for machine learning model based on data elements of the data structure; determining, generating, and engineering features (measurable attributes depicted by a column in a raw dataset) that may be input to a machine learning pipeline; paragraph 0063-0064, engineered features selected for inclusion in training dataset for machine learning model).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Thompson and U in front of him to have modified the teachings of Thompson (directed to column lineage for resource dependency system and graphical user interface), to incorporate the teachings of U (directed to feature engineering based on semantic types) to include the capability to utilize the datasets and associated processes (i.e. of Thompson) for use in feature engineering of data for training artificial intelligence models (as taught by U). One of ordinary skill would have been motivated to perform such a modification in order to provide domain-specific feature engineering without coding or pre-existent domain knowledge required by conventional systems, and to enable features to be ranked and optimized based on a target machine learning pipeline as described in U (paragraph 0019).
With respect to claim 18, Thompson in view of U teaches all of the limitations of claim 17, and Thompson further teaches wherein determining the first structure node in the integrated structure corresponding to the first modification comprises: determining an engineered feature corresponding to the first modification; determining that the engineered feature corresponds to the first feature lineage; and selecting the first structure from a plurality of structures in the integrated structure based on determining that the engineered feature corresponds to the first feature lineage (e.g. col. 14 lines 31-57, drilling into particular node/column of graph within represented data set; accessing data describing lineage of selected column of, including columns upstream and downstream of the selected column and transformations applied at each step of the pipeline; implemented using columns metadata associated with columns of the dataset, the column metadata including column lineage metadata describing column provenance such that column metadata describes the origins and history of a column throughout the lifecycle, from the source column in the source dataset to the target column in a target dataset from a particular data transformation step in the data pipeline; col. 32 lines 19-56, user using user interface to drill down into any particular node of the visual node graph and select any particular column of data within the data set; accessing provenance/lineage information of the selected column and generating user interface displaying updated visual node graph including representations of selected node, selected column within the node, edges connecting it with other columns, etc.; user interface including features for informing user of data transformations applied to columns at each step of the data pipeline; i.e. where the column/node is the result of applying a data transformation to existing data, such as raw data, this appears to be analogous to an engineered feature (Examiner notes that the specification of the instant application describes a similar process of feature engineering in paragraphs 0002 and 0045, i.e. of extracting features from raw data, such as by transforming the raw data), such that, for the given column/node (feature) which is selected by the user for addition or modification, the system determines (based on the user’s input) the column represented by the relevant node structure in the graph, and further determines the data/feature lineage which corresponds to this column/node along with the corresponding data structure representing the data/feature lineage for the column/node (i.e. the data structure representing the metadata for that column/node indicating other columns/nodes in the lineage along with corresponding transformations, and its potential representation in the graph, such as an edge connecting the column/node to other columns/nodes)).
With respect to claim 19, Thompson in view of U teaches all of the limitations of claim 17 as previously discussed, and Thompson further teaches wherein determining the first structure node in the integrated structure corresponding to the first modification comprises: determining a plurality of nodes in the first structure; and determining whether each of the plurality of nodes is shared with another structure in the integrated structure (e.g. col. 31 lines 20-31, dependency manager operating in conjunction with column metadata manager to retrieve column provenance/lineage from the column metadata to generate a graph representing the column lineage of all relevant columns, where a node in the graph represents a column and an edge between two nodes indicates column lineage between the two columns represented by those to nodes; storing graph as dependency relationships; col. 37 lines 5-23, determining selected column from dataset; resource dependency system determining target columns in other datasets that are dependent upon the selected column and source columns from other data sets from which the selected column depends; i.e. the system determines, for a given node/column, a plurality of all relevant nodes based on lineage/dependency information between the nodes, where the lineage/dependency information (which itself is represented as a data structure as previously cited) indicates that each of the nodes/columns are shared with other structures).
With respect to claim 20, Thompson in view of U teaches all of the limitations of claim 17 as previously discussed, and Thompson further teaches the operations further comprising: determining a third structure node in the integrated structure, wherein the third structure node is shared by the first structure and a third structure, wherein the third structure defines a third feature lineage; and merging the updated first structure and the third structure at the third structure node to generate the updated integrated structure (e.g. Fig. 3B, showing that a given column/node may share a common edge with multiple other columns/nodes, such as column/edge User_id in Dataset-1 which shares outgoing edges with multiple other columns/nodes and column/edge User_id+Vendor_id which shares incoming edges with multiple other columns/nodes in the graph; col. 24 line 1-col. 25 line 30, user interface including visual node graph including nodes representing resources such as each column in a column lineage and edges connecting the nodes representing dependency relationship between the resources represented by the nodes, such as a derivation dependency (transformation, etc.) between columns represented by the nodes; col. 28 line 39-67, after generation of column Z, using column Z as source column for generating target column Z1; current column lineage metadata for column Z1 indicating that column Z was a source column and data transformation was applied in order to generate target column Z1; col. 27 lines 6-13, retrieving existing column lineage metadata; combining or concatenating current column lineage metadata with the existing column lineage metadata to form complete column lineage metadata for particular column; col. 28 lines 28-25, in addition to tracing upstream provenance of target column, going back and determining downstream lineage of column and updating the column lineage metadata for the column to additionally capture that downstream lineage; determining downstream lineage of columns periodically or at a set time; col. 29, lines 7-8, saving downstream column lineage metadata for each target column; i.e. a third node may exist in the graph which shares both the first edge/lineage (i.e. from a same column/node as the new/modified column/node, such as a new column Z1 which shares a lineage with column Z in that they both ultimately inherit from an upstream column Y or another column at a same hierarchical level in the graph which also inherits from a same parent in the lineage as column Z as shown in Fig. 3B) and an additional edge/lineage (i.e. to a different column/node, such as a different downstream column/node), and corresponding combination/concatenation/merging may be performed with respect to these edges/lineages as well).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Thompson in view of U, further in view of Patel et al. (US 11029948 B1).
With respect to claim 10 Thompson in view of U teaches all of the limitations of claim 2 as previously discussed. Thompson and U do not explicitly disclose validating feature lineages in the updated integrated structure; and selecting the notification from a plurality of notifications based on validating the feature lineages. However, Patel teaches validating feature lineages in the updated integrated structure; and selecting the notification from a plurality of notifications based on validating the feature lineages (e.g. col. 13 lines 31-53, data lineage and provenance engine configured to record data lineage and provenance and validate the data provenance output against requirements; generating graphical representation of data element relationships and dependencies within and across data processing pipelines; col. 14 lines 11-22, data change discovery and alerting engine monitoring changes in data lineage and determining data dependencies at each node of the data lineage; comparing data lineage and provenance captured by DLP engine and detecting changes during subsequent runs; generating alert when change in data lineage or provenance is discovered; col. 16 lines 26-39, continuously monitoring for changes to the data lineage and the data provenance; generating alert based on receiving indication of changes to data lineage and data provenance; transmitting control signals to cause computing device associated with user to display the alert indicating the one or more changes; i.e. validating and continuously monitoring changes to data lineage and provenance and generating corresponding alerts notifications based on the validation and monitoring, where there are multiple possible alerts/notifications (i.e. one or more alerts corresponding to specific changes to data lineage and specific changes to data provenance) such that a given alert/notification is selected from a plurality of alerts notifications for presentation to the user).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Thompson and U in front of him to have modified the teachings of Thompson (directed to column lineage for resource dependency system and graphical user interface) and U (directed to feature engineering based on semantic types), to incorporate the teachings of Patel (directed to normalizing data dependency effects in a network environment) to include the capability to validate and continuously monitor data/feature lineage and provenance attributes and provide at least one alert/notification, from multiple possible alerts, based on the validation and monitoring of the data lineage/provenance (i.e. such as an alert regarding a particular change in data provenance, an alert regarding a particular change in data lineage, etc.). One of ordinary skill would have been motivated to perform such a modification in order to address a need for a system for normalizing data dependency effects across an electronic network environment as described in Patel (col. 1 lines 7-19).
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain,” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting in re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (GCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co, v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert, denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F,3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir, 2005): Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEREMY L STANLEY whose telephone number is (469)295-9105. The examiner can normally be reached on Monday-Friday from 9:00 AM to 5:00 PM CST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar, can be reached at telephone number (571) 270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center and Private PAIR for authorized users only. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form.
/JEREMY L STANLEY/
Primary Examiner, Art Unit 2127